Norobiik · @Norobiik
284 followers · 4475 posts · Server noc.social

Chandrrasekar argues that are violating Stack Overflow's terms of service, as users own the content they post on the platform, but it falls under a (CC) license. companies sell their models to customers, but they are unable to attribute each community member whose questions and answers were used to train the model, which is in breach of the CC license.

will charge for | | WIRED
wired.com/story/stack-overflow

#generativeAI #TrainingData #aigiants #stackoverflow #ai #creativecommons #developers #LLM

Last updated 1 year ago

Jay Robbie · @JayRobbie
40 followers · 93 posts · Server mastodon.art

'Just catching up on recent topics after passing my kidney stone. I heard about the AI / ML + song on podcast, but also covered it in a vblog / vlog. We're past the point of no return (hmm... I think that's a Nu Shooz song).

youtube.com/watch?v=sEW_R9sbnN

#pointofnoreturn #violation #copyright #music #artwork #art #aimusic #AIArtwork #aiart #mltrainingdata #aitrainingdata #TrainingData #ml #ai #lewlater #dailytechheadlines #TheWeekend #drake

Last updated 1 year ago

PR🪸🪼🧑‍💻 · @peterrenshaw
325 followers · 1171 posts · Server ioc.exchange

“In development, the dominant paradigm is that the more , the better. ’s GPT-2 model had a data set consisting of 40 gigabytes of text. GPT-3, which ChatGPT is based on, was trained on 570 GB of data. OpenAI has not shared how big the for its latest model, GPT-4, is.

But that hunger for larger models is now coming back to bite the company. In the past few weeks, several Western data protection authorities have started investigations into how OpenAI collects and processes the data powering . They believe it has scraped people’s personal data, such as names or email addresses, and used it without their consent.”

/
<technologyreview.com/2023/04/1>

#ai #TrainingData #openai #dataset #chatgpt #eu #dataprotection

Last updated 1 year ago

Anupam Basu · @abasu
66 followers · 52 posts · Server mas.to

Reading this very interesting paper on sizes for applications like and . Speaks to so many things that have been on my mind about the quality, size, curation and modeling of data. But one preliminary thought that jumps out is how this insistence on massive amounts of training data essentially focuses on the and perhaps world, or how it perpetuates differences of and access in other places.
epochai.org/blog/will-we-run-o

#class #western #anglophone #computervision #llm #machinelearning #TrainingData

Last updated 2 years ago

Anupam Basu · @abasu
66 followers · 53 posts · Server mas.to

@jantic I might be wrong, but AFAIR from a discussion a few years ago with my uni's lawyers, intellectual property rights extend to forms of distribution that could recreate the original work. So, making word frequencies of a text available over API was okay, but not the work. I wonder whether if this issue ever ends up in court, it might play out along similar lines.

Raises so many questions on the level of granularity at which individual creative and legal agency operates.

#TrainingData #ai

Last updated 2 years ago