FedSearch - Federated network search engine

FedSearch

Cory Doctorow's linkblog · @pluralistic

46708 followers · 44430 posts · Server mamot.fr

Many of the biggest "open AI" companies are totally opaque when it comes to training data. Google and OpenAI won't even say how many pieces of data went into their models' training - let alone which data they used.

Other "open AI" companies use publicly available datasets like #ThePile and #CommonCrawl. But you can't replicate their models by shoveling these datasets into an algorithm. Each one has to be groomed - labeled, sorted, de-duplicated, and otherwise filtered.

28/

#thepile #commoncrawl

Last updated 2 years ago

Original post

Mr.Trunk · @mrtrunk

7 followers · 14652 posts · Server dromedary.seedoubleyou.me

Gizmodo: Anti-Piracy Group Takes Massive AI Training Dataset 'Books3′ Offline https://gizmodo.com/anti-piracy-group-takes-ai-training-dataset-books3-off-1850743763 #generativepretrainedtransformer #artificialneuralnetworks #artificialintelligence #largelanguagemodels #technologyinternet #mariafredenslund #sarahsilverman #shawnpresser #deeplearning #eleutherai #microsoft #chatgpts #thepile #chatgpt #openai #llama #gpt3 #gpt4 #meta

#generativepretrainedtransformer #artificialneuralnetworks #artificialintelligence #largelanguagemodels #technologyinternet #mariafredenslund #sarahsilverman #shawnpresser #deeplearning #eleutherai #microsoft #chatgpts #thepile #chatgpt #openai #llama #gpt3 #gpt4 #meta

Last updated 2 years ago

Original post

getmisch · @GetMisch

52 followers · 665 posts · Server masto.nyc

The Verge - Sarah Silverman is suing OpenAI and Meta for copyright infringement.

The article points out we'll see lawsuits against artificial intelligence models for years to come. Good for her & them; these companies should start over w/o copywritten sources.
#AI #artificial #intelligence #copyright #law is a #thing | #dataset #large #language #models #illegal #sources #Bibliotik #fragrantly #illegal #programmers #artists #suing #similar #case #EleutherAI #ThePile #publishers #writers #songwriters #stolen #works #Meta #llama #ChatGPT #library #OpenAI https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

#ai #artificial #intelligence #copyright #law #thing #dataset #large #language #models #illegal #sources #bibliotik #fragrantly #programmers #artists #suing #similar #case #eleutherai #thepile #publishers #writers #songwriters #stolen #works #meta #llama #chatgpt #library #openai

Last updated 2 years ago

Original post