BIML on the porch. Discussing emergent computation and the history of AI. #MLsec
The Atlantic on AI. Nice piece.
#MLsec really needs to ditch the red teaming nonsense.
https://www.theatlantic.com/magazine/archive/2023/09/sam-altman-openai-chatgpt-gpt-4/674764/
In his column he likens training LLMs to what humans do when learning from others. But LLMs, whatever they do, do not integrate and synthesize ideas from their teachers to express them and develop them further in their own words.
Moreover he completely misses the point that what LLM have done when acquiring their training sets is wholesale theft of others’ intellectual property. By no stretch of the imagination could it be considered “fair use”.
Farhad Manjoo begins his NYT column today with:
“I’ve got 99 problems with A.I., but intellectual property ain’t one.”
He conflates learning by LLMs with learning by human beings. This is wrong on at least a couple of levels (see subsequent posts).
If you read his column, and are in a position to contact him directly, it might be worth your while to try to educate him.
#LLMs #mlsec #farhadmanjoo #nytimesopinion
Even a cursory read of this rudderless article shows the futility of the DEF CON AI red teaming bullshit. We need to do better as a discipline. #MLsec
NEW BIML Bibliography entry
https://knowingmachines.org/publications/9_ways_to_see_a_dataset
Knowing Machines
This is a rather vacuous treatment of a critically-important problem. How do we represent things in ML and what implications do such representations have? We were hoping for more treatment of: distributedness, bigness, sparseness, and modling.
New BIML Bibliography entry (under popular press)
https://www.theatlantic.com/ideas/archive/2023/07/godel-escher-bach-geb-ai/674589/
Doug Hofstadter
n excellent view of LLM production as seen by a top cognitive scientist
NEW BIML Bibliography entry
DATA VALIDATION FOR MACHINE LEARNING
Breck, et al.
This basic paper is about validating input data (as opposed to the validation set as linked to the training set).
NEW BIML Bibliography entry
Red Teaming Language Models to Reduce Harms:
Methods, Scaling Behaviors, and Lessons Learned
Anthropic
https://arxiv.org/pdf/2209.07858.pdf
Absolute malarky informed by zero understanding of security, pen testing, and what a real red team does.
NEW BIML Bibliography top 5 entry!
THE CURSE OF RECURSION:
TRAINING ON GENERATED DATA MAKES MODELS FORGET
Shumailov, et al.
https://arxiv.org/pdf/2305.17493.pdf
A very easy to grasp discourse covering the math of eating your own tail. This is directly relevant to LLMs and the pollution of large datasets. We pointed out this risk in 2020. This is the math.
Here. I said it in the press.
About that AI "red teaming"
Can you code using predictive statistical patterns? Nope.
https://www.theregister.com/2023/08/07/chatgpt_stack_overflow_ai/
Repeat after me. AI "red teaming" is bullshit. Do real #MLsec and stop the nonsense.
https://www.washingtonpost.com/technology/2023/08/08/ai-red-team-defcon/?wpisrc=nl_technology202
We know better than this. #MLsec https://www.nytimes.com/2023/08/06/technology/facial-recognition-false-arrest.html
"starting to doubt" my ass. #MLsec
https://fortune.com/2023/08/01/can-ai-chatgpt-hallucinations-be-fixed-experts-doubt-altman-openai/