#GoodhartsLaw holds that "When a measure becomes a target, it ceases to be a good measure." Give an MBA within HCA a metric ("get patients out of bed quicker") and they will find a way to hit that metric ("send patients off to die somewhere else, even if their doctors think they could recover"):
https://en.wikipedia.org/wiki/Goodhart%27s_law
Incentives matter! Any corporate measure immediately becomes a target.
26/
This may not be any kind of profound insight, but I just realized that part of the AI alignment problem is just Goodhart's Law.
https://en.wikipedia.org/wiki/Goodhart%27s_law says that any measure that becomes a target ceases to be a good measure.
It's an incentive mechanism: for example, you are running a doctor's clinic or hospital. You measure your revenue (because I'm presuming you're doing this in the US...) and then you make a target out of it.
Then the staff has an incentive to order unnecessary tests, or code a patient's condition as more serious, or so on. The incentive is to hit the target and not necessarily address the underlying measure -- or, the reality (health of your patients) underlying the measure.
Meanwhile, folks are worried about AI alignment, and some powerful, superintelligent AI system that can act in ways that harm humans.
So we want to measure how well-aligned an AI system is. So you...ask it. Maybe you take its responses and do further training. But the concern is, you could just train the model to *say* that it's aligned, while actually it's not.
(A similar thing happened with the image classifier that was supposed to distinguish dogs and wolves -- and it did so wonderfully! Until we realized that it wasn't distinguishing dogs and wolves, but more so whether the photo was taken indoors or outdoors: https://hackernoon.com/dogs-wolves-data-science-and-why-machines-must-learn-like-humans-do-41c43bc7f982 )
In other words, the analogy -- in SAT test-like phrasing:
Goodhart's Law:
measure :: target
and AI alignment:
AI is aligned with human interests :: AI declares itself to be aligned
There's even a name for this phenomenon: #GoodhartsLaw: "When a measure becomes a target, it ceases to be a good measure." The finance sector is spookily good at decoupling positive societal outcomes from positive investor outcomes. The real answer to medical companies that mutilate women with vaginal meshes, or destroy the planet with CO2, is *criminal sanctions* and regulation, not private lawsuits.
30/
If you think about it, #capitalism is the epitome of Goodhart's Law («when a measure becomes a target, it ceases to be a good metric»)
New year, new goals? Goodhart's law may be of value:
"When a measure becomes a target, it ceases to be a good measure".
There are many means to reach a goal. If you focus on just one of the means, you may eventually act against your primary goal.
A simple example.
Goal: a good health
Means: workouts
Unintended consequences: overtraining, injuries and bad health
Heuristic: Use multiple means (measures) to reach your goals.
#GoodhartsLaw #goals #metric #workout #heuristics #Health
The volume of bot accounts that autopost within seconds after posting any #art on #Instagram, asking you to "DM this to..." some other account, for "promotion" (red: scamming for paid bot accounts and of zero actual value proposition) is increasing considerably.
Another sign that the current social media giants are losing more and more control of content moderation and allowing any dumb shit to proliferate to juice user and "engagement" metrics.
100% #GoodhartsLaw https://en.wikipedia.org/wiki/Goodhart%27s_law
The other remarkable thing is the way that these AIs have leap-frogged to the top of Bloom's taxonomy. I think, though my mind could be changed, that it is accurate to say that they are able create original work (but not, mind you, understand what they've created.)
That seems scary because it gets around the sorts of prompts we might have used for online open book exams in the past.
Here I ask the AI to write a fable about a group of #shoebills who succumb to #GoodhartsLaw.
IMO, remarkable.
Regarding the point about "learning concepts in parallel, not in chunks/grouped together":
It somehow reminds me of this profound blog post about "Overoptimization", which @jsbarretto has kindly shared a few weeks ago 😘
Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law - by Jascha Sohl-Dickstein
https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart.html
This concept can be transfered to so many domains - it is just mind-boggling to me! 🤯
3/3