Goodhart's Law: A Basic Introduction
Goodhart's Law applies to far more than just economic measurements. See how metrics and KPIs can unintentionally backfire in this basic introduction.
Join the DZone community and get the full member experience.Join For Free
There is an old economic adage dating back to around 1975 that says “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” This statement was made by the chief economic advisor to the Bank of England, Charles Goodhart, in an article discussing monetary policy in the United Kingdom. Goodhart was making a direct criticism of the current policies and practices used to measure the growth of the British economy up until that time. He was cautioning that when a feature of the economy is picked as an indicator of its performance, then it inexorably ceases to function as that indicator because people start to game it. Now known as Goodhart’s Law, this same concept applies to far more than just economic measurements.
You can see Goodhart’s Law in action when anything is determined as a key performance indicator (KPI) and has a goal tied to it as a performance metric, particularly for IT metrics. These days, what doesn’t have a goal and a specific measurement tied to it?
Tiny Russian Nails and Bad Service
A more relatable paraphrasing of Goodhart’s Law is “when a measure becomes a target, it ceases to be a good measure.” This should hit home to anyone in charge of setting a team's goals and determining what metrics should be monitored to attain those goals. This is illustrated through the parable of a Russian nail factory looking for ways to incent employees to make more nails.
The story goes that one day in Soviet Russia, a nail factory wanted to increase production and decided to give the workers a goal based on the number of nails they produced per day. The workers immediately focused on satisfying their new goal and decided they would produce thousands of tiny nails. To the dismay of their leadership, the workers hit their production goal but the nails were so small they were useless. Their goal then changed from the number of nails produced per day to the number of pounds (actually poods) of nails per day they produced. Again the workers focused on their new goal and produced one giant nail that outweighed the thousands of tiny nails they produced the day before, thus achieving their goal.
Chances are no one reading this has ever been in charge of maximizing the output of a Russian nail factory, but this tale can easily occur in any modern-day workplace. For example, at a customer service call center, it may seem like a good idea to incent employees based on the number of customers they help instead of the amount of time spent doing their job. So their hourly wage is replaced with a compensation plan driven strictly by the number of calls an employee makes or receives. Employees immediately focus on achieving their new goal and maximizing their incentives, soon doubling the number of calls they previously made. On a reporting dashboard, it would appear this new policy is a success, but upon further investigation, the quality of each call has plummeted as employees look to drive more calls instead of solving issues or caring for the customer. This leads to unhappy customers who leave poor reviews, resulting in sales losses and downstream impacts for escalation teams.
The Cobra Effect
Because the employees in these extreme scenarios are given a new single focus of production increases to receive their paychecks, shortcuts were taken, pleasantries and follow-ups are eliminated, and the customer experience and quality of work is diminished. As Steve Jobs said, “Incentive structures work. So you have to be very careful of what you incent people to do, because various incentive structures create all sorts of consequences that you can’t anticipate.”
What Jobs is describing is also known as the Cobra Effect, which is when unattended consequences occur because an attempted solution to a problem ends up making the problem worse.
This phrase originates from an occurrence that also demonstrates Goodhart’s Law. When India was under British Rule, the government sought to eliminate the amount of venomous cobras inhabiting the capital city of Delhi. The plan was to offer a bounty for each dead cobra a resident killed and delivered to those in charge. As the government paid out rewards and the amount of dead snakes continued to pile up, the strategy seemed successful. Just like in our other examples, certain people focused on the goal and maximizing their incentive, not the overall intent of freeing the city of Delhi of its cobra problem. Enterprising people gamed the program by breeding cobras for slaughter instead of actually hunting the problematic snakes. When the government found out about the exploitation, the bounty was discontinued and the breeders released their now worthless snakes into the wild. The unintended consequence or cobra effect in this anecdote is that these newly freed snakes significantly increased the cobra population in Delhi.
What Should Be Measured?
Goodhart’s Law is not telling us to stop measuring things. Depending on the circumstance, applying Goodhart’s Law could reveal that even more needs to be measured to avoid creating an environment where promotions and pay scales are attached to certain measures and goals that can lead to the dreaded cobra effect. In data science specifically, Goodhart’s Law reminds us of the need for proper metrics to reach optimization. Use one key piece of data to determine if the effectiveness of a solution can lead to detrimental consequences. But often we find ourselves focusing on limited data like mean-squared error for regression or F1 score for classification problems to determine effectiveness of a machine learning model.
Choosing whatever metric seems to be useful at first glance or deploying metrics without thoughtful consideration of what the chosen metrics promote is an all-too-common strategy for establishing measurements of success. Metric design is both science and art and should be given careful consideration and thoughtful testing before becoming guide posts of success.
Pressure Testing Metrics
Immediacy: Can the metric be computed in real-time? Does it provide feedback rapidly enough to align incentives?
Simplicity: Is the metric difficult to understand? Will participants understand it well enough for it to influence their behavior? Are the implications understood?
Fairness: Is the metric commensurate to actual goals? Does the metric provide disproportionate benefits to some groups? Do behaviors that get influenced by the metric impose costs elsewhere in the system?
Non-Corruptibility: Can the system be used by a party providing incentives to cheat? Does the metric introduce unfair information asymmetries?
Poorly designed metrics will be exploited; however, before rolling out goals for a team or project in any industry, time should be spent to thoroughly ensure any goal or metric is coherent and socialized with all participants who could be impacted. Additionally, it would be wise to spend time pre-gaming metrics for ways to be exploited as well as establishing long-term checkpoints that can help detect system and behavior changes which can happen over time. Putting in the effort to build efficient solutions won’t fix every problem or keep everyone happy, but it will lead to less flawed metrics and better results overall.
Opinions expressed by DZone contributors are their own.