Gaming the Metrics

One of the cornerstones of quality and Lean Six Sigma is data. We insist on it. Don’t tell us what you think the situation is, let the data do the talking. In god we trust, all others bring data. You get the idea.

An unfortunate side effect of this emphasis is the proliferation of useless data. If the useless data weren’t used then collecting the data would merely be a waste of time. But if a person’s performance is being measured by this data, you can bet your last euro that the measurements will get a lot of attention, and it will drive a lot of behavior. And if the system doesn’t change, there’s still one way to make the measurements look better: cheat.

I often open my face-to-face training sessions with Dr. Deming’s Red Bead Experiment. It’s a great icebreaker and it introduces some important statistical ideas. The experiment is actually a game with very simple rules. “Willing Workers” are required to use a paddle with holes in it to sample beads from a container which has red and white beads in it. “We don’t want any red beads.” The workers are told. To drive the point home there are Quality Inspectors to check the samples for the unwanted red beads and to record the results, and Supervisors to use the results to “coach” and discipline the hapless Willing Workers. Before the game concludes there are always participants who, seeing a bunch of red beads on their paddle, quickly dump the sample back before the count can be made. Others deliberately pick out red beads and throw them back. Still others bring partially filled paddles to the Quality Inspectors. There are all manners of ways to try and beat the system. And this is just a fun game, played for no stakes at all. Imagine what people do when real consequences are on the line, such as pay and promotions.

The most serious games are probably paid in totalitarian countries where factory managers are measured and sometimes executed when the results are less than required by the authorities. According to the UK History Learning Site in Stalin’s Russia

Factories took to inflating their production figures and the products produced were frequently so poor that they could not be used – even if the factory producing those goods appeared to be meeting its target. The punishment for failure was severe.

In the book Eat the Rich author P.J. O’Rourke tells us that in the USSR

The trouble wasn’t that factory managers disobeyed orders. The trouble was that they obeyed them precisely. If a shoe factory was told to produce 1000 shoes, it produced 1000 baby shoes because they were the cheapest and easiest to make. If it was told to produce 1000 mens shoes, it made them all one size. If it was told to produce 1000 shoes in a variety for men, women and children, it produced 998 baby shoes, one pump and a wing tip. If it was told to produce 3000 pounds of shoes it produced one enormous pair of concrete sneakers.

Perhaps P.J. is exaggerating, but the point is still essentially valid: metrics can–and probably will–be gamed. In Lean Six Sigma there’s a common metric gaming activity which I call Denominator Improvement. One of the most popular metrics is defects per million opportunities, or DPMOs. The formula itself is quite simple: DPMO = 1,000,000 x Defects/Opportunities. If someone’s performance is being measured using DPMOs they can make the metric look better by reducing defects (the numerator,) or by increasing the number of opportunities (the denominator.) For example, we might be interested in the number of typing errors in this post. The DPMO metric might be 1,000,000 x Errors/Total Words. But if this number didn’t look good enough I might also use 1,000,000 x Errors/Total Letters or 1,000,000 x Errors/Total Characters, counting spaces and punctuation.

The solution to metrics gaming is to use metrics to guide improvement, not to measure the performance of people. Metrics should be limited to those numbers that quantify an important outcome (Y metrics,) or quantify an input that is critical to the quality of the outcome (a CTQ or X metric.) The reason for quantifying these things is to discover, validate, and use a transfer function — Y=f(x), a model of the cause-and-effect relationship — to guide improvement planning and activity. When metrics serve a useful purpose such as this the tendency to manipulate and game them is, if not eliminated, at least reduced.

Gaming the Metrics

Leave a Reply Cancel reply