The Pareto distribution of Pareto distributions

Tom Breur

2 December 2018

Vilfredo Pareto (1848-1923) was an Italian economist (although he was quite versatile) who studied income distribution. The famous “Pareto chart” was named after him, originally designed to illustrate inequality (skew) in income distribution, notably among households (a “theme” in modern economics). Economists represent the income distribution in a Lorenz curve, or Gini curve (or Gini coefficient), both closely related to their statistical cousin the Pareto curve.

In economics, the Gini coefficient gets mostly used to compare income inequality across countries. The late eminent Hans Rosling (1948-2017) did a fantastic Ted talk (very worthwhile watching – a highlight of data visualization) on income distribution and its historical development. The Occupy Wall Street movement had its origins here, too, claiming to represent the “99%” of population, alluding to excessively skewed wealth distribution.

Pareto was concerned about growing inequality as prosperity was rising. His approach was very data centric, in stark contrast to economists before him like Adam Smith (1723-1790) who approached economics more philosophically. Pareto showed that income distribution followed a power function, and was inspired by his observation that 20% of the people held about 80% of the nation’s riches in Italy, a distribution he found to occur across all nations he researched. His interest in economics didn’t arise until later in life, though, in his 40’s. Pareto was certainly not a socialist, in fact his political reputation is dubious at best (related to fascism).

One of the reasons the Pareto principle has become so widely known is because similar distributions have been observed in a surprisingly diverse number of settings, some physical, biological, and social. It has been observed to hold for the size of cities, distribution of internet traffic, magnitude of earthquakes, size of natural phenomena, and so it almost appears a law of Mother Nature.

Nowadays, the 80/20 rule is common parlance, although income is not always distributed in exact accordance to that ratio. The Gini coefficient comes in to quantify how far the distribution of incomes deviates from “perfect” (=uniform) equality. This is not meant as a value statement. For Gini=0 everyone has the same income, and for Gini=1 you have “perfect” inequality (one person holding everything). This allows for a convenient mathematical representation of skewness of income distribution, usually used to compare across nations.

In business it is often noted that 80% of sales come from the best 20% of customers, and (much) more extreme skew occurs, too. If only you could afford to concentrate exclusively on that elusive 20%… Business would sky rocket! In quality control, oftentimes fixing the 20% most reported bugs can address 80% of the issues, etc.

Because the Pareto principle is so common, and its distribution so often replicated, it has become a universal rule of thumb. Like most heuristics, they usually hold. Except when they don’t. Note that there is nothing in the distribution itself that points to causal factors, so it can be tempting to opt for the ‘obvious’ but poorly informed quick fix. Because after you shed your 10% worst customers, you still find that 20% of the remaining ones account for 80% of your profits. This feature is referred to as the “self-similarity” of these distributions, to all intents and purposes identical to fractal patterns.

Applying the Pareto principle can (falsely) seduce you into overly shallow analysis. Please allow me to illustrate with an example of a pattern I have seen repeated so often, that it hurts my eyes. Let’s say you have managed to drop your 10% least profitable customers, and you successfully navigated the treacherous cliffs of not alienating some of your “good” customers. Now unless you structurally changed your cost structure (which would be drastic measures, like shutting down systems, closing offices, etc.), you may find that your overall profitability has gone down because the same constant (fixed) costs are now allocated to fewer customers. It’s a well-known fallacy of Cost Accounting.

A similar counter intuitive effect occurred in my hometown Boston. Traffic was horribly congested. To alleviate some of that, an additional ring way (I95, Rte 128) was built where the worst congestions happened. Excellent Pareto thinking. Solve the worst problems, first. Except that after all was said and done, the traffic problems had gotten worse… It’s (I’d hate to say: again) a well-documented disaster of inadequate urban planning, see e.g. Gonzalez & Winch or Armah et al. An adage from System Dynamics is that unless you understand why problems are happening, trying to “fix” them (with the best of intentions) can actually make them worse.

80/20 has become shorthand for “efficient” management. Do the important things first, pick the low-hanging fruit, etc. It all sounds so plausible, doesn’t it? In my experience, this can tempt a somewhat superficial outlook. In earlier posts I made reference to the “inflation” in the data scientist’s job titles and corresponding qualifications (post “Citizen Data Scientists” on 22 July 2018). A causal analysis of profit and growth drivers requires more than “80/20” skills. Would you bet the farm on it? Then why risk shareholder growth, I wonder. Do you choose your brain surgeon based on cost? I didn’t think so…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s