Simpson strikes again – a closer look at data in support of COVID booster shots

Tom Breur

22 September 2021

Introduction

Simpson’s paradox is a relatively well-known statistical oddity: patterns at a more granular level appear to change into the opposite direction, relative to the courser grain. I have heard several of my peers express they see this phenomenon “a lot.” The Wikipedia page reads: “This result is often encountered in social-science and medical-science statistics” which raises the question whether it occurs often, or whether it appears mostly in social sciences and/or medical-science statistics. In my professional experience I have encountered Simpson’s Paradox infrequently, in fact so little that every time I do, I find it a remarkable oddity.

Edward H. Simpson (1922-2019) was a British statistician who described this phenomenon comprehensively in 1951, but he was certainly not the first to encounter it. He published “The Interpretation of Interaction in Contingency Tables” in the Journal of the Royal Statistical Society in 1951 that is considered the academic source, although he published this in 1946 already as a post graduate student. Sometimes it is called the Yule-Simpson effect, because the former (Udny Yule, 1871-1951) described it as early as 1903.

One of the “classic” examples of Simpson’s Paradox was when the University of Berkeley was accused of favoring male over female students in their admissions’ process. At Berkeley, in 1973 males were 1.8 times more likely to be admitted than females. Small wonder this led to the impression that females were being discriminated against! However, further analysis showed that in four out of six departments females had a higher chance of being accepted. The “paradox” appeared because there was a strong association between department and gender, and women tended to apply to departments with much lower a priori admission rates. Chu et al published about this case in 2018 in “Simpson’s Paradox: A statistician’s case study” with reference to the implied gender inequality.

If you are into baseball, the example on Wikipedia might appeal, and illustrate Simpson’s Paradox very nicely. It truly is paradoxical.

In both 1995, as well as 1996, David Justice was hitting a higher batting average than Derek Jeter:

However, when you average across those two years, Derek Jeter turns out to be the better batsman, gets more hits across 1995 & 1996 combined (.310 vs .270). How can that be?!? Paradoxical, isn’t it?

Of course the answer to this paradox lies in the fact that Jeter was at bat a lot more often in 1996:

It is easy to see how one can be fooled by phenomena like these.

For people new to baseball stats, the formula for calculating batting average is the ratio between base hits over appearances at bat, but excluding efforts when the player was either deliberately “walked” or hit by the ball which allows him a free walk to first base, or sacrifice flies (enabling a team mate to score by deliberately hitting the ball high). An explanation of how to calculate this number can be found here.

More recently, I encountered another interesting example of Simpson’s Paradox in the Israeli data that are widely being used to justify the need for a third shot (vaccine), a.k.a. booster shot against COVID-19.

Simpson’s Paradox in Israel’s COVID data

At this moment, health organizations across the globe are considering the desirability of a so-called COVID booster shot for fully vaccinated people. Since Israel was one of the very first countries to achieve a high degree of vaccinations, their “early” data are a treasure trove and looking glass into the future for many others. The case for administering booster shots appears almost solely based on these Israeli data.

First of all, and to clarify: what I am referring to here is the statistical evidence in support of a booster shot for people who are not immune compromised. For people with compromised immune systems, the booster (3rd) shot was always (already) planned for, and is merely a part of standard protocol. No third shot has been added for this group, they were always supposed to get, and already scheduled to receive a third dose.

Here are some of the results from those Israeli data that triggered my suspicion and made me dive in a little deeper:

Notice how the efficacy in the far right corner is listed as 67.5% because there are 16.4/100K severe cases among unvaccinated, and “still” 5.3/100K severe cases among fully vaccinated cases. This 67.5% is calculated as 1 – (5.3/16.4) = 0.675 or 67.5%. This appears (!) to suggest the effect or added safety from vaccination is rather moderate, certainly lower than the effects touted so far that are at least an order of magnitude higher.

Whenever I see something counter intuitive like that, it makes me wonder: why such a  small difference?!? One possible explanation seems that the safety from the vaccine is waning over time. That conclusion is the argument in favor of a booster shot. But could there be more to these data?

To begin with, note how there is confounding by age. The proportion of vaccinated is higher among elderly, and a priori you’d expect more “severe” cases in that group as well! For the >50 group 7.9% are unvaccinated, for the <50 group that amounts to 23.3% (the numbers don’t add up horizontally to 100% because some people are partially vaccinated). 

Simpson’s Paradox is evident in the 3rd column where you notice that vaccine efficacy is higher for both the <50 as well as >50 group, but when the two groups are combined (“aggregated”) vaccine efficacy drops to 67.5%! The superficial conclusion might be that since so many of the elder, and hence early vaccinated patients wind up in the hospital with severe symptoms, that a booster shot would provide additional protection for them. Beware the quick and easy conclusion!

Similar to the previous baseball example, the >50 group has so many more severe cases, compared to the <50 group, that the (invalid!) operation of averaging averages leads to a wildly misleading estimate. By the same token, the much higher number of at bats in 1996 for Derek Jeter led to the same paradoxical result.

Based on these data, it’s not clear that a booster shot against COVID-19 is actually needed. Just to be clear: if and when it becomes available, I’ll be getting mine, though! From all the evidence I have gathered so far, the odds it will do me any harm still seem orders of magnitude lower than the odds something bad happens if I don’t take it. That seems an easy choice, given the minor inconvenience of likely some mild symptoms after taking the shot.

Conclusion

Simpson’s Paradox can wreak havoc in many ways. Arguably most dangerous are causal interpretations of correlations, especially when the effect gets reversed at different levels of aggregation like the batting averages of David Justice & Derek Jeter. In a more “friendly” (less perverse) world, correlations merely disappear, rather than getting reversed.

Keep in mind that the reason this phenomenon exists is because of invalid numerical operations. In the database world we sometimes refer to additive versus non-additive measures. An average is a non-additive measure: calculating the average of averages which is at the heart of Simpson’s Paradox is not a mathematically valid operation.

Special credit for this vignette should go both to Jeffrey Morris and to Carl T. Bergstrom, author with Jevin West of the excellent book “Calling Bullshit” (2021) for pointing to some of the tempting misinterpretation lurking in these data. Specifically this post by Prof. Dr. Jeffrey Morris piqued my interest, but there are several other excellent contributions on his site https://www.covid-datascience.com/. Prof. Dr. Carl Bergstrom has been a source of inspiration for critically rethinking what conclusions can and cannot be drawn from data.

One comment

  1. […] Note how common it is to compare percentages, and how often the denominator may be subject to slight, barely noticeable shifts. Statisticians refer to that denominator as the base population, the universe you are generalizing to. When that population drifts (has a different composition), you can hardly generalize to it – that becomes a proverbial apples to oranges comparison. This effect is compounded by the fact that the denominator is typically the larger number, and therefore changes in the denominator don’t weigh as heavily on a proportion as the nominator does. However, at the same time the focus of attention is usually on the nominator, so changes in proportion are “mentally” more intuitively attributed to changes in the nominator – not the denominator! Bottom line is that you need to be mindful of shifts in either, one of the reasons behind Simpson’s Paradox as well, that I wrote about in an earlier blog. […]

    Like

Leave a comment