Multiple Comparisons Fallacy Summary

(Also known as: multiple comparisons, multiplicity, multiple testing problem, the look-elsewhere effect)

Description: Claiming that unexpected trends that occur through random chance alone in a data set with a large number of variables are meaningful.

In inductive arguments, there is always a chance that the conclusion might be false, despite the truth of the premises. This is often referred to as “confidence level.” In any given study or poll, there is a confidence level of less than 100%. If a confidence level is 95%, then one out of 20 similar studies will have a false conclusion. If you make multiple comparisons (either in the same study or compare multiple studies), say 20 or more where there is a 95% confidence level, you are likely to get a false conclusion. This becomes a fallacy when that false conclusion is seen as significant rather than a statistical probability.

This fallacy can be overcome by proper testing techniques and procedures that are outside the scope of this book.

Logical Forms:

Out of N studies, A produced result X and B produced result Y.

Tomorrow’s headlines read, “Studies show Y”.

The study’s significance level was X.

The study compared multiple variables until some significant result was found.

Example #1:

100 independent studies were conducted comparing brain tumor rates of those who use cell phones to those who don’t.

90 of the tests showed no significant difference in the rates.

5 of the tests showed that cell phone users were more than twice as likely to develop tumors than those who don’t use cell phones.

5 of the tests showed that cell phone users were half as likely to develop tumors than those who don’t use cell phones.

FunTel Mobile’s new ad, “Studies show: Cell phone users are half as likely to develop brain tumors!”

Explanation: Because we did multiple tests, i.e., compared multiple groups, statistically we are likely to get results that fall within the acceptable margin of error.  These must be disregarded as anomalies or tested further, but not taken to be meaningful while ignoring the other results.

Example #2:

In our study, we looked at 100 individuals who sang right before going to bed, and 100 individuals who did not sing.  Here is what we found: Over 90% of the individuals who sang slept on their backs, and just 10% slept on their stomachs or sides.  This is compared to 50% of those who did not sing, sleeping on their backs and 50% sleeping on their stomachs or sides.  Therefore, singing has something to do with sleeping position.

Explanation: What this study did not report, is that over 500 comparisons were done between the two groups, on everything from quality of sleep to what they ate for breakfast the next day.  Out of all the comparisons, most were meaningless, thus were discarded—but as expected via the law statistics and probability, there were some anomalies, the sleeping position being the most dramatic. 

Exception: Only proper testing and accurate representation of the results would lead to non-fallacious conclusions.

Fun Fact: In a group of 23 random people, it is more likely than not that at least two of the people in the group have the same birthday. This is referred to a the birthday paradox and it is a classic example of the multiple comparisons fallacy.


Walsh, J. (1996). True Odds: How Risk Affects Your Everyday Life. Silver Lake Publishing.