The Simpson's
- Noah Pessin

- Mar 25
- 1 min read
Hey Everybody,
Another super important topic covered in my Data Science elective this past fall was Simpson’s Paradox (read Studying Studies for the first part of the series). Not Homer, Edward. Edward H. Simpson was credited with first describing the Simpson’s Paradox. This phenomenon states that when data is aggregated, it shows one result, but when it is disaggregated, it shows a different result. This is normally due to a confounding variable that isn’t considered when the data is aggregated.
One very well-known example of this paradox is a UC Berkeley study on admissions based on gender. In the aggregated data men had a 45% admit rate and women had a 30% admit rate. On the other hand, men tended to apply to departments with higher admissions rates while women applied to more competitive departments. When the data was separated amongst departments A to F, admit rates between genders were very similar, and in some instances, women had higher admit rates than men. The confounding variable, which departments each gender chose and how competitive each department is, flipped the data when it was aggregated.
So, data scientists, when you come across or create a data set, make sure you find any, and all, confounding variables so your study or experiment doesn’t show the wrong thing.
Your Paradox,
Noah

Comments