top of page
Search

The Simpson's

  • Writer: Noah Pessin
    Noah Pessin
  • Mar 25
  • 1 min read

Hey Everybody,


Another super important topic covered in my Data Science elective this past fall was Simpson’s Paradox (read Studying Studies for the first part of the series). Not Homer, Edward. Edward H. Simpson was credited with first describing the Simpson’s Paradox. This phenomenon states that when data is aggregated, it shows one result, but when it is disaggregated, it shows a different result. This is normally due to a confounding variable that isn’t considered when the data is aggregated.


One very well-known example of this paradox is a UC Berkeley study on admissions based on gender. In the aggregated data men had a 45% admit rate and women had a 30% admit rate. On the other hand, men tended to apply to departments with higher admissions rates while women applied to more competitive departments. When the data was separated amongst departments A to F, admit rates between genders were very similar, and in some instances, women had higher admit rates than men. The confounding variable, which departments each gender chose and how competitive each department is, flipped the data when it was aggregated.


So, data scientists, when you come across or create a data set, make sure you find any, and all, confounding variables so your study or experiment doesn’t show the wrong thing.


Your Paradox,

Noah

 
 
 

Recent Posts

See All
Studying Studies

Hey Everybody, This last semester, I took an elective Computer Science course called Data Science (you can probably guess why) and I...

 
 
 
Beethoven or Bot-hoven

Hey Everybody, In the most recent episode of the Harvard Data Science Review podcast, experts in music and fashion talk about how...

 
 
 

Comments


Post: Blog2_Post

©2024 by My Site. Proudly created with Wix.com

bottom of page