top of page
Search

Deviate with Style

  • Writer: Noah Pessin
    Noah Pessin
  • May 5
  • 2 min read

Hey Everybody,


The third topic I’d like to discuss from my fall Data Science elective is standard deviation and z-score. I’m sure you may have heard one of these two terms but what do they actually mean?


Standard deviation is a way to determine how spread out a dataset is from its mean. This value is associated with the bell curve: a set curve of normal distribution where each standard deviation represents a percent of the data in the dataset shown by the curve. According to the bell, 34% of the dataset falls within 1 standard deviation of the mean on each side (so 68% total). Then,13.5% on each side when data is 1-2 standard deviations away from the mean. And then, 2.35% and lastly, 0.15% on either side. The standard deviation’s purpose in this bell is to determine how far away each section or percent of the bell is from the mean.

So how do you calculate standard deviation?

This is the formula for standard deviation… A lot’s going on here. The little zero with the tail on top that everything is equal to is standard deviation. X represents each data point and the u with a tail is the mean of the dataset. The E-looking symbol means that you add up all the values of the equation next to it: (X-μ)^2. Lastly, the n represents the number of data points your set has.


Now, what does z-score have to do with it?


Z-score is a system used to calculate percentile and probability where, by going to the website z-table.com, you match any z-score with a specific percentage representing a percentile.

Using this equation, with x being a specific value in a dataset, μ being the mean, and σ being the standard deviation (hint hint), a z-score is outputted.


One example of this process is: I got a score of 91 on my math test. The class average was an 88 according to the teacher. I asked everybody what their score was and, using the formula for standard deviation above, found that σ=5. Then, using my z-score equation, I calculated it to be 0.6. I went to the z-table and found the corresponding value to be 0.7257 or 72.57%. This meant that I scored higher than about 72% of my classmates. If I were to find the percentiles for scores of 92 and 90 and subtracted one from the other, I would get the percent chance someone in the class could end up getting a score of 91. That would end up being a 13.27% chance.


It may sound confusing but try this process out yourself and you’ll be a pro in no time.


Z you later!

Noah

 
 
 

Recent Posts

See All
The Simpson's

Hey Everybody, Another super important topic covered in my Data Science elective this past fall was Simpson’s Paradox (read Studying...

 
 
 
Studying Studies

Hey Everybody, This last semester, I took an elective Computer Science course called Data Science (you can probably guess why) and I...

 
 
 
Beethoven or Bot-hoven

Hey Everybody, In the most recent episode of the Harvard Data Science Review podcast, experts in music and fashion talk about how...

 
 
 

Comments


Post: Blog2_Post

©2024 by My Site. Proudly created with Wix.com

bottom of page