Standard deviation formula
The formula for standard deviation is:
\[ \sigma = \sqrt{\dfrac{\sum(x_i-\mu)^2}{N}}\]
Where:
\(\sigma\) is the standard deviation
\(\sum\) is the sum
\(x_i\) is an individual number in the data set
\( \mu\) is the mean of the data set
\(N\) is the total number of values in the data set
So, in words, the standard deviation is the square root of the sum of how far each data point is from the mean squared, divided by the total number of data points.
The variance of a set of data is equal to the standard deviation squared, \(\sigma^2\).
Standard deviation graph
The concept of standard deviation is pretty useful because it helps us predict how many of the values in a data set will be at a certain distance from the mean. When carrying out a standard deviation, we assume that the values in our data set follow a normal distribution. This means that they are distributed around the mean in a bell-shaped curve, as below.
Standard deviation graph. Image: M W Toews, CC BY-2.5 i
The \(x\)-axis represents the standard deviations around the mean, which in this case is \(0\). The \(y\)-axis shows the probability density, which means how many of the values in the data set fall between the standard deviations of the mean. This graph, therefore, tells us that \(68.2\%\) of the points in a normally-distributed data set fall between \(-1\) standard deviation and \(+1\) standard deviation of the mean, \(\mu\).
How do you calculate standard deviation?
In this section, we will look at an example of how to calculate the standard deviation of a sample data set. Let's say you measured the height of your classmates in cm and recorded the results. Here's your data:
165, 187, 172, 166, 178, 175, 185, 163, 176, 183, 186, 179
From this data we can already determine \(N\), the number of data points. In this case, \(N = 12\). Now we need to calculate the mean, \(\mu\). To do that we simply add all the values together and divide by the total number of data points, \(N\).
\[ \begin{align} \mu &= \frac{165 + 187+172+166+178+175+185+163+176+183+186+179}{12} \\ &= 176.25. \end{align} \]
Now we have to find
\[ \sum(x_i-\mu)^2.\]
For this we can construct a table:
\(x_i\) | \(x_i - \mu\) | \((x_i-\mu)^2\) |
165 | -11.25 | 126.5625 |
187 | 10.75 | 115.5625 |
172 | -4.25 | 18.0625 |
166 | -10.25 | 105.0625 |
178 | 1.75 | 3.0625 |
175 | -1.25 | 1.5625 |
185 | 8.75 | 76.5625 |
163 | -13.25 | 175.5625 |
176 | -0.25 | 0.0625 |
183 | 6.75 | 45.5625 |
186 | 9.75 | 95.0625 |
179 | 2.75 | 7.5625 |
For the standard deviation equation, we need the sum by adding all the values in the last column. This gives \(770.25\).
\[ \sum(x_i-\mu)^2 = 770.25.\]
We now have all the values we need to plug into the equation and get the standard deviation for this data set.
\[ \begin{align} \sigma &= \sqrt{\dfrac{\sum(x_i-\mu)^2}{N}} \\ &= \sqrt{\frac{770.25}{12}} \\ &= 8.012. \end{align}\]
This means that, on average, the values in the data set will be \(8.012\, cm\) away from the mean. As seen on the normal distribution graph above, we know that \(68.2\%\) of the data points are between \(-1\) standard deviation and \(+1\) standard deviation of the mean. In this case, the mean is \(176.25\, cm\) and the standard deviation \(8.012\, cm\). Therefore, \( \mu - \sigma = 168.24\, cm\) and \( \mu - \sigma = 184.26\, cm\), meaning that \(68.2\%\) of values are between \(168.24\, cm\) and \(184.26\, cm\) .
The age of five workers (in years) in an office was recorded. Find the standard deviation of the ages: 44, 35, 27, 56, 52.
We have 5 data points, so \(N=5\). Now we can find the mean, \(\mu\).
\[ \mu = \frac{44+35+27+56+52}{5} = 42.8\]
We now have to find
\[ \sum(x_i-\mu)^2.\]
For this, we can construct a table such as above.
\(x_i\) | \(x_i - \mu\) | \((x_i-\mu)^2\) |
44 | 1.2 | 1.44 |
35 | -7.8 | 60.84 |
27 | -15.8 | 249.64 |
56 | 13.2 | 174.24 |
52 | 9.2 | 84.64 |
To find
\[ \sum(x_i-\mu)^2,\]
we can simply add all the numbers in the last column. This gives
\[ \sum(x_i-\mu)^2 = 570.8\]
We can now plug everything into the standard deviation equation.
\[ \begin{align} \sigma &= \sqrt{\dfrac{\sum(x_i-\mu)^2}{N}} \\ &= \sqrt{\frac{570.8}{5}} \\ &= 10.68. \end{align}\]
So the standard deviation is \(10.68\) years.
Standard Deviation - Key takeaways
- Standard deviation is a measure of dispersion, or how far away the values in a data set are from the mean.
- The symbol for standard deviation is sigma, \(\sigma\)
- The equation for standard deviation is \[ \sigma = \sqrt{\dfrac{\sum(x_i-\mu)^2}{N}} \]
- The variance is equal to \(\sigma^2\)
- Standard deviation is used for data sets that follow a normal distribution.
- The graph for a normal distribution is bell-shaped.
- In a data set that follows a normal distribution, \(68.2\%\) of values fall within \(\pm \sigma\) the mean.
Images
Standard deviation graph: https://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg