This process is called taking a sample mean and in this article you will find the definition, how to calculate a sample mean, standard deviation, variance, the sampling distribution and examples.
Definition of Sample Means
The mean of a set of numbers is just the average, that is, the sum of all the elements in the set divided by the number of elements in the set.
The sample mean is the average of the values obtained in the sample.
It is easy to see that if two sets are different, they will most likely also have different means.
Calculation of Sample Means
The sample mean is denoted by \(\overline{x}\), and is calculated by adding up all the values obtained from the sample and dividing by the total sample size \(n\). The process is the same as averaging a data set. Therefore, the formula is \[\overline{x}=\frac{x_1+\ldots+x_n}{n},\]
where \(\overline{x}\) is the sample mean, \(x_i\) is each element in the sample and \(n\) is the sample size.
Let's go back to the San Francisco example. Suppose you asked \(5\) of your acquaintances how much they spend on public transport per week, and they said \(\$20\), \(\$25\), \(\$27\), \(\$43\), and \(\$50\). So, the sample mean is calculated by:
\[\overline{x}=\frac{20+25+27+43+50}{5}=\frac{165}{5}=33.\]
Therefore, for this sample, the average amount spent on public transportation in a week is \($33\).
Standard Deviation and Variance of the Sample Mean
Since the variance is the square of the standard deviation, to calculate either value, two cases must be considered:
1. You know the population standard deviation.
2. You do not know the population standard deviation.
The following section shows how to calculate this value for each case.
The Mean and Standard Deviation Formula for Sample Means
The mean of the sample mean, denoted by \(\mu_\overline{x}\), is given by the population mean, that is if \(\mu\) is the population mean, \[\mu_\overline{x}=\mu.\]
To calculate the standard deviation of the sample mean (also called the standard error of the mean (SEM)), denoted by \(\sigma_\overline{x}\), the two previous cases must be considered. Let's explore them in turn.
Calculating the Sample Mean Standard Deviation using the Population Standard Deviation
If the sample of size \(n\) is drawn from a population whose standard deviation \(\sigma\) is known, then the standard deviation of the sample mean will be given by \[\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
A sample of \(81\) people was taken from a population with standard deviation \(45\), what is the standard deviation of the sample mean?
Solution:
Using the formula stated before, the standard deviation of the sample mean is \[\sigma_\overline{x}=\frac{45}{\sqrt{81}}=\frac{45}{9}=5.\]
Note that to calculate this, you do not need to know anything about the sample besides its size.
Calculating the Sample Mean Standard Deviation without using the Population Standard Deviation
Sometimes, when you want to estimate the mean of a population, you do not have any information other than just the data from the sample you took. Fortunately, if the sample is large enough (greater than \(30\)), the standard deviation of the sample mean can be approximated using the sample standard deviation. Thus, for a sample of size \(n\), the standard deviation of the sample mean is \[\sigma_\overline{x}\approx\frac{s}{\sqrt{n}},\] where \(s\) is the sample standard deviation (see the article Standard Deviation for more information) calculated by:
\[s=\sqrt{\frac{(x_1-\overline{x})^2+\ldots+(x_n-\overline{x})^2}{n-1}},\]
where \(x_i\) is each element in the sample and \(\overline{x}\) is the sample mean.
❗❗ The sample standard deviation measures the dispersion of data within the sample, while the sample mean standard deviation measures the dispersion between the means from different samples.
Sampling Distribution of the Mean
Recall the sampling distribution definition.
The distribution of the sample mean (or sampling distribution of the mean) is the distribution obtained by considering all the means that can be obtained from fixed-size samples in a population.
If \(\overline{x}\) is the sample mean of a sample of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\). Then, the sampling distribution of \(\overline{x}\) has mean and standard deviation given by \[\mu_\overline{x}=\mu\,\text{ and }\,\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
Furthermore, if the distribution of the population is normal or the sample size is large enough (according to the Central Limit Theorem, \(n\geq 30\) is enough), then the sampling distribution of \(\overline{x}\) is also normal.
When the distribution is normal, you can calculate probabilities using the standard normal distribution table, for this you need to convert the sample mean \(\overline{x}\) into a \(z\)-score using the following formula
\[z=\frac{\overline{x}-\mu_\overline{x}}{\sigma_\overline{x}}=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}.\]
You may be wondering, what happens when the population distribution is not normal and the sample size is small? Unfortunately, for those cases, there is no general way to obtain the shape of the sampling distribution.
Let's see an example of a graph of a sampling distribution of the mean.
Going back to the example of public transportation in San Francisco, let's suppose you had managed to survey thousands of people, grouped the people into groups of size \(10\), averaged them in each group and obtained the following graph.
Figure 1. Relative frenquency histogram of 360 sample means for the public transport example
This graph approximates the graph of the sampling distribution of the mean. Based on the graph, you can deduce that an average of \(\$37\) is spent on public transportation in San Francisco.
Examples of Sample Means
Let's see an example of how to calculate probabilities.
It is assumed that the human body temperature distribution has a mean of \(98.6\, °F\) with a standard deviation of \(2\, °F\). If a sample of \(49\) people are taken at random, calculate the following probabilities:
(a) the average temperature of the sample is less than \(98\), that is, \(P(\overline{x}<98)\).
(b) the average temperature of the sample is greater than \(99\), that is, \(P(\overline{x}>99)\).
(c) the average temperature is between \(98\) and \(99\), that is, \(P(98<\overline{x}<99)\).
Solution:
1. Since the sample size is \(n=49>30\), you can assume the sampling distribution is normal.
2. Calculating the mean and the standard deviation of the sample mean. Using the formulas stated before, \(\mu_\overline{x}=98.6\) and the standard deviation \(\sigma_\overline{x}=2/\sqrt{49}=2/7\).
3. Converting the values into \(z-\)scores and using the standard normal table (see the article Standard Normal Distribution for more information), you'll have for (a):
\[\begin{align} P(\overline{x}<98) &=P\left(z<\frac{98-98.6}{\frac{2}{7}}\right) \\ &= P(z<-2.1) \\ &=0.0179. \end{align}\]
For (b) you'll have:
\[\begin{align} P(\overline{x}>99) &=P\left(z>\frac{99-98.6}{\frac{2}{7}}\right) \\ &= P(z>1.4) \\ &=1-P(z<1.4) \\ &=1-0.9192 \\ &= 0.0808. \end{align}\]
Finally, for (c):
\[\begin{align} P(98<\overline{x}<99) &=P(\overline{x}<99)-P(\overline{x}<98) \\ &= P(z<1.4)-P(z<-2.1) \\ &= 0.9192-0.0179 \\ &=0.9013. \end{align}\]
Sample Mean - Key takeaways
- The sample mean allows you to estimate the population mean.
- The sample mean \(\overline{x}\) is calculated as an average, that is, \[\overline{x}=\frac{x_1+\ldots+x_n}{n},\] where \(x_i\) is each element in the sample and \(n\) is the sample size.
- The sampling distribution of the mean \(\overline{x}\) has mean and standard deviation given by \[\mu_\overline{x}=\mu\,\text{ and }\,\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
- When the sample size is greater than \(30\), according to the Central Limit Theorem, the sampling distribution of the mean is similar to a normal distribution.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel