In this article, you'll find the definition of sampling distributions, types of sampling distributions, the formulas, the mean and the standard deviation of sampling distributions, and examples of application.
Introduction to Sampling Distributions
Coming back to the example above, let's say you randomly select and sample \(100\) senior students and calculate the average GPA from this sample. This average GPA would not be the same as the mean GPA of all senior students in Atlanta. It could be lower or higher, but it would most likely not be exactly equal to the population mean.
If you select a second sample of \(100\) senior students, the average GPA for this sample would most likely differ from the mean of your first one. Thus, random samples selected would produce different mean values. Despite this variety of values, when many sample means are obtained, you can plot these collected means on a graph, and then this can provide an estimated mean of the entire population. This process explains the concept of creating sampling distributions of the mean.
Definition of Sampling Distributions
A value that is calculated by taking information from a sample is called a statistic. Statistics allows you to estimate data of an entire population. As you saw in the example above, different random samples can give different values for a statistic; this difference is called sampling variability (or sampling error). This sampling variability can be reduced by increasing the sample size.
The distribution formed by all the possible values for sample statistics obtained for every possible different sample of a given size is called the sampling distribution.
Conditions for Sampling Distributions
To ensure that the sampling distribution truly estimates the entire population, you must make sure that these two criteria are checked:
Randomization condition: the most important condition necessary for creating a sampling distribution is that your data comes from samples randomly selected.
Independence (\(10\%\) condition): the sampled values must be independent one from another. Achieving this condition is the same as considering sample sizes no larger than \(10\%\) of the entire population.
Let's go back to the average GPA example. For the randomization condition, unless you have a list of the students with the highest GPA in Atlanta, choosing any \(100\) student randomly is enough to satisfy this condition.
On the other hand, for the independence condition, it is not unreasonable to assume that there are more than \(10\, 000\) senior students in Atlanta, so the \(10\%\) of this is \(1\,000\). Any sample size less than \(1\,000\) satisfies this condition, thus considering samples of a \(100\) in size is acceptable.
Types of Sampling Distributions
There are 3 types of sampling distributions:
Sampling distribution of proportions
Sampling distribution of means
T-distribution
Sampling Distribution of Proportions
It is used to estimate a population proportion. It calculates the proportion of success, or chance, that a specific event will occur. The mean from each group of the sample proportion is a representation of the estimated proportion of success of the entire population.
Sampling Distribution of Means
It entails calculating the means of all sample groups from a selected population. Then, the average of the means of all the samples is an estimated mean of the entire population.
T-distribution
It is focused on a small population. It is used to measure the mean of the population and other statistical measurements such as confidence intervals, linear regression, and statistical differences. Since this distribution uses \(t\)-scores to calculate probabilities, it is out of the scope of this article.
Formula for Sampling Distributions
The sample proportion, denoted by \(\widehat{p}\), is calculated by counting how many successes are in the sample (success means that an individual possesses the characteristic of interest) and dividing it by the total sample size \(n\)
\[\widehat{p}=\frac{\text{number of successes in the sample}}{n}.\]
The sample mean, denoted by \(\overline{x}\), is calculated by adding up all the values obtained from the sample and dividing by the total sample size \(n\). The idea is the same as finding the average for a set of data. The formula is
\[\overline{x}=\frac{x_1+x_2+...+x_n}{n},\]
where \(\overline{x}\) is the sample mean, \(x_i\) is each one of the values of the sample, and \(n\) is the sample size.
Mean and Standard Deviation of Sampling Distributions
All probability distributions have characteristics that distinguish them. Sampling distributions are no exception, knowing the mean and standard deviation can give you a lot of information about the shape of the distribution.
Mean and Standard Deviation of the Sample Proportion
Let \(p\) be the proportion of success in a population and \(\widehat{p}\) the sample proportion, that is, the proportion of success in a random sample of size \(n\), then the sampling distribution of \(\widehat{p}\) has mean and standard deviation given by \[\mu_\widehat{p}=p\,\text{ and }\, \sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.\]
Moreover, if \[np\geq 10\,\text{ and }\, n(1-p)\geq 10,\] then, the sampling distribution of \(\widehat{p}\) is similar to a normal distribution.
A random sample is selected from a population that has a proportion of successes \(p=0.72\). Calculate the mean and standard deviation of the sampling distribution of \(\widehat{p}\) with sample size \(n=20\).
Solution:
Using the formulas stated before, the mean is equal to the proportion of success of the population, then \[\mu_\widehat{p}=0.72,\] while the standard deviation is given by \[\sigma_\widehat{p} =\sqrt{\frac{0.72(0.28)}{20}}\approx 0.100.\]
Mean and Standard Deviation of the Sample Mean
Let \(\mu\) be the mean and \(\sigma\) the standard deviation of the population. Let \(\overline{x}\) be the sample mean of a random sample of size \(n\), then the sampling distribution of \(\overline{x}\) has mean and standard deviation given by \[\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
The standard deviation of the sampling distribution of means is also known as the standard error of the mean (SEM).
If the sample size \(n\) is large enough (according to the Central Limit Theorem, \(n\geq 30\) is enough) then, the sampling distribution of \(\overline{x}\) is similar to a normal distribution.
A random sample is selected from a population with mean \(\mu=80\) and standard deviation \(\sigma=5\). Calculate the mean and standard deviation of the sampling distribution of \(\overline{x}\) with sample size \(n=35\).
Solution:
Using the formulas stated before, the sample mean is equal to the mean of the population, so \[\mu_\overline{x}=80.\] And for the standard deviation of the sample mean
\[\sigma_\overline{x}=\frac{5}{\sqrt{35}}\approx 0.845.\]
Examples of Sampling Distributions
Let's see an example using sampling distributions.
A restaurant stated \(30\%\) of their customers like pineapple on their pizza. If there are \(100\) customers on a given day, what is the probability that at least \(40\%\) of these customers will buy a pizza with pineapple?
Solution:
(1) Note that \(p=0.30\), \((1-p)=0.70\) and the sample size is \(n=100\). Thus, the mean \(\mu_\widehat{p}=0.30\) and the standard deviation \[\sigma_{\widehat{p}}=\sqrt{\frac{(0.30)(0.70)}{100}}\approx 0.046.\]
(2) Since \(np=100(0.30)=30>10\) and \(n(1-p)=100(0.70)=70>10\), then the sampling distribution of \(\widehat{p}\) is similar to a normal distribution, and you can use this later to calculate the probability.
(3) Converting \(\widehat{p}\) into \(z\)-score (see the article \(z\)-scores for more details), you will have
\[\begin{align} P(\widehat{p}>40) &= P\left(z>\frac{0.40-0.30}{0.046}\right) \\ &=P(z>2.17) \\ & =1-P(z<2.17) \\ &= 1-0.9850 \\ &=0.015.\end{align}\]
Thus, the probability that at least \(40\%\) of these customers ask for a pizza with pineapple is \(0.015\).
Let's see one extra example.
A company claims that the average lifetime of their lightbulbs is \(2\,000\) hours with a standard deviation of \(300\) hours. What is the probability that a random sample of \(50\) lightbulbs have an average lifetime of less than \(1\,900\) hours?
Solution:
(1) Since the sample size is \(n=50\), according to the Central Limit Theorem, the sampling distribution of the mean \(\overline{x}\) follows a normal distribution with mean \(\mu_\overline{x}=2\,000\) and standard deviation \[\sigma_\overline{x}=\frac{300}{\sqrt{50}} \approx 42.426. \]
(2) Converting the \(\overline{x}\) into \(z\)-scores and using the standard normal table (see the article Standard Normal Distribution for more information), you will have
\[\begin{align} P(\overline{x}<1\,900) &=P\left(z<\frac{1\,900-2\,000}{42.426}\right) \\ &=P(z<-2.35) \\ &= 0.0094. \end{align}\]
Thus, the probability that from a sample of size \(n=50\) lightbulbs the average lifetime is less than \(1\,900\) hours is \(0.0094\).
Sampling Distribution - Key takeaways
- A sampling distribution shows every possible statistic that can be obtained from every possible sample of the population.
- The sampling distribution of proportion \(\widehat{p}\) has mean and standard deviation \[\mu_\widehat{p}=p\, \text{ and } \,\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.\]
- When \(np\geq 10\) and \(n(1-p)\geq 10,\) the sampling distribution of proportion \(\widehat{p}\) behaves like a normal distribution.
- The sampling distribution of mean \(\overline{x}\) has mean and standard deviation \[\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
- When \(n\geq 30\), the Central Limit Theorem states that the sampling distribution of mean \(\overline{x}\) behaves like a normal distribution.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel