In this article, you will find the definition of sample proportions, the symbol, formulas for sample proportions, their importance, and examples of application.
Definition of Sample Proportions
In the example above, the percentages obtained represent the percentage of people in a group who have a characteristic of interest, in this case, fear of heights. This type of percentage corresponds to a proportion.
A sample proportion is the proportion of individuals in a sample who have a particular characteristic of interest.
Symbol of the Sample Proportion
While the proportion of the total population is denoted by \(p\), the sample proportion is denoted by \(\widehat{p}\), and is calculated by counting how many successes are in the sample (success means that an individual possess the characteristic of interest) and dividing it by the total sample size \(n\)
\[\widehat{p}=\frac{\text{number of successes in the sample}}{n}.\]
Usually, the sample proportion \(\widehat{p}\) is different from the population proportion \(p\).
Understanding the Sample Proportion
Let's say you have a bag of \(40\) gummies, where \(20\) of them are sour and \(20\) are sweet. Let's assign a number to each gummy to make them easier to identify.
Sour gummies | \(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,\)\(11, 12, 13, 14, 15, 16, 17, 18, 19, 20\) |
Sweet gummies | \(21, 22, 23, 24, 25, 26, 27, 28, 29, 30,\)\(31, 32, 33, 34, 35, 36, 37, 38, 39, 40\) |
Table 1. data example, sample proportions.
Suppose that you don't know the actual proportion of each flavor in the bag, and you are interested in how many sweet gummies are in the bag. You decide to take a small sample of size \(4\), and you end up choosing the gummies \(1, 13, 14, 29.\) Then, for this sample, success means the gummy is sweet, so the sample proportion is:
\[\widehat{p}=\frac{1}{4}=0.25\]
Let's take more samples and see what happens.
Sample | Gummies selected | \(\widehat{p}\) | Sample | Gummies selected | \(\widehat{p}\) |
\(1\) | \(1, 13, 14, 29\) | \(0.25\) | \(7\) | \(3, 26, 27, 38\) | \(0.75\) |
\(2\) | \(11, 12, 13, 14\) | \(0\) | \(8\) | \(4, 26, 38, 39\) | \(0.75\) |
\(3\) | \(1, 2, 26, 37,\) | \(0.5\) | \(9\) | \(15, 26, 27, 38\) | \(0.75\) |
\(4\) | \(2, 14, 26, 38\) | \(0.5\) | \(10\) | \(5, 26, 37, 39\) | \(0.75\) |
\(5\) | \(2, 13, 15, 28\) | \(0.25\) | \(11\) | \(26, 27, 28, 29\) | \(1\) |
\(6\) | \(3, 4, 15, 36\) | \(0.25\) | \(12\) | \(26, 37, 38, 40\) | \(1\) |
Table 2. data example, sample proportions.
As you can see, different samples can give you different sample proportions.
Figure 1. Histogram with the frequency of sample proportions of sweet gummies
By plotting the frequencies of each sample proportion, it is easier to see the behavior of the sample proportion \(\widehat{p}\).
Importance of Sample Proportions
When you want to know what proportion of individuals or objects in an entire population posses a particular interest, it is sometimes very time-consuming or even impossible to collect all the data.
The idea behind taking sample proportions is that based on this information, you can infer what the proportion of the entire population would be. For this, you'll need to know the sampling distribution of the proportion.
Going back to the gummies example, the graph in Figure 1 approximates the distribution of the sample proportion \(\widehat{p}\). If you want to get the actual graph of the distribution of the sample proportion \(\widehat{p}\), you would have to consider all possible samples of gummies of size \(4\)!
Conditions for the Sampling Distribution of Proportions
For the sampling distribution of the proportion \(\widehat{p}\) to truly estimate the population proportion \(p\), you must make sure that the following conditions are checked:
1. Randomization condition: the most important condition necessary for creating a sampling distribution is that your data comes from samples randomly selected.
2. Independence (\(10\%\) condition): the sampled values must be independent one from another. This can be done by considering sample sizes no larger than \(10\%\) of the entire population.
Again, for the gummies example, you can choose the gummies randomly (you can take the gummy without looking at the bag or write down the numbers \(1-40\) on pieces of paper and take one at random). And the sample of size selected also satisfies the independence condition because \(4\) is \(10\%\) of the total gummies in the bag.
The Mean and Standard Deviation Formula for Sample Proportions
Let \(p\) be the proportion of success in a population and \(\widehat{p}\) the sample proportion, that is, the proportion of success in a random sample of size \(n\). Then, the mean and the standard deviation of the sampling distribution of \(\widehat{p}\) can be calculated by
\[\mu_{\widehat{p}}=p\,\text{ and }\, \sigma_{\widehat{p}}=\sqrt{\frac{p(1-p)}{n}}.\]
Furthermore, when \(np\geq 10\) and \(n(1-p)\geq 10\), the distribution of the sample proportion \(\widehat{p}\) is approximately normal.
Thus, when the normality condition is satisfied, you can convert any sample proportion \(\widehat{p}\) into a \(z\)-score (see the article \(z\)-score for more information) using the formula
\[ z=\frac{\widehat{p}-\mu_\widehat{p}}{\sigma_\widehat{p}}=\frac{\widehat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}.\]
The proportion of success in a population is \(p=0.35\). Find the mean and the standard deviation of the sample proportion \(\widehat{p}\) obtained from random samples of size \(n=70\).
Solution:
Using the formulas stated above, the mean is equal to
\[\mu_{\widehat{p}}=0.35,\]
and the standard deviation
\[\sigma_{\widehat{p}}=\sqrt{\frac{(0.35)(0.65)}{70}}\approx 0.057.\]
Examples of Sample Proportions
Let's see an example of how to calculate probabilities of the distribution of a sample proportion.
A company claims that only \(10\%\) of the products they manufacture are defective. A quality inspector took a random sample of size \(200\).
(a) What is the probability that at most \(12\%\) of them are defective?
(b) What is the probability that there are \(9\%\) to \(11\%\) defectives?
Solution:
(1) Since
\[np=200(0.10)=20>10\]
and
\[n(1-p)=200(0.90)=180>10,\]
the sampling distribution of \(\widehat{p}\) is approximately normal. So you can use the properties of the normal distribution (see the article Normal Distribution for more information) to calculate these probabilities.
(2) Let's calculate the mean and standard deviation of the proportion \(\widehat{p}\). Using the formulas given before
\[\mu_\widehat{p}=0.10\] and
\[\sigma_\widehat{p}=\sqrt{\frac{(0.10)(0.90)}{200}}\approx 0.021.\]
(3) Converting the values into \(z\)-scores: for (a) you'll have
\[ \begin{align}P(\widehat{p}<0.12) &= P\left(z<\frac{0.12-0.10}{0.021}\right) \\ &= P(z<0.95) \\ &=0.8289. \end{align} \]
And for (b) you'll have
\[ \begin{align}P(0.09<\widehat{p}<0.11) &= P\left(\frac{0.09-0.10}{0.021}<z<\frac{0.11-0.10}{0.021}\right) \\&= P(-0.48<z<0.48) \\ &= P(z<0.48)-P(z<-0.48) \\ &=0.6844-0.3156 \\ &=0.3688.\end{align} \]
Thus, the probability that at most \(12\%\) of them are defective is \(0.8289\), and the probability that there are \(9\%\) to \(11\%\) defectives is \(0.3688\).
Sample Proportion - Key takeaways
- The aim of taking sample proportions is to estimate the population proportion.
- The sample proportion is denoted by \(\widehat{p}\).
- The formula for calculating the mean and standard deviation of the sampling distribution of the proportion \(\widehat{p}\) is given by\[\mu_{\widehat{p}}=p\,\]and\[ \sigma_{\widehat{p}}=\sqrt{\frac{p(1-p)}{n}}.\]
- When \(np\geq 10\) and \(n(1-p)\geq 10\), the sampling distribution of the proportion \(\widehat{p}\) is similar to a normal distribution.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel