Hypothesis Test for the Difference of Two Population Proportions
Let's start by listing what you know from the example at the start of this article.
Population | Population Proportion | Sample Size | Sample Proportion |
Full-time employees of corporations in your country. | \(p_1 = \) proportion of all full-time employees who put aside at least twelve percent of their earnings in savings. | \(n_1 = 1300\) | \(\hat{p}_1 = 0.40\) |
Part-time employees of corporations in your country. | \(p_2 = \) proportion of all part-time employees who put aside at least twelve percent of their earnings in savings. | \(n_2 = 290\) | \(\hat{p}_2 = 0.38\) |
It is clear looking at the table that the sample sizes are very different, and their sample proportions are different as well. However, it will be very rare for you to find an example where the sample proportions are the same. Why might the sample proportions be different, even if you might eventually be able to conclude that the proportion of people who put aside at least twelve percent of their earnings is the same between part-time and full-time employees?
Differences that occur between two samples just by chance are called sampling variability.
One of the main questions that a hypothesis test for two population proportions tries to answer is whether the difference in your sample proportions happens because of sampling variability or because of an actual difference in the populations.
Comparing Two Population Proportions with Dependent Samples
One of the assumptions you will need is that your samples are independent.
Two samples are independent if picking members for one sample doesn't influence how members of the second sample are picked.
In the example involving employees, picking a person who is a full-time employee doesn't influence who you pick as a part-time employee, so they are independent. That is very different from dependent samples.
Two samples are dependent if picking members for one sample automatically determines the members of the second sample.
If you were doing a study on twins then picking a twin for one sample would automatically put the other twin in the second sample. Twins are a common example of dependent samples. This is called matched-pair data, and it requires a different form of hypothesis testing than you will see here.
Forming Your Hypothesis
There are many ways that \(p_1\) can be different from \(p_2\). It might be that \(p_1 < p_2\), or that \(p_1>p_2\). Rather than try and list all of the ways they are different and do a hypothesis test for each, you can look at the difference between the two population proportions. In fact, a hypothesis test for two population proportions is often called a hypothesis test for the difference between two population proportions for this very reason!
In this kind of hypothesis test, your null hypothesis will almost always be that the two population proportions are the same. If you state that in terms of their difference you get:
\[ H_0:\; p_1 - p_2 = 0.\]
Then there are three varieties of alternative hypotheses outlined in the next table.
Question | Alternative hypothesis | Test Type |
Is \(p_1\) different from \(p_2\)? | \(H_a:\; p_1 - p_2 \ne 0\) | Two-tailed test. |
Is \(p_1\) smaller than \(p_2\)? | \(H_a:\; p_1 - p_2 < 0\) | Left-tailed test. |
Is \(p_1\) larger than \(p_2\)? | \(H_a:\; p_1 - p_2 > 0\) | Right-tailed test. |
Let's go back to the example from the start of this article.
Your goal here is to figure out if full-time employees and part-time employees have different saving habits, so the hypotheses would be:
\[ \begin{align} &H_0:\; p_1 -p_2 = 0 \\ & H_a: \; p_1-p_2 \ne 0, \end{align} \]
and it would be a two-tailed test.
Next, let's look at the test statistic for this type of hypothesis test.
Significance Test Statistic for Two Population Proportions
It is important that your samples are independent, or the test statistic will be different from the one shown here. Since you are using independent samples, remember that
\[ \mu_{\hat{p}_1 - \hat{p}_2} = p_1 - p_2.\]
For the standard deviation,
\[ \sigma_{\hat{p}_1 - \hat{p}_2} = \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} }.\]
For the savings example, you have that \(n_1 = 1300\), \(n_2 = 290\), \(\hat{p}_1 = 0.40\), and \(\hat{p}_2 = 0.38\). Calculating the mean of the sampling distribution \(\hat{p}_1 - \hat{p}_2 \) gives you:
\[\begin{align} \mu_{\hat{p}_1 - \hat{p}_2} &= p_1 - p_2 \\ &= 0.40 - 0.38 \\ &= 0.02 \end{align}\]
The standard deviation for \(\hat{p}_1 - \hat{p}_2 \) is:
\[ \begin{align} \sigma_{\hat{p}_1 - \hat{p}_2} &= \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} } \\ &= \sqrt{ \frac{0.40(1-0.40)}{1300} + \frac{0.38(1-0.38)}{290} } \\ &= \sqrt{\frac{0.24}{1300} + \frac{0.2356}{290} } \\ &\approx 0.03157 \end{align} \]
So far you have only assumed that the samples are independent. For the next part, you will need to assume that the sample sizes are large enough. If they are, you can use the Central Limit Theorem to get that your sampling distribution \(\hat{p}_1 - \hat{p}_2 \) is approximately normal.
How do you know if your samples are large enough? If all four of the following conditions are satisfied, then your samples are large enough for the sampling distribution \(\hat{p}_1 - \hat{p}_2 \) to be approximately normal:
It isn't too hard to check that the sample sizes in the savings example are large enough for the sampling distribution to be approximately normal.
The last condition to use this type of hypothesis test is that your sample is less than \(10\%\) of the overall population. In this case, the sample size is certainly less than \(10\%\) of all of the people in your country, so this condition is satisfied as well.
Z-test for Difference in Population Proportions
When doing a hypothesis test for the difference in population proportions, a \(z\)-test is used. To do this, you will need to calculate the test statistic, which uses the difference in the two proportions. To make calculations a little easier, it is helpful to find:
\[ \begin{align}\hat{p}_c &= \frac{\text{number of successes in the two samples} }{\text{total of the two sample sizes}} \\ &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2} \end{align}\]
Combining counts to get an overall proportion is called pooling, and \(p_c\) is called the pooled (or combined) proportion.
Going again back to the savings example, \(n_1 = 1300\), \(n_2 = 290\), \(\hat{p}_1 = 0.40\), and
\(\hat{p}_2 = 0.38\), which means that:
\[ \begin{align}\hat{p}_c &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2}. \\ &= \frac{1300(0.40)+ 290(0.38) }{1300+ 290} \\ &= \frac{630.2}{1590} \\ & \approx 0.3964 \end{align}\]
As long as your null hypothesis is \(H_0:\; p_1 -p_2 = 0 \), the test statistic can be calculated using the formula:
\[ z = \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } }\]
Calculating the test statistic for the savings example:
\[ \begin{align} z &= \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } } \\ &= \frac{0.40 - 0.38 }{\sqrt{ \dfrac{0.3964 (1-0.3964 ) }{1300} +\dfrac{0.3964 (1-0.3964 ) }{290} } } \\ & \approx 0.63,\end{align} \]
Rounded to \(2\) decimal places.
Let's finish up the hypothesis test for the savings example. No significance level was given, so you will need to consider the Type I and Type II error consequences. See Errors in Hypothesis Testing for more information and examples. In this example, a Type I error would be deciding that the savings proportions are not the same for the two groups when in fact they are the same.
A Type II error would be not thinking there is a difference in the population proportion between the two groups when in fact they are not the same. Neither error is very bad (unlike in a medical trial where the type of error is of much more importance) so choosing a significance level of \(\alpha = 0.05\) would be fine.
Remember that this is a two-tailed test! So the \(P\)-value is twice the area under the \(z\)-curve and to the right of the \(z\)-value. In other words:
\[ \begin{align} P\text{-value} &= 2(\text{area under curve to the right of }0.63) \\ &= 2\cdot P(z>0.63) \\ &= 2(0.2643) \\ &\approx 0.529 \end{align} \]
The \(P\)-value is greater than the significance level of \(\alpha = 0.05\), so you will fail to reject the null hypothesis.
Remember that you never say things like "the null hypothesis is true". For a reminder on why, see the article Hypothesis Testing.
Communicating your conclusion can be the most challenging part of doing a hypothesis test. What does it mean to fail to reject the null hypothesis?
Solution:
The original goal was to find out if there is a difference in savings habits between full-time and part-time employees at corporations in your country. The null hypothesis is that there is no difference in the savings habits between the two groups. In failing to reject the null hypothesis, what you are saying is that there is no convincing evidence that there is a difference in savings habits between full-time and part-time employees.
Why was there a difference in the population proportions then? It might have been from sampling variability. All you can say from the sample proportions is that you are not convinced there is a difference between the two sampling proportions.
Hypothesis Testing of Two Population Proportions Example
Let's look at another example of hypothesis testing for the difference in two population proportions.
Many bulldog owners report that their pet snores, and in fact, their bulldog snores more frequently as it gets older.
Sleeping bulldog puppy.
You have decided to do a test to see if this is actually true or maybe just a matter of perception. So you break down bulldogs into two groups, those under three years of age and those over three years of age, and choose a random sample of \(700\) bulldog owners to ask them about their dog's snoring. From the survey responses (not everyone responds to surveys), you create the following table:
Population | Population Proportion | Sample Size | Sample Proportion |
Bulldogs under the age of \(3\). | \(p_1 = \) proportion of bulldogs under the age of \(3\) who snore more than five times a week. | \(n_1 = 300\) | \(\hat{p}_1 = 0.26\) |
Bulldogs over the age of 3. | \(p_2 = \) proportion of bulldogs over the age of \(3\) who snore more than five times a week. | \(n_2 = 291\) | \(\hat{p}_2 = 0.392\) |
Before going any further, let's check to make sure that the conditions for doing a hypothesis test for two population proportions are satisfied. First, the samples are independent since a bulldog can't be both under \(3\) years old and over \(3\) years old at the same time. In addition, there are certainly far more than \(591\) people worldwide that own bulldogs, so the number of bulldog owners sampled is less than \(10\%\) of the overall population of people who own bulldogs. Also,
\(n_1\hat{p_1} = 300(0.26)=78 \ge 10\),
\(n_2\hat{p_2} = 291(0.392) = 114 \ge 10\).
\(n_1(1-p_1) = 300(1-0.26) = 222 \ge 10\)
\(n_2(1-p_2) = 291(1-0.392) = 176.9 \ge 10\).
so all of the conditions for applying the test are met.
The next step is deciding on the null and alternative hypotheses. The null hypothesis would be:
\[ H_0: \; p_2-p_1 = 0\]
or in other words that there is no difference in snoring between the two groups. The alternative hypothesis would be that there is a difference in the snoring rates of the two groups, so:
\[H_a:\; p_2-p_1 \ne 0\]
Calculating the pooled success rate (sometimes called the combined success rate):
\[ \begin{align}\hat{p}_c &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2} \\ &= \frac{300(0.26)+291(0.392)}{300+291} \\ &\approx 0.325 . \end{align}\]
Then the test statistic is:
\[\begin{align} z &= \frac{\hat{p_2} - \hat{p_1} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } } \\ &= \frac{ 0.392 - 0.26 }{\sqrt{ \dfrac{0.325 (1-0.325) }{300} +\dfrac{0.325 (1-0.325) }{291} } } \\ &\approx 3.425 \end{align}\]
Notice that here you are using \(p_2-p_1\) as the null hypothesis simply for the convenience of having \(\hat{p_2} - \hat{p_1} \) be positive. It actually doesn't matter which version you choose for the null hypothesis, as long as you are consistent throughout your work and you make sure your \(z\) calculation matches.
Remember that this is a two-tailed test! So the \(P\)-value is twice the area under the \(z\)-curve and to the right of the \(z\)-value. In other words:
\[ \begin{align} P\text{-value} &= 2(\text{area under curve to the right of }3.425) \\ &= 2\cdot P(z>3.425) \\ &\approx 2(0.0003) \\ &= 0.0006, \end{align} \]
where the value of \(P(z>3.425)\) can be found using a standard normal table or calculator.
So at a \(\alpha = 0.05\) significance level, you can reject the null hypothesis, and conclude that there is a difference in bulldog snoring based on age.
Would your conclusion have been any different if the alternative hypothesis had been:
\[H_a:\; p_2-p_1 > 0?\]
Solution:
The main change would have been in calculating the \(P\)-value. Since it would be a one-tailed test, in this case, the calculation would be:
\[ \begin{align} P\text{-value} &= \text{area under curve to the right of }3.425 \\ &= P(z>3.425) \\ &\approx 0.0003 \end{align} \]
At the \(\alpha = 0.05\) significance level, you would still reject the null hypothesis and conclude that bulldogs over the age of \(3\) do snore more than bulldogs under the age of \(3\).
Hypothesis Test of Two Population Proportions - Key takeaways
- Two samples are independent if picking members for one sample doesn't influence how members of the second sample are picked.
- Two samples are dependent if picking members for one sample automatically determines the members of the second sample.
- For a hypothesis test for two population proportions, the null hypothesis will almost always be that the two population proportions are the same.
- The conditions for applying a hypothesis test for the difference of two population proportions are:
- The samples are independent.
- The sample is less than \(10\%\) of the overall population.
- \(n_1\hat{p_1} \ge 10\), \(n_2\hat{p_2} \ge 10\), \(n_1(1-p_1) \ge 10\), and \(n_2(1-p_2) \ge 10\) where \(n_1\) is the size of the first sample, \(n_2\) is the size of the second sample, \(p_1\) is the proportion of successes in the first sample, and \(p_2\) is the proportion of successes in the second sample.
- The pooled proportion formula is \[ \begin{align}\hat{p}_c &= \frac{\text{number of successes in the two samples} }{\text{total of the two sample sizes}} \\ &= \frac{n_1\hat{p_1} + n_2\hat{p_2} }{n_1 + n_2}. \end{align}\]
- The formula for the test statistic is \[ z = \frac{\hat{p_1} - \hat{p_2} }{\sqrt{ \dfrac{\hat{p}_c (1-\hat{p}_c) }{n_1} +\dfrac{\hat{p}_c (1-\hat{p}_c) }{n_2} } }\]