This article will discuss what a confidence interval is, their interpretation, types of confidence intervals such as confidence intervals for population mean and for proportion, and provide examples of confidence intervals.In statistics, the confidence interval is represented by the letters \(CI\).
Introduction to Confidence Intervals
Let's start by looking at the terminology behind this important concept in Statistics.
A confidence interval is a range of likely values to estimate a population parameter.
The main reason you want to do an interval estimation through confidence intervals than a point estimation – a single statistic – is that sample results vary from sample to sample.
Suppose you would like to estimate the percentage of students who eat cupcakes during break in a school. You can imagine that if you collected data from three samples, each sample in a different week, the three samples would likely be different. The results, and the percentages of the samples, would very likely be different too.
So, you need some measure of how much you can expect those results to change if you were to repeat your study. This expectation of variations in your statistic from sample to sample is measured by the margin of error.
The margin of error represents a certain number of standard deviations of your statistic you add and subtract to have a certain confidence in your results.
Let's go back to the previous example.
Imagine the first sample was of \(150\) students and the percentage of cupcake eaters was \(35\%\), the margin of error could be of \(1.5\%\). This would mean the actual percentage of students who eat cupcakes during breaks in the entire school population is expected to be \(35\% ± 1.5\%\) (that is, between \(33.5\%\) and \(36.5\%).
Here, you are using your sample to estimate a range of values – a confidence interval – where there’s a likelihood to find the true value of the unknown parameter you’re interested in. This likelihood gives you that certain confidence in your results, and it is called confidence level.
The confidence level is the likelihood, given in percentage, your result is close to the actual value of the population parameter you’re interested in if you repeated the sample collection over and over.
Without further ado, let's see how to build a confidence interval.
Confidence Interval Formula
The terminology presented in the previous section actually gives you a clue to the elements needed to build a confidence interval. For example the formula for the confidence interval for the mean is:
\[CI=\overline{x}\pm z \frac{\sigma_s}{\sqrt{n}} \]
Here we can identity:
\(\overline{x}\): The sample mean.
\(z\): The confidence level.
\(\sigma_s\): The standard deviation of the sample.
\(n\): The sample size.
If you want to know more about samples, the sample mean, and the sample standard deviation, check our article named Sample Mean.
With these elements, you can build a confidence interval.
The confidence level or \(z\) is set by you. This variable \(z\) is the percentage your results will get close to a value if you repeat your experiment.Let us propose an easy experiment. You measure the height of a sample of students in a college. The smaller students measures \(1.5m\), and the tallest \(1.87m\). Let us say you want a confidence interval of \(95\%\); if you chose a random student from the college outside the sample, you expect its height to fall into a range you choose with a \(95\%\) of probability if the variables to calculate the confidence interval are choosen correctly.
Let us suppose we have the measurements of the weight of coins of the same value. Some coins will have more weight, and some others don't. The coins weigh \(50gr\) and have a deviation from their weight from \(0gr\) to \(2gr\). If they follow a normal distribution, you will have the same as below:
Fig. 1. Normal distribution.
You choose an interval where you know \(66.3\%\) live. This is, \(64.2\%\) of the coin weight deviation will be there. You can see the interval below. The interval goes below and above the mean \(m\) in this case.
However, if this is just a sample of a large population then the mean and the interval might be different for the whole set of coins circulating in the market.
If you repeat the experiment with another sample of coins and you want values or the mean value to be close to the original sample; then a confidence interval will appear.For example the better the confidence interval, the closes our mean will be to the mean value of the total population. Then the close will be the means of the old sample and the new sample.
The confidence interval gets narrower as the sample increases.
Types of Confidence Intervals
However, confidence intervals can mean several things.
The types of confidence intervals you will see below are:
Confidence Interval for Population Mean
Let us say you take a sample \(a\) of a whole population \(A\). This sample \(a\) has a mean \(\overline{x_a}\). If the sampling has enough data and the survey is random, then the parameters of the sample will resemble the ones of the large population. The better the sampling method is, the better the mean of the sample will resemble the mean of the whole population.In this case, the confidence interval is the range \([x_1 - x_2]\) in the original sample \(a\), on which we have a probability value \(P\) to find the population mean.
So let us say you have a mean \(\overline{x_a}\) and you have a confidence interval of \(90\%\) around that mean. The interval goes from the value \(x_1\) to the value \(x_2\). In this case, the mean of the population \(A\) has \(90\%\) of probabilities of being inside this range.
This has another implication which is if you take another sample is very probable the mean of this sample will be in this range too. Let us make a numerical example.
Let us say we have some data which follows a normal distribution. Its mean is \(0\) and has a standard deviation of \(1\). This data is a sample of a larger population. The data of the sample is large, at least \(2000\) samples.Let us say you want the confidence interval for the mean to have a confidence level of \(95\%\). To retrieve the value of \(z\), you need to go to the z-score tables and choose a \(z\) value close to \(0.95\). The value for this confidence level is \(z=1.64\).If we plug this into the formula you saw in the first paragraphs:
\[CI=0 \pm 1.64 \frac{1}{\sqrt{2000}}=0.0366 \]
Then we can say with a \(95\%\) of confidence that the mean of all population is \(0\) with a \(\pm 0.036\) deviation.
Table 1. \(Z\) values for the confidence level of \(95\%\). The value \(1.64\) is taken from the column and row where the value \(z \cdot 100\) is closer to \(95\) in red.
z | 0 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 |
0.0 | 0.500 | 0.5040 | 0.5080 | 0.5160 | 0.5199 | 0.5239 |
0.1 | 0.5398 | 0.5438 | 0.5478 | 0.5517 | 0.5557 | 0.5596 |
0.2 | 0.5793 | 0.5832 | 0.5871 | 0.5910 | 0.5948 | 0.5987 |
0.3 | 0.6179 | ... | ... | ... | ... | ... |
0.4 | 0.6554 | ... | ... | ... | ... | ... |
0.5 | 0.6915 | ... | ... | ... | ... | ... |
0.6 | 0.7257 | ... | ... | ... | ... | ... |
0.7 | ... | ... | ... | ... | ... | ... |
0.8 | ... | ... | ... | ... | ... | ... |
0.9 | ... | ... | ... | ... | ... | ... |
1.0 | ... | ... | ... | ... | ... | ... |
1.1 | ... | ... | ... | ... | ... | ... |
1.2 | ... | ... | ... | ... | ... | ... |
1.3 | ... | ... | ... | ... | ... | ... |
1.4 | ... | ... | ... | ... | ... | ... |
1.5 | ... | ... | ... | ... | 0.9382 | 0.9394 |
1.6 | ... | ... | ... | ... | 0.9495 | 0.9505 |
The size of the sample will affect the confidence interval in the previous example. If the sample was only \(1000\) then the result will be \(0\pm0.051\).
The confidence level is probability that the interval contains the true parameter value.
Confidence Interval for the Difference of Two Means
Let us say you have two samples from two populations. Like the weight samples of a class in grade 8 in England and in grade 9 in Scotland. You want to find the difference between the means of both.This could be easy, calculate the mean in weight of the class in England \(w_E\) and subtract this from the mean of the class in Scotland \(w_S\). However, the samples are random, their means not resemble the mean of grade \(8\) in England and in grade \(9\) in Scotland. We have uncertainties about the possible rusult.In this case we have a formula to calculate the confidence interval. The mean of two different populations is defined as:
\[CI_p=(\overline{x_1} - \overline{x_2})+t\sqrt{\frac{sp^2}{n_1}+\frac{sp^2}{n_2}}\]
\(\overline{x_1}{,}\overline{x_2}\): The population means.
\(sp\): The pooled variance.
\(n_1, n_2\): The population of sample \(1\) and sample \(2\).
\(t\): The \(t\) critical value.
The pooled variance is calculated as follows:
\[sp=\sqrt{\dfrac{(n_1 - 1)\cdot s^2_1+n_2- 1)\cdot s_2^2 }{n_1 + n_2 -2}}\]
\(s_1{,}s_2\): are the variances of the samples.
Confidence Interval for Population Proportion
You have seen what happens with the confidence interval in a normal distribution. In these types of distributions, the values are continuous. However, there are other types of distributions, like the Binomial distribution. In this case, the values are the result of a Bernoulli-type experiment. In a Bernoulli-type, the results have only two outcomes.In these distributions, we can test a question.
Lets us say you want to pool people about a presidential candidate.
People do a random survey, calling people's houses. Enquired houses cover different socio-economical backgrounds and places, making the study as random as possible. In the survey, a \(67\%\) of people confirm their vote for candidate \(A\).However, there is a problem, you have uncertainties. The people which gave the answers do not correspond to the total population. In this case, the real percentage might vary.Let us say the people that made the survey confirm their study has a \(90%\) of certainty. In this case, a variation of \(\pm 6.7\%\) is possible. The real value could be \(60.3%%\) or \(73.7\%\).In these cases, the confidence interval of the proportion mentioned, which is \(67\%\), is important because it tells you something. The confidence interval tells us a history where this candidate, even in the worst case can win with more than \(50\%\) of votes. But what if the confidence interval is lower? If the confidence interval is \(70\%\), then the value can drop below \(50\%\) and then the candidate might loose even if \(67\%\) of the people will vote for him.This is why the confidence interval for proportions is very important. Given a population proportion, it can tell us how good it is when compared to the whole population.
\[p=Z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\]
\(\hat{p}\): is the proportion or percentage.
\(Z\): is the value for the confidence level as in the table you used before.
\(n\): is the sample size.
Confidence Interval for the Difference of Two Proportions
Just as when you have the confidence interval of two means of two samples of two populations, this can exist also for proportions. In this case, you have two proportions obtained from samples.
The two samples survey the same question in populations \(A\) y \(B\), however their results are different \(\hat{p_1}\) and \(\hat{p_2}\). In this case, the confidence interval for the difference of two proportions is given by the next equation:
\[(\hat{p_1}-\hat{p_2})\pm Z \sqrt{\dfrac{\hat{p_1}(1-\hat{p_1})}{n_1}+\dfrac{\hat{p_2}(1-\hat{p_2})}{n_2} }\]
Confidence Interval for the Slope of a Regression Model
If you suspect that there might be a linear relationship between two variables, then you can construct a confidence interval for the slope of a regression model. Remember that you can use linear regression or the least-squares regression technique to create the line that best fits the data.
To learn more about this confidence interval, read our article on Confidence Intervals for the Slope of a Regression Model.
Suppose you have collected data over the last 20 years about average voter age. If you think that the average voter age has decreased over the last 20 years, you could make a confidence interval for the slope of your linear regression model to see if their has been a linear relationship between time and average voter age.
To learn how to draw conclusions from this type of confidence interval, read our article on Justifying Claims Based on the Confidence Interval for the Slope of a Regression Model.
Confidence Interval Interpretation
Again, a confidence interval is an interval with the likely values of a population parameter based on one or several random samples, with a \(c\%\) confidence level.
What the confidence level says is that the method used to create a particular confidence interval is successful in capturing the value of the actual population parameter approximately \(c\%\) of the time.
Beware: a confidence level of \(X\%\) does not mean the probability of the parameter being between the limits of the confidence interval is \(X\%\).
Again, a confidence level of \(c\%\) concerns the method used to produce the confidence interval.
So, the interpretation you should make of a confidence interval is that you can be \(c\%\) confident that the actual value of the parameter is included in the calculated interval.
The most common confidence levels are of \(90\%\), \(95\%\) and \(99\%\).
Suppose that a \(95\%\) confidence interval states that the population mean is greater than \(150\) and less than \(200\). How would you interpret this statement?
“This means there is a \(95\%\) chance that the population mean falls between \(150\) and \(200\).”
“This means there is \(95\%\) confidence level that the true value of the population parameter is between \(150\) and \(200\)”.
Considerations on the margin of error, confidence level and sample size
You could think by aiming for a narrow interval to estimate your parameter, you get closer to knowing its true value since it's more precise. It is more convenient and precise for you to know that you are meeting a friend in neighborhood \(X\), instead of in city \(Y\).
But in confidence intervals you should think the other way around: the smaller the width of the interval, the less sure you are that the true value of the parameter is in that interval. Although accuracy decreases, it is much safer to assume that the parameter is in city \(Y\), rather than in neighborhood \(X\) because the same city may contain other neighborhoods in which the parameter may be present.
This means
the higher the confidence level is, the wider the confidence interval.
A confidence interval with \(99\%\) confidence level is wider than one with a \(95\%\) confidence level, which is wider than one with a \(90\%\) confidence level, regarding the same situation.
Another thing you might notice in the formulas presented is that the sample size also affects the margin of error. In all the situations presented, the sample size \(n\) appears in the denominator of the standard error. Thus,
the larger the sample size, the narrower the confidence interval
(because the smaller the value of the standard error).
Example of Confidence Intervals
Let's end this article with two examples where we calculate the confidence interval of a mean and the confidence interval of two proportions.
Let us assume you have the height data for students in several colleges. The data shows their height and the data mean is \(\mu=1.5m\). If the standard deviation is equal to \(1\). In this case, we want to know the confidence interval for the mean if the sample has a size of \(3000\) individuals.
Using the formula for the \(CI\) you have:
\[CI=1.5 \pm Z \frac{1}{\sqrt{3000}}=X \]
Let us say you want a \(95\%\) of confidence level as in the first problem:
\[CI=1.5m \pm 1.64 \frac{1}{\sqrt{3000}} \]
\[CI=1.5m \pm 0.029 \]
Let us say you want to calculate the confidence interval of a proportion. This proportion is \(62\%\). We again want a confidence level of the \(95\%\). The sample in this case was a pool of \(6734\) people.\[p=Z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\]
If you substitute the values:
\[p=0.0097\]
Confidence Intervals – Key takeaways
- A confidence interval is a range of likely values to estimate a population parameter.
- The margin of error represents a certain number of standard deviations of your statistic you add and subtract to have a certain confidence in your results.
- The confidence level is the likelihood, given in percentage, your result is close to the actual value of the population parameter you’re interested in if you repeated the sample collection over and over.
- The most frequent confidence levels are of \(90\%\), \(95\%\) and \(99\%\).
- The general form of a confidence interval issample statistic ± margin of error,where margin of error = critical value × standard error.
- Specific sample statistics have specific confidence intervals, but they all follow the same form.
- The interpretation you should make of a confidence interval is that you can be \(c\%\) confident that the actual value of the parameter is included in the calculated interval.