Figure 1. The chi-square distribution is useful for finding a relationship between two things; like clothing prices at different stores.
In this article, you will learn about a new type of distribution to answer questions such as these – the chi-square distribution. You will study the chi-square distribution formula, the properties of a chi-square distribution, chi-square distribution tables, and work through several chi-square distribution examples. You will also be introduced to the major applications of the chi-square distribution, including:
the chi-square test for goodness of fit – which tells you if data fit a certain distribution, like in the lottery number example.
the chi-square test for homogeneity – which tells you if two populations have the same distribution, like in the clothing example.
the chi-square test for independence – which tests for variability, like in the coffee machine example.
each of which has an article of its own.
Chi-Square Distribution Definition
What happens when you square a normally distributed random variable? You know the probability distribution of the random variable itself, but what does it tell you about the distribution of the squared random variable? That question led to the discovery of the chi-square distribution, and it turns out to be useful in a wide variety of contexts.
A Chi-Square \( (\chi^{2}) \) Distribution is a continuous probability distribution of the sum of squared, independent, standard normal random variables that is widely used in hypothesis tests.
The chi-square distribution is the basis for three chi-square tests:
the chi-square test for goodness of fit – allowing you to compare observed probability distributions to expected distributions,
the chi-square test for independence and the chi-square test for homogeneity;
the chi-square test for independence allows you to test the independence of categorical variables, and
the chi-square test for homogeneity allows you to test if two categorical variables follow the same probability distribution
the test of a single variance – allowing you to estimate the sampling distribution of the variance.
The basic shape of a chi-square distribution is determined by its degrees of freedom, denoted by \(k\).
The degrees of freedom, \(k\), are the number of values that are free to vary.
Let's take a look at an example.
Say you have \(4\) numbers that add up to \(1\):
\[ X_{1} + X_{2} + X_{3} + X_{4} = 1 \]
How many of the \(X\) values are free to vary?
Solution:
The answer is \(3\) because if you know \(3\) of the numbers, then you can solve for the \(4^{th}\) one:
\[ X_{4} = 1 - (X_{1} + X_{2} + X_{3}) \]
So, this example has \(3\) degrees of freedom.
In practice, the degrees of freedom, \(k\), is often one less than the number of observations.
The following graph illustrates examples of chi-square distributions with differing values of \(k\).
Figure 2. A comparison of Chi-Square Distributions with varying degrees of freedom.
Because very few real-world observations follow a chi-square distribution, the main purpose of a chi-square distribution is hypothesis testing.
The Chi-Square Distribution's Relationship to the Standard Normal Distribution
The reason a chi-square distribution is useful for hypothesis testing is because of how closely it is related to the standard normal distribution: a normal distribution whose mean is \(0\) and variance is \(1\). Let's walk through this relationship.
Say you take a random sample of a standard normal distribution, \(Z\). If you square all the values in your sample, you now have the chi-square distribution with one degree of freedom, or \( k = 1 \). So, mathematically, you represent this as:
\[ \chi_{1}^{2} = Z^{2} \]
Now, say you want to take random samples from \(2\) independent standard normal distributions, \( Z_{1} \) and \( Z_{2} \). If you square each sample and add them together every time you sample a pair of values, you have the chi-square distribution with two degrees of freedom, or \( k = 2 \). You represent this mathematically as:
\[ \chi_{2}^{2} = (Z_{1})^{2} + (Z_{2})^{2} \]
Continuing with this pattern, in general, if you take random samples from \(k\) independent standard normal distributions and then square and sum those values, you get a chi-square distribution with \(k\) degrees of freedom. Again, this is represented mathematically as:
\[ \chi_{k}^{2} = (Z_{1})^{2} + (Z_{2})^{2} + \ldots + (Z_{k})^{2} \]
In summary, a common use of a chi-square distribution is to find the sum of squared, normally distributed, random variables. So, if \( Z_{i} \) represents a normally distributed random variable, then:
\[ \sum_{i=1}^{k} Z_{i}^{2} \sim \chi^{2}_{k} \]
Chi-Square Distribution Formula
Chi-square tests are hypothesis tests whose test statistics follow a chi-square distribution under the null hypothesis. The first, and most widely used, chi-square test to be discovered was Pearson's chi-square test.
Pearson's chi-square distribution formula (a.k.a. statistic, or test statistic) is
\[ \chi^{2} = \sum \frac{(O-E)^{2}}{E} \]
where,
\( \chi^{2} \) is the chi-square test statistic
\( \sum \) is the summation operator
\( O \) is the observed frequency
\( E \) is the expected frequency
If you take many samples from a population and calculate Pearson's chi-square test statistic for each sample, the test statistic will follow a chi-square distribution; provided the null hypothesis is true.
Mean of a Chi-Square Distribution
The mean of a chi-square distribution is the degrees of freedom:\[ \mu \left[ \chi^{2} \right] = k. \]
Variance of a Chi-Square Distribution
The variance of a chi-square distribution is twice the degrees of freedom:\[ \sigma^{2} \left[ \chi^{2} \right] = 2k. \]
Mode of a Chi-Square Distribution
The mode of a chi-square distribution is the degrees of freedom minus two (when \( k \geq 2 \)):\[ \text{mode} \left[ \chi^{2} \right] = k - 2, \text{ if } k \geq 2 \]
Standard Deviation of a Chi-Square Distribution
The standard deviation of a chi-square distribution is the square-root of twice the degrees of freedom:
\[ \sigma \left[ \chi^{2} \right] = \sqrt{2k} \]
Properties of a Chi-Square Distribution
The chi-square distribution has several properties that make it easy to work with and well-suited for hypothesis testing:
A chi-square distribution is a continuous distribution.
A chi-square distribution is defined by a single parameter: the degrees of freedom, \(k\).
The sum of independent chi-square random variables is also a chi-square random variable, with the degrees of freedom of the sum being the sum of the degrees of freedom:\[ \chi^{2}_{k_{1}} + \chi^{2}_{k_{2}} + \sim \chi^{2}_{k_{1} + k_{2}} \]
Range of a Chi-Square Distribution
A chi-square distribution is never negative. This is easiest to see in the ratio-of-variances formula. Since both the top and bottom of the fraction are positive, the ratio can never be negative. In other words:
\[ \frac{(n-1)s^{2}}{\sigma^{2}} \geq 0 \]
Symmetry of a Chi-Square Distribution
The constraint that a chi-square distributed random variable can never be negative means that a chi-square distribution cannot be symmetrical. It is a non-symmetric distribution. However, a chi-square distribution becomes increasingly symmetrical as \(k\) increases.
Shape of a Chi-Square Distribution
The shape of a chi-square distribution depends on the degrees of freedom, \(k\). As the value of \(k\) increases, the chi-square distribution more closely resembles the bell-curve of a normal distribution. This is because, while a chi-square distribution can never be negative, it goes all the way to infinity in the positive direction. In statistical terms, you say that a chi-square distribution is skewed to the right because the right tail is longer than the left.
The skewness of a chi-square distribution is equal to:
\[ \text{Skewness} \left[ \chi^{2} \right] = \sqrt{\frac{8}{k}} \]
This means the mean of a chi-square distribution is greater than the median and the mode. As \(k\) gets increasingly large, the number under the square root gets closer and closer to zero, so the skewness of the distribution approaches zero as \(k\) approaches infinity.
When a Chi-Square Distribution has one or two Degrees of Freedom
When a chi-square distribution has only one or two degrees of freedom ( \( k = 1 \) or \( k = 2 \) ) it is shaped like a backwards "J".
Figure 3. Graphs of a Chi-Square Distribution when it has one and two degrees of freedom.
Because of the shape of these chi-square distributions, it means that there is a high probability that \( \chi^{2} \) is close to zero.
When a Chi-Square Distribution has three or more Degrees of Freedom
When a chi-square distribution has three or more degrees of freedom ( \( k \geq 3 \) ), it takes on a bump-shape that has a peak in the middle that more closely resembles a normal distribution. This means there is a low probability that \( \chi^{2} \) is either very close to or very far from zero.
When \(k\) is only slightly larger than \(2\), the chi-square distribution has a much longer right tail than left tail; that is, it is strongly right-skewed.
Figure 4. Graphs of a Chi-Square Distribution when it has three and five degrees of freedom.
Remember that the mean of a chi-square distribution is the degrees of freedom, then notice that the peak is always to the left of the mean. Also notice that the left tail ends at zero, but the right tail goes on forever. The peak of the distribution can never truly be in the middle; it always must be left of center (because half of infinity is also infinite).
As the degrees of freedom, \(k\), gets larger and larger, the skew of the distribution gets smaller and smaller. As the degrees of freedom approaches infinity, the distribution approaches that of a normal distribution.
Figure 5. A Chi-Square Distribution that has \(90\) degrees of freedom. At this point, you can use a normal distribution as a good approximation of the chi-square distribution.
In fact, when \( k \geq 90 \), you can consider a normal distribution as a good approximation of the chi-square distribution.
Chi-Square Distribution Tables
Nowadays, many calculators and any statistical software can calculate a chi-square distribution. But before that software was ubiquitous, people needed an easy way to approximate that value. That’s why they created the chi-square table. The chi-square distribution table is a reference tool you can use to find chi-square critical values.
A chi-square critical value is a threshold for statistical significance for hypothesis tests. It also defines confidence intervals for certain parameters.
Below is an example of a chi-square distribution table for \(1-5\) degrees of freedom.
Percentage Points of the Chi-Square Distribution |
---|
Degrees of Freedom (k) | Probability of a Larger Value of X2; Significance Level (α) |
0.99 | 0.95 | 0.90 | 0.75 | 0.50 | 0.25 | 0.10 | 0.05 | 0.01 |
1 | 0.000 | 0.004 | 0.016 | 0.102 | 0.455 | 1.32 | 2.71 | 3.84 | 6.63 |
2 | 0.020 | 0.103 | 0.211 | 0.575 | 1.386 | 2.77 | 4.61 | 5.99 | 9.21 |
3 | 0.115 | 0.352 | 0.584 | 1.212 | 2.366 | 4.11 | 6.25 | 7.81 | 11.34 |
4 | 0.297 | 0.711 | 1.064 | 1.923 | 3.357 | 5.39 | 7.78 | 9.49 | 13.28 |
5 | 0.554 | 1.145 | 1.610 | 2.675 | 4.351 | 6.63 | 9.24 | 11.07 | 15.09 |
Table 1: Table data, example chi-square distribution.
The leftmost column tells you the degrees of freedom, \(k\). The top row tells you the complement of the probability of a larger value of \( \chi^{2} \) you’re interested in; that is, if you want a probability of \(0.9\), you look in the column labelled \(0.1\). The number in each cell is the critical value at which the chi-square distribution has that much probability remaining to the right.
Let's walk through an example.
Say you have a chi-square distribution with \(6\) degrees of freedom, and you want to know the value at which your chi-square distribution reaches \(5\%\) probability. How can you use a chi-square distribution table to do this?
Solution:
- You are given the fact that you have \(6\) degrees of freedom, so choose the row that matches that; row \(6\).
- You want to know the value at which your chi-square distribution reaches \(5\%\) probability, so you choose the column that is the complement to \(5\% = 0.05\). This is column \(0.95\).
- With the row and column specified, identify the critical value in the cell. The critical value is \(1.635\).
- This means that your chi-square distribution reaches \(5\%\) probability when it becomes greater than or equal to \(1.635\).
- Mathematically, you write:\[ P(\chi_{6}^{2} \geq 1.635) = 0.95 \]
These tables are most important for hypothesis testing. If your test statistic is greater than the number in the appropriate cell, that means you have found evidence to reject the null hypothesis. See the article on Chi-Square Tests for more information.
Applications of the Chi-Square Distribution
What are some common applications of the chi-square distribution? Well, the chi-square distribution appears in many statistical tests and theories. Below are some of the most common.
Population Variance Inferences
A major motivation for the chi-square distribution is drawing inference about the population standard deviation \( (\sigma) \) or variance \( (\sigma^{2}) \) from a relatively small sample. Using a chi-square distribution, you can test the hypothesis that a population's variance is equal to a specific value by using the test of a single variance. Alternatively, you could calculate confidence intervals for the population's variance.
A large union strives to make sure that all employees who are at the same level of seniority get paid similar salaries. Their goal: a standard deviation in hourly salary that is less than \($3\).
- To test if they have achieved their goal, the union randomly samples \(30\) employees who are at the same level of seniority. They find that the standard deviation of their sample is \($2.95\), which is just slightly less than their goal of \($3\).
Is this enough evidence to conclude that the true standard deviation of all employees who are at the same level of seniority is less than \($3\)?
Solution:
To find out, the union should use the test of a single variance to determine if the standard deviation is significantly different from \($3\), using:
Then, by comparing a chi-square test statistic to the appropriate chi-square distribution, the union can decide if it is appropriate to reject the null hypothesis.
Pearson's Chi-Square Test
Pearson's chi-square tests are some of the most common applications of chi-square distributions. You use these tests to determine if your data are significantly different from what you expect. The two types of Pearson's chi-square tests are:
the chi-square goodness of fit test and
the chi-square test for independence.
Say a t-shirt company wants to know if all colors of their t-shirts are equally popular. To find out, they record the number of sales per shirt color for a week. This data is represented in the table below:
Sales per Shirt Color |
---|
Color | Frequency |
Black | 80 |
Blue | 90 |
Gray | 70 |
Red | 60 |
White | 100 |
Table 2. Sales per shirt color data.
Since the company sold \(400\) t-shirts, \(80\) sales per color would mean that the colors were equally popular. Based on the numbers in the table, you know that the company did not sell \(80\) of each color of t-shirt. However, this is only a one-week sample, so you should expect that the numbers won't be equal due to chance.
But, does this sample give enough confidence to conclude that the frequency of t-shirt sales truly differs between colors?
Solution:
This is where a chi-square goodness of fit test comes in. It could test whether the observed frequencies are significantly different from equal frequencies.
If you tell the company to compare the Pearson chi-square test statistic to the appropriate chi-square distribution, then the company can determine the probability of these t-shirt sale values happening because of chance.
F Distribution Definition
Chi-square distributions are also integral in defining the \(F\) distribution, a distribution used in Analysis of Variance (ANOVAs).
How do you use chi-square distributions to define an \(F\) distribution?
Solution:
- Say you take random samples from a chi-square distribution.
- Next, you divide the sample by the degrees of freedom of the chi-square distribution.
- Repeat steps \( 1-2 \) with a different chi-square distribution.
- If you take the ratios of the values from these two distributions, you will get an \(F\) distribution.
Chi-Square Distribution Examples
Now, let's work through some examples!
Square and add up \(15\) standard normal random variables. What distribution does this sum follow?
Solution:
A squared standard normal random variable follows a chi-square distribution with \(1\) degree of freedom. The sum of chi-square random variables also follow a chi-square distribution, with the degrees of freedom of the sum being the sum of the individual degrees of freedom. Let’s follow this process.
- Let \(Z\) be a standard normal random variable:\[ Z_{i}^{2} \sim \chi_{1}^{2} \]
- Then:\[ Z_{1}^{2} + Z_{2}^{2} = \chi_{1}^{2} + \chi_{1}^{2} \sim \chi_{2}^{2} .\]
- So if you have the sum of \(15 Z_{i}^{2} \), you have:\[ \sum_{i = 1}^{15} Z_{i}^{2} = \sum_{i = 1}^{15} \chi_{1}^{2} \sim \chi_{15}^{2}. \]
- The sum of \(15\) squared standard normal random variables follows a chi-square distribution with \(15\) degrees of freedom.
Building on the previous example:
What are the
- mean,
- variance,
- standard deviation, and
- skewness
of the distribution from the previous example?
Solution:
- The mean of a chi-square distribution is equal to the degrees of freedom:\[ \mu \left[ \chi_{15}^{2} \right] = k = 15 \]
- The variance of a chi-square distribution is two times the degrees of freedom:\[ \sigma^{2} \left[ \chi_{15}^{2} \right] = 2(15) = 30 \]
- The standard deviation is the square root of the variance:\[ \begin{align}\sigma \left[\chi_{15}^{2} \right] &= \sqrt{ \sigma^{2} \left[ \chi_{15}^{2} \right]} \\&= \sqrt{30} \approx 5.477.\end{align} \]
- The skewness has a formula, too:\[ \text{Skewness} \left[ \chi_{15}^{2} \right] = \sqrt{\frac{8}{15}} \approx 0.73 \]
Here is an example using a chi-square table.
Using a chi-square table, find the \(90\%\), \(95\%\), and \(99\%\) critical values for a chi-square distribution with \(8\) degrees of freedom.
Solution:
All you have to do for this question is read the table.
- There are \(8\) degrees of freedom; find the row corresponding to \(8\) degrees of freedom.
- To find the critical values, find the columns for \(0.1\), \(0.05\), and \(0.01\).
- \(0.1\) corresponds to the critical value for \(90\%\).
- \(0.05\) corresponds to the critical value for \(95\%\).
- \(0.01\) corresponds to the critical value for \(99\%\).
- Then read the numbers in the cells.
- The critical values are:
- \(13.36\),
- \(15.51\), and
- \(20.09\).
- The results are highlighted in the table below:
Table 3. Chi-square distribution example.
Percentage Points of the Chi-Square Distribution |
---|
Degrees of Freedom (k) | Probability of a Larger Value of X2; Significance Level (α) |
0.99 | 0.95 | 0.90 | 0.75 | 0.50 | 0.25 | 0.10 | 0.05 | 0.01 |
6 | 0.872 | 1.635 | 2.204 | 3.455 | 5.348 | 7.84 | 10.64 | 12.59 | 16.81 |
7 | 1.239 | 2.167 | 2.833 | 4.255 | 6.346 | 9.04 | 12.02 | 14.07 | 18.48 |
8 | 1.647 | 2.733 | 3.490 | 5.071 | 7.344 | 10.22 | 13.36 | 15.51 | 20.09 |
9 | 2.088 | 3.325 | 4.168 | 5.899 | 8.343 | 11.39 | 14.68 | 16.92 | 21.67 |
Chi-Square Distribution – Key takeaways