In statistics, there are constraints as well. The Chi Squared Tests use degrees of freedom to describe how free a test is based on the constraints placed on it. Read on to figure out how free the Chi Squared Test really is!
Degrees of freedom meaning
Many tests use degrees of freedom, but here you will see degrees of freedom as it relates to Chi Squared Tests. In general, the degrees of freedom is a way to measure how many test statistics you have calculated from the data. The more test statistics you have calculated using your sample, the less freedom you have to make choices with your data. Of course, there is a more formal way to describe these constraints as well.
A constraint, also called a restriction, is a requirement placed on the data by the model for the data.
Let's look at an example to see what that means in practice.
Suppose you are doing an experiment where you roll a four sided die \(200\) times. Then the sample size is \(n=200\). One constraint is that your experiment needs the sample size to be \(200\).
The number of constraints will also depend on the number of parameters you need to describe a distribution, and whether or not you know what these parameters are.
Next, let's look at how the constraints relate to degrees of freedom.
Degrees of freedom formula
For most cases, the formula
degrees of freedom = number of observed frequencies - number of constraints
can be used. If you go back to the example with the four sided die above, there was one constraint. The number of observed frequencies is \(4\) (the number of sides on the die. So the degrees of freedom would be \(4-1 = 3\).
There is a more general formula for the degrees of freedom:
degrees of freedom = number of cells (after combining) - number of constraints.
You are probably wondering what a cell is and why you might combine it. Let's look at an example.
You send out a survey to \(200\) people asking how many pets people have. You get back the following table of responses.
Table 1. Responses from pet ownership survey.
Pets | \(0\) | \(1\) | \(2\) | \(3\) | \(4\) | \(>4\) |
Expected | \(60\) | \(72\) | \(31\) | \(20\) | \(7\) | \(10\) |
However, the model you are using is only a good approximation if none of the expected values falls below \(15\). So you could combine the last two columns of data (known as cells) into the table below.
Table 2. Responses from pet ownership survey with combined cells.
Pets | \(0\) | \(1\) | \(2\) | \(3\) | \(>3\) |
Expected | \(60\) | \(72\) | \(31\) | \(20\) | \(17\) |
Then there are \(5\) cells, and one constraint (that the total of the expected values is \(200\)). So the degrees of freedom is \(5 - 1= 4\).
You will usually only combine adjoining cells in your tables of data. Next, let's look at the official definition of degrees of freedom with the Chi-Squared distribution.
Degrees of freedom definition
If you have a random variable \(X\) and want to do an approximation for the statistic \(X^2\), you would use the \(\chi^2\) family of distributions. This is written as
\[\begin{align} X^2 &= \sum \frac{(O_t - E_t)^2}{E_t} \\ &= \sum \frac{O_t ^2}{E_t} -N \\ & \sim \chi^2, \end{align}\]
where \(O_t\) is the observed frequency, \(E_t\) is the expected frequency, and \(N\) is the total number of observations. Remember that the Chi-Squared tests are only a good approximation if none of the expected frequencies is below \(5\).
For a reminder of this test and how to use it, see Chi Squared Tests.
The \(\chi^2\) distributions are actually a family of distributions that depend on the degrees of freedom. The degrees of freedom for this kind of distribution are written using the variable \(\nu\). Since you may need to combine cells when using \(\chi^2\) distributions, you would use the definition below.
For the \(\chi^2\) distribution, the number of degrees of freedom, \(\nu\) is given by
\[ \nu = \text{number of cells after combining}-1.\]
There will be cases where cells won't be combined, and in that case, you can simplify things a bit. If you go back to the four sided die example, there are \(4\) possibilities that could come up on the die, and these are the expected values. So for this example \(\nu = 4 - 1 = 3\) even if you are using a Chi-Squared distribution to model it.
To be sure you know how many degrees of freedom you have when using the Chi-Squared distribution, it is written as a subscript: \(\chi^2_\nu \).
Degrees of freedom table
Once you know that you are using a Chi-Squared distribution with \(\nu\) degrees of freedom, you will need to use a degrees of freedom table so that you can do hypothesis tests. Here is a section out of a Chi-Squared table.
Table 3. Chi-Squared table.
degrees of freedom | \(0.99\) | \(0.95\) | \(0.9\) | \(0.1\) | \(0.05\) | \(0.01\) |
\(2\) | \(0.020\) | \(0.103\) | \(0.211\) | \(4.605\) | \(5.991\) | \(9.210\) |
\(3\) | \(0.155\) | \(0.352\) | \(0.584\) | \(6.251\) | \(7.815\) | \(11.345\) |
\(4\) | \(0.297\) | \(0.711\) | \(1.064\) | \(7.779\) | \(9.488\) | \(13.277\) |
The first column of the table contains the degrees of freedom, and the first row of the table are areas to the right of the critical value.
The notation for a critical value of \(\chi^2_\nu\) which is exceeded with probability \(a\%\) is \(\chi^2_\nu(a\%)\) or \(\chi^2_\nu(a/100)\).
Let's take an example using the Chi-Squared table.
Find the critical value for \(\chi^2_3(0.01)\).
Solution:
The notation for \(\chi^2_3(0.01)\) tells you that there are \(3\) degrees of freedom and you are interested in the \(0.01\) column of the table. Looking at the intersection of the row and column in the table above, you get \(11.345\). So
\[\chi^2_3(0.01) = 11.345 . \]
There is a second use for the table, as demonstrated in the next example.
Find the smallest value of \(y\) such that \(P(\chi^2_3 > y) = 0.95\).
Solution:
Remember that the significance level is the probability that the distribution exceeds the critical value. So asking for the smallest value \(y\) where \(P(\chi^2_3 > y) = 0.95\) is the same as asking what \(\chi^2_3(0.95)\) is. Using the Chi-Squared table you can see that \(\chi^2_3(0.95) =0.352 \), so \(y=0.352\).
Of course, a table can't list all of the possible values. If you need a value which is not in the table, there are many different statistics packages or calculators that can give you Chi-Squared table values.
Degrees of freedom t-test
The degrees of freedom in a \(t\)-test is calculated depending on if you are using paired samples or not. For more information on these topics, see the articles T-distribution and Paired t-test.
Degrees of Freedom - Key takeaways
- A constraint, also called a restriction, is a requirement placed on the data by the model for the data.
- In most cases, degrees of freedom = number of observed frequencies - number of constraints.
- A more general formula for degrees of freedom is: degrees of freedom = number of cells (after combining) - number of constraints.
For the \(\chi^2\) distribution, the number of degrees of freedom, \(\nu\) is given by
\[ \nu = \text{number of cells after combining}-1.\]
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel