Definition of Spearman's rank correlation coefficient
Remember that a product moment correlation coefficient (PMCC) is used to measure a linear correlation between two variables.
See the articles Correlation and Product Moment Correlation Coefficient for more details.
But what if your data isn't linearly correlated, or can't even be measured on a continuous scale? In that case, you can use the Spearman's rank correlation coefficient. In fact, you might use the Spearman's rank correlation coefficient as an approximation to the product moment correlation coefficient even if the data is linearly correlated simply because Spearman's rank correlation coefficient is a simpler calculation.
For more details, see Comparing Spearman's Rank and Product Moment Correlation Coefficient.
In general, you would use Spearman's rank correlation coefficient if:
one or both of your data sets are from a population which is not normally distributed;
the relationship between the data sets is non-linear; or
one or both of the data sets is already represented as a ranking.
Values of the Spearman's rank correlation coefficient range between \(-1\) and \(1\).
A Spearman's rank correlation coefficient of:
- \(1\) means the rankings are in perfect agreement;
- \(0\) means there is no relationship between the rankings; and
- \(-1\) means the rankings are in reverse order.
Often the Spearman's rank correlation coefficient will not be exactly \(1\), \(0\) or \(-1\). Generally, when you do a hypothesis test using the Spearman's rank correlation coefficient, you are testing to see if there either is or is not, a relationship between the rankings.
See Testing for Zero Correlation for more details on this type of hypothesis test.
Spearman's rank graph
When looking to see if there might be a correlation when using the Spearman's rank, it can help to graph the data. Remember, you are not looking to see if the data in the graph makes a line, you are looking to see if the rankings are the same.
In the graph below, you can see the rankings that two judges gave at a competition. The rankings that Judge A gave the competitors are noted by circles, while the rankings that Judge B gave are noted by crosses.
Fig. 1 - Plot of rankings given by two different judges.
For example, Judge A gave the first competitor a ranking of \(1\), while Judge B gave the competitor a ranking of \(2\). While the data plotted does not form a line, it does appear that both judges gave approximately the same score to all of the competitors, and in three cases, they gave exactly the same score. So you could expect the Spearman's rank correlation coefficient for the rankings here to be closer to \(1\) than to \(0\).
Spearman's rank correlation coefficient formula
Using the formula for the Spearman's rank correlation coefficient requires the data sets to be ranked. It doesn't matter how you rank them (for example, best to worst or worst to best) as long as you rank both sets the same way. Before looking at the formula, let's look at an example of organising the rankings.
Two coffee tasters were asked to rank \(8\) brands of coffee in order of preference. Their order preferences for the brands are given in the table below.
Table 1. Coffee preferences by the taster.
Coffee Brand | A | B | C | D | E | F | G | H |
Taster \(x\) | \(4\) | \(5\) | \(2\) | \(8\) | \(1\) | \(3\) | \(7\) | \(6\) |
Taster \(y\) | \(4\) | \(6\) | \(1\) | \(7\) | \(3\) | \(2\) | \(5\) | \(8\) |
Each coffee is given a preference number by the taster. As long as taster \(x\) and taster \(y\) both use \(1\) to mean the same thing on the scale, then you will be able to compare the rankings. If you don't know that taster \(x\) and taster \(y\) used \(1\) to mean the coffee they prefer the most, you won't be able to tell what the correlation coefficient means even though you will be able to calculate it.
To calculate the correlation coefficient, you will need the following values:
\[ S_{xy} = \sum x_iy_i - \frac{1}{n}\sum x_i \sum y_i; \]
\[ S_{xx} = \sum x_i^2 - \frac{1}{n} \left(\sum x_i\right)^2;\]
and
\[S_{yy} = \sum y_i^2 - \frac{1}{n} \left(\sum y_i\right)^2.\]
Then the Spearman's rank correlation coefficient can be found using the formula
\[ r_s = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} .\]
You might find an example where the same score is given to more than one data point. This is called a tied rank.
A tied rank occurs when two or more data values in one of the data sets are the same.
Let's look at a quick example.
Suppose a coffee taster was instead asked to give the coffee a letter grade depending on how much they liked it. For the coffees they tasted, they gave scores of: A, C, F, D, B, C, C, C. Notice that of the eight coffees listed, three of them have a score of C! So if you tried to make a ranking table you would get:
Table 2. Possible ranking table
Rank | \(1\) | \(2\) | | | | | \(7\) | \(8\) |
Grade | A | B | C | C | C | C | D | F |
But what do you do with the four coffees that each scored a C? Do you give them a rank of \(3\), \(4\), \(5\) or \(6\)? It turns out that you give them the average of the ranks since they are tied. Finding the average gives you
\[ \frac{3+4+5+6}{4} = 4.5,\]
so each one would be ranked \(4.5\). The completed ranking table would be:
Table 3. Completed ranking table
Rank | \(1\) | \(2\) | \(4.5\) | \(4.5\) | \(4.5\) | \(4.5\) | \(7\) | \(8\) |
Grade | A | B | C | C | C | C | D | F |
Notice that in the previous example, you are not comparing the ranks of taster \(x\) to the ranks of taster \(y\). You are only comparing the ranks given by a single taster.
If there are more than two tied ranks, then the formula
\[ r_s = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \]
needs to be used. However, if there are two or fewer tied ranks, then you can use the following formula instead:
\[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2,\]
where \(n\) is the number of pairs of observations and \(d\) is the difference between the ranks of each observation. The difference formula will give you a good approximation of the Spearman's rank correlation coefficient as long as there aren't tied ranks.
Spearman's rank table
Once you know the Spearman's rank correlation coefficient, you will often use it to do a hypothesis test. While you can use technology to find the critical value, it is helpful to be able to read a Spearman's rank table. Below is a section from a Spearman's rank table.
Table 4. Spearman's rank table
\(n\)/\(\alpha \) | \(0.1\) | \(0.05\) | \(0.25\) | \(0.01\) |
\(6\) | \(0.657\) | \(0.829\) | \(0.886\) | \(0.943\) |
\(7\) | \(0.571\) | \(0.714\) | \(0.786\) | \(0.893\) |
\(8\) | \(0.524\) | \(0.643\) | \(0.738\) | \(0.833\) |
The first column of the table is the sample size \(n\), and the first row of the table gives you the confidence level. Notice that as the sample size increases, the critical value for a given confidence level decreases. Remember that the margin of error depends on the critical value:
margin of error = (critical value)(standard error).
This means that if you increase the sample size, the margin of error will decrease.
Critical value of Spearman's rank correlation coefficient
The critical value of the Spearman's rank correlation coefficient depends on the sample size and the confidence level you are using. The critical value can be found using a table or through statistical software. For example, if you are doing a one-tailed test, with a sample size of \(7\), at the \(0.25\) confidence level, you would use a table of Spearman's coefficients to see that the critical value is \(0.786\). You can find this critical value in the table above.
In other words, for a sample size of \(7\), the critical value of \(r_s\) is significant at the \(0.25\) level on a one-tailed test at \(\pm 0.786\).
Spearman's rank correlation coefficient example
Let's go back to the coffee example and work out what the correlation coefficient is.
Two coffee tasters were asked to rank eight brands of coffee in order of preference, with \(1\) being the coffee they liked the most. Their order preferences for the brands are given in the table below.
Table 5. Coffee preferences by the taster.
Coffee Brand | A | B | C | D | E | F | G | H |
Taster \(x\) | \(4\) | \(5\) | \(2\) | \(8\) | \(1\) | \(3\) | \(7\) | \(6\) |
Taster \(y\) | \(4\) | \(6\) | \(1\) | \(7\) | \(3\) | \(2\) | \(5\) | \(8\) |
Find and interpret the Spearman's rank correlation coefficient.
Solution:
Notice that even though both tasters ranked coffee A as their fourth choice, this is not an example of a tied rank. Tied ranks would happen if one taster gave two coffees the same rank. So it is reasonable to use the simplified formula
\[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2 .\]
Here there are eight coffee brands, so \(n=8\). Looking at the summation first,
\[\begin{align} \sum\limits_{i=1}^8 d_i^2 &= (4-4)^2 + (5-6)^2 + (2-1)^2 + (8-7)^2 \\ & \quad + (1-3)^2 + (3-2)^2 + (7-5)^2 + (6-8)^2 \\ &= 0+1+1+1+4+1+4+4 \\ &= 16. \end{align}\]
Then
\[\begin{align} r_s &= 1 - \frac{6}{n(n^2-1)} \sum d^2 \\ &= 1-\frac{6}{8(8^2-1)}(16) \\ &= 1-\frac{6}{8(63)}(16) \\ &\approx 0.81. \end{align}\]
Since \(r_s \not= 0\), you can't say there is no relationship between the rankings. However, since it is close to zero, you can say there is very little correlation between the rankings of the two tasters.
Spearman's Rank Correlation Coefficient - Key takeaways
- Use the Spearman's rank correlation coefficient if:
one or both of your data sets are from a population which is not normally distributed;
the relationship between the data sets is non-linear; or
one or both of the data sets is already represented as a ranking.
A Spearman's rank correlation coefficient of:
- \(1\) means the rankings are in perfect agreement;
- \(0\) means there is no relationship between the rankings; and
- \(-1\) means the rankings are in reverse order.
- A tied rank occurs when two or more data values in one of the data sets are the same.
- If there are two or fewer tied ranks, then you can use the formula:
\[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2,\]
to approximate the Spearman's rank correlation coefficient, where \(n\) is the number of pairs of observations and \(d\) is the difference between the ranks of each observation.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel