What is the hypothesis test for correlation coefficient?
When given a sample of bivariate data (data which include two variables), it is possible to calculate how linearly correlated the data are, using a correlation coefficient.
The product moment correlation coefficient (PMCC) describes the extent to which one variable correlates with another. In other words, the strength of the correlation between two variables. The PMCC for a sample of data is denoted by r, while the PMCC for a population is denoted by ρ.
The PMCC is limited to values between -1 and 1 (included).
If, there is a perfect positive linear correlation. All points lie on a straight line with a positive gradient, and the higher one of the variables is, the higher the other.
If, there is no linear correlation between the variables.
If there is a perfect negative linear correlation. All points lie on a straight line with a negative gradient, and the higher one of the variables is, the lower the other.
Correlation is not equivalent to causation, but a PMCC close to 1 or -1 can indicate that there is a higher likelihood that two variables are related.
Bivariate data with no correlation, positive correlation, and negative correlationThe PMCC should be able to be calculated using a graphics calculator by finding the regression line of y on x, and hence finding r (this value is automatically calculated by the calculator), or by using the formula, which is in the formula booklet. The closer r is to 1 or -1, the stronger the correlation between the variables, and hence the more closely associated the variables are. You need to be able to carry out hypothesis tests on a sample of bivariate data to determine if we can establish a linear relationship for an entire population. By calculating the PMCC, and comparing it to a critical value, it is possible to determine the likelihood of a linear relationship existing.
What is the hypothesis test for negative correlation?
To conduct a hypothesis test, a number of keywords must be understood:
Null hypothesis ( ): the hypothesis assumed to be correct until proven otherwise
Alternative hypothesis ( ): the conclusion made if is rejected.
Hypothesis test: a mathematical procedure to examine a value of a population parameter proposed by the null hypothesis compared to the alternative hypothesis.
Test statistic: is calculated from the sample and tested in cumulative probability tables or with the normal distribution as the last part of the significance test.
Critical region: the range of values that lead to the rejection of the null hypothesis.
Significance level: the actual significance level is the probability of rejecting when it is in fact true.
The null hypothesis is also known as the 'working hypothesis'. It is what we assume to be true for the purpose of the test, or until proven otherwise.
The alternative hypothesis is what is concluded if the null hypothesis is rejected. It also determines whether the test is one-tailed or two-tailed.
A one-tailed test allows for the possibility of an effect in one direction, while two-tailed tests allow for the possibility of an effect in two directions, in other words, both in the positive and the negative directions. Method: A series of steps must be followed to determine the existence of a linear relationship between 2 variables. 1. Write down the null and alternative hypotheses (). The null hypothesis is always, while the alternative hypothesis depends on what is asked in the question. Both hypotheses must be stated in symbols only (not in words).
2. Using a calculator, work out the value of the PMCC of the sample data, r .
3. Use the significance level and sample size to figure out the critical value. This can be found in the PMCC table in the formula booklet.
4. Take the absolute value of the PMCC and r, and compare these to the critical value. If the absolute value is greater than the critical value, the null hypothesis should be rejected. Otherwise, the null hypothesis should be accepted.
5. Write a full conclusion in the context of the question. The conclusion should be stated in full: both in statistical language and in words reflecting the context of the question. A negative correlation signifies that the alternative hypothesis is rejected: the lack of one variable correlates with a stronger presence of the other variable, whereas, when there is a positive correlation, the presence of one variable correlates with the presence of the other.
How to interpret results based on the null hypothesis
From the observed results (test statistic), a decision must be made, determining whether to reject the null hypothesis or not.
Image: Repapetilto CC BY-SA 3.0,
Two-tailed test applied to normal distribution. Image: public domain
Both the one-tailed and two-tailed tests are shown at the 5% level of significance. However, the 5% is distributed in both the positive and negative side in the two-tailed test, and solely on the positive side in the one-tailed test.
From the null hypothesis, the result could lie anywhere on the graph. If the observed result lies in the shaded area, the test statistic is significant at 5%, in other words, we reject. Therefore, could actually be true but it is still rejected. Hence, the significance level, 5%, is the probability that is rejected even though it is true, in other words, the probability that is incorrectly rejected. When is rejected, (the alternative hypothesis) is used to write the conclusion.
We can define the null and alternative hypotheses for one-tailed and two-tailed tests:
For a one-tailed test:
For a two-tailed test:
Let us look at an example of testing for correlation.
12 students sat two biology tests: one was theoretical and the other was practical. The results are shown in the table.
Score in theoretical test, t | 5 | 9 | 7 | 11 | 20 | 4 | 6 | 17 | 12 | 10 | 15 | 16 |
Score in practical test, p | 6 | 8 | 9 | 13 | 20 | 9 | 8 | 17 | 14 | 8 | 17 | 18 |
a) Find the product moment correlation coefficient for this data, to 3 significant figures.
b) A teacher claims that students who do well in the theoretical test tend to do well in the practical test. Test this claim at the 0.05 level of significance, clearly stating your hypotheses.
a) Using a calculator, we find the PMCC (enter the data into two lists and calculate the regression line. the PMCC will appear). r = 0.935 to 3 sign. figures
b) We are testing for a positive correlation, since the claim is that a higher score in the theoretical test is associated with a higher score in the practical test. We will now use the five steps we previously looked at.
1. State the null and alternative hypotheses. : ρ = 0 and : ρ > 0
2. Calculate the PMCC. From part a), r = 0.935
3. Figure out the critical value from the sample size and significance level. The sample size, n, is 12. The significance level is 5%. The hypothesis is one-tailed since we are only testing for positive correlation. Using the table from the formula booklet, the critical value is shown to be cv = 0.4973
4. The absolute value of the PMCC is 0.935, which is larger than 0.4973. Since the PMCC is larger than the critical value at the 5% level of significance, we can reach a conclusion.
5. Since the PMCC is larger than the critical value, we choose to reject the null hypothesis. We can conclude that there is significant evidence to support the claim that students who do well in the theoretical biology test also tend to do well in the practical biology test.
Let us look at a second example.
A tetrahedral die (four faces) is rolled 40 times and 6 'ones' are observed. Is there any evidence at the 10% level that the probability of a score of 1 is less than a quarter?
The expected mean is 10 . The question asks whether the observed result (test statistic 6 is unusually low.
We now follow the same series of steps.
1. State the null and alternative hypotheses. : ρ = 0 and : ρ <0.25
2. We cannot calculate the PMCC since we are only given data for the frequency of 'ones'.
3. A one-tailed test is required ( ρ < 0.25) at the 10% significance level. We can convert this to a binomial distribution in which X is the number of 'ones' so , we then use the cumulative binomial tables. The observed value is X = 6. To .
4. Since 0.0962, or 9.62% <10%, the observed result lies in the critical region.
5. We reject and accept the alternative hypothesis. We conclude that there is evidence to show that the probability of rolling a 'one' is less than
Hypothesis Test for Correlation - Key takeaways
- The Product Moment Correlation Coefficient (PMCC), or r, is a measure of how strongly related 2 variables are. It ranges between -1 and 1, indicating the strength of a correlation.
- The closer r is to 1 or -1 the stronger the (positive or negative) correlation between two variables.
- The null hypothesis is the hypothesis that is assumed to be correct until proven otherwise. It states that there is no correlation between the variables.
- The alternative hypothesis is that which is accepted when the null hypothesis is rejected. It can be either one-tailed (looking at one outcome) or two-tailed (looking at both outcomes – positive and negative).
- If the significance level is 5%, this means that there is a 5% chance that the null hypothesis is incorrectly rejected.
ImagesOne-tailed test: https://en.wikipedia.org/w/index.php?curid=35569621