Variables on a scatter graph
Each point on a scatter graph has coordinates (x, y) that relate to the values for the two variables. If there is a connection between these data sets, a line of best fit is drawn to give a visual representation of the relationship between them. The relationship between these data sets is referred to as Correlation.
The variables of a scatter graph are either independent or dependent. The independent variable is not influenced by anything and is plotted on the x-axis, whilst the dependent variable is affected by the independent and plotted on the y-axis.
The scatter graph below shows the relationship between the grades that students get in mathematics and in physical sciences. Each point has an x-coordinate that represents their math results and a y-coordinate that represents their physics results.
Scatter graph showing the relationship between Physical Science and Mathematics grades
What is correlation in scatter graphs?
Correlation is the relationship between two data sets or variables, that we aim to determine by drawing a scatter graph. It is linked to the correlation coefficient r, which measures the strength and direction of the linear relationship between the two variables.
It is important to understand that correlations only exist when there is a link between two variables.
The three types of correlation situations are:
Positive correlation
A positive correlation is when one variable increases, then the other variable will also increase. Scatter Graphs with a positive correlation have a positive slope. A perfect positive correlation is expressed as +1 and it means that the variables being compared will always move together in the same direction and percentage.
The more time that you spend exercising, the more calories you will burn.
Illustration of Scatter Graph with positive correlation
Negative correlation
A negative correlation is when one variable decreases, then the other will increase. Scatter graphs with a negative correlation have a negative slope. A perfect negative correlation is expressed as -1 and it means that the two variables being compared will always move in opposite directions.
The more time a person spends practising Maths, the less confused they will be with its topics.
Illustration of Scatter Graph with a negative correlation
No correlation
No correlation is when no clear relationship exists between two variables. No correlation is expressed as 0 correlation coefficient.
The more tea you drink, the greater your knowledge of scatter graphs.
Scatter graph with no correlation
Strength of the Correlation
The strength of the correlation is dependent on how closely aligned the data points are, whether moving in a positive or negative direction and the value of the correlation coefficient (r). These correlations are described as:
Strong correlation
A strong correlation is when the data points of the scatter graph are closely aligned to each other. A strong positive correlation coefficient has values that are closer to +1, whilst a strong negative r has values closer to -1.
Scatter Graph with strong negative correlation
Weak or moderate correlation
A weak or moderate correlation is when the data points of the scatter graph are more spread out. Weak correlation coefficients have values closer to 0.
Scatter Graph with weak positive correlation
What is the regression line in scatter graphs?
The regression line is a line drawn through a scatter graph to express the correlation of the data. It gives an outline of the relationship between the two variables and allows you to make predictions about future data points. It should be drawn in a way so that it goes through the middle of the points on the scatter graph, with an equal number of points on either side of the line.
Illustration of a regression line on a scatter graph
How are scatter graphs described?
Scatter graphs are described or interpreted with the following information: correlation, strength and outliers.
Outliers are points on the scatter graph that do not fit the pattern of the data set.
The graph below shows an example of outliers on a scatter graph (red points).
Illustration outliers on a scatter graph
Scatter Graph example mapping the relationship between study hours and grades
The following scatter graph shows the relationship between the number of hours a student spends studying Mathematics and the grades that they achieve.
This scatter graph could be interpreted or described in the following way: The direction of the points indicates a positive correlation meaning that the more hours a student studies, the higher their grades. The points are closely aligned, indicating a strong correlation where students are likely to always achieve higher grades when they study more. Lastly, it consists of outliers which could be due to the students' natural understanding, motivation or general interest in the subject, i.e. some students will still get lower grades when they attend more classes because they dislike Maths.
How to draw a scatter graph
A scatter graph is drawn using the following steps:
Step 1: Decide on the two variables that you will be comparing.
Step 2: Collect and tabulate data for these variables. The x / independent variable will be tabulated in the second row and your y / dependent variables will be in the third.
Step 3: Use your collected data to plot the points.
Step 4: Draw your regression line.
Scatter Graphs - Key takeaways
- A scatter graph can either have positive, negative or no correlation.
- A strong positive correlation coefficient has values that are closer to +1, whilst a strong negative r has values closer to -1.
- A weak or moderate correlation is when the data points of the scatter graph are more spread out. Weak correlation coefficients have values closer to 0.
- The regression line is a line drawn through a scatter graph to express the correlation of the data.
- Scatter graphs are interpreted or described with the following information: correlation, strength and outliers.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel