Definition for skewed
First, let's look at the definition for skewed.
If a distribution deviates from the normal distribution, it is said to be skewed.
So how skewed your distribution tells you both how much the distribution is asymmetric and gives you an idea of outliers in the data.
Are there any distributions with no skew? Sure there are:
all have zero skew.
Look at the graph below. It shows a normal distribution and a data distribution graphed together. As you can see, the normal distribution is symmetric but the data distribution is not. The data distribution is not symmetric like the normal distribution, therefore the data distribution is said to be skewed.
Fig. 1. The normal distribution is not skewed, but the data distribution is skewed.
Most statistical software packages will calculate the skew for you. But a general rule of thumb is that a good formula for measuring skew is
\[ \text{skew} = 3\left( \frac{\text{mean} - \text{median}}{\text{standard deviation}} \right) .\]
Let's take a look at the different kinds of skew.
Positively skewed distribution
A positively skewed distribution is one where the skew is greater than zero. In other words, the mean of the distribution is larger than the median. You might also see this distribution called right skewed. You can see in the graph below that the normal distribution has the mean, median, and mode in the same spot, but the positively skewed distribution has
\[ \text{mode} < \text{median} < \text{mean}.\]
The positively skewed distribution has a larger tail on the right side of the graph, which is the same as in the positive direction on the \(x\)-axis.
Fig. 2. Positive skewed distribution as compared to a normal distribution.
If you can have positive skew, of course, you can have negative skew too!
Negatively skewed distribution
A negatively skewed distribution is one where the skew is less than zero. In other words, the mean of the distribution is greater than the median. You might also see this distribution called left skewed. You can see in the graph below that the negatively skewed distribution has
\[ \text{mode} > \text{median} > \text{mean}.\]
The negatively skewed distribution has a larger tail on the left side of the graph, which is the same as in the negative direction on the \(x\)-axis.
Fig. 3. Negative skew distribution as compared to a normal distribution.
Next up, how do you interpret the skew?
Skewness interpretation
One of the things you can learn from the skew of the distribution is where the outliers in the data set are located. In any skewed distribution, the outliers are data points in the long tail of the distribution. This tells you that:
in a positive skew distribution, the outliers are in the long tail to the right of the mean; and
in a negative skew distribution, the outliers are in the long tail to the left of the mean.
What the skew does not tell you is how many outliers there are!
Skewness is not always a bad thing. In some data sets, it is to be expected.
Suppose you have collected data regarding the length a baseball travels when hit during professional baseball games. Most of the data will indicate that the ball travels a distance somewhere between the pitcher and the stadium seating. However on occasion, a player will bunt the ball, and it will only travel a short distance.
The bunts would be outliers, and the data would be skewed toward the longer distances travelled. This means the data distribution is skewed. That is expected of this kind of data, and not an indication that something is wrong with the data set.
In addition to skew, you can use kurtosis to get information about a data distribution.
Skewness and kurtosis
Kurtosis is a way to describe the shape of the tails of a data distribution as compared to the centre.
Kurtosis is a measurement of the tails of a data set, not of the peak of the data set!
There are three main kinds of kurtosis.
Mesokurtic distributions have \(\text{kurtosis} = 3\), and they generally have tails similar to a normal distribution.
Leptokurtic distributions have \(\text{kurtosis} > 3\). The prefix is "lepto" which means "thin". Leptokurtic distributions have very long tails on both sides of the distribution, making the centre look very thin and tall. The shape of this kind of distribution indicates there are actually outliers on both sides of the mean!
Remember, the important part of leptokurtic distributions is that they have fat tails, not that they have thin centres. Kurtosis is a description of the tails of a distribution. One example of a leptokurtic distribution is a \(t\)-distribution with a low degree of freedom.
Lastly, there are platykurtic distributions, which have \(\text{kurtosis} < 3\). The prefix is "platy" meaning "broad". Platykurtic distributions have very short tails on both sides of the distribution, making the centre look very short and broad. An example of a platykurtic distribution is the uniform distribution.
In the graph below you can see that each of the distributions is symmetric, meaning they have zero skew. However, the tail of the distributions is different in each case.
Fig. 4. All of these distributions have zero skew, but different kurtosis.
So one thing you can now see is that skew and kurtosis are entirely unrelated!
Skewness - Key takeaways
- If a distribution deviates from the normal distribution, it is said to be skewed.
- A positively skewed distribution has the mean of the distribution larger than the median, and a longer tail on the right side of the graph.
- A negatively skewed distribution has the mean of the distribution smaller than the median, and a longer tail on the left side of the graph.
- Kurtosis is a way to describe the shape of the tails of a data distribution as compared to the center.
- Mesokurtic distributions have \(\text{kurtosis} = 3\), and are similar to the normal distribution.
- Leptokurtic distributions have \(\text{kurtosis} > 3\), and very long tails on both sides of the distribution, making the centre look very thin and tall.
- Platykurtic distributions have \(\text{kurtosis} < 3\), and very short tails on both sides of the distribution, making the centre look very short and broad.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel