Cumulative distribution function definition
First, let's look at the official definition of a cumulative distribution function for a random variable \(X\).
Let \(X\) be a random variable. The cumulative distribution function, or CDF, \(F(x)\) is defined as
\[ F(x) = P(X \le x).\]
In other words, the cumulative distribution function is defined using the probability of the random variable. It doesn't matter if it is a continuous or discrete random variable, the definition is the same either way. The rest of this article, however, will focus on the case when \(X\) is a continuous random variable.
Cumulative distribution function from probability density function
Recall the definition of a probability density function.
The probability density function, or PDF, of a continuous random variable \(X\) is an integrable function \(f_X(x)\) satisfying the following:
- \(f_X(x) \ge 0\) for all \(x\) in \(X\); and
- \(\displaystyle \int_X f_X(x) \, \mathrm{d} x = 1\).
Then the probability that \(X\) is in the interval \([a,b]\) is \[ P(a<X<b) = \int_a^b f_X(x) \, \mathrm{d} x .\]
So how does this relate to the cumulative distribution function? Notice that the probability \(P(a<X<b)\) appears in the definition above. Since the cumulative distribution function \(F(x)\) is defined as \( F(x) = P(X \le x)\), you can rewrite the definition of the cumulative distribution function in terms of the probability density function in the following way:
Let \(X\) be a continuous random variable. The cumulative distribution function \(F(x)\) is defined as
\[ \begin{align} F(x) &= P(X \le x) \\ &= \int_{-\infty}^x f_X(t) \, \mathrm{d} t ,\end{align}\]
where \(f_X(x)\) is a probability density function for \(X\).
That means you can get from a cumulative distribution function to the probability density function by differentiation, and from the probability density function to the cumulative distribution function by integration.
Fig. 1 - changing from a CDF to a PDF.
Next, let's look at the properties of a cumulative distribution function.
Cumulative distribution function properties
You already know some of the properties of a cumulative distribution function just because it is defined in terms of probability:
the cumulative distribution function is always at least zero;
the cumulative distribution function is at most one; and
the cumulative distribution function is the area under the probability density function.
It turns out that you can read the probability of a continuous random variable directly from the cumulative distribution function graph. Let's take a quick example.
For a continuous random variable \(X\), given the cumulative distribution function as shown in the graph below, find \(P(X \le 3.5)\).
Fig. 2 - graph of a cumulative distribution function
Solution:
Don't be fooled because the graph is labelled as
\[ \int f_X(x)\, \mathrm{d}x.\]
Remember that for a probability density function \(f_X(x)\), the integral written is the same thing as the cumulative density function \(F(x)\).
It can help to find the equation of the cumulative distribution function since it is not exactly clear what \(F(3.5)\) is from the picture. Given the graph, you can see that the points \((1,0)\) and \((11,1)\) are the two endpoints of the diagonal line. The equation of that line is then
\[y= \frac{1}{10}x-\frac{1}{10},\]
so you can write down the formula for the cumulative distribution function as
\[ F(x) = \begin{cases} 0 & x < 1 \\ \dfrac{1}{10}x-\dfrac{1}{10} & 1 \le x \le 11 \\ 1 & x > 11 \end{cases}.\]
Now
\[ F(3.5) = \frac{1}{10}(3.5)-\frac{1}{10} = 0.25.\]
In other words, \(P(X \le 3.5) = 0.25\).
The normal distribution is a standard example of a continuous random variable, so let's take a peek at it next.
Cumulative distribution function of normal distribution probability density function
The cumulative distribution function of the normal distribution is just the integral of the probability density function, as you would expect. Below you can see the graph of a standard normal distribution, and then the associated cumulative distribution function for it.
Fig. 3 - Graph of the standard normal distribution probability density function
Fig. 4 - Graph of the cumulative distribution function for the standard normal distribution
Of course, it always helps to look at more examples!
Cumulative distribution function example
For the first example, let's look at how to determine whether a function is a cumulative distribution function or a probability density function.
Define
\[g(x) = \begin{cases} 0 & x \le 0 \\ \ln x + 1 & x>0\end{cases}.\]
(a) Could \(g(x)\) be a probability density function for a continuous random variable? Explain why or why not.
(b) Could \(g(x)\) be a cumulative distribution function for a continuous random variable? Explain why or why not.
Solution:
(a) For something to be a probability density function, it must always be at least zero. However
\[\begin{align} g\left(\frac{1}{4}\right) &= \ln\left(\frac{1}{4}\right) + 1 \\ &= \ln 1 - \ln 4 + 1 \\ &\approx -0.39 \\ &< 0,\end{align} \]
so it can't be a probability density function.
(b) For something to be a cumulative distribution function, it can't take on any values which are larger than \(1\). However
\[\begin{align} g\left(3\right) &= \ln\left(3\right) + 1 \\ &\approx 2.1 \\ &>1,\end{align} \]
so \(g(x)\) can't be a cumulative distribution function either.
Just because something is written in a piecewise fashion doesn't mean it has anything to do with probability.
If you know something is a probability density function, you can find the cumulative distribution function.
Suppose \(X\) is a continuous random variable, and the probability density function is
\[f(x) = \begin{cases} k\sin x & 0 \le x \le \pi \\ 0 &\text{otherwise} \end{cases}.\]
(a) Find the value of \(k\) that makes this work.
(b) Find the associated cumulative distribution function.
(c) Find \(P\left(x \le \dfrac{\pi}{4}\right)\).
Solution:
(a) Remember that for something to be a probability density function, the area under the curve must be equal to one. In other words, you need
\[ \int_{-\infty}^{\infty} f(x) \, \mathrm{d}x = 1.\]
Putting in the function, you get
\[ \begin{align} \int_{-\infty}^{\infty} f(x) \, \mathrm{d}x &= \int_0^\pi k \sin x \, \mathrm{d}x \\ &=\left. -k\cos x \phantom{\frac{}{}} \right|_0^\pi \\ &= -k\cos \pi - (-k\cos 0) \\ &= -k(-1) + k(1)\\ &= 2k. \end{align}\]
So for \(f(x)\) to be a probability density function you need that \(2k = 1\), so \(k = \dfrac{1}{2}\).
(b) From the first part of the problem, you know that
\[f(x) = \begin{cases} \dfrac{1}{2}\sin x & 0 \le x \le \pi \\ 0 &\text{otherwise} \end{cases}.\]
From properties of the cumulative distribution function, you also know that \(F(x)=0\) for \(x \le 0\), and \(F(x)=1\) for \(x \ge \pi\). All that remains is the pesky part between \(0\) and \(\pi\). If you integrate,
\[\begin{align} F(x) &= \int \dfrac{1}{2}\sin x \, \mathrm{d}x \\ &= -\frac{1}{2}\cos x + C \end{align}\]
where \(C\) is the constant of integration. Since \(F(0)=0\),
\[ \begin{align} -\frac{1}{2}\cos x + C &= -\frac{1}{2}\cos 0 + C \\ &= -\frac{1}{2} + C, \end{align} \]
and it must be the case that \(C = \dfrac{1}{2}\). So the cumulative distribution function is:
\[ F(x) = \begin{cases} 0 & x \le 0 \\ -\dfrac{1}{2}\cos x + \dfrac{1}{2} & 0 \le x\le \pi \end{cases}.\]
(c) To find \(P\left(x \le \dfrac{\pi}{4}\right)\), just evaluate \(F\left(\dfrac{\pi}{4}\right) \), giving you
\[ \begin{align} P\left(x \le \dfrac{\pi}{4}\right) &= F\left( \dfrac{\pi}{4}\right) \\ &= -\dfrac{1}{2}\cos \left( \dfrac{\pi}{4}\right) + \dfrac{1}{2} \\ &= -\frac{1}{2}\left(\frac{\sqrt{2}}{2}\right) + \frac{1}{2} \\ & \approx 0.145. \end{align}\]
Cumulative Distribution Function - Key takeaways
For any random variable \(X\), the cumulative distribution function \(F(x)\) is defined as
\[ F(x) = P(X \le x).\]
For a continuous random variable \(X\) with probability density function \(f_X(x)\), the cumulative distribution function \(F(x)\) is defined as
\[ \begin{align} F(x) &= P(X \le x) \\ &= \int_{-\infty}^x f_X(t) \, \mathrm{d} t .\end{align}\]
- The cumulative distribution function is always at least zero, at most one, and is the area under the probability density function.
- To get a cumulative distribution function from a probability density function, integrate the probability density function. To get a probability density function from a cumulative distribution function, differentiate the probability density function.