Random variables are used in diverse fields which deal with probability such as machine learning, health, forecasting, and others.
Random variables in statistics: Definition and types
A random variable is a variable with a domain (range of possible values) that corresponds to the numerical results of a random statistical experiment (or, more generally, the outcomes of random behavior). It is also known as a stochastic variable.
Let's consider a couple of scenarios for when we use random variables. For example, for random selection from a box, we are given a set of possible numbers {1, 2, 3, 4, 5}. Any of these numbers within this set can be drawn out in a statistical experiment or probability test. If the number 2 is chosen, then 2 takes on the value of the random variable for that iteration of the experiment.
Another example where the use of random variables applies is the rolling of a die. For each roll, any number ranging from 1 to 6 can be obtained. The outcome of the die role, measured as X, is a random variable.
A random variable can be classified as discrete, continuous, or mixed, and it is represented by a capital letter, for example, X or Y. The range of possible values which a random variable can take on is called its sample space.
Discrete random variables
When a random variable takes specified or finite values in an interval, it is said to be discrete. Values of a discrete random variable must be a countable number. For example, when rolling a die, the possible outcomes represented by X are the countable numbers of 1, 2, 3, 4, 5, and 6. We cannot, however, role a die and obtain an outcome of 5.243, for example.
Continuous random variables
When data is uncountable and can take on infinitely many values, it is referred to as continuous. The probabilities associated with continuous data are represented by a continuous random variable. For example, how much time it takes to complete a given task for a given period of 30 minutes is considered continuous.
You may be wondering how this range of 30 minutes can be considered infinite and uncountable. This is because the task can be completed at any given instance within the 30 minute range, as measured down to the millisecond, for example, or increasingly more precise units. This is in contrast to countable data, like the count of a number of people, for example, which can only be represented in whole numbers.
Thus, for the occurrence of a random variable, X, given the function y = f(x), X can take any value falling within the shaded region, a to b.
Mixed random variables
When a variable is neither entirely discrete nor continuous but rather has features of both, it is referred to as a mixed random variable.
The occurrences on the stock market and hydrology rainfall models exemplify mixed random variables. These events have both discrete and continuous features.
Random variables: The probability of random events formula
The probability of random events can be calculated with the following formula: .
Where:
“n” is the number of favorable outcomes, and
“N” is the number of total possible outcomes.
Let's consider an example which uses this formula.
Assuming a box contains 10 red balls, 5 yellow balls, and 15 green balls. If we are to select a ball at random, what is the probability that we will select a red ball?
Solution:
Let red balls = R = 10,
Yellow balls = Y = 5, and
Green balls = G = 15
Number of total possible outcomes: N = R + Y + G = 10 + 5 + 15 = 30
Since we are considering the probability of selecting a red ball in particular, the number of favorable outcomes is equal to the number of red balls: n = R = 10
Therefore, the probability of selecting a red ball is shown as: = = or
Note that this example above concerns a discrete random variable. We are measuring countable numbers of balls, and we could not obtain 1.4 red balls, for example.
Determination of probability distributions for random variables
The probability distribution of a random variable is a function which describes the chances/likelihood of occurrence of values within that random variable's sample space in an experiment.
Probability distributions can be classified by the types of random variables they describe: discrete probability distribution and continuous probability distributions.
Discrete probability distribution
Discrete probability distributions are formed by the probability mass function (PMF). What is the probability that a discrete random variable will be equal to some specific value? This range of probabilities across the sample space is defined by the PMF. Let's take a look at the notation and properties of the probability mass function, which describes the probability distributions of discrete random variables.
For the probability mass function (PMF) of a discrete random variable:
Notation:
Properties:
By the properties of discrete random variables, we know that the probability of each value must be between 0 and 1, and the sum of all values in the sample space must be equal to 1.
Continuous probability distribution
Continuous probability distributions are formed by the probability density function (PDF). Unlike discrete random variables, directly determining the probabilities of specific values of continuous variables isn't a straightforward process because there are infinitely many values!
For this reason, we may choose to simplify this measurement by "discretizing" the variables. This means that we approximate the continuous variable as taking on discrete quantities, allowing us to work with intervals of values rather than specific values.
To represent the continuous random variable's sample space in terms of the probability associated with its values, we use the probability density function (PDF). Let's take a look at the notation and properties of the PDF.
For the probability density function (PDF) of a continuous variable,:
Notation:
Properties:
From the properties, we know that the area under the PDF curve is equal to 1, and the probability of each distinct value is zero (because the values are infinite).
The measurement of height is a continuous measurement. Let's say we are to predict the height of one student in a class of 30 pupils. We would use a continuous random variable. With what precision can we predict that a certain student's height is exactly 1.68m and not 1.67m or 1.69m or any other very close value?
The easiest and most reasonable way to do this is to discretize the values and predict the student's height in a specified range, say between 1.65m to 1.70m.
Random Variables - Key takeaways
- A random variable (otherwise known as a stochastic variable) is a real-valued description or a function that allocates numerical values to a statistical experiment.
- Types of random variables are: discrete random variables, continuous random variables, and mixed random variables.
- To calculate the probability of a random event, we can use P(X) = n/N, where “n” is the number of favorable outcomes and “N” is the number of total possible outcomes.
- Probability distributions can be classified as discrete probability distributions and continuous probability distributions.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel