This is where statisticians use of estimator bias comes in. Since your estimate is based on an average idea of how things have gone in the past, you can use an estimator for the average, and from there figure out how biased or unbiased it is.
Comparing estimators and finding the variance or standard error of an estimator are explained in the article Quality of Estimators.
Definition of the Bias of an estimator
Say, for example, you wanted to find the mean length of fish in an aquarium. Not only are there a huge number of fish you'd need to measure, but it's also very difficult to catch and measure all the fish.
Instead of measuring every single fish in the population (which is referred to as a census), a better approach would be to take a sample of fish, and from that sample find an estimate for the mean length of the fish. This is referred to as an estimator.
First, however, you need to know what a statistic is.
The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,...,X_n\)). These observations are independent and are each identically distributed.
Often these are called test statistics to differentiate them from the word "statistics". Mathematically, this means that the statistic used to estimate a parameter, \(T\), will be comprised of \(n\) independent, random samples taken from a random variable, \(X\).
An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.
You might also see an estimator called a point estimate. It is important to be able to recognise what estimators are. Have a look at the following example.
Explain why the following functions are or are not estimators where \(X_1, X_2,...,X_n\) are taken from a population with parameters \(\mu\) and \(\sigma\).
i) \(\dfrac{X_3+X_6}{2}\)
ii) \(\dfrac{\sum(X_i-\mu)^2}{n}\)
Solution:
i) The function
\[\dfrac{X_3+X_6}{2}\]
is an estimator since it is comprised of independent, identically distributed samples.
ii) On the other hand,
\[\dfrac{\sum(X_i-\mu)^2}{n}\]
is not an estimator since it contains \(\mu\) which is not a sample. In fact, this potential estimator is not even a statistic. The variable \(\mu\) is the population parameter! You can't use a formula involving the population parameter to estimate the population parameter.
Let's take look at a quick overview.
Overview of estimator bias
Not all statistics are reliable estimators. To determine the validity of a statistic's ability to estimate a parameter, you will need to find the expected value of the statistic.
If the expectation of the statistic is different to the parameter that you want to estimate, then this tells you that the statistic is biased.
You can think of bias as a measure of how skewed your sampling distribution is, or how far from the population parameter your estimator is as well. The more skewed the sampling distribution, the higher the bias.
For more information on skew, see the article Skewness.
Bias of an estimator explanation
You can write the definition of an estimate being biased or unbiased using simple mathematical notation.
If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when
\[\text {E}(\hat{\theta})=\theta\]
where \(\text{E}\) is the notation for expected value. Any statistic which is not unbiased is called biased.
If \(\hat{\theta}\) is biased, the bias can be found using the following formula:
\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]
How large the bias of \(\hat{\theta}\) is can be found using the following formula:
\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]
Notice that if \(\text{E}(\hat{\theta})=\theta \) then \(\text{Bias}=0\).
Let's put the definition to use.
Show that \(\text{E}(\bar{X})=\mu\) where
\[\bar{X}=\frac{(X_1+X_2+\dots+X_n)}{n} \]
is an unbiased estimator.
Solution:
Keeping in mind that \(\text {E}(aX)=\text {E}(X)\), you have
\[\begin{align}\text {E}(\bar{X})&=\frac{1}{n}\text{E}(X_1+\dots +X_n)\\&=\frac{1}{n}(\text {E}(X_1)+\dots +\text {E}(X_n))\end{align}\]
Since \(\text {E}(X_i)=\mu\) for all \(i\), you have
\[ \begin{align} \text {E} (\bar{X}) &= \frac{\mu +\mu +\dots + \mu}{n} \\ &= \frac{n \mu}{\mu}\\ &=\mu .\end{align}\]
This shows that \(\text {E}(\bar{X})=\mu\), which means \(\bar{X}\) is an unbiased estimator of parameter \(\mu\). This means that on average, this statistic will give the correct value for the estimated parameter.
The fact that the previous example gives you an unbiased estimator is why you will see it used to construct confidence intervals.
Estimator Bias example
Not all estimators are unbiased!
You are given
\[T=\frac{X_1+2X_2}{n}\]
as a candidate for an estimator of the parameter for the mean of a distribution, \(t\), where \(n\) is the total number of samples taken. Find the bias of this statistic.
Solution:
In this problem, the population parameter is the mean, \(t\). So to find the bias, you can use the formula
\[\text{Bias}(T)=\text {E}(T)-t,\]
giving you
\[ \begin{align} \text{Bias} (T) &= \text {E} \left(\frac{X_1+2X_2}{n}\right) -t \\&= \frac{\text {E} (X_1)+2\text {E} (X_2)}{n} -t \\&= \frac{3t}{n}-t\\&= \frac{t(3-n)}{n} .\end{align}\]
Therefore the bias of estimator \(T\) is
\[\text{Bias}(T) = \dfrac{t(3-n)}{n}.\]
Bias of estimator formula
While the sample mean is one way to get an unbiased estimator, it is not the only way. Let's look at applying the formula for the estimator of bias to variance instead.
To find an estimator for the population variance, you may try to use the variance of the sample which would be denoted as
\[V=\frac{\sum\limits_{i=1}^n(X_i-\bar{X})^2}{n}.\]
However, since this formula uses the sample mean, \(\bar{X}\), rather than \(\mu\), the population mean, the variance of a sample will be biased towards the sample mean rather than the population mean.
Instead, you can use a different statistic: the sample variance. This will give you an unbiased estimator for the population variance, \(\sigma^2\).
An unbiased estimator for the population variance, \(\sigma ^2\), is the sample variance, \(S^2\):
\[S^2=\frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1}.\]
This formula isn't always the easiest to use when calculating the sample mean. The are other ways to find \(s^2\).
These are the ways that you can calculate the sample variance:
\[\begin{align} s^2 &= \frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1} \\&= \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \\&=\frac{S_{xx}}{n-1} .\end{align} \]
In general, \(S^2\) is used to denote the estimator for the population variance, and \(s^2\) is used to denote a particular estimate. It's worth learning the above two equivalent formulas as they are significantly easier to apply than the first one.
Let's take a look at the proof that \(s^2\) is an unbiased estimate for \( \sigma ^2\). In other words, the goal is to show that \(\text {E}(s^2)=\sigma ^2\).
To do this, you need to write the expectation of the sample variance
\[\text{E}(S^2) = \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \]
in terms of \(\sigma\) and \(\mu\). Notice that you have already used one of the alternate ways of calculating the sample variance.
First, using the definition of \(\sigma ^2\), you have
\[\begin{align} \sigma ^2 &=\text{Var}(X) \\ &=\text {E}(X^2)-\mu ^2, \end{align} \]
therefore \(\text{E}(X^2)=\sigma ^2 +\mu ^2.\)
You also know that \(\text{Var}(\bar{X})=\dfrac{\sigma ^2}{n}\) and \(\text{E}(\bar{X})=\mu\), so you can write \(\text{Var}(\bar{X})\) as
\[\begin{align} \text{Var}(\bar{X}) &= \frac{\sigma ^2}{n} \\ &=\text {E}(\bar{X} ^2)-\mu ^2, \end{align}\]
so
\[\text {E}(\bar{X}^2)=\frac{\sigma ^2}{n}+\mu ^2.\]
The expectation of the sample variance is given by:
\[\begin{align} \text {E}(S^2) &= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2-n\bar{X}^2\right)}{n-1} \\&= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2\right)-\text {E}(n\bar{X}^2)}{n-1} .\end{align} \]
Since
\[\begin{align} \text {E}\left(\sum\limits_{i=1}^n X^2\right)&=\sum\limits_{i=1}^n \text {E}(X^2)\\ &=n\text {E}(X^2), \end{align}\]
you have
\[\begin{align} \text {E}(S^2) &= \frac{ n\text {E}(X^2)-\text {E}(n\bar{X}^2)}{n-1} \\ &= \frac{n(\sigma ^2 +\mu ^2)-n\left(\dfrac{\sigma ^2}{n} +\mu ^2\right)}{n-1}\\ &=\frac{n\sigma^2 +n\mu ^2 -\sigma ^2 -n\mu ^2 }{n-1} \\&= \frac{(n-1)\sigma ^2}{n-1} \\ &=\sigma^2 . \end{align} \]
Since \(\text {E}(s^2)=\sigma ^2\), you have shown that \(s^2\) is an unbiased estimate for the population variance, \(\sigma ^2\).
While you may not need to memorise the proof, it is always good to read and understand the steps to ensure you have a good understanding of the topic.
Estimator Bias - Key takeaways
- An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.
- The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,\dots ,X_n\)). These observations are independent are each identically distributed.
- If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when \(\text {E}(\hat{\theta})=\theta\).
- If \(\hat{\theta}\) is biased, the bias can be quantified using the following formula:\[\text{Bias}(\hat{\theta})=\text {E}(\hat{\theta})-\theta.\]
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel