Bias

The bias of an estimate is the difference between the expectation value of the point estimate and value of the parameter.

biasF(θ^,θ)=θ^θ=dxθ^f(x)T(F).

Note that the expectation value of θ^ is computed over the (unknown) generative distribution whose PDF is f(x).

Bias of the plug-in estimate for the mean

We often want a small bias because we want to choose estimates that give us back the parameters we expect. Let’s first investigate the bias of the plug-in estimate of the mean. As a reminder, the plug-in estimate is

μ^=x¯,

where x¯ is the arithmetic mean of the observed data. To compute the bias of the plug-in estimate, we need to compute μ^ and compare it to μ.

μ^=x¯=1nixi=1nixi=x=μ.

Because μ^=μ, the bias in the plug-in estimate for the mean is zero. It is said to be unbiased.

Bias of the plug-in estimate for the variance

To compute the bias of the plug-in estimate for the variance, first recall that the variance, as the second central moment, is computed as

σ2=x2x2.

So, the expectation value of the plug-in estimate is

σ^2=1nixi2x¯2=1nixi2x¯2=1nixi2x¯2=x2x¯2=μ2+σ2x¯2.

We now need to compute x¯2, which is a little trickier. We will use the fact that the measurements are independent, so xixj=xixj for ij.

x¯2=(1nixi)2=1n2(ixi)2=1n2ixi2+2ij>ixixj=1n2(ixi2+2ij>ixixj)=1n2(n(σ2+μ2)+2ij>ixixj)=1n2(n(σ2+μ2)+n(n1)x2)=1n2(nσ2+n2μ2)=σ2n+μ2.

Thus, we have

σ^2=(11n)σ2.

Therefore, the bias is

bias=σ2n.

If σ^2 is the plug-in estimate for the variance, an unbiased estimator would instead be

nn1σ^2=1n1i=1n(xix¯)2.

Justification of using plug-in estimates.

Despite the apparent bias in the plug-in estimate for the variance, we will normally just use plug-in estimates going forward. (We will use the hat, e.g. θ^, to denote an estimate, which can be either a plug-in estimate or not.) Note that the bootstrap procedures we lay out in what follows do not need to use plug-in estimates, but we will use them for convenience. Why do this? The bias is typically small. We just saw that the biased and unbiased estimators of the variance differ by a factor of n/(n1), which is negligible for large n. In fact, plug-in estimates tend to have much smaller error than the confidence intervals for the parameter estimate, which we will discuss next.