Method of moments

There is another widely used method for obtaining parameter estimates that we have not covered, the method of moments. Parameter estimates from the method of moments do not have the same desirable properties as those from maximum likelihood estimation, so estimates from the method of moments are not as useful in practice in parametric models. They are, however, useful for getting initial guesses for numerical maximum likelihood estimates. For this reason, and because the method of moments is still fairly widely used (though far less than MLE) and you should be familiar with it, I will discuss it here.

I first introduce the method of moments formally, and then demonstrate with examples.

Definition of method of moments

For ease of discussion, we will consider a generative model of a set of \(n\) i.i.d. measurements drawn from a generative distribution with probability density function \(f(x;\theta)\), where \(\theta\) are the parameters. The mth moment of the distribution is

\[\begin{aligned} \langle x^m \rangle = \int \mathrm{d}x\,x^m f(x;\theta). \end{aligned}\]

In the case where \(x\) is discrete, the mth moment is

\[\begin{aligned} \langle x^m \rangle = \sum_x\,x^m f(x;\theta). \end{aligned}\]

The moments will in general be functions of the parameters \(\theta\).

Recalling our discussion of plug-in estimates of linear statistical functionals, the plug-in estimate for the mth moment is

\[\begin{aligned} \widehat{\langle x^m \rangle} = \frac{1}{n}\sum_{i=1}^n\,x_i^m, \end{aligned}\]

where \(x_i\) is one of the \(n\) empirical measurements.

In the method of moments, we equate the plug-in estimates for moments with the analytical expressions for the moments derived from the model generative distribution and then solve for \(\theta\) to get our estimates. This is best seen in practice.

Examples of method of moments

We will demonstrate the method of moments with a few examples.

Estimates for an Exponential distribution

For an Exponential distribution, the probability density function is

\[\begin{aligned} f(x;\beta) = \beta\,\mathrm{e}^{-\beta x}. \end{aligned}\]

There is a single parameter, \(\beta\). The first moment of the Exponential distribution is \(\langle x \rangle = 1/\beta\), which may be computed by evaluating the integral

\[\begin{aligned} \int_0^\infty \mathrm{d}x\;x\,f(x;\beta) = \int_0^\infty \mathrm{d}x\;x\,\beta\,\mathrm{e}^{-\beta x} = 1/\beta, \end{aligned}\]

but is more easily looked up in the Distribution Explorer. The plug-in estimate for the first moment is

\[\begin{aligned} \widehat{\langle x \rangle} = \frac{1}{n}\sum_{i=1}^n\,x_i = \bar{x}. \end{aligned}\]

Equating the moment from the model generative distribution and the plug-in estimate gives \(\beta = 1/\bar{x}\), which is our estimate by method of moments for the parameter \(\beta\).

Estimates for a Normal distribution

We now turn to the Normal distribution. We will again use the first moment, but we need to use a second moment to get a second equation, since the Normal distribution has two parameters, \(\mu\) and \(\sigma\). It is often the case that it is easier to compare the variance and the plug-in estimate for the variance instead of the second moment. Note that the variance can be expressed in terms of the first and second moments,

\[\begin{aligned} \sigma^2 = \langle x^2 \rangle - \langle x \rangle^2, \end{aligned}\]

so that even though we are comparing the variance and the plug-in estimate for the variance, we are still doing method of moments.

The first moment and variance of a Normal distribution are \(\mu\) and \(\sigma^2\), respectively. Equating them with the plug-in estimates gives our method-of-moments estimates for the parameters, \(\mu = \bar{x}\) and \(\sigma = \hat{\sigma}\), where \(\hat{\sigma}\) is the plug-in estimate for the variance.

Estimates for a Gamma distribution

You may have noticed that for the Exponential and Normal distributions, the method of moments gives the MLE. This is not generally the case. As a counter example, consider a Gamma distribution, which has probability density function

\[\begin{aligned} f(x;\alpha, \beta) = \frac{1}{\Gamma(\alpha)}\,\frac{(\beta x)^\alpha}{x}\,\mathrm{e}^{-\beta x}, \end{aligned}\]

with two parameters, \(\alpha\) and \(\beta\). Equating the first moment and variance with their respective plug-in estimates gives

\[\begin{split}\begin{aligned} &\frac{\alpha}{\beta} = \bar{x},\\[1em] &\frac{\alpha}{\beta^2} = \hat{\sigma}^2. \end{aligned}\end{split}\]

Solving for \(\alpha\) and \(\beta\) yields our method-of-moments estimates for the parameters,

\[\begin{split}\begin{aligned} &\alpha = \frac{\bar{x}^2}{\hat{\sigma}^2},\\[1em] &\beta = \frac{\bar{x}}{\hat{\sigma}^2}. \end{aligned}\end{split}\]

This is not the same result as given by maximum likelihood estimation. In fact, the MLE cannot be written analytically.