The Olkin-Petkau-Zidek example of MLE fragility
[1]:
import numpy as np
import scipy.stats as st
import scipy.special
import scipy.optimize
In 1981, Olkin, Petkau, and Zidek demonstrated an example in which MLE estimates can very wildly with only small changes in the data. Here, we will work through their example.
The problem is thusly stated: Say you are measuring the outcomes of \(N\) Bernoulli trials, but you can only measure a positive result; negative results are not detected in your experiment. You do know, however that \(N\), while unknown, is the same for all experiments. The number of positive results you get from a set of measurements (sorted for convenience) are n = 16, 18, 22, 25, 27. Modeling the generative process with Binomial distribution, \(n_i \sim \text{Binom}(\theta, N)\;\;\forall i\), we can obtain maximum likelihood estimates for \(\theta\) and \(N\).
Let us first attempt to make some analytical progress on computing the MLE. We start by writing the likelihood and log likelihood. Let \(M\) be the number of measurements made; in this case, \(M = 5\).
\begin{align} &L(\theta, N;\mathbf{n}) = \prod_{i=1}^M \frac{N!}{n_i!(N-n_i)!}\,\theta^{n_i}(1-\theta)^{N-n_i},\\[1em] &\ell(\theta, N;\mathbf{n}) = M\ln N! + \sum_{i=1}^M \left[-\ln n_i! - \ln\left(N-n_i\right)! + n_i \ln \theta + (N-n_i)\ln(1-\theta)\right]. \end{align}
A necessary condition for the MLE is that
\begin{align} \frac{\partial \ell}{\partial \theta} = \sum_{i=1}^M\,\frac{n_i}{\theta} - \frac{N-n_i}{1-\theta} = M\left(\frac{\bar{n}}{\theta} - \frac{N}{1-\theta} + \frac{\bar{n}}{1-\theta}\right)= 0, \end{align}
where \(\bar{n}\) denotes the average of the \(M\) observed \(n\) values. Solving for the MLE value of \(\theta\) yields
\begin{align} \theta^* = \frac{\bar{n}}{N}, \end{align}
which is perhaps not surprising. We are left to find the MLE for \(N\), which is the value of \(N\) for which the log likelihood
\begin{align} \ell = M\ln N! + \sum_{i=1}^M \left[-\ln n_i! - \ln\left(N-n_i\right)! + n_i\,\ln \frac{\bar{n}}{N} + (N-n_i)\ln\left(1-\frac{\bar{n}}{N}\right)\right] \end{align}
is maximal. Equivalently, we seek \(N\) such that the quantity
\begin{align} \ln N! - \bar{n}\,\ln N + (N-\bar{n}) \ln\left(1-\frac{\bar{n}}{N}\right) - \frac{1}{M}\sum_{i=1}^M \ln(N-n_i)! \end{align}
is maximal.
Computing the MLE for a discrete parameter can be a bit tricky. We could take the approach of exhaustively enumerating the log likelihood for many values of \(N\) and then selecting the value of \(N\) that give the largest log likelihood. Here is an implementation of that.
[2]:
def mle_exhaustive(n, N_min=None, N_max=1000):
"""Compute MLE for N and theta."""
# Values of N we will enumerate
N = np.arange(np.max(n) if N_min is None else N_min, N_max+1)
# MLE for theta as a function of N
theta = np.mean(n) / N
# Compute log likelihood
log_L = [
np.sum(st.binom.logpmf(n, N_val, theta_val)) for N_val, theta_val in zip(N, theta)
]
# Find index of MLE of N
i_mle = np.argmax(log_L)
# Compute MLE
N_mle = N[i_mle]
theta_mle = np.mean(n) / N_mle
return N_mle, theta_mle
Let’s put our function to use and compute the MLE!
[3]:
# Observed data
n = np.array([16, 18, 22, 25, 27])
mle_exhaustive(n)
[3]:
(99, 0.2181818181818182)
We get \(N^* = 99\) and \(\theta^* = 0.22\).
But now, let’s say that the final measurement has 28 positive results instead of 27. That’s a change of a single positive result in just one of the five experiments. Now let’s compute the MLE for this new data set.
[4]:
# Observed data
n = np.array([16, 18, 22, 25, 28])
mle_exhaustive(n)
[4]:
(191, 0.11413612565445026)
Oof! \(N^*\) almost doubled to 191 and \(\theta^*\) halved to 0.11. With just a tiny change in the data, we can get wildly different estimates for the parameters. This is an example of the fragility of the MLE.
To get a handle on how our MLE would change with newly acquired data sets, we will do parametric bootstrap to compute confidence intervals for \(N^*\) and \(\theta^*\) using the original data set with 27 successes for the fifth trial. Computing the bootstrapped confidence interval becomes difficult because exhaustive enumeration is slow and \(N\) can be very large. Another approach is to consider \(N\) as a continuous variable such that \(N!= \Gamma(N+1)\), and instead search for a maximizer of
\begin{align} h(N) = \ln \Gamma(N+1) - \bar{n}\,\ln N + (N-\bar{n}) \ln\left(1-\frac{\bar{n}}{N}\right) - \frac{1}{M}\sum_{i=1}^M \ln \Gamma(N-n_i+1). \end{align}
Noting that the derivative of the log of a Gamma function is a polygamma function,
\begin{align} \frac{\mathrm{d}}{\mathrm{d}x}\,\ln \Gamma(x) = \psi_0(x), \end{align}
we can write
\begin{align} \frac{\mathrm{d}h}{\mathrm{d}N} = \ln\left(1 - \frac{\bar{n}}{N}\right) + \psi_0(N+1) - \frac{1}{M}\sum_{i=1}^M\psi_0(N-n_i+1) = 0. \end{align}
Our strategy to find \(N^*\) is to solve the above root finding problem for continuous \(N\), and then find which integer \(N\) on either side of the root gives the maximal log likelihood. Note that in the limit of \(N\to \infty\), \(\mathrm{d}h/\mathrm{d}N = 0\). A finite root may not exist, meaning that the MLE is \(N\to\infty\) and \(\theta = 0\). In this limit, the number of positive results are Poisson distributed (the \(N\to\infty\) and \(\theta \to 0\) limit where \(N\theta\) remains finite). In this case, the MLE for \(N\theta\) is \(\bar{n}\). We will not include those results in our bootstrap samples.
[5]:
def dh_dN(N, n):
"""Return dh/dN for root finding problem."""
sum_term = np.mean(scipy.special.polygamma(0, N - n + 1))
return np.log(1 - np.mean(n) / N) + scipy.special.polygamma(0, N + 1) - sum_term
def mle(n, N_max=1_000_000):
"""Compute the MLE for OPZ problem."""
try:
root = scipy.optimize.brentq(dh_dN, np.max(n)+1, N_max, args=(n,))
return mle_exhaustive(n, N_min=int(root), N_max=int(root)+1)
except:
return (np.nan, np.nan)
Let’s now compute the bootstrap replicates.
[6]:
# Observed data
n = np.array([16, 18, 22, 25, 27])
# Get MLE
N, theta = mle(n)
# Parametric bootstrap
rng = np.random.default_rng()
bs_reps = np.array([mle(rng.binomial(N, theta, size=len(n))) for _ in range(100_000)])
Let’s first check what fraction of the bootstrap replicates had \(N\to\infty\).
[7]:
np.sum(np.isnan(bs_reps[:,0])) / len(bs_reps)
[7]:
0.24807
Wow. Nearly 25% of the bootstrap replicates had \(N\to\infty\), such that we can’t even compute an MLE for the Binomial model. Now, computing a confidence interval with the samples where an MLE exists, first for \(N\).
[8]:
np.percentile(bs_reps[~np.isnan(bs_reps[:, 0]), 0], [2.5, 97.5])
[8]:
array([ 24. , 520.6])
And now for theta.
[9]:
np.percentile(bs_reps[~np.isnan(bs_reps[:, 1]), 1], [2.5, 97.5])
[9]:
array([0.04166667, 0.85 ])
These results verify that from observing first positive results from five sets of experiments, we cannot really know anything about the parameters of the model.
Computing environment
[10]:
%load_ext watermark
%watermark -v -p numpy,scipy,jupyterlab
Python implementation: CPython
Python version : 3.12.4
IPython version : 8.25.0
numpy : 1.26.4
scipy : 1.13.1
jupyterlab: 4.0.13