23. Mixture models

Dataset download

[2]:

import warnings
import multiprocessing

import polars as pl
import numpy as np
import scipy.optimize
import scipy.stats as st

import bebi103

import iqplot

import bokeh.io
bokeh.io.output_notebook()

Loading BokehJS ...

In a previous lesson, we explored a Negative-Binomial model for mRNA counts. The data set we used came from this paper from the Elowitz lab and featured smFISH performed in 279 cells for the genes rex1, rest, nanog, and prdm14. The data set may be downloaded here.

Multiple cell types?

While the Negative Binomial model could work for some genes, exploratory data analysis shows an obvious bimodality in the distribution of counts of Rex1 transcripts.

[3]:

df = pl.read_csv(os.path.join(data_path, "singer_transcript_counts.csv"), comment_prefix="#")

p = iqplot.ecdf(
    data=df, q="Rex1", x_axis_label="Rex1 mRNA count", conf_int=True
)

bokeh.io.show(p)

We can define a cell type by its expression profile. The bimodality of the ECDF suggests that there are more than one cell types present, since the ECDF clearly does not follow a Negative Binomial distribution.

A Negative Binomial mixture model

What is the story for a distribution of mRNA counts coming from more than one cell type? To be concrete, let’s say we ahve two cell types, one with one level of bursty expression of the gene of interest, and another with a different level of bursty expression. We would expect the number of mRNA transcripts to be distributed according to a linear combination of negative binomial distributions. Such a linear combination of distributions is called a mixture model. For this mixture model, we can write out the PMF for an individual measurement of \(n\) mRNA transcripts as

\begin{align} f(n\mid \alpha_1, \beta_1, \alpha_2, \beta_2, w) &= w\,\frac{(n + \alpha_1 - 1)!}{n!\,(\alpha_1-1)!}\,\left(\frac{\beta_1}{1+\beta_1}\right)^{\alpha_1}\left(\frac{1}{1+\beta_1}\right)^{n} \\[1em] &\;\;\;\;+ (1-w) \,\frac{(n + \alpha_2 - 1)!}{n!\,(\alpha_2-1)!}\,\left(\frac{\beta_2}{1+\beta_2}\right)^{\alpha_2}\left(\frac{1}{1+\beta_2}\right)^{n}. \end{align}

Where, \(w\) is the fraction of cells that are of type one (burst size and frequency are determined by \(1/\beta_1\) and \(\alpha_1\)), and \(1-w\) is the fraction of cells that are of type two. We can write this likelihood more concisely as

\begin{align} &n_i \sim w \, \text{NegBinom}(\alpha_1, \beta_1) + (1-w)\,\text{NegBinom}(\alpha_2, \beta_2) \; \forall i, \\[1em] &b_1 = 1/\beta_1, \\[1em] &b_2 = 1/\beta_2. \end{align}

(Recall from the last lesson that we are choosing to parametrize the Negative Binomial distributions with the burst size \(b\), related to \(\beta\) as \(b = 1/\beta\), so going forward we will use \(b_1\) and \(b_2\).)

MLE for a mixture model

Mixture models offer some unique challenges in modeling. They are revealed by looking at the log likelihood, which I write again here for a single measurement \(n\) for clarity.

\begin{align} \ell(n\mid \alpha_1, b_1, \alpha_2, b_2, w) = \ln(w\,x_1 + (1-w)x_2), \end{align}

where

\begin{align} x_i = \frac{(n + \alpha_i - 1)!}{n!\,(\alpha_i-1)!}\,\left(\frac{1}{1+b_i}\right)^{\alpha_i}\left(\frac{b_i}{1+b_i}\right)^{n}. \end{align}

The log-sum-exp trick

While the logarithm of a product is conveniently split, we cannot split the logarithm of a sum. If we compute the sum directly, we will get serious underflow errors for parameters for which the terms \(x_1\) or \(x_2\) are small. To compute this is a more numerically stable way, we need to use the log-sum-exp trick. To do this, we re-write the \(x_i\) as \(x_i = \mathrm{e}^{\ln x_i}\).

\begin{align} \ln f(n\mid \alpha_1, b_1, \alpha_2, b_2, w) = \ln(w\,\mathrm{e}^{\ln x_1} + (1-w)\mathrm{e}^{\ln x_2}). \end{align}

For a moment, let us assume that \(x_1 > x_2\). Then, we can factor the log likelihood as

\begin{align} \ln f(n\mid \alpha_1, b_1, \alpha_2, b_2, w) &= \ln\left[\mathrm{e}^{\ln x_1}\left(w + (1-w)\mathrm{e}^{\ln x_2 - \ln x_1}\right)\right] \\[1em] &= \ln x_1 + \ln\left(w + (1-w)\mathrm{e}^{\ln x_2 - \ln x_1}\right). \end{align}

Provided \(w\) is not too close to zero, the argument of the logarithm of the sum is now of order \(w\). So, when computing the log of the sum of weighted probabilities, we should factor out the largest term and then proceed. This is known as the log-sum-exp trick. Conveniently, this is implemented in Scipy as scipy.special.logsumexp(). Be sure to read the docs to understand how this works, and we will use it below in our log-likelihood calculation.

Model identifiability

A nonidentifiable model is a model for which we cannot unambiguously determine the parameter values. That is, two or more parameter sets are observationally equivalent. This is the case for this mixture model because of label switching. In this mixture model, it is arbitrary which \((\alpha, b)\) pair we label as \((\alpha_1, b_1)\) or \((\alpha_2, b_2)\). We can switch the labels, and also change \(w\) to \(1-w\), and we have exactly the same log likelihood. You can imagine that if we have a mixture of more than two distributions, the label switching problem gets worse, as we can have various combinations of label switching.

There is no one way to get around this kind of nonidentifiability (or many other kinds of nonidentifiability, for that matter). Since we are finding a maximum likelihood estimate, and the value of the likelihood is identical upon switching labels, we do not really need to worry about it here, though this could become a problem when computing confidence intervals, since we need to do the MLE calculation many times with different bootstrap sample data sets each time. We will proceed to compute the confidence interval and check to see if the label-switching nonidentifiability presents itself in the results.

The log-likelihood

With these two considerations in mind, let’s code up the log-likelihood.

[4]:

def log_like_mix(alpha1, b1, alpha2, b2, w, n):
    """Log-likeihood of binary Negative Binomial mixture model."""
    # Enforce w as a probability
    if w < 0 or w > 1:
        return -np.inf

    # Physical bounds on parameters
    if alpha1 < 0 or alpha2 < 0 or b1 < 0 or b2 < 0:
        return -np.inf

    # Pieces of log likelihood
    logx1 = st.nbinom.logpmf(n, alpha1, 1 / (1 + b1))
    logx2 = st.nbinom.logpmf(n, alpha2, 1 / (1 + b2))

    # Multipliers for log-sum-exp
    lse_coeffs = np.tile([w, 1 - w], [len(n), 1]).transpose()

    # log-likelihood for each measurement
    log_likes = scipy.special.logsumexp(np.vstack([logx1, logx2]), axis=0, b=lse_coeffs)

    return np.sum(log_likes)

Initial guess

The optimization problem is more difficult now, since we have five parameters. We should provide the best initial guesses we can. To come up with guesses, we note in the ECDF that the second inflection point occurs around a value of the ECDF of 0.2. We thus can guess \(w \approx 0.2\). To get guesses for \(\alpha_1\) and \(b_1\), we take the first 20% of the counts and get an MLE using the single-Negative Binomial model. We do the same for the remaining 80% of counts to get guesses for \(\alpha_2\) and \(b_2\). To do this, we need to borrow our MLE function from a previous lesson.

[5]:

def log_like_iid_nbinom(params, n):
    """Log likelihood for i.i.d. NBinom measurements, parametrized
    by alpha, b=1/beta."""
    alpha, b = params

    if alpha <= 0 or b <= 0:
        return -np.inf

    return np.sum(st.nbinom.logpmf(n, alpha, 1/(1+b)))


def mle_iid_nbinom(n):
    """Perform maximum likelihood estimates for parameters for i.i.d.
    NBinom measurements, parametrized by alpha, b=1/beta"""
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")

        res = scipy.optimize.minimize(
            fun=lambda params, n: -log_like_iid_nbinom(params, n),
            x0=np.array([3, 3]),
            args=(n,),
            method='Powell'
        )

    if res.success:
        return res.x
    else:
        raise RuntimeError('Convergence failed with message', res.message)

We can now write a function to get an initial guess of the \(\alpha\)’s and \(b\)’s.

[6]:

def initial_guess_mix(n, w_guess):
    """Generate initial guess for mixture model."""
    n_low = n[n < np.percentile(n, 100 * w_guess)]
    n_high = n[n >= np.percentile(n, 100 *  w_guess)]

    alpha1, b1 = mle_iid_nbinom(n_low)
    alpha2, b2 = mle_iid_nbinom(n_high)

    return alpha1, b1, alpha2, b2

Let’s give it a test run.

[7]:

# Extract the values for Rex1
n = df['Rex1'].to_numpy()

# Guess for w
w_guess = 0.2

# Generate initial guess
initial_guess_mix(n, w_guess)

[7]:

(2.2311645373503657, 8.87458109227595, 6.5393762491360174, 25.47713821741079)

The guesses seem reasonable, so we will proceed using this method of generating initial guesses.

Performing the optimization

Now we’ll code up a function to generate the initial guess, and then perform the optimization. Note that below I have decreased the tolerance down to \(10^{-6}\). I found this to be necessary for this problem; the solver would be off by about 5% if I gave it similar, but different, initial guesses. The lower the tolerance, the closer to zero you require the gradient of the log likelihood to be.

[8]:

def mle_mix(n, w_guess):
    """Obtain MLE estimate for parameters for binary mixture
    of Negative Binomials."""
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")

        res = scipy.optimize.minimize(
            fun=lambda params, n: -log_like_mix(*params, n),
            x0=[*initial_guess_mix(n, w_guess), w_guess],
            args=(n,),
            method='Powell',
            tol=1e-6,
        )

    if res.success:
        return res.x
    else:
        raise RuntimeError('Convergence failed with message', res.message)

Let’s get our MLE!

[9]:

popt = mle_mix(n, w_guess)

# Check out optimal parameters
print('α1 = {0:f}\nb1 = {1:f}\nα2 = {2:f}\nb2 = {3:f}\nw = {4:f}'.format(*popt))

α1 = 3.497007
b1 = 4.104918
α2 = 5.089626
b2 = 31.810367
w = 0.160422

Comparison to empirical CDF

Let’s now see how the CDFs parametrized by the MLE compare with the ECDF. The theoretical CDF for a mixture model is computed from the weights of each distribution in the mixture. If the CDF from the first Negative Binomial distribution is \(F_1(n;\alpha_1^*,b_1^*)\) and that of the second is \(F_2(n;\alpha_2^*,b_2^*)\), then the theoretical ECDF for the mixture model, parametrized by the MLE, is

\begin{align} F(n;\alpha_1^*, b_1^*, \alpha_2^*, b_2^*, w^*) = w^* F_1(n;\alpha_1^*,b_1^*) + (1-w^*) F_2(n;\alpha_2^*,b_2^*). \end{align}

[10]:

# Unpack parameters
alpha1_mle, b1_mle, alpha2_mle, b2_mle, w_mle = popt

n_theor = np.arange(0, n.max()+1)
cdf_theor = w_mle * st.nbinom.cdf(n_theor, alpha1_mle, 1/(1+b1_mle))
cdf_theor += (1 - w_mle) * st.nbinom.cdf(n_theor, alpha2_mle, 1/(1+b2_mle))

# Weave together to make staircase for discrete distribution
n_plot, cdf_plot = bebi103.viz.cdf_to_staircase(n_theor, cdf_theor)

p.line(n_plot, cdf_plot, line_color='orange', line_width=2)
bokeh.io.show(p)

Alternatively, and in fact better, we can make a predictive ECDF plot. We will not do that here, as predictive ECDFs are covered in future lessons.

We can, for comparison, also add the theoretical CDF for the Negative Binomial (non-mixture) model.

[11]:

alpha_mle, b_mle = mle_iid_nbinom(n)

cdf_theor = st.nbinom.cdf(n_theor, alpha_mle, 1 / (1 + b_mle))

# Weave together to make staircase for discrete distribution
n_plot, cdf_plot = bebi103.viz.cdf_to_staircase(n_theor, cdf_theor)

p.line(n_plot, cdf_plot, line_color='tomato', line_width=2)
bokeh.io.show(p)

We will implement graphical methods of model assessment later in this lesson, but it is clear from the crude plot above that the Negative Binomial model is not commensurate with the data, but the mixture model is.

Confidence interval for the mixture model

Let’s now compute a confidence interval for the mixture model. To use bebi103.bootstrap.draw_bs_reps_mle(), we need to first write a function to generate data out of the mixture model. To do this, we first draw a random number to see which Negative Binomial distribution to draw the count out of (either the one parametrized with \(\alpha_1\) and \(b_1\), or the one parametrized with \(\alpha_2\) and \(b_2\)). We then draw a count out of the distribution we chose.

[12]:

def gen_mix(params, size, rng):
    """Generate data for the mixture model."""
    alpha1, b1, alpha2, b2, w = params

    n = np.empty(size)
    for i in range(size):
        low_cell_type = rng.uniform() < w

        if low_cell_type:
            n[i] = rng.negative_binomial(alpha1, 1/(1+b1))
        else:
            n[i] = rng.negative_binomial(alpha2, 1/(1+b2))

    return n

Now that we have this function in hand, we can generate our bootstrap samples (in parallel). This can take a while.

[13]:

bs_reps = bebi103.bootstrap.draw_bs_reps_mle(
    mle_mix,
    gen_mix,
    n,
    mle_args=(w_guess,),
    gen_args=(),
    size=10_000,
    n_jobs=9,
)

Now that we have the bootstrap samples, we can compute our confidence intervals by taking percentiles. The columns are respectively for \(\alpha_1^*\), \(b_1^*\), \(\alpha_2^*\), \(b_2^*\), and \(w^*\).

[14]:

np.percentile(bs_reps, [2.5, 97.5], axis=0)

[14]:

array([[ 2.00339497,  1.5458858 ,  4.16832771, 25.1146269 ,  0.11619696],
       [ 8.41091976,  8.45923126,  6.45015086, 38.94580183,  0.20824756]])

We can visualize the confidence region with a corner plot.

[27]:

# Package replicates in data frame for plotting
df_res = pl.DataFrame(data=bs_reps, schema=["α1*", "b1*", "α2*", "b2*", "w*"])

p = bebi103.viz.corner(
    samples=df_res,#.loc[inds, :],
    parameters=["α1*", "b1*", "α2*", "b2*", "w*"],
)

bokeh.io.show(p)

The corner plot of the confidence region shows that \(\alpha_1^*\) and \(b_1^*\) are strongly correlated, as are \(\alpha_2^*\) and \(b_2^*\). However, the MLEs of the parameters for the respective Negative Binomials are not correlated with each other.

We note that because \(w\) is always much smaller than 1/2 and unimodal, we are having problems with the label-switching nonidentifiability.

A mixture model for all cells

If we assume a Negative Binomial (non-mixture) model for the mRNA transcript counts, we can independently perform maximum likelihood estimates for the burst frequency and burst size for each gene. However, in the mixture model, we do need to take into account the fact that the mRNA counts for the respective genes are measured in the same cells. In our modeling, that is manifest in the parameter \(w\), which specifies the relative fraction of cell types, being shared among all four genes. The generative PMF for a single set of observations \(\mathbf{n} = \{n_\mathrm{Rex1}, n_\mathrm{Rest}, n_\mathrm{Nanog}, n_\mathrm{Prdm14}\}\) is given by

\begin{align} f(\mathbf{n};\boldsymbol{\alpha}, \mathbf{b}, w) = w\prod_j x_{1j} + (1-w)\prod_j x_{2j}, \end{align}

where

\begin{align} x_{ij} = \frac{(n + \alpha_{ij} - 1)!}{n!\,(\alpha_{ij}-1)!}\,\left(\frac{1}{1+b_{ij}}\right)^{\alpha_{ij}}\left(\frac{b_{ij}}{1+b_{ij}}\right)^{n}, \end{align}

where \(i\) indexes the distribution of the mixture and \(j\) indexes the gene. We can code up the log likelihood of this by modifying our original log_like_mix() function to take in arrays.

[16]:

def log_like_mix_all(alpha1, b1, alpha2, b2, w, n):
    """Log-likelihood of a mixuture model for
    all genes. Here, `alpha1`, `b1`, `alpha2`, and `b2` are
    all length 4 arrays, and `n` is 279 by 4. The
    parameter `w`, however is a scalar, shared by all genes.
    """
    # Fix nonidentifiability be enforcing values of w
    if w < 0 or w > 1.0:
        return -np.inf

    # Physical bounds on parameters
    if np.any(alpha1 < 0) or np.any(alpha2 < 0) or np.any(b1 < 0) or np.any(b2 < 0):
        return -np.inf

    # The sum is over genes
    logx1 = st.nbinom.logpmf(n, alpha1, 1 / (1 + b1)).sum(axis=1)
    logx2 = st.nbinom.logpmf(n, alpha2, 1 / (1 + b2)).sum(axis=1)

    # Multipliers for log-sum-exp
    lse_coeffs = np.tile([w, 1 - w], [len(n), 1]).transpose()

    # log-likelihood for each measurement
    log_likes = scipy.special.logsumexp(np.vstack([logx1, logx2]), axis=0, b=lse_coeffs)

    return np.sum(log_likes)

The initial guess for this model is also a bit more challenging. While we can see immediately from the Rex1 counts that the lower mRNA counts account for about 20% of the cells, this is not obvious in the others. We can, however label a “pseudo cell type” for each row of the data frame; if the Rex1 expression is below the percentile corresponding to our guess for the value of \(w\), we consider this one part of the mixture, and if it is above, it is part of the other.

[17]:

def initial_guess_mix_all(df, w_guess):
    pseudo_cell_type = df["Rex1"] < np.percentile(df["Rex1"], 100*w_guess)

    guess = []

    for gene in df.schema:
        n = df[gene].to_numpy()
        n_low = n[pseudo_cell_type]
        n_high = n[~pseudo_cell_type]

        alpha1, b1 = mle_iid_nbinom(n_low)
        alpha2, b2 = mle_iid_nbinom(n_high)

        guess += [alpha1, b1, alpha2, b2]

    return guess + [w_guess]

For speed later when we use bootstrap methods to compute the confidence intervals, we will compute this initial guess once and use it over and over to save on the four optimizations that are necessary to compute the initial guess.

[18]:

params_0 = initial_guess_mix_all(df, 0.2)

Also for speed, we should extract the counts from the data frame as a Numpy array.

[19]:

n = df.to_numpy()

Now, we can write a function to perform the MLE. First, we need to write a function to compute the negative log likelihood, including unpacking the 17 parameters.

[20]:

def neg_log_like_all(params, n):
    """
    Negative log-likelihood.
    """
    n_genes = n.shape[1]
    alpha1 = params[:n_genes]
    b1 = params[n_genes:2*n_genes]
    alpha2 = params[2*n_genes:3*n_genes]
    b2 = params[3*n_genes:4*n_genes]
    w = params[-1]

    return -log_like_mix_all(alpha1, b1, alpha2, b2, w, n)

Finally, we can perform the optimization. Because we will use bootstrapping in a moment to compute confidence intervals for the parameters, we will encapsulate the MLE calculation in a function.

[21]:

def mle_mix_all(n, params_0):
    """Obtain MLE estimate for parameters for binary mixture
    of Negative Binomials."""
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")

        res = scipy.optimize.minimize(
            fun=neg_log_like_all,
            x0=params_0,
            args=(n,),
            method='powell',
            tol=1e-6,
        )

    if res.success:
        return res.x
    else:
        raise RuntimeError('Convergence failed with message', res.message)

Let’s do it!

[22]:

mle = mle_mix_all(n, params_0)

# Put in a data frame for ease of reference
df_mle = pl.DataFrame(
    data=np.concatenate((mle, np.array([mle[-1]] * 3))).reshape(5, 4),
    schema=[(col, float) for col in df.schema],
)
df_mle = df_mle.with_columns(parameter=pl.Series(["alpha1", "b1", "alpha2", "b2", "w"]))

# Take a look at the results
df_mle

[22]:

shape: (5, 5)

Rex1	Rest	Nanog	Prdm14	parameter
f64	f64	f64	f64	str
7.182746	8.977211	5.392818	0.956715	"alpha1"
24.600379	9.607683	21.07764	6.468837	"b1"
1.112493	3.56355	1.209395	0.440802	"alpha2"
45.001282	13.736508	18.132925	1.665332	"b2"
0.696519	0.696519	0.696519	0.696519	"w"

The MLEs converged on \(w \approx 0.7\), which corresponds to \(w \approx 0.3\) from before (we encountered label switching in this optimization). We can do a quick graphical assessment of the model.

[23]:

def plot_theor_cdf(gene, p):
    """Add a theoretical CDF to plot"""
    n_theor = np.arange(0, df[gene].max() + 1)
    w, alpha1, alpha2, b1, b2 = [
        df_mle.filter(pl.col("parameter") == param)[gene].item()
        for param in ("w", "alpha1", "alpha2", "b1", "b2")
    ]
    cdf_theor = w * st.nbinom.cdf(n_theor, alpha1, 1 / (1 + b1))
    cdf_theor += (1 - w) * st.nbinom.cdf(n_theor, alpha2, 1 / (1 + b2))

    # Weave together to make staircase for discrete distribution
    n_plot, cdf_plot = bebi103.viz.cdf_to_staircase(n_theor, cdf_theor)

    p.line(n_plot, cdf_plot, color="orange", line_width=2)

    return p


# Make a list of plots
plots = [
    iqplot.ecdf(
        data=df,
        q=gene,
        frame_height=125,
        frame_width=250,
        conf_int=True,
        title=gene,
    )
    for gene in df.schema
]

# Add CDFs
for i, gene in enumerate(df.schema):
    plots[i] = plot_theor_cdf(gene, plots[i])

# Show them as a grid
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

The mixture model seems to fit all four reasonably well, with the theoretical CDF parametrized by the MLE mostly falling within the 95% confidence interval of the ECDF. (We will explore other ways to visualize and assess models in later lessons.) However, we note that the Rex1 data do see some deviations on the tail of the distribution for small counts.

Confidence intervals for the mixture model for all genes

We can attempt to compute confidence regions for the MLEs for this model for all four genes. We again start by writing a function to generate the data from the mixture model.

[24]:

def gen_mix_all(params, size, rng):
    """Generate data for the mixture model."""
    alpha1 = params[:4]
    b1 = params[4:8]
    alpha2 = params[8:12]
    b2 = params[12:16]
    w = params[-1]

    n = np.empty((size, 4))
    for i in range(size):
        low_cell_type = rng.uniform() < w

        if low_cell_type:
            n[i] = rng.negative_binomial(alpha1, 1 / (1 + b1))
        else:
            n[i] = rng.negative_binomial(alpha2, 1 / (1 + b2))

    return n

We can now draw our bootstrap replicates. I will only draw a few replicates for now, though, and the reason will be clear when we see the result.

[25]:

bs_mle_reps = bebi103.bootstrap.draw_bs_reps_mle(
    mle_mix_all,
    gen_mix_all,
    n,
    mle_args=(params_0,),
    gen_args=(),
    size=100,
    n_jobs=3,
)

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/bois/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/bois/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bois/Dropbox/git/bebi103/bebi103/bootstrap.py", line 559, in _draw_bs_reps_mle
    mle_fun(gen_fun(params, *gen_args, size=len(data), rng=rng), *mle_args)
  File "/var/folders/8h/qwnxpqcx6vldhxr71n1582d00000gn/T/ipykernel_41358/3990052399.py", line 18, in mle_mix_all
    raise RuntimeError('Convergence failed with message', res.message)
RuntimeError: ('Convergence failed with message', 'Maximum number of function evaluations has been exceeded.')
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[25], line 1
----> 1 bs_mle_reps = bebi103.bootstrap.draw_bs_reps_mle(
      2     mle_mix_all,
      3     gen_mix_all,
      4     n,
      5     mle_args=(params_0,),
      6     gen_args=(),
      7     size=100,
      8     n_jobs=3,
      9 )

File ~/Dropbox/git/bebi103/bebi103/bootstrap.py:649, in draw_bs_reps_mle(mle_fun, gen_fun, data, mle_args, gen_args, size, n_jobs, progress_bar, rng)
    643 arg_iterable = [
    644     (mle_fun, gen_fun, data, mle_args, gen_args, s, progress_bar, None)
    645     for s in sizes
    646 ]
    648 with multiprocess.Pool(n_jobs) as pool:
--> 649     result = pool.starmap(_draw_bs_reps_mle, arg_iterable)
    651 return np.concatenate(result)

File ~/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py:375, in Pool.starmap(self, func, iterable, chunksize)
    369 def starmap(self, func, iterable, chunksize=None):
    370     '''
    371     Like `map()` method but the elements of the `iterable` are expected to
    372     be iterables as well and will be unpacked as arguments. Hence
    373     `func` and (a, b) becomes func(a, b).
    374     '''
--> 375     return self._map_async(func, iterable, starmapstar, chunksize).get()

File ~/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py:774, in ApplyResult.get(self, timeout)
    772     return self._value
    773 else:
--> 774     raise self._value

File ~/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py:125, in worker()
    123 job, i, func, args, kwds = task
    124 try:
--> 125     result = (True, func(*args, **kwds))
    126 except Exception as e:
    127     if wrap_exception and func is not _helper_reraises_exception:

File ~/miniconda3/envs/bebi103_build/lib/python3.12/site-packages/multiprocess/pool.py:51, in starmapstar()
     50 def starmapstar(args):
---> 51     return list(itertools.starmap(args[0], args[1]))

File ~/Dropbox/git/bebi103/bebi103/bootstrap.py:559, in _draw_bs_reps_mle()
    554 else:
    555     iterator = range(size)
    557 return np.array(
    558     [
--> 559         mle_fun(gen_fun(params, *gen_args, size=len(data), rng=rng), *mle_args)
    560         for _ in iterator
    561     ]
    562 )

Cell In[21], line 18, in mle_mix_all()
     16     return res.x
     17 else:
---> 18     raise RuntimeError('Convergence failed with message', res.message)

RuntimeError: ('Convergence failed with message', 'Maximum number of function evaluations has been exceeded.')

We got a runtime error that the solver did not converge. This can happen; finding MLEs for a large number of parameters is very difficult and can often lead to this kind of failure. We will therefore not pursue confidence intervals for this more complicated mixture model.

That said, given the difficulty in finding MLEs for complicated models, we can’t be too sure that the single maximum likelihood estimate we got for the nineteen parameters of this model is even a good one. This should raise a reg flag for you. MLEs are very difficult to obtain for high-dimensional models, and when they are obtained, it is very difficult to verify that they do, in fact, give the maximum likelihood estimate.

We will deal with complex models and ways of investigating the failure modes in finding parameter estimates next term. For now, we will proceed by considering only the single-gene case of Rex1.

The expectation-maximization algorithm

I close this lesson by briefly mentioning that the expectation-maximization (EM) algorithm is a commonly used effective algorithm for performing maximum likelihood estimates for mixture models. It alleviates some (but by no means all) of the problems for finding MLEs of complex models. We will not discuss it, but you should be aware of it.

Computing environment

[26]:

%load_ext watermark
%watermark -v -p numpy,polars,scipy,bokeh,iqplot,bebi103,jupyterlab

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.25.0

numpy     : 1.26.4
polars    : 1.1.0
scipy     : 1.13.1
bokeh     : 3.4.1
iqplot    : 0.3.7
bebi103   : 0.1.21
jupyterlab: 4.0.13