Homework 9.2: Data transformations and parameter estimation (30 pts)

Data download

We often want to ascertain how tightly two proteins are bound by measuring their dissociation constant, \(K_d\). This is usually done by doing a titration experiment and then performing a maximum likelihood estimate of \(K_d\). For example, imagine two proteins, \(a\) and \(b\) may bind to each other in the reaction

\begin{align} ab \rightleftharpoons a + b \end{align}

with dissociation constant \(K_d\). At equilibrium

\begin{align} K_d = \frac{c_a\,c_b}{c_{ab}}, \end{align}

were \(c_i\) is the concentration of species \(i\). If we add known amounts of \(a\) and \(b\) to a solution such that the total concentration of a is \(c_a^0\) and the total concentration of b is \(c_b^0\), we can compute the equilibrium concentrations of all species. Specifically, in addition to the equation above, we have conservation of mass equations,

\begin{align} c_a^0 &= c_a + c_{ab}\\[1em] c_b^0 &= c_b + c_{ab}, \end{align}

fully specifying the problem. We can solve the three equations for \(c_{ab}\) in terms of the known quantities \(c_a^0\) and \(c_b^0\), along with the parameter we are trying to measure, \(K_d\). We get

\begin{align} c_{ab} = \frac{2c_a^0\,c_b^0}{K_d+c_a^0+c_b^0 + \sqrt{\left(K_d+c_a^0+c_b^0\right)^2 - 4c_a^0\,c_b^0}}. \end{align}

The technique, then, is to hold \(c_a^0\) fixed and measure \(c_{ab}\) for various \(c_b^0\). We can then perform devise a variate-covariate model and obtain an MLE of \(K_d\).

In order to do this, though, we need some readout of \(c_{ab}\). For this problem, we will use FRET (fluorescence resonance energy transfer) to monitor how much of \(a\) is bound to \(b\). Specifically, we take \(a\) with a fluorophore and \(b\) is a receptor. When the two are unbound, we get a fluorescence signal per molecule of \(f_0\). When they are bound, the receptor absorbs the light coming out of the fluorophore, so we get less fluorescence per molecule, which we will call \(f_q\) (for “quenched”). Let \(f\) be the total per-fluorophore fluorescence signal. Then, the measured fluorescence signal, \(F\), is

\begin{align} F = c_a^0\,V f = \left(c_a \,f_0 + c_{ab}\, f_q\right)V, \end{align}

where \(V\) is the reaction volume.

As is commonly done by biochemists, we can define a FRET efficiency, \(e\), as

\begin{align} e = 1 - \frac{f}{f_0}. \end{align}

If we measure \(F_0\), the measured fluorescence when there is no b protein in the sample, we can compute the FRET efficiency from the measured values \(F\) and \(F_0\)

\begin{align} e = 1 - \frac{c_a^0\,V f}{c_a^0\,Vf_0} = 1 - \frac{F}{F_0}. \end{align}

Substituting in our expressions for \(F\) and \(F_0\), we get

\begin{align} e = 1 - \frac{\left(c_a \,f_0 + c_{ab}\, f_q\right)V}{c_a^0\,V f_0} = 1 - \frac{c_a}{c_a^0} - \frac{c_{ab}}{c_a^0}\,\frac{f_q}{f_0}. \end{align}

Using the fact that \(c_a^0 = c_a + c_{ab}\), this becomes

\begin{align} e = \left(1-\frac{f_q}{f_0}\right)\frac{c_{ab}}{c_a^0}. \end{align}

In other words, the FRET efficiency is proportional to the fraction of a that is bound, or

\begin{align} e = \alpha \, \frac{c_{ab}}{c_a^0} = \frac{2\alpha\,c_b^0}{K_d+c_a^0+c_b^0 + \sqrt{\left(K_d+c_a^0+c_b^0\right)^2 - 4c_a^0\,c_b^0}}, \end{align}

where \(\alpha = 1 - f_q/f_0\). Biochemists then typically consider \(e\) to be a variate (and \(c_a^0\) and \(c_b^0\) to be covariates) and then obtain MLEs for the parameters \(\alpha\) and \(K_d\).

a) Load in the data for one of these FRET efficiency titration curves. You can download the data set here. These are real data from here on campus, collected by a former student in this class, Emily Blythe. They were never published, but were preliminary experiments for this publication. To get the fluorescence for each measurement, you need to subtract the background fluorescence. Do that, and then also compute the FRET efficiency.

b) One could use a variate-covariate model based on the typical approach used by biochemists using the FRET efficiency as described above to obtain estimates for \(K_d\) and \(\alpha\). Alternatively, one could instead directly use the measured (background-subtracted) fluorescence and build a variate-covariate model around the equation

\begin{align} F = \left(c_a \,f_0 + c_{ab}\, f_q\right)V, \end{align}

where there are now three parameters, \(K_d\), \(f_0V\), and \(f_qV\), from which \(\alpha\) may be calculated as \(\alpha = 1 - f_qV/f_0V\). Which of these two approaches is preferred, and why?

c) Provide MLEs for \(\alpha\) and \(K_d\), along with confidence intervals, and display a graphical model assessment.