qPCR analysis

This post does a step-by-step analysis of quantitative PCR data. It includes compensation for amplification efficiency and the incorporation of both single and multiple loading controls. If you like derivations, keep reading; if you’d rather skip the mathematics, go to the final forms. Excel implementations are provided at the end of this post.

Ct values 101

qPCR yields Ct-values, which is the number of amplification cycles needed before a fluorescent signal is detected. Every cycle, the amount of nucleic material doubles, up to a maximum, which is effectively the limit of detection. The relationship between the number of cycles ($Ct$) and the amount of amplified nucleic material ($C_n$) is:

$$ C_n = Cq \cdot a^n $$

where $a$ is the PCR amplification efficiency (which is \( \approx 2 \), $n$ is the number of cycles, and $Cq$ is the initial amount of material. With qPCR, we measure $n$ when $C_n$ reaches a specific fixed value $K$:

$$ Cq \cdot a^{Ct} = K $$ $$ Cq = \frac{K}{a^{Ct}}$$

qPCR data is often presented is the form of a Ct value. Effectively, this is expression data on a $log_a$ scale:

$$ Ct = log_a\left(\frac{K}{Cq}\right) = -log_a\left(\frac{Cq}{K}\right) $$

Accounting for qPCR amplification efficiency

When interpreting Ct values, we need to take some technical points into account. The qPCR reaction efficiency need not be perfect, which means that the number of Ct cycles is overestimated. This will in turn cause differences in outcomes, in particular when the data are generated on different plates or with different batches of reagents. Using a calibration setup (see Taqman efficiency calculation), the efficiency can be measured and incorporated into the analysis.

Using the equality $log_y(x) = \frac{log_z(x)}{log_z(y)}$, we take change the Ct values to a base 2 logarithm for simplicity:

$$ Ct_{norm} = -log_2(a) \cdot log_2{\left(\frac{Cq}{K}\right)} = log_2(a) \cdot Ct$$

Accounting for sample loading ($- \Delta Ct$)

Furthermore, we need to take into account differences in sample content and loading. This is done by selecting a reference or ‘household’ gene that is assumed to have the same concentration in all samples. Therefore, results are given as $\Delta Ct$ relative to the household gene. To make the results more intuitive by having an increasing value representing increasing expression, the $-\Delta Ct$ is typically presented:

$$ -\Delta Ct_t = log_2(a_r) \cdot Ct_r - log_2(a_t) \cdot Ct_t $$

where $a_r$ and $Ct_r$ refer to the reference (i.e., household), and $a_t$ and $Ct_t$ refer to the target gene values.

Readout: Changes relative to control ($-\Delta\Delta Ct$)

Now that we have properly calibrated and normalized the values, we can compare treatment to control. Since we are then taking differences between two ($-\Delta Ct$) values, the resulting value is typically refered to as ($-\Delta\Delta Ct$):

$$ -\Delta\Delta Ct = \Delta Ct_m - \Delta Ct_t $$

If we start from the non-calibrated Ct values of the mock ($Ct_m$), target ($Ct_t$) and references ($Ct_r$ and $Ct_s$), this is the expression for a normalized $-\Delta\Delta Ct_n$:

$$ -\Delta\Delta Ct = \log_2 \left( a_m \right) \cdot Ct_m - \log_2 \left( a_l \right) \cdot Ct_l - \log_2 \left(a_t\right) \cdot Ct_t - \log_2\left(a_t\right) \cdot Ct_t $$

Incorporating multiple reference genes:

Selecting a relevant household gene is key to obtaining reliable results in qPCR, but depending on the circumstances, a single household reference may not be enough. Using multiple reference genes is then a good way to obtain more stable normalized Ct values. This involves taking the average value of the references and using it to normalize the data.

$$ -\Delta Ct_t = log_2(a_r) \cdot Ct_r - log_2(a_t) \cdot Ct_t $$

We replace $log_2(a_r) \cdot Ct_r$ with the mean of multiple genes. This mean can be computed at the Cq and Ct value level, but this gives non-identical results. We use the Cq level (see also Hellemans et al, 2007, 10.1186/gb-2007-8-2-r19),

$$ log_2(a_r) \cdot Ct_r \rightarrow log_2 \left( \sum_i^{r \rightarrow n} a_i^{-Ct_i} \right)$$

but the disadvantage of this approach is that highly expressed genes dominate the normalization. High values tend to have high variance at the Cq level, and hence they dominate the mean. Since this data is typically exponentially or log-normally distributed, we could use another estimate of average expression, such as log-average (see Dvinge & Bertone, 2009, 10.1093/bioinformatics/btp578). Using the latter option gives a very intuitive solution to computation of normalized Ct data:

$$ log_2(a_r) \cdot Ct_r \rightarrow \frac{1}{n} \sum_i^{r \leftrightarrow n} log_2\left( a_i \right) \cdot Ct_i $$

where $r \leftrightarrow n$ is the the number of reference genes we take the mean of, which each have there specific PCR efficiency factor $a_i$ and Ct value $Ct_i$. This solution is simply using the mean normalized Ct value of the reference genes to compute corrected target Ct values. The final $-\Delta Ct_t$ then becomes

$$ -\Delta Ct_t = \left[ \frac{1}{n} \sum_i^{r \leftrightarrow n} log_2\left( a_i \right) \cdot Ct_i \right] - log_2\left(a_t\right) \cdot Ct_t $$

$-\Delta (\Delta) Ct$ formulae

In summary, the $-\Delta Ct$ with multiple log-averaged household (reference) genes is:

$$ -\Delta Ct_t = \left[ \frac{1}{n} \sum_i^{r \leftrightarrow n} log_2\left( a_i \right) \cdot Ct_i \right] - log_2\left(a_t\right) \cdot Ct_t $$

and the $-\Delta\Delta Ct$ (i.e., target vs mock $Ct$, $\left(-\Delta Ct_t \right) - \left(-\Delta Ct_m \right)$) then follows:

$$ -\Delta\Delta Ct = \left[\left(\frac{1}{n} \sum_j^{s \leftrightarrow n} log_2\left( a_j \right) \cdot Ct_j \right) - log_2\left(a_m\right) \cdot Ct_m \right] $$

$$ - \left[ \left( \frac{1}{n} \sum_i^{r \leftrightarrow n} log_2\left( a_i \right) \cdot Ct_i \right) - log_2\left(a_t\right) \cdot Ct_t \right]$$

where $a_m$, $Ct_m$, $a_j$ and $Ct_j$ are the mock and mock reference values, and $a_t$, $Ct_t$, $a_i$ and $Ct_i$ are the target and target reference values, respectively.

$-\Delta (\Delta) Ct$ analysis in Excel

The implementation in Excel is pretty straight-forward if you use log-averaging:

  • calibrate all of the raw Ct values by multiplying them with the $log_2$ of the PCR amplification efficiency for every gene on every plate:

=LOG({PCR E},2)*{Ct}

  • Compute the mean of the calibrated references (if more than one):

=AVERAGE({calibrated Ct references})

  • Subtract the calibrated gene Ct value from the (average) calibrated reference Ct value to get the $-\Delta Ct$:

={calibrated mean reference Ct}-{calibrated target Ct}

These values can be used to test for statistical significance between sample groups. The difference between a treatment and a mock group is then the $-\Delta\Delta Ct$ value:

{normalized treatment} = {calibrated treatment mean reference Ct} - {calibrated treatment target Ct}

{normalized mock} = {calibrated mock mean reference Ct} - {calibrated mock target Ct}

$-\Delta\Delta Ct$={normalized mock} - {normalized treatment}