Processing math: 50%
+ - 0:00:00
Notes for current slide
Notes for next slide

QTL: Interval Mapping

Jinliang Yang

May 1, 2024

1 / 37

Interval Mapping

Consider a QTL flanked by two markers segregating in an F2 population.

2 / 37

Interval Mapping

Consider a QTL flanked by two markers segregating in an F2 population.

Basics for interval mapping

To fully understand interval mapping, first we need to cover some basics:

  • Conditional probabilities
  • Likehood inference and maximum likelihood
3 / 37

Conditional probabilities of QTL genotypes

Consider a QTL flanked by two markers segregating in an F2 population.

The recombination frequency between M1 and Q is 0.05 or 5cM and between Q and M2 is 0.20 or 20cM. Assume no interference.

  • What are the probabilities of the three QTL genotypes given a marker genotype of M1M1M2M2?

P(QQ|M1M1M2M2)P(Qq|M1M1M2M2)P(qq|M1M1M2M2)

4 / 37

Conditional probabilities of QTL genotypes

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=P(M1M1QQM2M2)P(M1M1M2M2)

5 / 37

Conditional probabilities of QTL genotypes

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=P(M1M1QQM2M2)P(M1M1M2M2)

P(M1M1M2M2)=P(M1M2)2=(1c122)2

6 / 37

Conditional probabilities of QTL genotypes

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=P(M1M1QQM2M2)P(M1M1M2M2)

P(M1M1M2M2)=P(M1M2)2=(1c122)2

P(M1M1QQM2M2)=P(M1QM2)2=((1c1Q)(1c2Q)2)2

7 / 37

Conditional probabilities of QTL genotypes

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=P(M1M1QQM2M2)P(M1M1M2M2)

P(M1M1M2M2)=P(M1M2)2=(1c122)2

P(M1M1QQM2M2)=P(M1QM2)2=((1c1Q)(1c2Q)2)2

P(QQ|M1M1M2M2)=((1c1Q)(1c2Q)2)2(1c122)2=0.973

8 / 37

Conditional probabilities

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=((1c1Q)(1c2Q)2)2(1c122)2=0.973

9 / 37

Conditional probabilities

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=((1c1Q)(1c2Q)2)2(1c122)2=0.973


P(Qq|M1M1M2M2)=P(M1M1QqM2M2)P(M1M1M2M2)P(M1M1QqM2M2)=P(M1QM2)P(M1qM2)+P(M1qM2)P(M1QM2)=2((1c1Q)(1c2Q)2)(c1Qc2Q2)=0.0038

Here, c1Q=0.05 and c2Q=0.2

10 / 37

Therefore, P(Qq|M1M1M2M2)=P(M1M1QqM2M2)P(M1M1M2M2)=0.026

Conditional probabilities

Consider a QTL flanked by two markers segregating in an F2 population.

P(QQ|M1M1M2M2)=((1c1Q)(1c2Q)2)2(1c122)2=0.973P(Qq|M1M1M2M2)=2((1c1Q)(1c2Q)2)(c1Qc2Q2)(1c122)2=0.026P(qq|M1M1M2M2)=(c1Qc2Q2)2(1c122)2=1.7×104

11 / 37

Conditional probabilities

If the genotypic values for each of the QTL genotypes were given as below:

Genotype Value Probability
QQ 7 P(QQ/M1M1M2M2)=0.973
Qq 5 P(Qq/M1M1M2M2)=0.026
qq 0 P(qq/M1M1M2M2)=1.7×104
  • What is the expected value of individuals with the M1M1M2M2?
12 / 37

Conditional probabilities

If the genotypic values for each of the QTL genotypes were given as below:

Genotype Value Probability
QQ 7 P(QQ/M1M1M2M2)=0.973
Qq 5 P(Qq/M1M1M2M2)=0.026
qq 0 P(qq/M1M1M2M2)=1.7×104
  • What is the expected value of individuals with the M1M1M2M2?

E(M1M1M2M2)=0.973×7+0.026×5+1.7×104×0=6.94

13 / 37

Maximum likelihood

What does it mean to calculate the likelihood of somethings?

The likelihood function is represented as L(θ|s)=fθ(s) .

This function represents the likelihood of a certain parameter value ( θ ) given a data vector ( s ).

14 / 37

Maximum likelihood

What does it mean to calculate the likelihood of somethings?

The likelihood function is represented as L(θ|s)=fθ(s) .

This function represents the likelihood of a certain parameter value ( θ ) given a data vector ( s ).

Interpretation of the likelihood function

  • The fθ(s) represents the probability density function with θ set as the parameter and s set as the observations.

  • The value of L(θ|s) is called the likelihood of θ.

15 / 37

Maximum likelihood

What does it mean to calculate the likelihood of somethings?

The likelihood function is represented as L(θ|s)=fθ(s) .

This function represents the likelihood of a certain parameter value ( θ ) given a data vector ( s ).

Estimation

To find the value of θ with the maximum likelihood, a range of theta values is tested against the observed data, and the θ giving the highest likelihood is determined to be the maximum likelihood estimator of θ.

Note: we are fixing the data and varying the parameter.

16 / 37

Example: Binomial distribution

You tossed a coin ten times and observed four heads.

What is the maximum likelihood estimator of p, the probability of obtaining a head?

17 / 37

Example: Binomial distribution

You tossed a coin ten times and observed four heads.

What is the maximum likelihood estimator of p, the probability of obtaining a head?

  • Substitute the observation into the binomial probability density function (pdf) and vary the value of p.

\begin{align*} & L(p | k) = \binom{n}{k}p^kq^{n-k} \\ & L(p | 4) = \binom{10}{4}p^4q^{10-4} \\ \end{align*}

18 / 37

Example: Binomial distribution

You tossed a coin ten times and observed four heads.

What is the maximum likelihood estimator of p, the probability of obtaining a head?

  • Substitute the observation into the binomial probability density function (pdf) and vary the value of p.

\begin{align*} & L(p | k) = \binom{n}{k}p^kq^{n-k} \\ & L(p | 4) = \binom{10}{4}p^4q^{10-4} \\ \end{align*}

  • Vary p from 0 to 1, with step size 0.1.
  • Each p is assumed to be the true value, then the likelihood of the data is calculated.

\begin{align*} & L(0 | 4) = 0; L(0.1 | 4) = 0.01; L(0.2 | 4) = 0.09; ... \\ & L(0.4 | 4) = 0.25; ... \\ & L(0.7 | 4) = 0.04; ... ; L(1.0 | 4) = 0 \\ \end{align*}

p=0.4 is our maximum likelihood (ML) estimator for p.

19 / 37

Construction of QTL likelihood functions?

When a major bi-allelic locus is segregating in a population. The distribution of the entire population can be broken into three underlying distributions:

  • The distribution of the QQ individuals,
  • Qq individuals,
  • qq individuals.
20 / 37

Construction of QTL likelihood functions?

When a major bi-allelic locus is segregating in a population. The distribution of the entire population can be broken into three underlying distributions:

  • The distribution of the QQ individuals,
  • Qq individuals,
  • qq individuals.

The likelihood of the genotypic parameters given phenotypic value z is:

\begin{align*} L(z) & = L(\mu_{QQ}, \mu_{Qq}, \mu_{qq}, \sigma^2 | z) \\ & = P(QQ)f(z, \mu_{QQ}, \sigma^2) + P(Qq)f(z, \mu_{Qq}, \sigma^2) + P(qq)f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

  • Where P(Q_k) equals the probability of a particular genotype

    • e.g. 1/4 in an F2 population for QQ
  • f(z, \mu_k, \sigma^2) is the probability density function for a normally distributed random variable with mean \mu_k and variance \sigma^2.

    • The mean value of QQ = a, Qq=d and qq = -a.
21 / 37

Construction of QTL likelihood functions?

When a major bi-allelic locus is segregating in a population. The distribution of the entire population can be broken into three underlying distributions:

  • The distribution of the QQ individuals,
  • Qq individuals,
  • qq individuals.

The likelihood of the genotypic parameters given phenotypic value z is:

\begin{align*} L(z) & = L(\mu_{QQ}, \mu_{Qq}, \mu_{qq}, \sigma^2 | z) \\ & = P(QQ)f(z, \mu_{QQ}, \sigma^2) + P(Qq)f(z, \mu_{Qq}, \sigma^2) + P(qq)f(z, \mu_{qq}, \sigma^2)\\ \end{align*}


For n random (unrelated) individuals, the overall likelihood is the product of the n individual likelihoods

\begin{align*} L(z_1, z_2, .., z_n) & = L(\mathbf{z}) = \prod_{j=1}^{n}{L(z_j)}\\ \end{align*}

22 / 37

Construction of QTL likelihood functions?

Now, let's return to our conditional probabilities, specifically the probability of a QTL genotype given a marker genotype.

The likelihood of an individual with phenotypic value z given a marker genotype M_i is represented as:

\begin{align*} L(z|M_i) & = P(QQ|M_i)f(z, \mu_{QQ}, \sigma^2) + P(Qq|M_i)f(z, \mu_{Qq}, \sigma^2) + P(qq|M_i)f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

23 / 37

Construction of QTL likelihood functions?

Now, let's return to our conditional probabilities, specifically the probability of a QTL genotype given a marker genotype.

The likelihood of an individual with phenotypic value z given a marker genotype M_i is represented as:

\begin{align*} L(z|M_i) & = P(QQ|M_i)f(z, \mu_{QQ}, \sigma^2) + P(Qq|M_i)f(z, \mu_{Qq}, \sigma^2) + P(qq|M_i)f(z, \mu_{qq}, \sigma^2)\\ \end{align*}


For example, the likelihood for genotype MM is: \begin{align*} L(z|MM) & = P(QQ|MM)f(z, \mu_{QQ}, \sigma^2) + P(Qq|MM)f(z, \mu_{Qq}, \sigma^2) + P(qq|MM)f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

  • The P(Q_k|M_j) parts are a function of the map positions and experimental design, so that

\begin{align*} L(z|MM) & = (1-c)^2f(z, \mu_{QQ}, \sigma^2) + 2c(1-c)f(z, \mu_{Qq}, \sigma^2) + c^2f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

  • And the QTL effects enter through the means and variances of the underlying normal distributions f_{\theta}(z) or f(z, \mu_{Q_k}, \sigma^2).
24 / 37

Back to interval mapping

To calculate the likelihoods for an interval, we simply insert the probabilities of a QTL genotype given a marker interval genotype.

For example,

\begin{align*} L(z|M_1M_1M_2M_2) = & \frac{(1-c_1)^2(1-c_2)^2}{(1-c_{12})^2}f(z, \mu_{QQ}, \sigma^2) \\ & + \frac{2c_1c_2(1-c_1)(1-c_2)}{(1-c_{12})^2}f(z, \mu_{Qq}, \sigma^2) \\ & + \frac{c_1^2c_2^2}{(1-c_{12})^2}f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

25 / 37

Back to interval mapping

To calculate the likelihoods for an interval, we simply insert the probabilities of a QTL genotype given a marker interval genotype.

For example,

\begin{align*} L(z|M_1M_1M_2M_2) = & \frac{(1-c_1)^2(1-c_2)^2}{(1-c_{12})^2}f(z, \mu_{QQ}, \sigma^2) \\ & + \frac{2c_1c_2(1-c_1)(1-c_2)}{(1-c_{12})^2}f(z, \mu_{Qq}, \sigma^2) \\ & + \frac{c_1^2c_2^2}{(1-c_{12})^2}f(z, \mu_{qq}, \sigma^2)\\ \end{align*}

  • This likelihood value is calculated for each genetic position in between the two flanking markers by varying value of recombination rate (c).

  • The span of the entire interval ( c_{12} ) is calculated using a mapping function.

  • The values of \mu_{QQ}, \mu_{Qq}, \mu_{qq}, \sigma^2 are estimated at each genetic position.

26 / 37

Back to interval mapping

Step 1: Calculate the likelihood with a QTL

  • Given c_1, c_2, c_{12} and the phenotypic data

Step 2: Calculate the likelihood that no underlying QTL exists (the data arose in the absence of a QTL).

  • That is, there are no underlying QTL genotypes and the distribution of individuals with the M_1M_1M_2M_2 genotype consists of a single distribution.

Step 3: Finally, compute the ratio.

  • This ratio is what provides the likelihood ratio.
27 / 37

LOD score

The ratio is converted to the famous logarithm of odds (LOD) score

\begin{align*} LOD = log_{10}(\frac{L_{full}}{L_{reduced}}) \end{align*}

  • Where L_{full} is the likelihood of a QTL at assumed genetic position given the data.
  • L_{reduced} is the likelihood of no QTL present given the data.
28 / 37

LOD score

The ratio is converted to the famous logarithm of odds (LOD) score

\begin{align*} LOD = log_{10}(\frac{L_{full}}{L_{reduced}}) \end{align*}

  • Where L_{full} is the likelihood of a QTL at assumed genetic position given the data.
  • L_{reduced} is the likelihood of no QTL present given the data.

If the LOD score is 3, for example, this means that the likelihood for a model including a QTL at the given genetic position is 1,000 times higher than no QTL at that position!

29 / 37

Statistical significance

QTL mapping involves a large number of tests, which requires adjustments for multiple testing to keep the experiment-wise error rate low.

Bonferroni correction

  • Assume all tests are independent, which is not the case in QTL mapping because markers are linked.
  • Overly conservative for QTL mapping.
30 / 37

Statistical significance

QTL mapping involves a large number of tests, which requires adjustments for multiple testing to keep the experiment-wise error rate low.

Permutation test

A commonly used technique for QTL mapping.

31 / 37

Statistical significance

QTL mapping involves a large number of tests, which requires adjustments for multiple testing to keep the experiment-wise error rate low.

Permutation test

A commonly used technique for QTL mapping.

  • Basically, the phenotypic data is randomized relative to the marker data so that the null hypothesis is established.

  • Then, the test statistic for each QTL is calculated and the largest test statistic across the genome is tabulated.

  • This is repeated 1,000 or more times in order to establish an empirical distribution of the test statistic under the null hypothesis.
  • The test statistics calculated for the real data are compared to this distribution to determine the significance level.
32 / 37

Simulating a QTL mapping experiment

library(qtl)
set.seed(12347)
# Five autosomes of cM length 50, 75, 100, 125, 60
L <- c(50, 75, 100, 125, 60)
map <- sim.map(L, L/5+1, eq.spacing=FALSE, include.x=FALSE)
# Simulate a backcross with two QTL
a <- 0.7
mymodel <- rbind(c(1, 40, a), c(4, 100, a))
pop <- sim.cross(map, type="bc", n.ind=200, model=mymodel)
plot.map(pop)

33 / 37

Simulating a QTL mapping experiment

Checking phenotypic distribution

hist(pop$pheno$phenotype, main="simulated phenotype",
breaks=50, xlab="Phenotype", col="#cdb79e")

34 / 37

Single-marker analysis

# single-QTL scan by marker regression with the simulated data
out.mr <- scanone(pop, method="mr")
# plot of marker regression results for chr 4 and 12
plot(out.mr, chr=c(1,2,3,4,5), ylab="LOD Score")

35 / 37

Haley-knott Regression

This is a version of interval mapping which is a very good approximation to interval mapping via maximum likelihood.

# single-QTL scan using Haley-knott Regression approach
out.hk <- scanone(pop, method="hk")
# plot of marker regression results for chr 4 and 12
plot(out.hk, chr=c(1,2,3,4,5), ylab="LOD score")

36 / 37

Plot QTL effect

# summary of out.mr
summary(out.mr, threshold=3)
## chr pos lod
## D1M4 1 32 3.77
effectplot(pop, mname1="D1M4", main="Chr1")

37 / 37

Interval Mapping

Consider a QTL flanked by two markers segregating in an F2 population.

2 / 37
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow