QTL: Single-marker analysis

class: center, middle, inverse, title-slide

# QTL: Single-marker analysis
### Jinliang Yang
### Dec. 11th, 2019

---

# Mapping quantitative trait loci

In past chapters dealing with breeding value and statistics like heritability and genetic correlation, we lumped all the QTL for a trait together into a total, aggregate genotypic value.

A __quantitative trait locus (QTL)__ is a genomic region generating variation for a quantitative trait.
  - All loci affecting a quantitative trait are “quantitative trait loci”.
  
--

Identifying QTL brings in new opportunities for applications:
  - To produce the preferred genotype
  - To understand the molecular mechanism at gene level
  - To locate the direct target for gene editing

---
# QTL mapping or Linkage Mapping

Crosses between inbred lines offer an ideal situation for mapping QTL.
  - The linkage disequilibrium is maximized in an F1 between two inbred lines, and is reduced slowly with subsequent generations of intermating.

--
  
  - This generation of linkage between loci that different between the two inbred lines is exploited for QTL mapping, and is therefore referred to as __linkage mapping__.

---
# Linkage Mapping

Linkage mapping refers to a set of steps taken to associate chromosomal intervals, and ultimately genes (but is very unlikely due to the low mapping resolution), to genetic variation of traits.

These steps include:
  - 1) Create a segregating population
  - 2) Genotype individuals within this population with molecular markers
  - 3) Phenotype the individuals
  - 4) Apply statistical models to associate the markers to the phenotypic variation

The statistical approaches range from simple techniques such as __ANOVA__ to complex __Bayesian models__ capable of zeroing in on QTL-by-QTL interactions.

---
# Single marker analysis

.pull-left[
<div align="center">
<img src="figure21-c2.2.png" height=350>
</div>

c here is the recombination freq.
]

.pull-right[

#### Mean of the __AC__ genotype:
`\begin{align*}
& \frac{1/2(1-c)\times d + 1/2c \times a}{1/2}\\
& = d(1-c) + ca \\
\end{align*}`

#### Mean of the __CC__ genotype:
`\begin{align*}
& \frac{1/2(1-c)\times a + 1/2c \times d}{1/2}\\
& = a(1-c) + cd \\
\end{align*}`

#### Difference between __AC__ and __CC__ value:
`\begin{align*}
& d(1-c) + ca - (a(1-c) + cd)\\
& = (d-a)(1-2c) \\
\end{align*}`

]

---

# BC1 example

Conduct __t-test__ to test the null hypothesis that the mean values of genotype __AC__ and __CC__ are the same.

### A t-test would be

`\begin{align*}
t = \frac{\hat{u}_{AC} - \hat{u}_{CC}}{\sqrt{\frac{\sigma_{AC}^2}{N_{AC}} + \frac{\sigma_{CC}^2}{N_{CC}} } } \\
\end{align*}`

- where `$\sigma^2$` is equal to the within sample variance for each genotype
- `$N$` is equal to the number of individuals in each genotype class.

#### If we get a __p-value < 0.05__: 
  - Reject the null hypothesis that the two genotypes means are the same.
  - Or the means are different. 
  - In the other words, the marker is __linked with a QTL__.

---

# What is the QTL effect ( `$a$` )?

`\begin{align*}
& \hat{u}_{AC} - \hat{u}_{CC} = (d-a)(1-2c) \\
& a = d - \frac{\hat{u}_{AC} - \hat{u}_{CC}}{1-2c} \\
\end{align*}`

Assume no dominance at this locus:

`\begin{align*}
& a = \frac{\hat{u}_{CC} - \hat{u}_{AC}}{1-2c} \\
\end{align*}`

---
# Shortcomings of the single-marker test

### The QTL effect of `$a$`

`\begin{align*}
& a = \frac{\hat{u}_{CC} - \hat{u}_{AC}}{1-2c} \\
\end{align*}`

- It is important to note that these single-marker tests confound __the QTL effect__ and __the recombination frequency__ between the marker and QTL.
  - For this reason, the calculated marker effects and significance do not really tell you how far or how close a marker is to QTL.
  
--

- This test does not tell you if __one QTL__ is controlling the trait, or __two or more linked QTLs__.

---

# Pearl millet example (F2 population)

.pull-left[
<div align="center">
<img src="figure21-c3.1.png" height=300>
</div>

]

---

# Pearl millet example (F2 population)

---

# Pearl millet example (F2 population)

---

# Frequencies in F2

The frequencies of the resulting F2 individuals are simply calculated by multiplying together the frequencies of the gametes united to form those individuals.

Here we use __M__ and __m__ represent molecular marker and __Q__ and __q__ represent QTL.

| Genotype | Value      | Frequency |
| :-------: | :-------: | :--------: | 
| `$MMQQ$`  | a | `$\frac{1}{4}(1-c)^2$` |
| `$MMQq$`  | d | `$\frac{1}{2}c(1-c)$` |
| `$MMqq$`  | -a | `$\frac{1}{4}c^2$` |
| `$MmQQ$`  | a | `$\frac{1}{2}c(1-c)$` |
| `$MmQq$`  | d | `$\frac{1}{2}((1-c)^2+c^2)$` |
| `$Mmqq$`  | -a | `$\frac{1}{2}c(1-c)$` |
| `$mmQQ$`  | a | `$\frac{1}{4}c^2$` |
| `$mmQq$`  | d | `$\frac{1}{2}c(1-c)$` |
| `$mmqq$`  | -a | `$\frac{1}{4}(1-c)^2$` |

---

# Frequencies in F2

- Because we only see the marker genotypes and not the QTL genotypes, the expected means of the different genotypes are only relevant.

- The expected values are calculated simply by taking the __weighted average of the genotypic values__ of the underlying QTL genotypes.

| Genotype  | Expected value |
| :-------: | :-------: | 
| `$MM$`  |  `$a(1-2c) + 2dc(1-c)$` |
| `$Mm$`  |  `$d((1-c)^2 + c^2)$` |
| `$mm$`  |  `$-a(1-2c) + 2dc(1-c)$` |

After some algebra, it can be shown that the differences between the genotypic means in an F2 population are as follows:

`\begin{align*}
& \mu_{MM} - \mu_{mm} = 2a(1-2c) \\
& \mu_{Mm} - \frac{\mu_{MM} - \mu_{mm}}{2} = d(1-2c)^2 \\ 
& \mu_{Mm} - \mu_{mm} = (a+d)(1-2c) \\ 
\end{align*}`

---
# More on single-marker analysis

Markers are tested one by one for their effects on a trait.

This can be done by:

- A __t-test__ contrasting the mean values of two genotypes
- An __F-test__ on all three genotypes to determine if difference in trait values exist among the genotypes.
- __Regression__ of the trait value on the number (dosage) of marker alleles.

---
# Regression approach

The __regression of phenotypic values on allele dosage__ can be represented as:

`\begin{align*}
y_j = \mu + bx_j + e_j
\end{align*}`

- where `$y_j$` is the phenotypic value for individual `$j$`,
- `$\mu$` is the population mean,
- `$b$` is the regression coefficient,
- `$x_j$` represents the number of a particular allele. For example, if `$x_j$` was set to represent the number of M alleles at a marker locus: 
  - `$x_j =0$` for an `$mm$` individual,
  - `$x_j =1$` for an `$Mm$` individual,
  - `$x_j =2$` for an `$MM$` individual.
- `$e_j$` is the residual error for individual `$j$`.

If the regression coefficient is __significantly different from zero__, there is evidence for QTL near this marker.

---
# Simulating a QTL mapping experiment

.pull-left[

```r
library(qtl)
```

```
## Warning: package 'qtl' was built under R version 3.5.2
```

```r
set.seed(12347)
# Five autosomes of cM length 50, 75, 100, 125, 60
L <- c(50, 75, 100, 125, 60)
map <- sim.map(L, L/5+1, eq.spacing=FALSE, include.x=FALSE)

# Simulate a backcross with two QTL
a <- 0.7
mymodel <- rbind(c(1, 40, a),
                 c(4, 100, a))
pop <- sim.cross(map, type="bc", n.ind=200, model=mymodel)
plot.map(pop)
```

<img src="Ch21_2019-c1_files/figure-html/unnamed-chunk-1-1.png" width="70%" style="display: block; margin: auto;" />
]

.pull-left[

### Checking pheno distribution

```r
hist(pop$pheno$phenotype, main="simulated phenotype", 
     breaks=50, xlab="Pheno", col="#cdb79e")
```

<img src="Ch21_2019-c1_files/figure-html/unnamed-chunk-2-1.png" width="80%" style="display: block; margin: auto;" />
]

---
# Simulating a QTL mapping experiment

.pull-left[

### single-marker analysis

```r
# single-QTL scan by marker regression with the simulated data
out.mr <- scanone(pop, method="mr")

# plot of marker regression results for chr 4 and 12
plot(out.mr, chr=c(1,2,3,4,5), ylab="LOD score")
```

<img src="Ch21_2019-c1_files/figure-html/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" />
]

.pull-left[

```r
# summary of out.mr
summary(out.mr, threshold=3)
```

```
##      chr  pos  lod
## D1M5   1 32.1 3.77
```

```r
effectplot(pop, mname1="D1M4", main="Chr1")
```

<img src="Ch21_2019-c1_files/figure-html/unnamed-chunk-4-1.png" width="80%" style="display: block; margin: auto;" />
]

---
# Limitation of single marker analysis

The problem with above analysis is that QTL effect is completely __confounded with QTL position__.

`\begin{align*}
\mu_{Mm} - \mu_{mm} = (a+d)(1-2c) \\ 
\end{align*}`

For exmaple, the expected difference between two genotypes is the function of `$c$`.

Therefore, we do not know if a significant difference 
  - is due to a small QTL right next to the marker
  - or a large QTL further away from the marker

Also, there is no way to tell if a signficant difference is due to
  - One large QTL
  - several smaller QTLs cluster together

---

# Interval Mapping

Interval mapping is able to more __precisely locate the putative QTL within a marker interval__.

Interval mapping was introduced to remedy the deficiencies of single marker analysis (Lander and Botstein, 1989)

- By taking advantage of information from __flanking markers__,

- It can achieve the __same power with fewer progenies__ compared to single marker analysis (Haley and Knott, 1992)

### Basics for interval mapping

To fully understand interval mapping, first we need to cover some basics:

- Conditional probabilities
- Likehood inference and maximum likelihood