QTL: single-marker analysis

class: center, middle, inverse, title-slide

.title[
# QTL: single-marker analysis
]
.author[
### Jinliang Yang
]
.date[
### April 29, 2024
]

---

# Mapping quantitative trait loci

In past chapters dealing with breeding value and statistics like heritability and genetic correlation, we lumped all the QTLs for a trait together into a total, aggregate genotypic value.

A __quantitative trait locus (QTL)__ is a genomic region generating variation for a quantitative trait.
  - All loci affecting a quantitative trait are “quantitative trait loci”.
  
--

### Identifying QTL brings in new opportunities for applications:

- To produce the preferred genotype
  
  - To understand the molecular mechanism at gene level
  
  - To locate the direct target for gene editing

---
# QTL mapping or linkage mapping

Crosses between individuals that differ genetically for the trait offer an ideal situation for mapping QTL.

.pull-left[
- The __linkage disequilibrium__ is maximized in an F1 between two parental lines
  
- LD is reduced slowly with subsequent generations of intermating.
  - Using F2 or recombinant inbred lines (RIL)

- This generation of linkage between loci that different between the two parental lines is exploited for QTL mapping. Therefore, QTL Mapping is also referred to as __linkage mapping__.
]

.pull-right[
<div align="center">
<img src="qtl.png" height=450>
</div>
]

---
# Linkage Mapping

Linkage mapping refers to a set of steps taken to associate chromosomal intervals, and ultimately genes to genetic variation of traits.

- But is very unlikely to map to gene level due to the low mapping resolution

### Steps for QTL mapping include:
  - 1) Create a segregating population
  - 2) Genotype individuals within this population with molecular markers
  - 3) Phenotype the individuals
  - 4) Apply statistical models to associate the markers to the phenotypic variation

The statistical approaches range from simple techniques such as __ANOVA__ to complex __Bayesian models__ capable of zeroing in on QTL-by-QTL interactions.

---
# Single marker analysis

.pull-left[
<div align="center">
<img src="figure21-c2.2.png" height=400>
</div>

- Here, `\(c\)` is the recombination rate.
]

.pull-right[

#### Mean of the __AC__ genotype:
`\begin{align*}
& \frac{1/2(1-c)\times d + 1/2c \times a}{1/2}\\
& = d(1-c) + ca \\
\end{align*}`

#### Mean of the __CC__ genotype:
`\begin{align*}
& \frac{1/2(1-c)\times a + 1/2c \times d}{1/2}\\
& = a(1-c) + cd \\
\end{align*}`

#### Difference between __AC__ and __CC__ value:
`\begin{align*}
& d(1-c) + ca - (a(1-c) + cd)\\
& = (d-a)(1-2c) \\
\end{align*}`

]

---

# BC1 example

Conduct __ `\(t\)`-test__ to test the null hypothesis that the mean values of genotype __AC__ and __CC__ are the same.

--
### A `\(t\)`-test would be

`\begin{align*}
t = \frac{\hat{u}_{AC} - \hat{u}_{CC}}{\sqrt{\frac{\sigma_{AC}^2}{N_{AC}} + \frac{\sigma_{CC}^2}{N_{CC}} } } \\
\end{align*}`

- where `\(\sigma^2\)` is equal to the within sample variance for each genotype
- `\(N\)` is equal to the number of individuals in each genotype class

#### If we get a __p-value < 0.05__: 
  - Reject the null hypothesis that the two genotypes means are the same.
  
  - Or the means are different; in the other words, the marker is __linked with a QTL__.

---

# What is the QTL effect ( `\(a\)` )?

`\begin{align*}
& \hat{u}_{AC} - \hat{u}_{CC} = (d-a)(1-2c) \\
& a = d - \frac{\hat{u}_{AC} - \hat{u}_{CC}}{1-2c} \\
\end{align*}`

Assume no dominance at this locus:

`\begin{align*}
& a = \frac{\hat{u}_{CC} - \hat{u}_{AC}}{1-2c} \\
\end{align*}`

---
# Shortcomings of the single-marker test

### The QTL effect of `\(a\)`

`\begin{align*}
& a = \frac{\hat{u}_{CC} - \hat{u}_{AC}}{1-2c} \\
\end{align*}`

- It is important to note that these single-marker tests confound __the QTL effect__ and __the recombination frequency__ between the marker and QTL.

- For this reason, the calculated marker effects and significance do not really tell you how far or how close a marker is to a QTL.
  
--

- This test does not tell you if __one QTL__ is controlling the trait, or __two or more linked QTLs__.

---

# Sorghum example (F2 population)

.pull-left[
<div align="center">
<img src="figure21-c3.1.png" height=320>
</div>

]

---

# Sorghum example (F2 population)

---

# Sorghum example (F2 population)

---

# Frequencies in F2

The frequencies of the resulting F2 individuals are simply calculated by multiplying together the frequencies of the gametes united to form those individuals.

Here we use __M__ and __m__ represent molecular marker and __Q__ and __q__ represent QTL.

| Genotype | Value      | Frequency |
| :-------: | :-------: | :--------: | 
| `\(MMQQ\)`  | a | `\(\frac{1}{4}(1-c)^2\)` |
| `\(MMQq\)`  | d | `\(\frac{1}{2}c(1-c)\)` |
| `\(MMqq\)`  | -a | `\(\frac{1}{4}c^2\)` |
| `\(MmQQ\)`  | a | `\(\frac{1}{2}c(1-c)\)` |
| `\(MmQq\)`  | d | `\(\frac{1}{2}((1-c)^2+c^2)\)` |
| `\(Mmqq\)`  | -a | `\(\frac{1}{2}c(1-c)\)` |
| `\(mmQQ\)`  | a | `\(\frac{1}{4}c^2\)` |
| `\(mmQq\)`  | d | `\(\frac{1}{2}c(1-c)\)` |
| `\(mmqq\)`  | -a | `\(\frac{1}{4}(1-c)^2\)` |

---

# Frequencies in F2

- Because we only see the marker genotypes and not the QTL genotypes, the expected means of the different genotypes are only relevant.

- The expected values are calculated simply by taking the __weighted average of the genotypic values__ of the underlying QTL genotypes.

| Genotype  | Expected value |
| :-------: | :-------: | 
| `\(MM\)`  |  `\(a(1-2c) + 2dc(1-c)\)` |
| `\(Mm\)`  |  `\(d((1-c)^2 + c^2)\)` |
| `\(mm\)`  |  `\(-a(1-2c) + 2dc(1-c)\)` |

After some algebra, it can be shown that the differences between the genotypic means in an F2 population are as follows:

`\begin{align*}
& \mu_{MM} - \mu_{mm} = 2a(1-2c) \\
& \mu_{Mm} - \frac{\mu_{MM} - \mu_{mm}}{2} = d(1-2c)^2 \\ 
& \mu_{Mm} - \mu_{mm} = (a+d)(1-2c) \\ 
\end{align*}`

---
# More on single-marker analysis

Markers are tested one by one for their effects on a trait.

This can be done by:

- A __ `\(t\)`-test__ contrasting the mean values of two genotypes.

- An __ `\(F\)`-test__ on all three genotypes to determine if difference in trait values exist among the genotypes.

- __Regression__ of the trait value on the number (dosage) of marker alleles.

---
# Regression approach

The __regression of phenotypic values on allele dosage__ can be represented as:

`\begin{align*}
y_j = \mu + bx_j + e_j
\end{align*}`

- where `\(y_j\)` is the phenotypic value for individual `\(j\)`,
- `\(\mu\)` is the population mean,
- `\(b\)` is the regression coefficient,
- `\(x_j\)` represents the number of a particular allele. For example, if `\(x_j\)` was set to represent the number of __ `\(M\)` allele__ at a marker locus: 
  - `\(x_j =0\)` for an `\(mm\)` individual,
  - `\(x_j =1\)` for an `\(Mm\)` individual,
  - `\(x_j =2\)` for an `\(MM\)` individual.
- `\(e_j\)` is the residual error for individual `\(j\)`.

If the regression coefficient is __significantly different from zero__, there is evidence for QTL near this marker.

---
# Limitation of single marker analysis

The problem with above analysis is that QTL effect is completely __confounded with QTL position__.

`\begin{align*}
\mu_{Mm} - \mu_{mm} = (a+d)(1-2c) \\ 
\end{align*}`

--
For example, the expected difference between two genotypes is the function of `\(c\)`.

Therefore, we do not know if a significant difference is 
  - Due to a small QTL right next to the marker
  
  - Or a large QTL further away from the marker

Also, there is no way to tell if a significant difference is due to
  - One large QTL
  - several smaller QTLs cluster together