Genetic components of variance

class: center, middle, inverse, title-slide

.title[
# Genetic components of variance
]
.author[
### Jinliang Yang
]
.date[
### Mar. 27, 2024
]

---

# Expectation and Variance

### Expectation `$E(X)$`:  
- A measure of the __mean value__ of a variable

`\begin{align*}
E[X] &= \mu_X \\
&= \sum\limits_{i=1}^kf(x_i)Pr(X = x_i)
\end{align*}`

### Variance `$Var(X)$`: 
- A measure of the __spread__ of a variable

`\begin{align*}
Var(X) & = \sigma_X^2 \\
       & = E[X^2] - E[X]^2 \\
\end{align*}`

---
# An example: Tobacco leaves

Number of leaves per plant in a F1 population (N=25) crossed from two inbred lines of tobacco:

|   |   |   |   |   |
|--:|--:|--:|--:|--:|
| 18| 16| 16| 15| 15|
| 15| 14| 13| 16| 16|
| 16| 16| 16| 15| 16|
| 18| 18| 14| 15| 15|
| 15| 17| 16| 16| 16|

Then, made N=25 F2 plants:

|   |   |   |   |   |
|--:|--:|--:|--:|--:|
| 16| 16| 20| 21| 14|
| 20| 14| 13| 18| 17|
| 19| 14| 12| 15| 13|
| 17| 15| 15| 14| 15|
| 14| 17| 16| 18| 13|

---
# An example: Tobacco leaves

### Mean and variance of F1 and F2 populations?

```r
f1 <- as.vector(as.matrix(f1))
f2 <- as.vector(as.matrix(f2))
mean(f1)
```

```
## [1] 15.72
```

```r
mean(f2)
```

```
## [1] 15.84
```

```r
var(f1)
```

```
## [1] 1.46
```

```r
var(f2)
```

```
## [1] 5.973333
```

---
# Tobacco example

```r
library(ggplot2)
df <- rbind(data.frame(number=f1, pop="F1"), data.frame(number=f2, pop="F2"))
ggplot(df, aes(x=pop, y=number, fill=pop)) +
  scale_fill_manual(values=c("#E69F00", "#56B4E9")) +
  geom_violin(trim=FALSE) +
  guides(fill=guide_legend(title="Popl.s")) +
  theme_classic() +
  theme(legend.position=c(0.2, 0.8), axis.text=element_text(size=20),
              axis.title=element_text(size=20))
```

---
# Covariance

To quantify to what extent the two variables **co-vary**.

$$
`\begin{aligned}
Cov(X, Y) & = \sigma_{XY} \\
& = E([X - E(X)][Y - E(Y)]) \\
& = E(XY) - E(X)E(Y) \\
\end{aligned}`
$$

where,

$$
`\begin{aligned}
E(XY) = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j)
\end{aligned}`
$$

### The variance of a sum

`\begin{align*}
& Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y) \\
& \sigma_{X+Y}^2 = \sigma_X^2 + \sigma_Y^2 + 2\sigma_{XY} \\
\end{align*}`

---
# Phenotypic variance partitioning

### Phenotypic model: P = G + E

`\begin{align*}
& Var(P) = Var(G+E) \\
& \sigma_{P}^2 = \sigma_{G+E}^2 = \sigma_G^2 + \sigma_E^2 + 2\sigma_{GE}
\end{align*}`

- `$\sigma_{G}^2$` is the variance of the genotypic effects
- `$\sigma_{E}^2$` is the variance of the environmental deviation
- `$\sigma_{GE}$` is the covariance between genotypic effects and environmental deviation. **Normally, we assume `$\sigma_{GE} = 0$`**

Therefore,
`\begin{align*}
& \sigma_{P}^2 = \sigma_G^2 + \sigma_E^2
\end{align*}`

---

# Genetic variance partitioning

The genotypic value could be partitioned into the breeding value and dominance deviation.

### Genotypic model:  `$G = A + D$`

`\begin{align*}
& \sigma_{G}^2 = \sigma_A^2 + \sigma_D^2
\end{align*}`

### Breeding Value:  `$A = \alpha_i + \alpha_j$`

`\begin{align*}
& \sigma_{A}^2 = \sigma_{\alpha_i}^2 + \sigma_{\alpha_j}^2
\end{align*}`

Therefore,

`\begin{align*}
& \sigma_{G}^2 = \sigma_{\alpha_i}^2 + \sigma_{\alpha_j}^2 + \sigma_{\delta_{ij}}^2
\end{align*}`

---
# Epistatic effect for multiple loci

### Genotypic effect model

`\begin{align*}
& G_{ijkl} = \mu + (\alpha_i + \alpha_j + \delta_{ij}) + (\alpha_k + \alpha_l + \delta_{kl}) + I_{ijkl}
\end{align*}`

--
### Genotypic variance

`\begin{align*}
\sigma_{G_{ijkl}}^2 = & (\sigma_{\alpha_i}^2 + \sigma_{\alpha_j}^2 + \sigma_{\delta_{ij}}^2) \\
                    & + (\sigma_{\alpha_k}^2 + \sigma_{\alpha_l}^2 + \sigma_{\delta_{kl}}^2) \\
                    & + \sigma^2_{I_{ijkl}} \\
                    = & (\sigma_{\alpha_i}^2 + \sigma_{\alpha_j}^2 + \sigma_{\alpha_k}^2 + \sigma_{\alpha_l}^2 ) \\
                    & + (\sigma_{\delta_{ij}}^2 + \sigma_{\delta_{kl}}^2) \\
                    & + \sigma^2_{I_{ijkl}}\\
                    = & \sigma_A^2 + \sigma_D^2 + \sigma_I^2 \\
\end{align*}`

---
# Additive and dominance variance

| Genotype  | Freq      | Breeding Value | `$A^2$`  | Dominance Deviation  | `$D^2$` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: |
| `$A_1A_1$`  | `$p^2$`     | `$2q\alpha$`    |    |  `$-2q^2d$`   |   |
| `$A_1A_2$`  | `$2pq$`     | `$(q-p)\alpha$` |   |  `$2pqd$`   |   |
| `$A_2A_2$`  | `$q^2$`     | `$-2p\alpha$`   |   |  `$-2p^2d$`   |   |

#### Variance `$Var(X)$`: **a measure of the spread of a variable**
`\begin{align*}
Var(X) & = \sigma_X^2 = V_A \\
       & = E[X^2] - E[X]^2 \\
\end{align*}`

--
### What is the `$\sigma^2_{A}$` and `$\sigma^2_{D}$`?

Remember that the means of **breeding value** and **dominance deviation** are **0**.

---
# Additive and dominance variance

| Genotype  | Freq      | Breeding Value | `$A^2$`  | Dominance Deviation  | `$D^2$` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: |
| `$A_1A_1$`  | `$p^2$`     | `$2q\alpha$`    | `$(2q\alpha)^2$`  |  `$-2q^2d$`   | `$(-2q^2d)^2$` |
| `$A_1A_2$`  | `$2pq$`     | `$(q-p)\alpha$` | `$(q-p)^2\alpha^2$` |  `$2pqd$`   | `$(2pqd)^2$` |
| `$A_2A_2$`  | `$q^2$`     | `$-2p\alpha$`   | `$(-2p\alpha)^2$` |  `$-2p^2d$`   | `$(-2p^2d)^2$` |

`\begin{align*}
Var(X) & = \sigma_X^2 = V_A \\
       & = E[A^2] - E[A]^2 \\
       & = E[A^2]
\end{align*}`

The __additive genetic variance__ in a HWE population is:

`\begin{align*}
\sigma_A^2 & = p^2(2q\alpha)^2 + 2pq(q-p)^2\alpha^2 + q^2(-2p\alpha)^2 \\
           & = 2pq\alpha^2(2pq + (q-p)^2 + 2pq) \\
           & = 2pq\alpha^2(p+q)^2 \\
           & = 2pq\alpha^2 \\
           & = 2pq(a + d(q-p))^2 \\
\end{align*}`
  
  
---
# Additive and dominance variance

| Genotype  | Freq      | Breeding Value | `$A^2$`  | Dominance Deviation  | `$D^2$` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: |
| `$A_1A_1$`  | `$p^2$`     | `$2q\alpha$`    | `$(2q\alpha)^2$`  |  `$-2q^2d$`   | `$(-2q^2d)^2$` |
| `$A_1A_2$`  | `$2pq$`     | `$(q-p)\alpha$` | `$(q-p)^2\alpha^2$` |  `$2pqd$`   | `$(2pqd)^2$` |
| `$A_2A_2$`  | `$q^2$`     | `$-2p\alpha$`   | `$(-2p\alpha)^2$` |  `$-2p^2d$`   | `$(-2p^2d)^2$` |

Likewise, the variance due to dominance deviations in a HWE population is:

`\begin{align*}
Var(X) & = \sigma_X^2 = V_D \\
       & = E[D^2] - E[D]^2 \\
       & = E[D^2]
\end{align*}`
--

`\begin{align*}
\sigma_D^2 & = p^2(-2q^2d)^2 + 2pq(2pqd)^2 + q^2(-2p^2d)^2 \\
           & = 4p^2q^2d^2(q^2 + 2pq + p^2) \\
           & = 4p^2q^2d^2 \\
           & = (2pqd)^2 \\
\end{align*}`

---
# Graphical Representation

```r
a = 1
*d = 0*a # pure additive
q = seq(0, 1, by=0.01)
p = 1 - q
va <- 2*p*q*(a + d*(q-p))^2
vd <- (2*p*q*d)^2
vg <- va + vd
plot(p, vg, lty=1, lwd=5, type="l", xlab="Allele Freq", ylab="Variance")
lines(p, va, lty=2, lwd=5, col="red")
lines(p, vd, lty=3, lwd=5, col="blue")
```

---
# Graphical Representation

- Genetic variance components are typically maximized at intermediate allele frequencies. 
- Additive genetic variance typically makes up most of the genetic variance except in unusual situations, such as when overdominant gene action is present or allele frequencies are at the extremes.

---

# Variance due to disequilibrium

Consider two loci that affect a trait. The genotypic value across both loci is

### G'' = G' + G

`\begin{align*}
& Var(G'') = Var(G'+G) \\
& \sigma_{G''}^2 = \sigma_{G'}^2 + \sigma_G^2 + 2\sigma_{G'G} \\
& \sigma_{G''}^2 = \sigma_{G'}^2 + \sigma_G^2 + 2E[(G' - \mu_{G'})(G - \mu_G)]
\end{align*}`

- #### When the loci are independent, the covariance term equals to zero.

- #### When loci in LD, what do you think the covariance term will be?
  - i.e., positive or negative?

---

# Variance due to disequilibrium

`\begin{align*}
& \sigma_{G''}^2 = \sigma_{G'}^2 + \sigma_G^2 + 2E[(G' - \mu_{G'})(G - \mu_G)] \\
& \sigma_{G''}^2 = \sigma_{G'}^2 + \sigma_G^2 + 2E(G' - \mu_{G'}) \times E(G - \mu_G)
\end{align*}`

### Negative covariance
Two loci are linked in repulsion phase. => reduce the genetic variance.

### Positive covariance
Two loci are in coupling linkage, i.e., assortative mating.

---

# Variance due to GxE interaction

Here, we have two genotypes (or varieties) A and B, tested in two different locations.

---

# Variance due to GxE interaction

Often time, different genotypes react to different environments differently.

---

# Variance due to GxE interaction

In this case,

`\begin{align*}
& P = G + E + G \times E \\
\end{align*}`

--
Or,
`\begin{align*}
& P = G + E + I_{GE} \\
\end{align*}`
where `$I_{GE}$` is the __interaction effect__ between genotype and environment.

And the variance is,

`\begin{align*}
& \sigma_P^2 = \sigma^2_G + \sigma^2_E + 2Cov(G, E) + \sigma^2_{I_{GE}} \\
\end{align*}`

- The covariance between genetic and non-genetic effects is often zero except in special circumstances.

- The variance due to interaction between genotype and environment is considered a non-genetic variance and is typically pooled in with `$\sigma_E^2$`.