Phenotypic and Genotypic variances

class: center, middle, inverse, title-slide

# Phenotypic and Genotypic variances
### Jinliang Yang
### March 22th, 2022

---

# Phenotypic variance

A population can be characterized by its __allele__ and __genotype frequencies__.

Phenotype in a population can be characterized in terms of its __mean__ and __variance__.

- Phenotypic values are due to genetic ( `\(G\)` ) and environmental ( `\(E\)` ) effects

- Now, we will partition the phenotypic variance into different causal components.

---
# Mean, variance and covariance

### Expectation: a measure of the __mean value__ of a variable

`\begin{align*}
E[X] &= \mu_X = \sum f_iX_i \\
\end{align*}`

Where `\(f_i\)` is the frequency of `\(X_i\)` value.

### Variance: a measure of the __spread__ of a variable

The __variance__ is defined as the mean of the squared deviations of a random variable from the population mean.

`\begin{align*}
E(X_i - \mu)^2 & = E(X^2) - E(X)^2 \\
& = \sum{f_iX_i^2} - \mu^2 \\
\end{align*}`

---
# Mean, variance and covariance

### Covariance: to quantify to what extent the two variables **co-vary**.

The __covariance__ is a measure of the joint variation between __two variables__.

`\begin{align*}
Cov(X, Y) &  = E([X - E(X)][Y - E(Y)]) \\
& = E(XY) - E(X)E(Y) \\
\end{align*}`

where,

`\begin{align*}
E(XY) = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j)
\end{align*}`

### The variance of a sum

When a variable is the sum of two or more components, its variance can be partitioned into the variances of the components and the covariances among the components.
`\begin{align*}
& Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y) \\
\end{align*}`

---

# Phenotypic model: P = G + E

Phenotypic values are due to genetic and environmental effects.

`\begin{align*}
& Var(P) = Var(G+E) \\
& V_P = V_G + V_E + 2Cov_{G \times E}
\end{align*}`

- `\(V_G\)` and `\(V_E\)` is the __variance__ of the genotypic and environmental effects
- `\(Cov_{G \times E}\)` is the __covariance__ between genotypic effects and environmental effects. 
  - Normally, we assume `\(Cov_{G \times E} =0\)`

Breeders conduct performance trials in multiple environments.
 - As such, `\(V_{G \times E}\)` should be considered.

---
# Additive and dominance variance

- Genotypic model:  `\(G = A + D\)`
- The genotypic value could be partitioned into the breeding value and dominance deviation.

`\begin{align*}
& V_G = V_A + V_D + 2Cov_{A \times D} \\
\end{align*}`

------------------

Now, let's calculate `\(Cov_{A \times D}\)`:

`\begin{align*}
Cov(X, Y) &  = E(XY) - E(X)E(Y) \\
E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j)
\end{align*}`

| Genotype  | Freq      | Breeding Value |  Dominance Deviation  | `\(AD\)` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | 
| `\(A_1A_1\)`  | `\(p^2\)`     | `\(2q\alpha\)`    | `\(-2q^2d\)`   |   |
| `\(A_1A_2\)`  | `\(2pq\)`     | `\((q-p)\alpha\)` | `\(2pqd\)`   |   |
| `\(A_2A_2\)`  | `\(q^2\)`     | `\(-2p\alpha\)`   | `\(-2p^2d\)`   |    |

---
# Additive and dominance variance

- Genotypic model:  `\(G = A + D\)`
- The genotypic value could be partitioned into the breeding value and dominance deviation.

`\begin{align*}
& V_G = V_A + V_D + Cov_{A \times D} \\
\end{align*}`

------------------
Now, let's calculate `\(Cov_{A \times D}\)`:
`\begin{align*}
Cov(X, Y) &  = E(XY) - E(X)E(Y) \\
E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j)
\end{align*}`

| Genotype  | Freq      | Breeding Value |  Dominance Deviation  | `\(AD\)` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | 
| `\(A_1A_1\)`  | `\(p^2\)`     | `\(2q\alpha\)`    | `\(-2q^2d\)`   | `\(2q\alpha \times -2q^2d\)`  |
| `\(A_1A_2\)`  | `\(2pq\)`     | `\((q-p)\alpha\)` | `\(2pqd\)`   |  `\((q-p)\alpha \times 2pqd\)` |
| `\(A_2A_2\)`  | `\(q^2\)`     | `\(-2p\alpha\)`   | `\(-2p^2d\)`   | `\(-2p\alpha \times -2p^2d\)`   |

---
# Additive and dominance variance

- Genotypic model:  `\(G = A + D\)`
- The genotypic value could be partitioned into the breeding value and dominance deviation.

`\begin{align*}
& V_G = V_A + V_D + Cov_{A \times D} \\
& Cov_{A \times D} = 0 \\
\end{align*}`

| Genotype  | Freq      | Breeding Value |  Dominance Deviation  | `\(AD\)` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | 
| `\(A_1A_1\)`  | `\(p^2\)`     | `\(2q\alpha\)`    | `\(-2q^2d\)`   | `\(2q\alpha \times -2q^2d\)`  |
| `\(A_1A_2\)`  | `\(2pq\)`     | `\((q-p)\alpha\)` | `\(2pqd\)`   |  `\((q-p)\alpha \times 2pqd\)` |
| `\(A_2A_2\)`  | `\(q^2\)`     | `\(-2p\alpha\)`   | `\(-2p^2d\)`   | `\(-2p\alpha \times -2p^2d\)`   |

`\begin{align*}
E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j) \\
E(AD) &= p^2\times (2q\alpha)\times (-2q^2d) + 2pq \times (q-p)\alpha \times 2pqd + q^2 \times (-2p\alpha) \times (-2p^2d) \\
      &= 4\alpha d p^2q^2(-q + q -p +p) \\
      &=0
\end{align*}`

---
# Additive and dominance variance

- Genotypic model:  `\(G = A + D\)`
- The genotypic value could be partitioned into the breeding value and dominance deviation.

`\begin{align*}
& V_G = V_A + V_D + Cov_{A \times D} \\
& Cov_{A \times D} = 0 \\
& V_G = V_A + V_D \\
\end{align*}`

--
------------------
Now, let's calculate `\(V_A\)` and `\(V_D\)`:

### Variance: a measure of the __spread__ of a variable

`\begin{align*}
E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\
\end{align*}`

---
# `\(V_A\)`

`\begin{align*}
E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\
\end{align*}`

| Genotype  | Freq      | Breeding Value | `\(A^2\)`  | Dominance Deviation  | `\(D^2\)` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: |
| `\(A_1A_1\)`  | `\(p^2\)`     | `\(2q\alpha\)`    | `\((2q\alpha)^2\)`  |  `\(-2q^2d\)`   |  |
| `\(A_1A_2\)`  | `\(2pq\)`     | `\((q-p)\alpha\)` | `\((q-p)^2\alpha^2\)` |  `\(2pqd\)`   |  |
| `\(A_2A_2\)`  | `\(q^2\)`     | `\(-2p\alpha\)`   | `\((-2p\alpha)^2\)` |  `\(-2p^2d\)`   | |

These breeding values have a mean of zero, and their variance is the sum of the products of the genotype frequencies and the squared breeding values:

`\begin{align*}
V_A & = p^2(2q\alpha)^2 + 2pq(q-p)^2\alpha^2 + q^2(-2p\alpha)^2 \\
           & = 2pq\alpha^2(2pq + (q-p)^2 + 2pq) \\
           & = 2pq\alpha^2(p+q)^2 \\
           & = 2pq\alpha^2 \\
           & = 2pq(a + d(q-p))^2 \\
\end{align*}`
  
  
---
# `\(V_D\)`

`\begin{align*}
E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\
\end{align*}`

| Genotype  | Freq      | Breeding Value | `\(A^2\)`  | Dominance Deviation  | `\(D^2\)` |
| :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: |
| `\(A_1A_1\)`  | `\(p^2\)`     | `\(2q\alpha\)`    | `\((2q\alpha)^2\)`  |  `\(-2q^2d\)`   | `\((-2q^2d)^2\)` |
| `\(A_1A_2\)`  | `\(2pq\)`     | `\((q-p)\alpha\)` | `\((q-p)^2\alpha^2\)` |  `\(2pqd\)`   | `\((2pqd)^2\)` |
| `\(A_2A_2\)`  | `\(q^2\)`     | `\(-2p\alpha\)`   | `\((-2p\alpha)^2\)` |  `\(-2p^2d\)`   | `\((-2p^2d)^2\)` |

Likewise, the variance due to dominance deviations is:

the sum of the products of the genotype frequencies and the squared dominance deviation values.

`\begin{align*}
V_D & = p^2(-2q^2d)^2 + 2pq(2pqd)^2 + q^2(-2p^2d)^2 \\
           & = 4p^2q^2d^2(q^2 + 2pq + p^2) \\
           & = 4p^2q^2d^2 \\
           & = (2pqd)^2 \\
\end{align*}`

---

# Understanding `\(V_A\)` and `\(V_D\)`

`\begin{align*}
V_A & = 2pq\alpha^2 \\
           & = 2pq(a + d(q-p))^2 \\
\end{align*}`

The `\(V_A\)` is function of 
- Allele frequencies, which vary __among populations__.

- And genotypic values, which are intrinsic properties of genotypes in a one-locus model

Because `\(d\)` contributes to `\(V_A\)`, `\(V_A\)` therefore does not imply that the alleles act in a purely additive manner.

---

# Understanding `\(V_A\)` and `\(V_D\)`

`\begin{align*}
V_D & = (2pqd)^2 \\
& = 4p^2q^2d^2
\end{align*}`

The `\(V_D\)` is function of 
- Allele frequencies, which vary __among populations__.
- The level of dominance ( `\(d\)` )

- `\(V_D\)` is zero when dominance is absent ( `\(d=0\)` ) 
  
  - In this situation, the intralocus variance comprises only `\(V_A\)`

---

# Covariance between relatives

Close relatives in human, such as a parent and its offspring, have a higher degree of __resemblance__ than more distant relatives, such as an uncle and a niece.

#### Genetics
The __covariance between relatieves__ meansures the degree of genetic resemblance between related individuals in a population.
- By defintion, the covariance between unrelated individuals is zero.

#### Nongenetics

Nongenetic factors can contribute to the degree of resemblance between relatives.
- In plants, we assume that nongenetic effects among relatives are uncorrelated.
- This assumption is met through the __randomization procedure__ in the experimental designs used in plant breeding.

---

# General framework for genetic covariance

If allele `\(A_i\)` carried by individual X is IBD to allele `\(A_k\)` in Y, then the covariance due to this allele is:

`\begin{align*}
Cov(\alpha_i, \alpha_k) & =  E[(\alpha_i - \mu_{\alpha})(\alpha_k - \mu_{\alpha})] \\
& = E[(\alpha_i - \mu_{\alpha})^2] \\
& = Var(\alpha_i) \\
& = V_{\alpha_i}
\end{align*}`

Because `\(\alpha_i = \alpha_k\)` if alleles `\(A_i\)` and `\(A_k\)` are IBD.

---
# Additive genetic covariance

Alleles in individuals `\(X (A_iA_j)\)` and `\(Y (A_kA_l)\)` can be IBD through four possible events:
  
`\begin{align*}
& A_i \equiv A_k \\
& A_i \equiv A_l \\
& A_j \equiv A_k \\
& A_j \equiv A_l \\
\end{align*}`

The probability of each of these four events is equal to `\(f_{XY}\)`, __the coefficient of coancestry__ between `\(X\)` and `\(Y\)`.

---
# Additive genetic covariance

Therefore, the covariance due to additive genetic effects (__breeding values__):

`\begin{align*}
Cov_\alpha(X, Y) = & P(x_i \equiv y_k)Cov(\alpha_i, \alpha_k) + P(x_i \equiv y_l)Cov(\alpha_i, \alpha_l) \\
 & + P(x_j \equiv y_k)Cov(\alpha_j, \alpha_k) + P(x_j \equiv y_l)Cov(\alpha_j, \alpha_l) \\
 = & 4f_{XY} V_{\alpha_i} \\
 = & 2f_{XY} V_A \\
\end{align*}`

Because `\(V_A = V(\alpha_i + \alpha_j) = 2V_{\alpha_i}\)` and `\(\alpha_i = \alpha_k\)` when alleles `\(i\)` and `\(k\)` are IBD.

---

# Covariance due to dominance deviations

To get dominance deviations, must be two alleles IBD:

`\begin{align*}
& A_i \equiv A_k,  A_j \equiv A_l\\
& A_i \equiv A_l,  A_j \equiv A_k \\
\end{align*}`

Therefore,

`\begin{align*}
Cov_\delta(X, Y) = & P(x_i \equiv y_k,  x_j \equiv y_l)Cov(\delta_{ij}, \delta_{kl}) + P(x_j \equiv y_l,  x_j \equiv y_k)Cov(\delta_{ij}, \delta_{kl}) \\
 = & (P(x_i \equiv y_k,  x_j \equiv y_l) + P(x_j \equiv y_l,  x_j \equiv y_k))Cov(\delta_{ij}, \delta_{kl}) \\
 = & \Delta_{XY}V_D \\
\end{align*}`

---
# Genetic covariances for general relatives

`\begin{align*}
& Cov_\alpha(X, Y) = 2f_{XY} V_A \\
& Cov_\delta(X, Y) = \Delta_{XY} V_D \\
\end{align*}`

The genetic covariance between relative now is:

`\begin{align*}
Cov_G(X, Y) = 2f_{XY}V_A + \Delta_{XY}V_D \\
\end{align*}`

### Simplify it:

`\begin{align*}
Cov_G = rV_A + uV_D
\end{align*}`

Where,
`\begin{align*}
& r = 2f_{XY} \\
& u = \Delta_{XY} \\
\end{align*}`

---

# Genetic covariances for general relatives

`\begin{align*}
Cov_G = rV_A + uV_D
\end{align*}`

Where,
`\begin{align*}
& r = 2f_{XY} \\
& u = \Delta_{XY} \\
\end{align*}`

Note that `\(u\)` is normally zero unless they IBD through __both of their respective parents__.
for example, full sibs and double first cousins.

|  |   Relationship     | Coancestry | r  | u  | 
| :-------: | :-------: | :-----------: | :-----------: | :-------: | :-------: | 
| First degree  | Parent:offspring   |  1/4  | 1/2    |  0  |  
| Second degree  | Half sibs    |    1/8     | 1/4 |  0 |  
|                | Full sibs     |   1/4    | 1/2 |  __1/4__ |  
|                | Grantparent:offspring | 1/8 | 1/4 | 0 |
| Third degree   | great-grantparent:offspring |  1/16   | 1/8 |  0 |

---

# Parent-offspring

### From Breeding value

- Parent genotypic value: `\(G = A + D\)`.
- Offspring (half the breeding value of the parents) : `\(G= \frac{1}{2}A\)`

Now, let's compute the covariance between a parent and its offspring.

`\begin{align*}
Cov(P, O) & = Cov(A + D, \frac{1}{2}A) \\
& = \frac{1}{2}Cov(A, A) + \frac{1}{2}Cov(A, D) \\
& = \frac{1}{2}V_A \\
\end{align*}`

Because `\(Cov(A, D) = 0\)`.

---

# Variance of testcross

Suppose individuals from one population are crossed to a common tester from another population.

The variance among the resulting testcrosses (__ `\(V_{testcross}\)` __) is obtained from the frequencies and testcross means.

| Genotype  | Freq with inbreeding     | Testcross mean | 
| :-------: | :-------: | :-----------: | :-------: | 
| `\(A_1A_1\)`  | `\(p^2 + pqF\)`     | `\(q\alpha_T\)`    | 
| `\(A_1A_2\)`  | `\(2pq (1-F)\)`     | `\(1/2(q-p)\alpha_T\)` | 
| `\(A_2A_2\)`  | `\(q^2 + pqF\)`     | `\(- p\alpha_T\)`   |

Here `\(F\)` denotes the inbreeding coefficient.

The variance among testcrosses is then obtained as the sum of the products of the genotype frequencies and the square of the test genotypic effects.

`\begin{align*}
V_{testcross} & = (p^2 + pqF)(q\alpha_T)^2 + 2pq (1-F)(1/2(q-p)\alpha_T)^2 + (q^2 + pqF)(- p\alpha_T)^2\\
& = \frac{1}{2}(1+F)pq(a + d (q_T - p_T))^2 \\
& = \frac{1}{2}(1+F)V_{\alpha_i^T} \\
\end{align*}`

---

# Variance of testcross

`\begin{align*}
V_{testcross} & = \frac{1}{2}(1+F)pq(a + d (q_T - p_T))^2 \\
& = \frac{1}{2}(1+F)V_{\alpha_i^T} \\
\end{align*}`

The `\(V_{testcross}\)` is therefore a function of the allele frequencies
- in the population ( `\(p\)` and `\(q\)` ) being testcrossed
- and in the tester ( `\(p_T\)` and `\(q_T\)` )

It is useful in predicting the `\(V_{testcross}\)` at different __selfing generations__ and in determining the __appropriate generation for testcrossing__ in hybrid breeding programs.