Calculate genetic variances

class: center, middle, inverse, title-slide

# Calculate genetic variances
### Jinliang Yang
### March 31, 2022

---

# How to estimate genetic variances

### Basic steps

#### 1. Relative developed by some sort of mating design.

#### 2. The progeny are evaluated in a set of environments.

#### 3. Variance components are estimated from the mean squares in the __analysis of variance__.

#### 4. The variance components are interpreted in terms of the covariances between relatives.

---

# A commerical soybean breeding program

---

# A commerical soybean breeding program

- Suppose that 100 S2 families in __Summer1__ are allowed to open pollinate. 
- The open-pollinated seeds within each S2 familiy are bulked to form a half-sib family.
- An inbreeding coefficient of `$F=1/2$` among the parents of the half-sib families.

-------------

Next summer:
- The 100 half-sib families (i.e., `$n=100$`) are evaluated for their grain yield in a __randomized complete block design__ with two replications (i.e., `$r=2$`).
- The experiment is grown in three environments (i.e., `$e=3$`)

---

# Plot the phentypic data

```r
d <- read.csv("data/soybean_half-sib_yield.csv")
#install.packages("tidyr")
library(tidyr)
df <- gather(d, key="Env", value="yield", 2:4)
```

--
### Using `ggplot2`

```r
library(ggplot2)
ggplot(df, aes(x=yield, color=rep, fill=rep)) +
    geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+
    geom_density(alpha=0.6)+
    facet_wrap(~ Env)+
    #scale_color_manual(values=c("#56B4E9", "#fe6f5e"))+
    labs(title="", y="Yield", x = "Density")+
    theme_classic() +
    guides(color=FALSE) +
    theme(plot.title = element_text(size=20, face = "bold"), 
          axis.text=element_text(size=16, face="bold"),
          strip.text.y = element_text(size = 16, face = "bold"),
          axis.title=element_text(size=18, face="bold"),
          )
```

---

# How to estimate genetic variances

### Basic steps

#### 1. Relative developed by some sort of mating design.
- Half-sib design with the S2 families

#### 2. The progeny are evaluated in a set of environments.
- RCB design with 2 reps in 3 environments

#### 3. Variance components are estimated from the mean squares in the __analysis of variance__.

#### 4. The variance components are interpreted in terms of the covariances between relatives.

---
# Half-sib design

`\begin{align*}
p_{ijr} = \mu + f_i + l_j + f_i \times l_j + b_{r} + e_{ijr}
\end{align*}`

- where `$p_{ijr}$` is the phenotype value of the `$j$`th offspring of the  `$i$`th father evaluated in the `$r$`th replication,
- `$f_i$` is the effect of the `$i$`th father,
- `$l_{j}$` is the effect of the `$j$`th environment (or location),
- `$f_i \times l_j$` is the interaction effect of the `$i$`th father with the `$j$`th environment (or location),
- `$b_{r}$` is the effect of the `$r$`th replication,
- and `$e_{ijr}$` is the residual error. The `$e_{ijr}$` have expectation equal to zero.

### Fit a linear model

```r
fit <- lm(yield ~ fam + Env + fam:Env + rep, data=df)

summary(aov(fit))
```
---

# The general framework

ANOVA table for one type of progeny (one-factor design)

| Source        |    df     |  Observed MS      |  E(MS) |
| :------:      | :-------: | :--------------------:|:------: | 
| Environment   | `$e-1$`       |   |  |   
| Rep        | `$r-1$`    |   |  | 
| Progeny       | `$n-1$`       | `$MS_{progeny}$`  | `$V_e + rV_{PE} + reV_{progeny}$`       | 
| Progeny x E   | `$(n-1)(e-1)$`   |  `$MS_{PE}$` | `$V_e + rV_{PE}$`       | 
| pooled error  | `$(n-1)(r-1)e$`   |  `$MS_{error}$` | `$V_e$`       |

`\begin{align*}
V_{progeny} = \frac{MS_{progeny} - MS_{PE}}{re}
\end{align*}`

- `$MS_{error}$`: the mean squares for the pooled error
- `$MS_{PE}$`: mean squares for progeny `$\times$` environment interaction
- `$MS_{progeny}$`: mean squares for progeny

---
# The Soybean half-sib example

#### 3. Variance components are estimated from the mean squares in the __analysis of variance__.

ANOVA table for half-sib families.

| Source        |    df     |  Observed MS      |  E(MS) |
| :------:      | :-------: | :--------------------:|:------: | 
| Environment   | `$e-1=2$`       |   |  |   
| Rep/E        | `$r-1=1$`    |   |  | 
| __HS Families__       | `$n-1=95$`       | `$MS_{progeny}=255.84$`  | `$V_e + rV_{PE} + reV_{progeny}$`       | 
| __HS F x E __   | `$(n-1)(e-1)=190$`   |  `$MS_{PE}=7.77$` | `$V_e + rV_{PE}$`       | 
| pooled error  | `$(n-1)(r-1)e=287$`   |  `$MS_{error}=6.67$` | `$V_e$`       |

`\begin{align*}
V_{progeny} = \frac{MS_{progeny} - MS_{PE}}{re}
\end{align*}`

```r
out <- summary(aov(fit))[[1]]
vprogeny <- (out[1,3] - out[4,3])/(2*3)
```

---
# The Soybean half-sib example

#### 4. The variance components are interpreted in terms of the covariances between relatives.

- Half-sibs: `$V_{progeny} = \frac{1+F}{4} V_A$`
  - `$V_A  = \frac{4}{1+F} V_{progeny}$`

Therefore,
`\begin{align*}
V_{A} = \frac{4}{1+F} * 41.3 = 110
\end{align*}`

```r
Va = 4/(1+1/2)*vprogeny
```
  
---
# The variance of a progeny mean `$V_{\bar{Y}}$`

`$V_{\bar{Y}}$` measures the sampling variation in the mean of a single group of individuals (e.g., half-sibs in this example).

- In this example, soybean yield is not measured for each plant in a plot.
- Rather, each plot is harvested by machine and the total yield from all the plants in each plot is recorded.

- The variance of the measurement for an individual plot is `$V_e + rV_{PE}$`, which is estimated by `$MS_{PE}$`.

Therefore, 
`\begin{align*}
V_{\bar{Y}} = \frac{MS_{PE}}{re} = \frac{V_e}{re} + \frac{V_{PE}}{e}
\end{align*}`

The variance of a progeny mean is equal to the variance of an individual plot divided by the number of observations for each group of progeny.

---
# The Soybean half-sib example

ANOVA table for half-sib families.

```r
Vp = 7.77/(2*3)
```

---

# Narrow-sense heritablity `$h^2$`

In this case, since individual-plant measurements are unavailable, the exact narrow-sense heritability ( `$h^2$` ) cannot be estimated.

But the `$h^2$` on a progeny-mean basis can be estimated as

`\begin{align*}
h^2_{HS} & = \frac{V_{progeny}}{V_{progeny} + V_{\bar{Y}}} \\
\end{align*}`

```r
h2 = vprogeny/(vprogeny + Vp)
```

---
# Another example for inbred lines

<div align="center">
<img src="uav.png" height=280>
</div>
> Rodene, et. al., The Plant Phenome Journal, 2022.

- The inbred lines ( `$n=230$` ) are assumed to be a random sample of genotypes from the population.
- Two environments (with or without N treatment, `$e=2$`)
- Each with two replications ( `$r=2$` )

---
# Another example for inbred lines

`\begin{align*}
p_{ijk} = \mu + g_i + t_j + g_i \times t_j +r_k + e_{ijk}
\end{align*}`

- where `$p_{ijk}$` is the phenotype value of the `$i$`th genotype evaluated in the `$j$`th treatment with the `$k$`th rep,
- `$g_i$` is the effect of the `$i$`th genotype,
- `$t_{j}$` is the effect of the `$j$`th treatment (or environment),
- `$g_i \times t_j$` is the interaction effect of the `$i$`th genotype with the `$j$`th treatment,
- `$r_{k}$` is the effect of the `$k$`th rep,
- and `$e_{ijk}$` is the residual error. The `$e_{ijk}$` have expectation equal to zero.

```r
cc <- read.csv("data/ppj220030-sup-0002-tables1.csv")
table(cc$date)

### add replication information
cc$Rep <- "Rep2"
cc[cc$Row< 3000,] $Rep <- "Rep1"

j6 <- subset(cc, date %in% "July6")

fit <- lm(Canopy_Coverage ~ Genotype + Treatment + Genotype:Treatment + Rep, 
          data=j6)
summary(aov(fit))
```

---
# `$h^2$` for Canopy Coverage

| Source        |    df     |  Observed MS      |  E(MS) |
| :------:      | :-------: | :--------------------:|:------: | 
| Environment   | `$e-1=1$`       |   |  |   
| Replications/E        | `$r-1=1$`    |   |  | 
| Inbred lines       | `$n-1=232$`       | `$MS_{progeny}=275$`  | `$V_e + rV_{G \times E} + reV_{progeny}$`       | 
| Inbreds x E   | `$(n-1)(e-1)=224$`   |  `$MS_{PE}=31$` | `$V_e + rV_{G \times E}$`       | 
| pooled error  | `$(n-1)(r-1)e=419$`   |  `$MS_{error}=32$` | `$V_e$`       |

- Inbred lines: `$V_{progeny} = V_A$`
  - `$V_{progeny} = V_A = \frac{MS_{progeny} - MS_{PE}}{re} = \frac{275 -31}{2 \times 2} = 61$`

The `$h^2$` on a plot-mean basis can be estimated as

`\begin{align*}
h^2 & = \frac{V_{A}}{V_{A} + V_{\bar{Y}}} \\
    & = \frac{V_{A}}{V_{A} + V_{G \times E}/e + V_{e}/(re)} \\
    & = \frac{61}{61 + 31/4} = 0.89 \\
\end{align*}`