class: center, middle, inverse, title-slide # Phenotypic and Genotypic variances ### Jinliang Yang ### March 22th, 2022 --- # Phenotypic variance <div align="center"> <img src="height.png" height=200> </div> A population can be characterized by its __allele__ and __genotype frequencies__. Phenotype in a population can be characterized in terms of its __mean__ and __variance__. - Phenotypic values are due to genetic ( `\(G\)` ) and environmental ( `\(E\)` ) effects -- - Now, we will partition the phenotypic variance into different causal components. --- # Mean, variance and covariance ### Expectation: a measure of the __mean value__ of a variable `\begin{align*} E[X] &= \mu_X = \sum f_iX_i \\ \end{align*}` Where `\(f_i\)` is the frequency of `\(X_i\)` value. -- ### Variance: a measure of the __spread__ of a variable The __variance__ is defined as the mean of the squared deviations of a random variable from the population mean. `\begin{align*} E(X_i - \mu)^2 & = E(X^2) - E(X)^2 \\ & = \sum{f_iX_i^2} - \mu^2 \\ \end{align*}` --- # Mean, variance and covariance ### Covariance: to quantify to what extent the two variables **co-vary**. The __covariance__ is a measure of the joint variation between __two variables__. `\begin{align*} Cov(X, Y) & = E([X - E(X)][Y - E(Y)]) \\ & = E(XY) - E(X)E(Y) \\ \end{align*}` where, `\begin{align*} E(XY) = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j) \end{align*}` -- ### The variance of a sum When a variable is the sum of two or more components, its variance can be partitioned into the variances of the components and the covariances among the components. `\begin{align*} & Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y) \\ \end{align*}` --- # Phenotypic model: P = G + E Phenotypic values are due to genetic and environmental effects. `\begin{align*} & Var(P) = Var(G+E) \\ & V_P = V_G + V_E + 2Cov_{G \times E} \end{align*}` - `\(V_G\)` and `\(V_E\)` is the __variance__ of the genotypic and environmental effects - `\(Cov_{G \times E}\)` is the __covariance__ between genotypic effects and environmental effects. - Normally, we assume `\(Cov_{G \times E} =0\)` -- Breeders conduct performance trials in multiple environments. - As such, `\(V_{G \times E}\)` should be considered. --- # Additive and dominance variance - Genotypic model: `\(G = A + D\)` - The genotypic value could be partitioned into the breeding value and dominance deviation. `\begin{align*} & V_G = V_A + V_D + 2Cov_{A \times D} \\ \end{align*}` -- ------------------ Now, let's calculate `\(Cov_{A \times D}\)`: `\begin{align*} Cov(X, Y) & = E(XY) - E(X)E(Y) \\ E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j) \end{align*}` -- | Genotype | Freq | Breeding Value | Dominance Deviation | `\(AD\)` | | :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | | | `\(A_1A_2\)` | `\(2pq\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | | | `\(A_2A_2\)` | `\(q^2\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | | --- # Additive and dominance variance - Genotypic model: `\(G = A + D\)` - The genotypic value could be partitioned into the breeding value and dominance deviation. `\begin{align*} & V_G = V_A + V_D + Cov_{A \times D} \\ \end{align*}` ------------------ Now, let's calculate `\(Cov_{A \times D}\)`: `\begin{align*} Cov(X, Y) & = E(XY) - E(X)E(Y) \\ E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j) \end{align*}` | Genotype | Freq | Breeding Value | Dominance Deviation | `\(AD\)` | | :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | `\(2q\alpha \times -2q^2d\)` | | `\(A_1A_2\)` | `\(2pq\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | `\((q-p)\alpha \times 2pqd\)` | | `\(A_2A_2\)` | `\(q^2\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | `\(-2p\alpha \times -2p^2d\)` | --- # Additive and dominance variance - Genotypic model: `\(G = A + D\)` - The genotypic value could be partitioned into the breeding value and dominance deviation. `\begin{align*} & V_G = V_A + V_D + Cov_{A \times D} \\ & Cov_{A \times D} = 0 \\ \end{align*}` | Genotype | Freq | Breeding Value | Dominance Deviation | `\(AD\)` | | :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | `\(2q\alpha \times -2q^2d\)` | | `\(A_1A_2\)` | `\(2pq\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | `\((q-p)\alpha \times 2pqd\)` | | `\(A_2A_2\)` | `\(q^2\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | `\(-2p\alpha \times -2p^2d\)` | `\begin{align*} E(XY) & = \sum_i \sum_j x_i y_j Pr(X = x_i, Y = y_j) \\ E(AD) &= p^2\times (2q\alpha)\times (-2q^2d) + 2pq \times (q-p)\alpha \times 2pqd + q^2 \times (-2p\alpha) \times (-2p^2d) \\ &= 4\alpha d p^2q^2(-q + q -p +p) \\ &=0 \end{align*}` --- # Additive and dominance variance - Genotypic model: `\(G = A + D\)` - The genotypic value could be partitioned into the breeding value and dominance deviation. `\begin{align*} & V_G = V_A + V_D + Cov_{A \times D} \\ & Cov_{A \times D} = 0 \\ & V_G = V_A + V_D \\ \end{align*}` -- ------------------ Now, let's calculate `\(V_A\)` and `\(V_D\)`: ### Variance: a measure of the __spread__ of a variable `\begin{align*} E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\ \end{align*}` --- # `\(V_A\)` `\begin{align*} E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\ \end{align*}` | Genotype | Freq | Breeding Value | `\(A^2\)` | Dominance Deviation | `\(D^2\)` | | :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(2q\alpha\)` | `\((2q\alpha)^2\)` | `\(-2q^2d\)` | | | `\(A_1A_2\)` | `\(2pq\)` | `\((q-p)\alpha\)` | `\((q-p)^2\alpha^2\)` | `\(2pqd\)` | | | `\(A_2A_2\)` | `\(q^2\)` | `\(-2p\alpha\)` | `\((-2p\alpha)^2\)` | `\(-2p^2d\)` | | -- These breeding values have a mean of zero, and their variance is the sum of the products of the genotype frequencies and the squared breeding values: `\begin{align*} V_A & = p^2(2q\alpha)^2 + 2pq(q-p)^2\alpha^2 + q^2(-2p\alpha)^2 \\ & = 2pq\alpha^2(2pq + (q-p)^2 + 2pq) \\ & = 2pq\alpha^2(p+q)^2 \\ & = 2pq\alpha^2 \\ & = 2pq(a + d(q-p))^2 \\ \end{align*}` --- # `\(V_D\)` `\begin{align*} E(X_i - \mu)^2 & = \sum{f_iX_i^2} - \mu^2 \\ \end{align*}` | Genotype | Freq | Breeding Value | `\(A^2\)` | Dominance Deviation | `\(D^2\)` | | :-------: | :-------: | :-----------: | :-------: | :-------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(2q\alpha\)` | `\((2q\alpha)^2\)` | `\(-2q^2d\)` | `\((-2q^2d)^2\)` | | `\(A_1A_2\)` | `\(2pq\)` | `\((q-p)\alpha\)` | `\((q-p)^2\alpha^2\)` | `\(2pqd\)` | `\((2pqd)^2\)` | | `\(A_2A_2\)` | `\(q^2\)` | `\(-2p\alpha\)` | `\((-2p\alpha)^2\)` | `\(-2p^2d\)` | `\((-2p^2d)^2\)` | Likewise, the variance due to dominance deviations is: the sum of the products of the genotype frequencies and the squared dominance deviation values. `\begin{align*} V_D & = p^2(-2q^2d)^2 + 2pq(2pqd)^2 + q^2(-2p^2d)^2 \\ & = 4p^2q^2d^2(q^2 + 2pq + p^2) \\ & = 4p^2q^2d^2 \\ & = (2pqd)^2 \\ \end{align*}` --- # Understanding `\(V_A\)` and `\(V_D\)` `\begin{align*} V_A & = 2pq\alpha^2 \\ & = 2pq(a + d(q-p))^2 \\ \end{align*}` The `\(V_A\)` is function of - Allele frequencies, which vary __among populations__. - And genotypic values, which are intrinsic properties of genotypes in a one-locus model Because `\(d\)` contributes to `\(V_A\)`, `\(V_A\)` therefore does not imply that the alleles act in a purely additive manner. --- # Understanding `\(V_A\)` and `\(V_D\)` `\begin{align*} V_D & = (2pqd)^2 \\ & = 4p^2q^2d^2 \end{align*}` The `\(V_D\)` is function of - Allele frequencies, which vary __among populations__. - The level of dominance ( `\(d\)` ) - `\(V_D\)` is zero when dominance is absent ( `\(d=0\)` ) - In this situation, the intralocus variance comprises only `\(V_A\)` --- # Covariance between relatives Close relatives in human, such as a parent and its offspring, have a higher degree of __resemblance__ than more distant relatives, such as an uncle and a niece. -- #### Genetics The __covariance between relatieves__ meansures the degree of genetic resemblance between related individuals in a population. - By defintion, the covariance between unrelated individuals is zero. -- #### Nongenetics Nongenetic factors can contribute to the degree of resemblance between relatives. - In plants, we assume that nongenetic effects among relatives are uncorrelated. - This assumption is met through the __randomization procedure__ in the experimental designs used in plant breeding. --- # General framework for genetic covariance <div align="center"> <img src="fig6.3.png" height=150> </div> If allele `\(A_i\)` carried by individual X is IBD to allele `\(A_k\)` in Y, then the covariance due to this allele is: `\begin{align*} Cov(\alpha_i, \alpha_k) & = E[(\alpha_i - \mu_{\alpha})(\alpha_k - \mu_{\alpha})] \\ & = E[(\alpha_i - \mu_{\alpha})^2] \\ & = Var(\alpha_i) \\ & = V_{\alpha_i} \end{align*}` Because `\(\alpha_i = \alpha_k\)` if alleles `\(A_i\)` and `\(A_k\)` are IBD. --- # Additive genetic covariance <div align="center"> <img src="fig6.3.png" height=150> </div> Alleles in individuals `\(X (A_iA_j)\)` and `\(Y (A_kA_l)\)` can be IBD through four possible events: `\begin{align*} & A_i \equiv A_k \\ & A_i \equiv A_l \\ & A_j \equiv A_k \\ & A_j \equiv A_l \\ \end{align*}` The probability of each of these four events is equal to `\(f_{XY}\)`, __the coefficient of coancestry__ between `\(X\)` and `\(Y\)`. --- # Additive genetic covariance <div align="center"> <img src="fig6.3.png" height=150> </div> Therefore, the covariance due to additive genetic effects (__breeding values__): `\begin{align*} Cov_\alpha(X, Y) = & P(x_i \equiv y_k)Cov(\alpha_i, \alpha_k) + P(x_i \equiv y_l)Cov(\alpha_i, \alpha_l) \\ & + P(x_j \equiv y_k)Cov(\alpha_j, \alpha_k) + P(x_j \equiv y_l)Cov(\alpha_j, \alpha_l) \\ = & 4f_{XY} V_{\alpha_i} \\ = & 2f_{XY} V_A \\ \end{align*}` Because `\(V_A = V(\alpha_i + \alpha_j) = 2V_{\alpha_i}\)` and `\(\alpha_i = \alpha_k\)` when alleles `\(i\)` and `\(k\)` are IBD. --- # Covariance due to dominance deviations <div align="center"> <img src="fig6.3.png" height=150> </div> To get dominance deviations, must be two alleles IBD: `\begin{align*} & A_i \equiv A_k, A_j \equiv A_l\\ & A_i \equiv A_l, A_j \equiv A_k \\ \end{align*}` -- Therefore, `\begin{align*} Cov_\delta(X, Y) = & P(x_i \equiv y_k, x_j \equiv y_l)Cov(\delta_{ij}, \delta_{kl}) + P(x_j \equiv y_l, x_j \equiv y_k)Cov(\delta_{ij}, \delta_{kl}) \\ = & (P(x_i \equiv y_k, x_j \equiv y_l) + P(x_j \equiv y_l, x_j \equiv y_k))Cov(\delta_{ij}, \delta_{kl}) \\ = & \Delta_{XY}V_D \\ \end{align*}` --- # Genetic covariances for general relatives `\begin{align*} & Cov_\alpha(X, Y) = 2f_{XY} V_A \\ & Cov_\delta(X, Y) = \Delta_{XY} V_D \\ \end{align*}` -- The genetic covariance between relative now is: `\begin{align*} Cov_G(X, Y) = 2f_{XY}V_A + \Delta_{XY}V_D \\ \end{align*}` -- ### Simplify it: `\begin{align*} Cov_G = rV_A + uV_D \end{align*}` Where, `\begin{align*} & r = 2f_{XY} \\ & u = \Delta_{XY} \\ \end{align*}` --- # Genetic covariances for general relatives `\begin{align*} Cov_G = rV_A + uV_D \end{align*}` Where, `\begin{align*} & r = 2f_{XY} \\ & u = \Delta_{XY} \\ \end{align*}` Note that `\(u\)` is normally zero unless they IBD through __both of their respective parents__. for example, full sibs and double first cousins. -- | | Relationship | Coancestry | r | u | | :-------: | :-------: | :-----------: | :-----------: | :-------: | :-------: | | First degree | Parent:offspring | 1/4 | 1/2 | 0 | | Second degree | Half sibs | 1/8 | 1/4 | 0 | | | Full sibs | 1/4 | 1/2 | __1/4__ | | | Grantparent:offspring | 1/8 | 1/4 | 0 | | Third degree | great-grantparent:offspring | 1/16 | 1/8 | 0 | --- # Parent-offspring ### From Breeding value - Parent genotypic value: `\(G = A + D\)`. - Offspring (half the breeding value of the parents) : `\(G= \frac{1}{2}A\)` -- Now, let's compute the covariance between a parent and its offspring. `\begin{align*} Cov(P, O) & = Cov(A + D, \frac{1}{2}A) \\ & = \frac{1}{2}Cov(A, A) + \frac{1}{2}Cov(A, D) \\ & = \frac{1}{2}V_A \\ \end{align*}` Because `\(Cov(A, D) = 0\)`. --- # Variance of testcross Suppose individuals from one population are crossed to a common tester from another population. The variance among the resulting testcrosses (__ `\(V_{testcross}\)` __) is obtained from the frequencies and testcross means. -- | Genotype | Freq with inbreeding | Testcross mean | | :-------: | :-------: | :-----------: | :-------: | | `\(A_1A_1\)` | `\(p^2 + pqF\)` | `\(q\alpha_T\)` | | `\(A_1A_2\)` | `\(2pq (1-F)\)` | `\(1/2(q-p)\alpha_T\)` | | `\(A_2A_2\)` | `\(q^2 + pqF\)` | `\(- p\alpha_T\)` | Here `\(F\)` denotes the inbreeding coefficient. -- The variance among testcrosses is then obtained as the sum of the products of the genotype frequencies and the square of the test genotypic effects. `\begin{align*} V_{testcross} & = (p^2 + pqF)(q\alpha_T)^2 + 2pq (1-F)(1/2(q-p)\alpha_T)^2 + (q^2 + pqF)(- p\alpha_T)^2\\ & = \frac{1}{2}(1+F)pq(a + d (q_T - p_T))^2 \\ & = \frac{1}{2}(1+F)V_{\alpha_i^T} \\ \end{align*}` --- # Variance of testcross `\begin{align*} V_{testcross} & = \frac{1}{2}(1+F)pq(a + d (q_T - p_T))^2 \\ & = \frac{1}{2}(1+F)V_{\alpha_i^T} \\ \end{align*}` The `\(V_{testcross}\)` is therefore a function of the allele frequencies - in the population ( `\(p\)` and `\(q\)` ) being testcrossed - and in the tester ( `\(p_T\)` and `\(q_T\)` ) -- It is useful in predicting the `\(V_{testcross}\)` at different __selfing generations__ and in determining the __appropriate generation for testcrossing__ in hybrid breeding programs.