Rice data

Download the Rice Diversity Panel data RiceDiversity.44K.MSU6.Genotypes_PLINK.zip from http://ricediversity.org/data/sets/44kgwas/.

Phenotype data

Simulate QTLs controlling flowering time

Simulate phenotype using genotype data given number of QTLs and heritability.

Apply the QTL simulation function to simulate phenotype with the given values of the h2 and number of QTLs nqtl.

## ### [simpheno], read in [ 1000 ] SNPs for [ 413 ] plants and simulated [ 2 ] QTLs

Plot the phenotypic distribution:

Average effect of the markers

\(Y = G + E\)

\(Y = \sum_{i=1}^n{G_i} + E\) for a population with \(i\) markers

\(G = A + D\)

If we find \(A_i\) for \(i\)th marker, then

\(\hat{Y} = \sum_{i=1}^n{A_i}\)

Single locus

  • A1A1 -> 2
  • A1A2 -> 1
  • A2A2 -> 0

Next we wrap it into a function to calcuate the avg effect for all the markers

Additive and dominance variance

Breeder’s equation

\[\begin{align*} R & = ih^2\sigma_P/L \\ & ih\sigma_A/L \\ \end{align*}\]

Because \(G = A + D\), then

\[\begin{align*} \sigma_G^2 & = \sigma_A^2 + \sigma_D^2 + 2\sigma_{A, D} \end{align*}\]

And \(\sigma_{A, D}=0\) in a HWE population, therefore,

\[\begin{align*} \sigma_G^2 & = \sigma_A^2 + \sigma_D^2 \end{align*}\]


Genotype Freq Breeding Value \(A^2\) Dominance Deviation \(D^2\)
\(A_1A_1\) \(p^2\) \(2q\alpha\) \((2q\alpha)^2\) \(-2q^2d\) \((-2q^2d)^2\)
\(A_1A_2\) \(2pq\) \((q-p)\alpha\) \((q-p)^2\alpha^2\) \(2pqd\) \((2pqd)^2\)
\(A_2A_2\) \(q^2\) \(-2p\alpha\) \((-2p\alpha)^2\) \(-2p^2d\) \((-2p^2d)^2\)

The additive and dominance genetic variance in a HWE population is:

\[\begin{align*} \sigma_A^2 & = 2pq(a + d(q-p))^2 \\ \sigma_D^2 & = (2pqd)^2 \\ \end{align*}\]

A sib analysis

A swine breeder collected a set of data from a sib design: - 2 sires + 34 dams
- Each pair of parents has four offspring
- Measured body weight of the newborns

  • Calculate the estimate of heritability
  • Comment on any possible bias in the estimated heritability

Get the ANOVA table

\(Y = S + D + E\)

\(V_P = V_S + V_D + V_E\)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## sire          1   6331    6331  17.978 3.73e-05 ***
## dam          33  24216     734   2.084  0.00141 ** 
## Residuals   163  57405     352                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variance components in a sib design

Source df Sums of Squares MS E(MS)
Sires s-1 \(dn\sum\limits_{i=1}^s(\bar{p}_i - \bar{p})^2\) \(MS_s\) \(= \sigma_w^2 + n\sigma_d^2 + dn\sigma_s^2\)
Dams (Sires) s(d-1) \(n\sum\limits_{i=1}^s\sum\limits_{j=1}^d(\bar{p}_{ij} - \bar{p}_i)^2\) \(MS_d\) \(= \sigma_w^2 + n\sigma_d^2\)
Sibs (Dams) sd(n-1) \(\sum\limits_{i=1}^s\sum\limits_{j=1}^d\sum\limits_{k=1}^n(p_{ijk} - \bar{p}_{ij})^2\) \(MS_w\) \(= \sigma_w^2\)

Summary of the variance components

A key concept in the ANOVA is that the variance between-family (group) is equal to the covariance within-family (group).

Observational Covariance and causal components estimated values
Sires \(\sigma_s^2 = Cov(HS)\) \(=\frac{1}{4}\sigma_A^2\)
Dams \(\sigma_d^2 = Cov(FS) - Cov(HS)\) \(=\frac{1}{4}\sigma_A^2 + \frac{1}{4}\sigma_D^2\)
Progeny \(\sigma_w^2 = V_P - Cov(FS)\) \(= \frac{1}{2}\sigma_A^2 +\frac{3}{4}\sigma_D^2\)
Total \(\sigma_T^2 = V_P = \sigma_s^2 + \sigma_d^2 + \sigma_w^2\) \(=\sigma_A^2 + \sigma_D^2 + \sigma_E^2\)
Sires + Dams \(\sigma_s^2 + \sigma_d^2 = Cov(FS)\) \(=\frac{1}{2}\sigma_A^2 + \frac{1}{4}\sigma_D^2\)

\(\sigma_s^2 = Cov(HS) = 1/4 \sigma_A^2\)

\(h^2 = V_A/V_P\)

## [1] 0.3462044

Vd

\(\sigma_d^2 = Cov(FS) - Cov(HS) = 1/4 \sigma_A^2 + 1/4 \sigma_D^2\)