A Linear Regression Perspective

G = A + D

  • A repsents the breeding value (i.e., A = \(\alpha_i + \alpha_j\))
  • D represents the dominance deviation

Further breakdown \(A\):

\[\begin{align*} G = & \alpha_1N_1 + \alpha_2N_2 + \delta \end{align*}\]

where,
- \(\alpha_i\) is the average effect of allele \(i\) and \(\alpha = \alpha_1 - \alpha_2\)
- \(N_i\) is the number of allele \(i\) carried by the genotype
- \(N \in \{0, 1, 2\}\) for a bi-allelic locus and \(N_1 + N_2 = 2\)

Therefore,

\[\begin{align*} G = & \alpha_1N_1 + \alpha_2N_2 + \delta \\ = & \alpha_1N_1 + \alpha_2(2 - N_1) + \delta \\ = & 2\alpha_2 + (\alpha_1 - \alpha_2)N_1 + \delta \\ = & (2\alpha_2 + \delta) + \alpha N_1 \end{align*}\]

Note that,

\[\begin{align*} A = & 2\alpha_2 + \alpha N_1 \\ D = & G - A \end{align*}\]

R functions

Built-in Functions

  • Almost everything in R is done through functions.
  • Here I’m only refering to numeric and character functions that are commonly used in creating or recoding variables.

Apply the function

##   N1         gv         bv         dd
## 1  0 -0.2222222 -0.4444444  0.2222222
## 2  1 -0.2222222  0.2222222 -0.4444444
## 3  2  1.7777778  0.8888889  0.8888889

Rice data

Download the Rice Diversity Panel data RiceDiversity.44K.MSU6.Genotypes_PLINK.zip from http://ricediversity.org/data/sets/44kgwas/.