Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Genomic Selection

Jinliang Yang

April 22, 2025

1 / 36

The Linear Mixed Model

y=Xb+Zu+e where,

  • y is a vector of observed phenotypes
  • X is the design or incidence matrix
  • b is the vector of the fixed effects to be estimated
  • Z is the incidence matrix for random effects
  • u is the vector of the random effects to be predicted
  • e is the vector of residuals.

[ue]N[0,(Gσ2u00Rσ2e)]

Or, uN(0,G)eN(0,R)

2 / 36

The covarinace matrix using pedigree

Morex Robust Stander Excel
Morex 1 1/2 11/32 7/16
Robust 1 43/64 27/32
Stander 1 91/128
Excel 1

V(u)=AVA=[2111/167/81243/3227/1611/1643/32291/647/827/1691/642]VA

Where the elements of A, the additive relationship matrix, are equal to twice the fXY among inbreds.


Genetic covariances for general relatives is

Cov(X,Y)=2fXYVA+ΔXYVD

3 / 36

Pedigree-based relationship

Parental contributions to progeny may differ from those expected from pedigrees.

4 / 36

Pedigree-based relationship

Parental contributions to progeny may differ from those expected from pedigrees.

  • For example, F2 => Recombinant Inbred lines (RILs)
  • Expected to have a parental contribution of 50%, but fXY deviates from its expected value.
5 / 36

Genomic relationship

Marker similarity

The expected marker similarity between X and Y is equal to:

SXY=fXY(1fXY)θXY

  • Where θXY is the probability that a marker allele from a random parent X and a marker allele from a random parent of Y are IBS, given that they are not IBD.
  • If fXY=0, SXY=θXY
6 / 36

Genomic relationship

Marker similarity

The expected marker similarity between X and Y is equal to:

SXY=fXY(1fXY)θXY

  • Where θXY is the probability that a marker allele from a random parent X and a marker allele from a random parent of Y are IBS, given that they are not IBD.
  • If fXY=0, SXY=θXY

  • θXY can be estimated from unrelated individuals as

    • the average nucleotide diversity: θ
7 / 36

Genomic relationship

Rearrange the marker similarity equation and replace θXY with θ:

fXY=SXYθXY1θXY=SXYθ1θ

8 / 36

Genomic relationship

Rearrange the marker similarity equation and replace θXY with θ:

fXY=SXYθXY1θXY=SXYθ1θ

Marker-based fXY: the probability of identity by descent (IBD) between two individuals by accounting for the frequency of marker alleles that are identity by state (IBS).

9 / 36

Genomic relationship

Rearrange the marker similarity equation and replace θXY with θ:

fXY=SXYθXY1θXY=SXYθ1θ

Marker-based fXY: the probability of identity by descent (IBD) between two individuals by accounting for the frequency of marker alleles that are identity by state (IBS).

Example:

  1. two unrelated individual A and B: θAB=0.5
  2. A and C are related: SAC=0.6
  3. therefore, fAC=SACθ1θ=0.2
10 / 36

GBLUP

  • Step1: Coefficients of coancestry calculated from genome-wide markers instead of pedigree records

  • Step2: Genomic relationship (G) matrix replaces the additive relationship matrix or A matrix

    • G matrix can also be described as a realized relationship matrix because it captures the realized rather than the expected relatedness.
  • Step3: Fit the linear mixed model or the GBLUP model.

11 / 36

GBLUP

  • Step1: Coefficients of coancestry calculated from genome-wide markers instead of pedigree records

  • Step2: Genomic relationship (G) matrix replaces the additive relationship matrix or A matrix

    • G matrix can also be described as a realized relationship matrix because it captures the realized rather than the expected relatedness.
  • Step3: Fit the linear mixed model or the GBLUP model.


  • Empirical results in plants have shown that GBLUP is superior to BLUP.

    • For maize yield, the predictive ability ranged from 0.7 to 0.8 with GBLUP vs. 0.66 to 0.79 with BLUP (Bernardo, 1994)

    • Recent results with high density markers: 0.10-0.25 higher (Albrecht et al., 2014; Schrag et al., 2019)

12 / 36

Genomic Selection

Genomic selection is a procedure that utilizes a large set of random markers in marker-based selection.

  • GS can be considered as marker-based selection without QTL mapping!

Prediction:

Obtaining genome-enabled prediction of genotypic value

Selection:

Selection is conducted on the basis of such predicted values.

13 / 36

Genomic Selection

Genomic selection is a procedure that utilizes a large set of random markers in marker-based selection.

  • GS can be considered as marker-based selection without QTL mapping!

Prediction:

Obtaining genome-enabled prediction of genotypic value

Selection:

Selection is conducted on the basis of such predicted values.

Meuwissen et al. (2001) published the landmark paper outlining the application of BLUP of allele effects to breeding, specifically in the context of animal breeding.

Meuwissen, et al.,2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819-1829.

14 / 36

Genomic Selection Procedure

Heffner, E.L., M.E. Sorrells, and J.L. Jannink. 2009. Genomic selection for crop improvement. Crop Sci. 49:1-12.

15 / 36

A GS experimental design

Bernardo 2020. textbook

  • Suppose N=150 F3 families are developed from the cross between two maize inbreds.

  • These F3 families are evaluated for their testcross performance and are genotyped with N=384 random SNP markers.

    • Pooled amplicon sequencing
    • Genotyping-by-sequencing
    • SNP array
    • Whole genome sequencing
  • The yield trials are conducted in the same set of environments and the data are assumed balanced so that the only fixed effect is the overall mean.

16 / 36

A GS example

The performance of the testcrosses on an entry-mean basis can be modeled as:

y=1μ+Zm+e

where,

  • y is a vector of testcross phenotypic means
  • 1 is an N×1 vector with all elements equal to 1
  • μ is the fixed effect of the overall mean
  • Z is the incidence matrix for random effects of SNP genotype
  • m is the vector of the random effects for each of the SNP markers
  • e is the vector of residuals
17 / 36

A GS example

The performance of the testcrosses on an entry-mean basis can be modeled as:

y=1μ+Zm+e

where,

  • y is a vector of testcross phenotypic means
  • 1 is an N×1 vector with all elements equal to 1
  • μ is the fixed effect of the overall mean
  • Z is the incidence matrix for random effects of SNP genotype
  • m is the vector of the random effects for each of the SNP markers
  • e is the vector of residuals

About Z matrix

  • The Z matrix is coded (first homozygous =1, heterozygous =0, 2nd homozygous =1) with marker genotype.
  • The SNP effect is defined as the effect associated with the first homozygous inbred, coded with 1.
18 / 36

A GS example

The performance of the testcrosses on an entry-mean basis can be modeled as:

y=1μ+Zm+e

Note that fitting marker effects as random instead of fixed does not require degrees of freedom

  • The number of marker loci (N=384) can exceed the population size (N=150)
19 / 36

A GS example

The performance of the testcrosses on an entry-mean basis can be modeled as:

y=1μ+Zm+e

Note that fitting marker effects as random instead of fixed does not require degrees of freedom

  • The number of marker loci (N=384) can exceed the population size (N=150)

The covariance matrix of m can be modeled as:

Meuwiseen et al., 2001.

V(m)=IVMi=I(VG/NM)

  • where I is an identity matrix
  • VMi is the variance due to each marker locus
  • VG is the type of genetic variance expressed among the progeny being evaluated.
  • NM is the number of markers
20 / 36

Ridge regression-BLUP

y=1μ+Zm+e

V(m)=IVMi=I(VG/NM)

This convenient way to calculated the effects of genome-wide markers by BLUP is called ridge regression BLUP or RR-BLUP.

21 / 36

Ridge regression-BLUP

y=1μ+Zm+e

V(m)=IVMi=I(VG/NM)

This convenient way to calculated the effects of genome-wide markers by BLUP is called ridge regression BLUP or RR-BLUP.

Two assumptions in RR-BLUP

  • (1) Each random marker is assumed to account for an equal amount of the genetic variance

    • This does not mean that the predicted marker effects are equal
    • It simply means that the marker effects have the same underlying genetic variance
  • (2) Epistasis is ignored for convenience in the prediction

22 / 36

The Barley Example

Materials

Four related barley cultivars:

  • Morex, Robust, Excel, Stander

Experimental design

  • M, R, and S in 18 environments (set1)
  • R, E, and S in 9 environments (set2)
23 / 36

The Barley Example

Materials

Four related barley cultivars:

  • Morex, Robust, Excel, Stander

Experimental design

  • M, R, and S in 18 environments (set1)
  • R, E, and S in 9 environments (set2)
Env Number inbred Grain yield SNP1 SNP2 SNP3
Set1 18 M 4.45 C A C
Set1 18 R 4.61 C G C
Set1 18 S 5.27 T A A
Set2 9 R 5.00 C G C
Set2 9 E 5.82 T G C
Set2 9 S 5.79 T A A
24 / 36

The Linear Mixed Model

y=Xb+Zm+e

  • y is a vector of observed grain yield phenotype
  • X is the design or incidence matrix
  • b is the vector of the fixed effects due to sets of yield trials
  • e is the vector of residuals.
25 / 36

The Linear Mixed Model

y=Xb+Zm+e

  • y is a vector of observed grain yield phenotype
  • X is the design or incidence matrix
  • b is the vector of the fixed effects due to sets of yield trials
  • e is the vector of residuals.

  • Z is the incidence matrix for the SNP markers

  • m is the vector of the random effects due to SNPs

V(m)=IVMi=I(VG/NM)

26 / 36

The Linear Mixed Model

y=Xb+Zm+e

  • y is a vector of observed grain yield phenotype
  • X is the design or incidence matrix
  • b is the vector of the fixed effects due to sets of yield trials
  • e is the vector of residuals.

  • Z is the incidence matrix for the SNP markers

  • m is the vector of the random effects due to SNPs

V(m)=IVMi=I(VG/NM)

SNP Coding

We will use biallelic SNPs and consider Morex (coded as 1) as our reference

  • 1: If an individual is homozygous for the allele carried by Morex
  • 0: heterozygous
  • 1: if homozygous for the allele not carried by Morex
27 / 36

Z Matrix

Env Number inbred Grain yield SNP1 SNP2 SNP3
Set1 18 M 4.45 C A C
Set1 18 R 4.61 C G C
Set1 18 S 5.27 T A A
Set2 9 R 5.00 C G C
Set2 9 E 5.82 T G C
Set2 9 S 5.79 T A A

Get Z matrix into R

28 / 36

Z Matrix

Env Number inbred Grain yield SNP1 SNP2 SNP3
Set1 18 M 4.45 C A C
Set1 18 R 4.61 C G C
Set1 18 S 5.27 T A A
Set2 9 R 5.00 C G C
Set2 9 E 5.82 T G C
Set2 9 S 5.79 T A A

Get Z matrix into R

# let's input data by column
Z <- matrix(c(1,1,-1,1,-1,-1,
1,-1,1,-1,-1,1,
1,1,-1,1,1,-1), byrow=FALSE, nrow=6)
Z
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 -1 1
## [3,] -1 1 -1
## [4,] 1 -1 1
## [5,] -1 -1 1
## [6,] -1 1 -1
29 / 36

Solve the MME

[ˆbˆu]=[XR1XXR1ZZR1XZR1Z+G1Ve/Vu]1[XR1yZR1y]

30 / 36

Solve the MME

[ˆbˆu]=[XR1XXR1ZZR1XZR1Z+G1Ve/Vu]1[XR1yZR1y]

In our case,

[ˆbˆm]=[XR1XXR1ZZR1XZR1Z+IVe/VMi]1[XR1yZR1y]

31 / 36

Solve the MME

[ˆbˆu]=[XR1XXR1ZZR1XZR1Z+G1Ve/Vu]1[XR1yZR1y]

In our case,

[ˆbˆm]=[XR1XXR1ZZR1XZR1Z+IVe/VMi]1[XR1yZR1y]

Here, we suppose the heritability is H2=0.5.

H2=VGVG+VeVMi=VGNMVeVMi=(1H2)×NMH2=3

32 / 36

Solve the MME

y <- matrix(c(4.45, 4.61, 5.27, 5.00, 5.82, 5.79), byrow=FALSE, nrow=6)
X <- matrix(c(1,1,1,0,0,0, 0, 0,0, 1, 1,1), byrow=FALSE, nrow=6)
R <- matrix(c(1/18,0,0,0,0,0, 0,1/18,0,0,0,0, 0,0,1/18,0,0,0,
0,0,0,1/9,0,0, 0,0,0,0,1/9,0, 0,0,0,0,0,1/9), nrow=6, byrow=T)
33 / 36

Solve the MME

y <- matrix(c(4.45, 4.61, 5.27, 5.00, 5.82, 5.79), byrow=FALSE, nrow=6)
X <- matrix(c(1,1,1,0,0,0, 0, 0,0, 1, 1,1), byrow=FALSE, nrow=6)
R <- matrix(c(1/18,0,0,0,0,0, 0,1/18,0,0,0,0, 0,0,1/18,0,0,0,
0,0,0,1/9,0,0, 0,0,0,0,1/9,0, 0,0,0,0,0,1/9), nrow=6, byrow=T)

[ˆbˆm]=[XR1XXR1ZZR1XZR1Z+IVe/VMi]1[XR1yZR1y]

34 / 36

Solve the MME

y <- matrix(c(4.45, 4.61, 5.27, 5.00, 5.82, 5.79), byrow=FALSE, nrow=6)
X <- matrix(c(1,1,1,0,0,0, 0, 0,0, 1, 1,1), byrow=FALSE, nrow=6)
R <- matrix(c(1/18,0,0,0,0,0, 0,1/18,0,0,0,0, 0,0,1/18,0,0,0,
0,0,0,1/9,0,0, 0,0,0,0,1/9,0, 0,0,0,0,0,1/9), nrow=6, byrow=T)

[ˆbˆm]=[XR1XXR1ZZR1XZR1Z+IVe/VMi]1[XR1yZR1y]

solve_mme <- function(X, y, R, Z, H2, nmarker){
a11 <- t(X) %*% solve(R) %*% X
a12 <- t(X) %*% solve(R) %*% Z
a21 <- t(Z) %*% solve(R) %*% X
a22 <- t(Z) %*% solve(R) %*% Z
v = (1-H2)*nmarker/H2
a22_2 <- diag(3)*3
lhs <- rbind(cbind(a11, a12), cbind(a21, a22 + a22_2))
rhs <- rbind(t(X) %*% solve(R) %*% y, t(Z) %*% solve(R) %*% y)
eff <- solve(lhs) %*% rhs
return(eff)
}
eff <- solve_mme(X, y, R, Z, H2=0.5, nmarker=3)
35 / 36

The marker effects

Env Number inbred Grain yield SNP1 SNP2 SNP3
Set1 18 M 4.45 C A C
Set1 18 R 4.61 C G C
Set1 18 S 5.27 T A A
Set2 9 R 5.00 C G C
Set2 9 E 5.82 T G C
Set2 9 S 5.79 T A A
eff
## [,1]
## [1,] 4.93475643
## [2,] 5.41898437
## [3,] -0.34842360
## [4,] -0.06523449
## [5,] -0.06061120

The effect of SNP1 ^m1=0.35 indicates that at SNP1, the allele carried by Morex leads to a lower trait value.

If the genotype at SNP1 changes from CC to CT, the predicted value for yield would increase by 0.35. CC -> TT, increase by 0.7.

36 / 36

The Linear Mixed Model

y=Xb+Zu+e where,

  • y is a vector of observed phenotypes
  • X is the design or incidence matrix
  • b is the vector of the fixed effects to be estimated
  • Z is the incidence matrix for random effects
  • u is the vector of the random effects to be predicted
  • e is the vector of residuals.

[ue]N[0,(Gσ2u00Rσ2e)]

Or, uN(0,G)eN(0,R)

2 / 36
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow