Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Breeding value interpretation

Jinliang Yang

March 25, 2024

1 / 32

Breeding value and dominance deviation

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d
2 / 32

Breeding value and dominance deviation

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d

G = A + D

  • The mean of the BV is zero

  • It follows that the mean dominance deviation is zero.

3 / 32

A Linear Regression Perspective

G = A + D

  • A repsents the breeding value (i.e., A = αi+αj)
  • D represents the dominance deviation
4 / 32

A Linear Regression Perspective

G = A + D

  • A repsents the breeding value (i.e., A = αi+αj)
  • D represents the dominance deviation

Further breakdown A:

G=α1N1+α2N2+δ

5 / 32

A Linear Regression Perspective

G = A + D

  • A repsents the breeding value (i.e., A = αi+αj)
  • D represents the dominance deviation

Further breakdown A:

G=α1N1+α2N2+δ

  • αi is the average effect of allele i and α=α1α2
  • Ni is the number of allele i carried by the genotype
  • N{0,1,2} for a bi-allelic locus and N1+N2=2
6 / 32

A Linear Regression Perspective

G = A + D

  • A repsents the breeding value (i.e., A = αi+αj)
  • D represents the dominance deviation

Further breakdown A:

G=α1N1+α2N2+δ

  • αi is the average effect of allele i and α=α1α2
  • Ni is the number of allele i carried by the genotype
  • N{0,1,2} for a bi-allelic locus and N1+N2=2

Therefore,

G=α1N1+α2N2+δ=α1N1+α2(2N1)+δ=2α2+(α1α2)N1+δ=(2α2+δ)+αN1

7 / 32

Graphical Representation

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d

An R function to calculate genotypic values

gfunction <- function(a=1, d=0, p=1/2){
# a: additive effect
# d: dominance effect
# p: allele frequency for the A1 allele
q = 1-p
# allele sub effect
alpha <- a + d*(q-p)
a1a1 <- 2*alpha*q
a1a2 <- (q-p)*alpha
a2a2 <- -2*p*alpha
# population mean
M <- a*(p-q) + 2*d*p*q
# return a data.frame with genotype values and breeding values
return(data.frame(N1=c(0,1,2), gv=c(-a-M,d-M,a-M), bv=c(a2a2, a1a2, a1a1)))
}
8 / 32

Graphical Representation

Apply the R function

out <- gfunction(a=1, d=-1, p=1/3)
#out
out$dd <- out$gv - out$bv
out
## N1 gv bv dd
## 1 0 -0.2222222 -0.4444444 0.2222222
## 2 1 -0.2222222 0.2222222 -0.4444444
## 3 2 1.7777778 0.8888889 0.8888889
9 / 32

Plot the results

plot(out$N1, out$gv, xlab="Genotype", ylab="", ylim=c(-1, 2), cex.lab=1.5, xaxt="n", pch=17, cex=2, col="red");
# add the axis and labels
axis(1, at=c(0, 1, 2), labels=c("A2A2", "A1A2", "A1A1"));
# add y-axis title on the right side
mtext("Breeding Value", side = 4, line = 1, cex=1.5, col="blue");
# add y-axis title on the left side
mtext("Genotypic Value", side = 2, line = 2, cex=1.5, col="red")
# add breeding values
points(out$N1, out$bv, cex=2, col="blue")
# join the points by a line
lines(out$N1, out$bv, lwd=2, col="blue")

10 / 32

Graphical Representation

The relationship between genotypic values, breeding values and dominance deviations.

G=2α2+αN1+δ

11 / 32

Graphical Representation

The relationship between genotypic values, breeding values and dominance deviations.

G=2α2+αN1+δ

12 / 32

Graphical Representation

The relationship between genotypic values, breeding values and dominance deviations.

G=2α2+αN1+δ

  • The slope is the average effect of allele substitution.

    • As A1 is substituted by A2, the breeding value increaes at a rate equal to α.
  • Dominance can be seen as residuals from the fitted regression line.

    • Dominance deviation are the differences between the genotypic values and breeding values.
13 / 32

Predicting BV: a real-world data

# read phenotype and genotype data
flowering <- read.table("https://jyanglab.com/slides/flowering.csv",
sep=",", header=TRUE)
dim(flowering)
## [1] 413 7
head(flowering)
## FID IID Flowering_time snp1 snp2 snp3 type
## 1 F1 1 85.08333 2 2 1 training
## 2 F2 3 101.50000 2 2 2 training
## 3 F3 4 106.50000 2 2 2 training
## 4 F4 5 99.50000 2 2 2 training
## 5 F5 6 95.08333 1 1 1 training
## 6 F6 7 111.00000 1 1 1 training
table(flowering$type)
##
## test training
## 63 350
f <- subset(flowering, type == "training")
14 / 32

Visualize the phenotypic data

library(ggplot2)
p <- ggplot(f, aes(x=Flowering_time, fill=as.factor(snp1))) +
geom_histogram(position="dodge")+
# Use brewer color palettes
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) +
guides(fill=guide_legend(title="Genotype")) +
theme_classic() +
theme(legend.position=c(0.8, 0.8), axis.text=element_text(size=20),
axis.title=element_text(size=20))
p

15 / 32

A real-world data

Let's find out the breeding value for each individual.

16 / 32

A real-world data

Let's find out the breeding value for each individual.

Training

  • Step1: For each SNP, find out a,d,p, and M

  • Step2: Compute average effect for A1 and A2

17 / 32

A real-world data

Let's find out the breeding value for each individual.

Training

  • Step1: For each SNP, find out a,d,p, and M

  • Step2: Compute average effect for A1 and A2

Prediction

  • Step3: Sum up the BV of all SNPs

18 / 32

Training

Step1: For each SNP, find out a, d, p, and M

The number of copies of the A1 allele:

  • A1A1 -> 2
  • A1A2 -> 1
  • A2A2 -> 0
A1A1 <- mean(subset(f, snp1 == 2)$Flowering_time, na.rm=TRUE)
A1A2 <- mean(subset(f, snp1 == 1)$Flowering_time, na.rm=TRUE)
A2A2 <- mean(subset(f, snp1 == 0)$Flowering_time, na.rm=TRUE)
A1A1
## [1] 99.13419
A1A2
## [1] 96.27799
A2A2
## [1] 92.94753
19 / 32

Training

Step1: For each SNP, find out a, d, p, and M

### mid point
m <- (A1A1 + A2A2)/2
a <- A1A1 - m
d <- A1A2 - m
20 / 32

Training

Step1: For each SNP, find out a,d, p, and M

Allele freq for A1 (Note it is in 2 coding)

table(f$snp1)
##
## 0 1 2
## 10 124 216
21 / 32

Training

Step1: For each SNP, find out a,d, p, and M

Allele freq for A1 (Note it is in 2 coding)

table(f$snp1)
##
## 0 1 2
## 10 124 216
getp <- function(df=f$snp1){
t <- as.data.frame(table(df))
c11 <- subset(t, df==2)$Freq
c12 <- subset(t, df==1)$Freq
c22 <- subset(t, df==0)$Freq
return((2*c11+c12)/(2*c11+2*c12 +2*c22))
}
p <- getp(df=f$snp1)
p
## [1] 0.7942857
q <- 1 -p
22 / 32

Get population mean

Step1: For each SNP, find out a,d, p, and M

M1 <- mean(f$Flowering_time, na.rm=T)
M1
## [1] 97.96743
M2 <- a*(p-q) + 2*d*p*q
M2
## [1] 1.898136
M1 - M2
## [1] 96.0693
m
## [1] 96.04086
23 / 32

Training

Step2: Average effect for A1 and A2

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d
24 / 32

Training

Step2: Average effect for A1 and A2

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d
# allele sub effect
alpha <- a + d*(q-p)
a1a1 <- 2*alpha*q
a1a2 <- (q-p)*alpha
a2a2 <- -2*p*alpha
25 / 32

Training

Step2: Average effect for A1 and A2

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d
# allele sub effect
alpha <- a + d*(q-p)
a1a1 <- 2*alpha*q
a1a2 <- (q-p)*alpha
a2a2 <- -2*p*alpha

Apply the previous gfunction to get BV for snp1

out2 <- gfunction(a=a, d=d, p=p)
out2
## N1 gv bv
## 1 0 -4.991466 -4.692268
## 2 1 -1.661012 -1.738502
## 3 2 1.195194 1.215264
26 / 32

Repeat the process for snp2 and snp3

bv <- function(f, snpid="snp1", trait="Flowering_time"){
A1A1 <- mean(f[f[, snpid]==0, trait], na.rm=TRUE)
A1A2 <- mean(f[f[, snpid]==1, trait], na.rm=TRUE)
A2A2 <- mean(f[f[, snpid]==2, trait], na.rm=TRUE)
m <- (A1A1 + A2A2)/2
a <- A1A1 - m
d <- A1A2 - m
p <- getp(df=f[, snpid])
q <- 1 -p
alpha <- a + d*(q-p)
a1a1 <- 2*alpha*q
a1a2 <- (q-p)*alpha
a2a2 <- -2*p*alpha
M <- a*(p-q) + 2*d*p*q
out <- data.frame(N1=c(0,1,2), gv=c(-a-M,d-M,a-M), bv=c(a2a2, a1a2, a1a1))
return(out)
}
bv(f, snpid="snp1", trait="Flowering_time")
## N1 gv bv
## 1 0 4.836486 5.135684
## 2 1 1.980280 1.902789
## 3 2 -1.350175 -1.330105
27 / 32

Prediction

Step3: Sum up the BV of all SNPs

BV is the sum of the average effects of the alleles an individual carries.

BV=ki=12j=1αij

Where summation occurs across the number of loci ( k ) and the two alleles present at each locus.

28 / 32

Prediction

Step3: Sum up the BV of all SNPs

BV is the sum of the average effects of the alleles an individual carries.

BV=ki=12j=1αij

Where summation occurs across the number of loci ( k ) and the two alleles present at each locus.

b1 <- bv(f, snpid="snp1", trait="Flowering_time")
b2 <- bv(f, snpid="snp2", trait="Flowering_time")
b3 <- bv(f, snpid="snp3", trait="Flowering_time")
b3
## N1 gv bv
## 1 0 1.153707 2.405023
## 2 1 1.225328 0.640331
## 3 2 -1.397850 -1.124361
29 / 32

Prediction

Step3: Sum up the BV of all SNPs

t <- subset(flowering, type=="test")
dim(t)
## [1] 63 7
head(t, 4)
## FID IID Flowering_time snp1 snp2 snp3 type
## 351 F351 52 NA 2 2 0 test
## 352 F352 54 118.8333 1 2 2 test
## 353 F353 62 NA 2 1 2 test
## 354 F354 64 NA 2 2 2 test
30 / 32

Prediction

Step3: Sum up the BV of all SNPs

t <- subset(flowering, type=="test")
dim(t)
## [1] 63 7
head(t, 4)
## FID IID Flowering_time snp1 snp2 snp3 type
## 351 F351 52 NA 2 2 0 test
## 352 F352 54 118.8333 1 2 2 test
## 353 F353 62 NA 2 1 2 test
## 354 F354 64 NA 2 2 2 test

BV for individual F356:

b1[b1$N1 ==2, ]$bv + b2[b2$N1 ==2, ]$bv + b3[b3$N1 ==2, ]$bv + M1
## [1] 95.48984
31 / 32

Prediction

Step3: Sum up the BV of all SNPs

mygs <- function(t, b1, b2, b3){
t$yhat <- -9
for(i in 1:nrow(t)){
t$yhat[i] <- b1[b1$N1 == t$snp1[i], ]$bv + b2[b2$N1 == t$snp2[i], ]$bv + b3[b3$N1 == t$snp3[i], ]$bv
}
return(t)
}
out <- mygs(t, b1, b2, b3)
out <- subset(out, !is.na(Flowering_time))
cor.test(out$Flowering_time, out$yhat+M1)
##
## Pearson's product-moment correlation
##
## data: out$Flowering_time and out$yhat + M1
## t = -1.1976, df = 55, p-value = 0.2362
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4032380 0.1055306
## sample estimates:
## cor
## -0.1594208
32 / 32

Breeding value and dominance deviation

Genotype Value as deviated from M Breeding Value Dominance Deviation
A1A1 2q(αqd) 2qα 2q2d
A1A2 (qp)α+2pqd (qp)α 2pqd
A2A2 2p(α+pd) 2pα 2p2d
2 / 32
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow