class: center, middle, inverse, title-slide .title[ # Breeding value interpretation ] .author[ ### Jinliang Yang ] .date[ ### March 25, 2024 ] --- # Breeding value and dominance deviation | Genotype | Value as deviated from `\(M\)` | Breeding Value | Dominance Deviation | | :-------: | :-------: | :-----------: | :-----------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(2q(\alpha - qd)\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | | `\(A_1A_2\)` | `\((q-p)\alpha + 2pqd\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | | `\(A_2A_2\)` | `\(-2p(\alpha + pd)\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | -- ### G = A + D - The mean of the BV is **zero** - It follows that the mean dominance deviation is **zero**. --- # A Linear Regression Perspective ### G = A + D - **A** repsents the breeding value (i.e., A = `\(\alpha_i + \alpha_j\)`) - **D** represents the dominance deviation -- ### Further breakdown __A__: `\begin{align*} G = & \alpha_1N_1 + \alpha_2N_2 + \delta \end{align*}` -- - `\(\alpha_i\)` is the average effect of allele `\(i\)` and `\(\alpha = \alpha_1 - \alpha_2\)` - `\(N_i\)` is the number of allele `\(i\)` carried by the genotype - `\(N \in \{0, 1, 2\}\)` for a bi-allelic locus and `\(N_1 + N_2 = 2\)` -- Therefore, `\begin{align*} G = & \alpha_1N_1 + \alpha_2N_2 + \delta = \alpha_1N_1 + \alpha_2(2 - N_1) + \delta \\ = & 2\alpha_2 + (\alpha_1 - \alpha_2)N_1 + \delta \\ = & (2\alpha_2 + \delta) + \alpha N_1 \end{align*}` --- # Graphical Representation | Genotype | Value as deviated from `\(M\)` | Breeding Value | Dominance Deviation | | :-------: | :-------: | :-----------: | :-----------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(2q(\alpha - qd)\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | | `\(A_1A_2\)` | `\((q-p)\alpha + 2pqd\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | | `\(A_2A_2\)` | `\(-2p(\alpha + pd)\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | An R function to calculate genotypic values ```r gfunction <- function(a=1, d=0, p=1/2){ # a: additive effect # d: dominance effect # p: allele frequency for the A1 allele q = 1-p # allele sub effect alpha <- a + d*(q-p) a1a1 <- 2*alpha*q a1a2 <- (q-p)*alpha a2a2 <- -2*p*alpha # population mean M <- a*(p-q) + 2*d*p*q # return a data.frame with genotype values and breeding values return(data.frame(N1=c(0,1,2), gv=c(-a-M,d-M,a-M), bv=c(a2a2, a1a2, a1a1))) } ``` --- # Graphical Representation ### Apply the R function ```r out <- gfunction(a=1, d=-1, p=1/3) #out out$dd <- out$gv - out$bv out ``` ``` ## N1 gv bv dd ## 1 0 -0.2222222 -0.4444444 0.2222222 ## 2 1 -0.2222222 0.2222222 -0.4444444 ## 3 2 1.7777778 0.8888889 0.8888889 ``` --- # Plot the results ```r plot(out$N1, out$gv, xlab="Genotype", ylab="", ylim=c(-1, 2), cex.lab=1.5, xaxt="n", pch=17, cex=2, col="red"); # add the axis and labels axis(1, at=c(0, 1, 2), labels=c("A2A2", "A1A2", "A1A1")); # add y-axis title on the right side mtext("Breeding Value", side = 4, line = 1, cex=1.5, col="blue"); # add y-axis title on the left side mtext("Genotypic Value", side = 2, line = 2, cex=1.5, col="red") # add breeding values points(out$N1, out$bv, cex=2, col="blue") # join the points by a line lines(out$N1, out$bv, lwd=2, col="blue") ``` <img src="week10_c1_files/figure-html/unnamed-chunk-3-1.png" width="40%" style="display: block; margin: auto;" /> --- # Graphical Representation The relationship between genotypic values, breeding values and dominance deviations. `\begin{align*} G = & 2\alpha_2 + \alpha N_1 + \delta \end{align*}` -- .pull-left[ <div align="center"> <img src="linear.png" height=350> </div> ] -- .pull-right[ - The __slope__ is the average effect of allele substitution. - As `\(A_1\)` is substituted by `\(A_2\)`, the breeding value increaes at a rate equal to `\(\alpha\)`. - Dominance can be seen as __residuals__ from the fitted regression line. - Dominance deviation are the differences between the genotypic values and breeding values. ] --- # Predicting BV: a real-world data ```r # read phenotype and genotype data flowering <- read.table("https://jyanglab.com/slides/flowering.csv", sep=",", header=TRUE) dim(flowering) ``` ``` ## [1] 413 7 ``` ```r head(flowering) ``` ``` ## FID IID Flowering_time snp1 snp2 snp3 type ## 1 F1 1 85.08333 2 2 1 training ## 2 F2 3 101.50000 2 2 2 training ## 3 F3 4 106.50000 2 2 2 training ## 4 F4 5 99.50000 2 2 2 training ## 5 F5 6 95.08333 1 1 1 training ## 6 F6 7 111.00000 1 1 1 training ``` ```r table(flowering$type) ``` ``` ## ## test training ## 63 350 ``` ```r f <- subset(flowering, type == "training") ``` --- # Visualize the phenotypic data ```r library(ggplot2) p <- ggplot(f, aes(x=Flowering_time, fill=as.factor(snp1))) + geom_histogram(position="dodge")+ # Use brewer color palettes scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) + guides(fill=guide_legend(title="Genotype")) + theme_classic() + theme(legend.position=c(0.8, 0.8), axis.text=element_text(size=20), axis.title=element_text(size=20)) p ``` <img src="week10_c1_files/figure-html/unnamed-chunk-5-1.png" width="40%" style="display: block; margin: auto;" /> --- # A real-world data Let's find out the breeding value for each individual. -- ### Training - #### Step1: For each SNP, find out `\(a, d, p\)`, and `\(M\)` - #### Step2: Compute average effect for A1 and A2 -- ### Prediction - #### Step3: Sum up the BV of all SNPs --- # Training #### Step1: For each SNP, find out __ `\(a\)`__, __ `\(d\)`__, `\(p\)`, and `\(M\)` The number of copies of the A1 allele: - A1A1 -> 2 - A1A2 -> 1 - A2A2 -> 0 ```r A1A1 <- mean(subset(f, snp1 == 2)$Flowering_time, na.rm=TRUE) A1A2 <- mean(subset(f, snp1 == 1)$Flowering_time, na.rm=TRUE) A2A2 <- mean(subset(f, snp1 == 0)$Flowering_time, na.rm=TRUE) A1A1 ``` ``` ## [1] 99.13419 ``` ```r A1A2 ``` ``` ## [1] 96.27799 ``` ```r A2A2 ``` ``` ## [1] 92.94753 ``` --- # Training #### Step1: For each SNP, find out __ `\(a\)`__, __ `\(d\)`__, `\(p\)`, and `\(M\)` ```r ### mid point m <- (A1A1 + A2A2)/2 a <- A1A1 - m d <- A1A2 - m ``` --- # Training #### Step1: For each SNP, find out `\(a, d\)`, __ `\(p\)`__, and `\(M\)` Allele freq for __A1__ (Note it is in __`2`__ coding) ```r table(f$snp1) ``` ``` ## ## 0 1 2 ## 10 124 216 ``` -- ```r getp <- function(df=f$snp1){ t <- as.data.frame(table(df)) c11 <- subset(t, df==2)$Freq c12 <- subset(t, df==1)$Freq c22 <- subset(t, df==0)$Freq return((2*c11+c12)/(2*c11+2*c12 +2*c22)) } p <- getp(df=f$snp1) p ``` ``` ## [1] 0.7942857 ``` ```r q <- 1 -p ``` --- # Get population mean #### Step1: For each SNP, find out `\(a, d\)`, `\(p\)`, and __ `\(M\)`__ ```r M1 <- mean(f$Flowering_time, na.rm=T) M1 ``` ``` ## [1] 97.96743 ``` ```r M2 <- a*(p-q) + 2*d*p*q M2 ``` ``` ## [1] 1.898136 ``` ```r M1 - M2 ``` ``` ## [1] 96.0693 ``` ```r m ``` ``` ## [1] 96.04086 ``` --- # Training #### Step2: Average effect for A1 and A2 | Genotype | Value as deviated from `\(M\)` | Breeding Value | Dominance Deviation | | :-------: | :-------: | :-----------: | :-----------: | :-------: | :-------: | | `\(A_1A_1\)` | `\(2q(\alpha - qd)\)` | `\(2q\alpha\)` | `\(-2q^2d\)` | | `\(A_1A_2\)` | `\((q-p)\alpha + 2pqd\)` | `\((q-p)\alpha\)` | `\(2pqd\)` | | `\(A_2A_2\)` | `\(-2p(\alpha + pd)\)` | `\(-2p\alpha\)` | `\(-2p^2d\)` | -- ```r # allele sub effect alpha <- a + d*(q-p) a1a1 <- 2*alpha*q a1a2 <- (q-p)*alpha a2a2 <- -2*p*alpha ``` -- Apply the previous `gfunction` to get BV for snp1 ```r out2 <- gfunction(a=a, d=d, p=p) out2 ``` ``` ## N1 gv bv ## 1 0 -4.991466 -4.692268 ## 2 1 -1.661012 -1.738502 ## 3 2 1.195194 1.215264 ``` --- ### Repeat the process for snp2 and snp3 ```r bv <- function(f, snpid="snp1", trait="Flowering_time"){ A1A1 <- mean(f[f[, snpid]==0, trait], na.rm=TRUE) A1A2 <- mean(f[f[, snpid]==1, trait], na.rm=TRUE) A2A2 <- mean(f[f[, snpid]==2, trait], na.rm=TRUE) m <- (A1A1 + A2A2)/2 a <- A1A1 - m d <- A1A2 - m p <- getp(df=f[, snpid]) q <- 1 -p alpha <- a + d*(q-p) a1a1 <- 2*alpha*q a1a2 <- (q-p)*alpha a2a2 <- -2*p*alpha M <- a*(p-q) + 2*d*p*q out <- data.frame(N1=c(0,1,2), gv=c(-a-M,d-M,a-M), bv=c(a2a2, a1a2, a1a1)) return(out) } bv(f, snpid="snp1", trait="Flowering_time") ``` ``` ## N1 gv bv ## 1 0 4.836486 5.135684 ## 2 1 1.980280 1.902789 ## 3 2 -1.350175 -1.330105 ``` --- # Prediction ### Step3: Sum up the BV of all SNPs BV is the sum of the average effects of the alleles an individual carries. `\begin{align*} BV = \sum_{i=1}^k\sum_{j=1}^2\alpha_{ij} \end{align*}` Where summation occurs across the number of loci ( `\(k\)` ) and the two alleles present at each locus. -- ```r b1 <- bv(f, snpid="snp1", trait="Flowering_time") b2 <- bv(f, snpid="snp2", trait="Flowering_time") b3 <- bv(f, snpid="snp3", trait="Flowering_time") b3 ``` ``` ## N1 gv bv ## 1 0 1.153707 2.405023 ## 2 1 1.225328 0.640331 ## 3 2 -1.397850 -1.124361 ``` --- # Prediction ### Step3: Sum up the BV of all SNPs ```r t <- subset(flowering, type=="test") dim(t) ``` ``` ## [1] 63 7 ``` ```r head(t, 4) ``` ``` ## FID IID Flowering_time snp1 snp2 snp3 type ## 351 F351 52 NA 2 2 0 test ## 352 F352 54 118.8333 1 2 2 test ## 353 F353 62 NA 2 1 2 test ## 354 F354 64 NA 2 2 2 test ``` -- BV for individual `F356`: ```r b1[b1$N1 ==2, ]$bv + b2[b2$N1 ==2, ]$bv + b3[b3$N1 ==2, ]$bv + M1 ``` ``` ## [1] 95.48984 ``` --- # Prediction ### Step3: Sum up the BV of all SNPs ```r mygs <- function(t, b1, b2, b3){ t$yhat <- -9 for(i in 1:nrow(t)){ t$yhat[i] <- b1[b1$N1 == t$snp1[i], ]$bv + b2[b2$N1 == t$snp2[i], ]$bv + b3[b3$N1 == t$snp3[i], ]$bv } return(t) } out <- mygs(t, b1, b2, b3) out <- subset(out, !is.na(Flowering_time)) cor.test(out$Flowering_time, out$yhat+M1) ``` ``` ## ## Pearson's product-moment correlation ## ## data: out$Flowering_time and out$yhat + M1 ## t = -1.1976, df = 55, p-value = 0.2362 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.4032380 0.1055306 ## sample estimates: ## cor ## -0.1594208 ```