class: center, middle, inverse, title-slide # Phenotypic values and variances ### Jinliang Yang ### AprilĀ 1, 2025 --- # Select for shorter plants <div align="center"> <img src="height.png" height=200> </div> - Tom collected a population of 20 maize landraces to select short plants. - Then phenotypically characterized the 20 landraces and selected two of them as the founder lines to make an F2 population. - After phenotyping and genotyping, Tom needs to determine which individual plants be more valuable to be selected and crossed. -- ```r geno <- read.table("data/geno.txt", header=FALSE) dim(geno) head(geno) names(geno) <- c("chr", "pos", "ref", "alt", paste0("plant", 1:20)) ``` --- # Create an F2 population - To make the F2 population, Tom chose `plant1` and `plant18` as the parents. - `plant1` and `plant18` were selfed five generations to be pure inbreds. ```r ### Just sample 100 markers from this Mt chr set.seed(12579) markers <- sample(1:nrow(geno), size=100) f <- geno[sort(markers), c("chr", "pos", "ref", "alt", "plant1", "plant18")] # select just one haplotype f$plant1 <- gsub("/.*", "", f$plant1) f$plant18 <- gsub("/.*", "", f$plant18) # recoding to use -1,0,1 f[f==0] <- -1 # simulate the recombination rate f$cM <- f$pos/5000 ``` --- # Create an F2 population - To make the F2 population, Tom chose `plant1` and `plant18` as the parents. - `plant1` and `plant18` were selfed five generations to be pure inbreds. ```r install.packages("devtools") library(devtools) install_github("lian0090/simuPoisson") library(simuPoisson) ``` -- Then, simulate an F2 population with 200 individuals and 100 SNP markers. ```r pgeno <- t(f[, c("plant1", "plant18")]) pgeno <- apply(pgeno, 2, as.numeric) f2 <- simuPoisson(pgeno, f$chr, f$cM, 200) f2 <- as.data.frame(f2) names(f2) <- paste0(f$chr, "_", f$pos) write.table(f2, "data/f2_geno.csv", sep=",", quote=FALSE) ``` --- # Compute genotype frq and allele frq ```r f2 <- read.csv("data/f2_geno.csv") table(f2[,1]) ``` Note that in this F2 population, the SNP coding is `-1, 0, 1` for `\(A_1A1\)`, `\(A_1A_2\)`, and `\(A_2A_2\)`. ```r table(f2[,6]) ``` --- # Frequencies #### Observed allele frequency ```r # For A1 allele p <- (52*2+92)/((52+92+56)*2) # For A2 allele q <- (56*2+92)/((52+92+56)*2) ``` -- #### Observed genotype frequency ```r # For A1A1 genotype A1A1 <- 52/(52+92+56) # For A1A2 genotype A1A2 <- 92/(52+92+56) # For A2A2 genotype A2A2 <- 56/(52+92+56) ``` --- # Frequencies | Genotype | Freq. | Value | | :-------: |: ------- :| :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(+a\)` | | `\(A_1A_2\)` | `\(2pq\)` | `\(d\)` | | `\(A_2A_2\)` | `\(q^2\)` | `\(-a\)` | -- ### Predicted genotype frequency ```r p^2 2*p*q q^2 chisq.test(rbind(c(A1A1, A1A2, A2A2), c(p^2, 2*p*q, q^2))) ``` --- # Phenotype <div align="center"> <img src="height.png" height=200> </div> From this F2 population, Tom measured the plant height for each individual plant. Phenotype in a population can be characterized in terms of its __mean__ and __variance__. --- # Phenotype ```r pheno <- read.csv("data/f2_pheno.csv") hist(pheno$height, main="Plant Height", xlab="Value (inch)", breaks=20) ``` -- ### Combine genotype and phenotype files ```r gp <- cbind(pheno, f2) ``` | Genotype | Freq. | Value | | :-------: |: ------- :| :-------: | | `\(A_1A_1\)` | `\(p^2\)` | `\(+a\)` | | `\(A_1A_2\)` | `\(2pq\)` | `\(d\)` | | `\(A_2A_2\)` | `\(q^2\)` | `\(-a\)` | --- # Genotypic value `\(P = G + E\)` `\(G = A + D\)` Let's find out `\(a\)` and `\(d\)` at a specific Marker `Mt_24242`: -- ```r ggplot(gp, aes(x=as.factor(Mt_24242), y=height, color=as.factor(Mt_24242))) + geom_boxplot() + geom_jitter(color="black", size=1, alpha=0.9) + scale_color_manual(values=c("#E69F00", "#56B4E9", "#fe6f5e"))+ labs(title="Mt_24242", y="Plant Height", x = "Genotype")+ theme_classic() + guides(color=FALSE) + theme(plot.title = element_text(size=20, face = "bold"), axis.text=element_text(size=16, face="bold"), strip.text.y = element_text(size = 16, face = "bold"), axis.title=element_text(size=18, face="bold"), ) ``` --- # Genotypic value Let's find out `\(a\)` and `\(d\)` at a specific Marker `Mt_24242`: ```r u <- mean(gp$height) # population mean # A1A1 h1 <- mean(subset(gp, Mt_24242 == -1)$height) # A1A2 h12 <- mean(subset(gp, Mt_24242 == 0)$height) # A2A2 h2 <- mean(subset(gp, Mt_24242 == 1)$height) ``` -- ```r a <- (h2 - h1)/2 midpoint <- h1+a d <- h12 - midpoint ``` --- # Allele Substitution Effect The average effect of A1 and A2: ```r alpha <- a + d*(q - p) alpha1 <- q*alpha alpha2 <- -p*alpha ``` -- ### Breeding value The __Breeding value__ associated with Marker `Mt_24242` is defined as: the sum of `\(\alpha_i\)` and `\(\alpha_j\)`. - Breeding value is the value of an individual as a parent! `\begin{align*} BV_{ij} = \mu + \alpha_i + \alpha_j \end{align*}` ```r bv1 = u+alpha1 + alpha1 bv2 = u+alpha2 + alpha2 bv12 = u+alpha1 + alpha2 ``` --- # Genotypic value and breeding value ```r plot(c(0, 1, 2), c(h1, h12, h2), xlab="Genotype",ylab="", cex.lab=1.5, xaxt="n", pch=17, cex=2, col="red", ylim=c(60, 95)); axis(1, at=c(0, 1, 2), labels=c("A1A1", "A1A2", "A2A2")); mtext("Breeding Value", side = 4, line = 1, cex=1.5, col="blue"); mtext("Genotypic Value", side = 2, line = 2, cex=1.5, col="red") points(c(0, 1, 2), c(bv2, bv12, bv1), cex=2, col="blue") lines(c(0, 1, 2), c(bv2, bv12, bv1), lwd=2, col="blue") ``` --- # Additive and dominance variance P = A + D + E ### Phenotypic variance ```r Vp <- var(gp$height) ``` --- # Additive genetic variance: `\(V_A\)` These breeding values have a mean of zero, and their variance is the sum of the products of the genotype frequencies and the squared breeding values: `\begin{align*} V_A & = p^2(2q\alpha)^2 + 2pq(q-p)^2\alpha^2 + q^2(-2p\alpha)^2 \\ & = 2pq\alpha^2(2pq + (q-p)^2 + 2pq) \\ & = 2pq\alpha^2(p+q)^2 \\ & = 2pq\alpha^2 \\ & = 2pq(a + d(q-p))^2 \\ \end{align*}` -- ```r Va <- 2*p*q*(a + d*(q - p))^2 ``` --- # Dominance genetic variance: `\(V_D\)` The variance due to dominance deviations is: the sum of the products of the genotype frequencies and the squared dominance deviation values. `\begin{align*} V_D & = p^2(-2q^2d)^2 + 2pq(2pqd)^2 + q^2(-2p^2d)^2 \\ & = 4p^2q^2d^2(q^2 + 2pq + p^2) \\ & = 4p^2q^2d^2 \\ & = (2pqd)^2 \\ \end{align*}` -- ```r Vd <- (2*p*q*d)^2 ``` --- # Additive and dominance variance `\(G = A + D\)` ```r Vg <- Va + Vd ``` -- ### `\(H^2\)` and `\(h^2\)` due to this SNP marker ```r h2 <- Va/Vp H2 <- Vg/Vp ```