class: center, middle, inverse, title-slide .title[ # QTL and GWAS ] .author[ ### Jinliang Yang ] .date[ ### May 3, 2024 ] --- # QTL or Linkage Mapping Linkage mapping uses statistical techniques to localize chromosomal regions that might contain genes contributing to phenotypic variation in a complex trait of interest. #### Steps for QTL mapping include: - 1) Create a segregating population - __mating design__ - 2) Genotype individuals within this population with molecular markers - __genotyping by sequencing__ - 3) Phenotype the individuals - __field experimental design__ - 4) Apply statistical models to associate the markers to the phenotypic variation - __single-marker analysis__ or __interval mapping__ --- # Recombinant inbred lines (RILs) .pull-left[ <div align="center"> <img src="ril.jpg" height=400> </div> > Mauricio et al., 2001 ] -- #### a. Select the parental lines with extreme phenotypes - The density of hairs that occur on a plant leaf -- #### b. Selfing an F1 to form a population of F2 individuals -- #### c. Each F2 is selfed for six additional generations - Forming a population of recombinant inbred lines (RILs) - Each RIL is homozygous for a section of a parental chromosome --- # Simulating an F2 population ```r library(qtl) set.seed(12347) # Five autosomes of cM length 50, 75, 100, 125, 60 L <- c(50, 75, 100, 125, 60) map <- sim.map(L, n.mar=L/5+1, eq.spacing=FALSE, include.x=FALSE) # Simulate a F2 population with 250 individuals pop <- sim.cross(map, type="f2", n.ind=250, model = rbind(c(1,45,1,1),c(5,20,0.5,-0.5))) plot.map(pop) ``` <img src="w15_c3_files/figure-html/unnamed-chunk-1-1.png" width="50%" style="display: block; margin: auto;" /> --- # Simulating a QTL mapping experiment #### Checking phenotypic distribution ```r hist(pop$pheno$phenotype, main="simulated phenotype", breaks=50, xlab="Phenotype", col="#cdb79e") ``` <img src="w15_c3_files/figure-html/unnamed-chunk-2-1.png" width="50%" style="display: block; margin: auto;" /> --- # Single-marker analysis ```r # single-QTL scan by marker regression with the simulated data out.mr <- scanone(pop, method="mr") # plot of marker regression results for chr 4 and 12 plot(out.mr, chr=c(1,2,3,4,5), ylab="LOD Score") ``` <img src="w15_c3_files/figure-html/unnamed-chunk-3-1.png" width="50%" style="display: block; margin: auto;" /> --- # Haley-knott Regression This is a version of interval mapping which is a very good approximation to interval mapping via maximum likelihood. ```r # single-QTL scan using Haley-knott Regression approach out.hk <- scanone(pop, method="hk") # plot of marker regression results for chr 4 and 12 plot(out.hk, chr=c(1,2,3,4,5), ylab="LOD score") ``` <img src="w15_c3_files/figure-html/unnamed-chunk-4-1.png" width="50%" style="display: block; margin: auto;" /> --- # Plot QTL effect ```r # summary of out.mr summary(out.mr, threshold=3) ``` ``` ## chr pos lod ## D1M10 1 42.3 21.7 ## D5M6 5 21.0 4.8 ``` ```r effectplot(pop, mname1="D1M10", main="Chr1") ``` <img src="w15_c3_files/figure-html/unnamed-chunk-5-1.png" width="40%" style="display: block; margin: auto;" /> --- # Plot QTL effect ```r # summary of out.mr summary(out.mr, threshold=3) ``` ``` ## chr pos lod ## D1M10 1 42.3 21.7 ## D5M6 5 21.0 4.8 ``` ```r effectplot(pop, mname1="D5M6", main="Chr5") ``` <img src="w15_c3_files/figure-html/unnamed-chunk-6-1.png" width="40%" style="display: block; margin: auto;" /> --- # QTL recap .pull-left[ <div align="center"> <img src="ril2.png" height=400> </div> > Candela and Hake, 2008 ] .pull-right[ ### Traditional bi-parental QTL: - Newly created recombination events (very few) - Low mapping resolution (Mb level, ~ hundreds of genes) - Limited number of QTLs - Can be used for marker assistant selection (MAS) ] --- # GWAS: A brief overview .pull-left[ <div align="center"> <img src="gwas1.png" height=400> </div> ] .pull-right[ ### GWAS in a nutshell: - Utilize existing populations - Leverage historical recombination events - Achieve high mapping resolution but need high density marker ] --- # Nested association mapping (NAM) <div align="center"> <img src="nam.png" height=450> </div> > Yu et al., 2008 --- # SNP projection for NAM RILs <div align="center"> <img src="snp.png" height=300> </div> - More than 15 years ago - No reference genome available and the genotyping is challenging - It is a cost effective method to identify trait-associated genes in high resolution --- # A diversity panel for GWAS .pull-left[ <div align="center"> <img src="diversity.png" height=300> </div> > Liu et al., 2003 - NSS, SS, TS, popcorn, sweet corn, mixed group ] -- .pull-right[ <div align="center"> <img src="founders.png" height=190> </div> - With diverse genetic backgrounds repsenting different subpopulations - Individuals with known or unknown predigrees or ancestral origins - Representative sampling of alleles across the genome to capture genetic variation ] --- # Population structure and relatedness <div align="center"> <img src="pop.png" height=330> </div> -- __Subpopulations (Q) __: ancestral origins and subgroups due to population subdivision __Genetic relatedness (K)__: the degree of genetic similarity within a subpopulation --- # Mixed linear model for GWAS <div align="center"> <img src="mlm.png" height=330> </div> > Yu et al., 2006 -- - __False Discovery Rate (FDR)__: the proportion of __false positive__ results among all statistical significant associations identified in a GWAS - __GWAS Power__: the ability of a GWAS to detect __true positive__ genetic associations with phenotypic traits --- # Other strategies for association analysis .pull-left[ ### Extreme Phenotype GWAS (or XP-GWAS) <div align="center"> <img src="krn.png" height=330> </div> > J. Yang et al., 2015 ] -- .pull-right[ #### Pros: - Increased power due to variants likely have large effect size - Cost effective for genotyping and no need for additional phenotyping #### Cons: - Limited generalizability: only for a given trait - Collecting large sample size can be challenging - Extreme phenotype selection may introduce bias if differs systematically ] --- # Other strategies for association analysis .pull-left[ ### Mediation Annalysis <div align="center"> <img src="med.png" height=200> </div> > Z. Yang et al., 2022 __dSNP__: directly associated SNP __iSNP__: indirectly associated SNP __mediator__: gene mediating via iSNP ] -- .pull-right[ #### Pros: - Establish a assumed causal chain from SNP to gene expression to phenotype - Mapping down to the gene (mediator) #### Cons: - Population-wide Omics data is required - Currently only powerful for large effect mediators ] --- # Multi-Omics data integration <div align="center"> <img src="multiomics.png" height=400> </div>