Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Genome-wide association study

Jinliang Yang

May 6, 2024

1 / 32

GWAS --- 15 years and on

The very first GWAS was published in 2005 about age-related macular degeneration.

Klein, et al., 2005, Science

2 / 32

GWAS --- 15 years and on

The very first GWAS was published in 2005 about age-related macular degeneration.

Klein, et al., 2005, Science

3 / 32

GWAS --- 15 years and on

  • More than 4,300 papers have reported on 4,500 GWAS studies

  • Over 55,000 unique loci for nearly 5,000 diseases and traits

4 / 32

GWAS --- 15 years and on

  • More than 4,300 papers have reported on 4,500 GWAS studies

  • Over 55,000 unique loci for nearly 5,000 diseases and traits

User-friendly data portals to query GWAS results

  • GWAS catalog: a searchable database of SNP-trait association

  • PhenoScanner: a curated database holding publicly available GWAS results

  • GTex (Genotype-Tissue Expression) eQTL Browser: is a resource to study human gene expression and regulation and its relationship to genetic variation

  • ENCODE: Encyclopedia of DNA elements, including elements that act at the protein and RNA levels, and regulatory elements.

Loos, 2020, Nature Communications

5 / 32

The driving forces for GWAS

  • The decreasing cost of genome-wide genotyping
    • Now >20 times less expensive than 15 years ago
6 / 32

The driving forces for GWAS

  • The decreasing cost of genome-wide genotyping

    • Now >20 times less expensive than 15 years ago
  • The number of variants tested has increased

    • In human study, from ~500k variants in the early days to nearly 10 million in the lastest GWAS
7 / 32

The driving forces for GWAS

  • The decreasing cost of genome-wide genotyping

    • Now >20 times less expensive than 15 years ago
  • The number of variants tested has increased

    • In human study, from ~500k variants in the early days to nearly 10 million in the lastest GWAS
  • More refined phenotypes

    • Such as imaging-derived traits and multi-Omics outcomes (i.e., gene expression as a trait for GWAS)
8 / 32

The driving forces for GWAS

  • The decreasing cost of genome-wide genotyping

    • Now >20 times less expensive than 15 years ago
  • The number of variants tested has increased

    • In human study, from ~500k variants in the early days to nearly 10 million in the lastest GWAS
  • More refined phenotypes

    • Such as imaging-derived traits and multi-Omics outcomes (i.e., gene expression as a trait for GWAS)
  • Advanced statistical analyses and sophisticated modeling

    • Multivariate GWAS to identify loci that affect multiple traits simultaneously
    • Integrate intermediate Omics data to conduct causal inference

      Z. Yang et al., 2022

9 / 32

Validation of GWAS results

Identifying GWAS loci is only the first step of a long journey

10 / 32

Validation of GWAS results

Identifying GWAS loci is only the first step of a long journey

Translation of genetic loci into new biological insights

  • Integrate multi-Omcis data

  • Targeted molecular experiments is critical to establish the role of the prioritized genes.

11 / 32

Validation of GWAS results

Identifying GWAS loci is only the first step of a long journey

Translation of genetic loci into new biological insights

  • Integrate multi-Omcis data

  • Targeted molecular experiments is critical to establish the role of the prioritized genes.

Implement the knowledge into breeding practice

  • Marker assistant selection (using large effect markers only)

  • Genomic selection (using all genome-wide markers)

12 / 32

Steps for conducting GWAS

Uffelmann et al., 2021

13 / 32

Testing for associations

A quantitative trait is sometimes controlled jointly by

  • major QTLs with large effects
  • minor QTLs with small effects
14 / 32

Testing for associations

A quantitative trait is sometimes controlled jointly by

  • major QTLs with large effects
  • minor QTLs with small effects

Prevent Shrinkage

If markers that correspond to the major QTLs are known

  • Then these markers can be treated as having fixed effects

    • It will prevents shrinkage of their estimates
  • The remaining markers can be treated as having random effects

    • Their effects can still be estimated through RR-BLUP or other approaches.
15 / 32

Linear Mixed Model (LMM)

y=Xb+Zm+e

V(m)=IVMi=I(VG/nM)

16 / 32

Linear Mixed Model (LMM)

y=Xb+Zm+e

V(m)=IVMi=I(VG/nM)

Fit a marker as a fixed effect

With some modification of the above LMM model, a mixed-model approach can be used for association mapping:

y=Xb+wimi+Zm+e

  • Where mi is the fixed effect due to the ith SNP marker
  • wi is an incidence vector for the SNP marker
17 / 32

G Model

y=Xb+wimi+Zm+e

  • Where mi is the fixed effect due to the ith SNP marker
  • wi is an incidence vector for the SNP marker

This G model utilizes marker effects to account for variation due to QTL found on the background chromosomes.

Bernardo, 2013

  • Like the QTL composite interval mapping approach

  • The disadvantage of this type of approach is the uncertainty in how many background markers should be included.

    • If too few, the background variation will be underestimated

    • If too many, overfitting the model.

18 / 32

K model for GWAS

y=Xb+wimi+Zu+e

In this LMM, the covariance matrix of u becomes equal to AVA

  • Where A is the additive relationship matrix, or kinship ( K ) matrix

  • And VA is the portion of the additive variance that is not accounted for by mi

19 / 32

K model for GWAS

y=Xb+wimi+Zu+e

In this LMM, the covariance matrix of u becomes equal to AVA

  • Where A is the additive relationship matrix, or kinship ( K ) matrix

  • And VA is the portion of the additive variance that is not accounted for by mi

In practice, VA will need to be estimated by an iteractive procedure.

20 / 32

K model for GWAS

y=Xb+wimi+Zu+e

In this LMM, the covariance matrix of u becomes equal to AVA

  • Where A is the additive relationship matrix, or kinship ( K ) matrix

  • And VA is the portion of the additive variance that is not accounted for by mi

In practice, VA will need to be estimated by an iteractive procedure.

Multiple marker model

With multiple SNP markers

  • wi => an incidence matrix W

  • mi => a vector of m

21 / 32

Multiple marker GWAS model

y=Xb+Wm+Zu+e

A two-step approach

  • First, a single marker is included at a time

    • The significance of individual marker effects is then tested by z-tests
22 / 32

Multiple marker GWAS model

y=Xb+Wm+Zu+e

A two-step approach

  • First, a single marker is included at a time

    • The significance of individual marker effects is then tested by z-tests
  • Second, the markers found significant in the single-marker analyses are included in a multiple-marker model.

    • A standard model-selection procedure, such as backward elimination, maybe used to determine which markers should be incorporated in the final multiple-marker model.
23 / 32

Multiple sub-populations

In breeding context, we define each germplasm group or heterotic pattern as a subpopulation of the larger pool of inbred, hybrids, or clones.

  • In maize, dent (Iowa Stiff Stalk Synthetic, BSSS) and flint (non-BSSS)

  • Barley inbreds comprise six-row and two-row types

24 / 32

Multiple sub-populations

In breeding context, we define each germplasm group or heterotic pattern as a subpopulation of the larger pool of inbred, hybrids, or clones.

  • In maize, dent (Iowa Stiff Stalk Synthetic, BSSS) and flint (non-BSSS)

  • Barley inbreds comprise six-row and two-row types

Population structure

  • Separate analysis: One-subpopulation-at-a-time approach

  • Joint analysis to account for the differences between the subpopulations

25 / 32

QK model for multiple sub-populations

K model

y=Xb+wimi+Zu+e

26 / 32

QK model for multiple sub-populations

K model

y=Xb+wimi+Zu+e

QK model

y=Xb+Qv+wimi+Zu+e

  • In this model, the effects due to different sub-populations are captured by Qv

  • The relatedness among lines within each sub-population is specified by the covariance matrix of u.

27 / 32

Use PCA method to construct Q matrix

Price et al., 2006

Proposed a method to use principal component analysis (PCA) of marker-allele frequencies and the use of PCA scores as the Qv matrix.

28 / 32

Use PCA method to construct Q matrix

Price et al., 2006

Proposed a method to use principal component analysis (PCA) of marker-allele frequencies and the use of PCA scores as the Qv matrix.

What is in the Q matrix:

  1. The columns in Q correspond to different PCA axes

  2. The rows in Q correspond to PCA scores of the lines in y

29 / 32

Use PCA method to construct Q matrix

Price et al., 2006

Proposed a method to use principal component analysis (PCA) of marker-allele frequencies and the use of PCA scores as the Qv matrix.

What is in the Q matrix:

  1. The columns in Q correspond to different PCA axes

  2. The rows in Q correspond to PCA scores of the lines in y

How many PCs?

  • The first PC captures the largest amount of variation, and the 2nd captures the second-largest amount of variation, and so on.

  • No fixed rule, but knowing the number of germplasm groups will help.

30 / 32

GWAS methods summary

K model

Account for relatedness using either pedigree records or marker data.

  • Mainly A matrix (only considering additive relationship)

  • AD matrix might be better

QK model

Account for effects of subpopulations and the relatedness within each subpopulation

  • K matrix estimated from marker better than from pedigres

    Stich et al., 2008; Yu et al., 2006

31 / 32

GWAS methods summary

K model

Account for relatedness using either pedigree records or marker data.

  • Mainly A matrix (only considering additive relationship)

  • AD matrix might be better

QK model

Account for effects of subpopulations and the relatedness within each subpopulation

  • K matrix estimated from marker better than from pedigres

    Stich et al., 2008; Yu et al., 2006


G model or QG model

Utilizes RR-BLUP marker effects to account for variation due to QTL found on the background chromosomes.

32 / 32

GWAS --- 15 years and on

The very first GWAS was published in 2005 about age-related macular degeneration.

Klein, et al., 2005, Science

2 / 32
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow