+ - 0:00:00
Notes for current slide
Notes for next slide

GWAS in practice

Jinliang Yang

April 26, 2022

1 / 12

GWAS practice using rice data

Zhao et al., 2011 Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa

Materials and methods

  • Materials: 413 diverse accessions (inbreds) of rice from 82 countries
  • Genotype: 44,100 SNPs using 44 K SNP array
  • Phenotype: One location in two years. Two reps per year in a randomized complete block design.
    • 34 traits
2 / 12

GWAS practice using rice data

Zhao et al., 2011 Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa

Materials and methods

  • Materials: 413 diverse accessions (inbreds) of rice from 82 countries
  • Genotype: 44,100 SNPs using 44 K SNP array
  • Phenotype: One location in two years. Two reps per year in a randomized complete block design.
    • 34 traits

Five sub-populations

  • Indica
  • Aus
  • Temperate Japonica
  • Tropical Japonica
  • Aromatic
3 / 12

Geographic distribution of germplasm

### read in the data from step one
df <- read.csv("http://ricediversity.org/data/sets/44kgwas/RiceDiversity.44K.germplasm.csv", skip=1)
df$Latitude <- as.numeric(as.character(df$Latitude))
range(df$Latitude, na.rm = T)
## [1] -38.4161 55.7500
df$Longitude <- as.numeric(as.character(df$Longitude))
range(df$Longitude, na.rm = T)
## [1] -102.5528 179.4144
4 / 12

Geographic distribution of germplasm

library(ggmap)
##lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
myloc <- c(-105, -40, 170, 56)
mymap <- get_map(location=myloc, source="stamen", crop=FALSE, color="bw")
ggmap(mymap) +
geom_point(aes(x = Longitude, y = Latitude), data = df,
alpha = .9, size = 1, col="red")
5 / 12

Geographic distribution of germplasm

Color code the subpopulatios

table(df$Sub.population)
##lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
myloc <- c(-105, -40, 170, 56)
mymap <- get_map(location=myloc, source="stamen", crop=FALSE, color="bw")
ggmap(mymap) +
geom_point(aes(x = Longitude, y = Latitude, color=Sub.population), data = df,
alpha = .9, size = 2)
6 / 12

Geographic distribution of germplasm

Weight the size of the dots

Count the number of the accessions for each country.

library(plyr)
c <- ddply(df, .(Country.of.origin), nrow)
c <- subset(c, Country.of.origin != "")
df2 <- merge(c, df[, c("Country.of.origin", "Latitude", "Longitude")], by="Country.of.origin")
df2 <- df2[!duplicated(df2$Country.of.origin), ]
mymap <- get_map(location=myloc, source="stamen", crop=FALSE, color="bw")
ggmap(mymap) +
geom_point(aes(x = Longitude, y = Latitude, color=Country.of.origin), data = df2,
alpha = .9, size = df2$V1/3) +
theme(legend.position = "none")
head(df2[order(df2$V1, decreasing = T),])
7 / 12

Genotypic data

  • data/RiceDiversity_44K_Genotypes_PLINK/
    • sativas413.fam
    • sativas413.map
    • sativas413.ped
  • Or download data from: zipped data

.fam: A text file with no header line, and one line per sample with the following six fields:

  • Family ID ('FID')
  • Within-family ID ('IID'; cannot be '0')
  • Within-family ID of father ('0' if father isn't in dataset)
  • Within-family ID of mother ('0' if mother isn't in dataset)
  • Sex code ('1' = male, '2' = female, '0' = unknown)
  • Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
8 / 12

Genotypic data

  • data/RiceDiversity_44K_Genotypes_PLINK/
    • sativas413.fam
    • sativas413.map
    • sativas413.ped
  • Or download data from: zipped data

.map: A text file with no header file, and one line per variant with the following 3-4 fields:

  • Chromosome code. PLINK 1.9 also permits contig names here, but most older programs do not.
  • Variant identifier
  • Position in morgans or centimorgans (optional; also safe to use dummy value of '0')
  • Base-pair coordinate
9 / 12

Genotypic data

  • data/RiceDiversity_44K_Genotypes_PLINK/
    • sativas413.fam
    • sativas413.map
    • sativas413.ped
  • Or download data from: zipped data

.ped: Contains no header line, and one line per sample with 6+2V fields where V is the number of variants.

  • The first six fields are the same as those in a .fam file.
  • The seventh and eighth fields are allele calls for the first variant in the .map file ('0' = no call);
  • the 9th and 10th are allele calls for the second variant; and so on.
10 / 12

Genotypic data manipulation

cp -r data/RiceDiversity_44K_Genotypes_PLINK largedata/
module load plink/1.90
# convert it to binary file
cd largedata/RiceDiversity_44K_Genotypes_PLINK
plink --file sativas413 --make-bed --out binary_sativas413

Calculate MAF and missingness

plink v1.9

  • --freq: writes a minor allele frequency report to plink.frq
  • --missing: produces sample-based and variant-based missing data reports.
plink -bfile binary_sativas413 --freq --missing --out sativas413
# copy results back to cache folder!
cp largedata/RiceDiversity_44K_Genotypes_PLINK/sativas413.frq cache/
cp largedata/RiceDiversity_44K_Genotypes_PLINK/sativas413.lmiss cache/
11 / 12

Visualize MAF and locus missing rate

# install.packages("data.table")
library("data.table")
maf <- fread("cache/sativas413.frq", header=TRUE)
lmiss <- fread("cache/sativas413.lmiss", header=TRUE)
pdf("graphs/maf_lmiss.pdf", width = 10, height=5)
par(mfrow=c(1,2))
hist(maf$MAF, breaks=50, col="#cdb79e", main="MAF (SNP = 36,901)", xlab="Minor Allele Freq")
#abline(v=0.01, lty=2, col="black", lwd=3)
abline(v=0.05, lty=2, col="red", lwd=3)
hist(lmiss$F_MISS, breaks=35, col="#cdb79e", main="Missingness (SNP = 36,901)", xlab="Missing Rate")
#abline(v=0.6, lty=2, col="red", lwd=3)
#abline(v=0.05, lty=2, col="red", lwd=3)
dev.off()
12 / 12

GWAS practice using rice data

Zhao et al., 2011 Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa

Materials and methods

  • Materials: 413 diverse accessions (inbreds) of rice from 82 countries
  • Genotype: 44,100 SNPs using 44 K SNP array
  • Phenotype: One location in two years. Two reps per year in a randomized complete block design.
    • 34 traits
2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow