Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Population Genomics Module 3

Jinliang Yang

Nov. 3, 2022

1 / 83

Syllabus

Module 1: Introduction and popgen terminology

  • Introduction of population genomics
  • Basic principles of evolutionary processes

Module 2: Diversity measurement

  • Heterozygosity and diversity
  • Population differentiation ( FST )
2 / 83

Syllabus

Module 1: Introduction and popgen terminology

  • Introduction of population genomics
  • Basic principles of evolutionary processes

Module 2: Diversity measurement

  • Heterozygosity and diversity
  • Population differentiation ( FST )

Module 3: Scan for direct and linked selection

  • Direct selection
    • The effects of selection on these mutation themselves
  • Linked selection
    • SNPs themselves have no effect on fitness ( s=0 ) but are affected by selection occurring nearby.
3 / 83

Slides for today's class module

  • Scan QR code to view the HTML slides:
4 / 83

Neutral theory of molecular evolution

"The neutral theory asserts that the great majority of evolutionary changes at the molecular level are caused NOT by Darwinian selection but by random drift of selectively neutral or nearly neutral mutants."

Motoo Kimura (木村 資生), 1983

  • Iowa State with Jay Lush and then University of Wisconsin with James Crow
5 / 83

Neutral theory of molecular evolution

"The neutral theory asserts that the great majority of evolutionary changes at the molecular level are caused NOT by Darwinian selection but by random drift of selectively neutral or nearly neutral mutants."

Motoo Kimura (木村 資生), 1983

  • Iowa State with Jay Lush and then University of Wisconsin with James Crow

Core ideas of neutral theory of molecular evolution:

  • Most mutations are not advantageous

    • Selectively (or effectively) neutral if s<1/2Ne
  • Most changes that are fixed over time are selectively neutral (fixed by drift)

    • Drift rather than selection predominates
6 / 83

Neutral Theory

What the neutral theory does not claim

  • Does NOT claim natural selection is unimportant in evolution

    • In fact, most morphological adaptations are the result of natural selection
7 / 83

Neutral Theory

What the neutral theory does not claim

  • Does NOT claim natural selection is unimportant in evolution

    • In fact, most morphological adaptations are the result of natural selection
  • It does NOT deny that most mutations are (slightly) deleterious (it claims most of the variation we see is neutral)

    • Most of the deleterious mutations have been eliminated

    • Rare mutations have been fixed

8 / 83

Neutral Theory

What the neutral theory does not claim

  • Does NOT claim natural selection is unimportant in evolution

    • In fact, most morphological adaptations are the result of natural selection
  • It does NOT deny that most mutations are (slightly) deleterious (it claims most of the variation we see is neutral)

    • Most of the deleterious mutations have been eliminated

    • Rare mutations have been fixed

Selection counteracts drift

  • s>1/2Ne

Pr(fix)=1e2s1e4Nes

9 / 83
set.seed(12347)
Ne=20; A1=1; t=4*Ne
frq <- wright_fisher(N=Ne, A1=A1, t=t)
plot(frq, type="l", ylim=c(0,1), col=3, xlab="Generations", ylab="Freq")
for(u in 1:100){
frq <- wright_fisher(N=N, A1=A1, t=t)
random <- sample(1:1000,1,replace=F)
randomcolor <- colors()[random]
lines(frq, type="l", lwd=3, col=(randomcolor))
}

10 / 83

Expected allele frequencies distribution

On timescales shorter than those required for mutations to fix, selection will change the mean frequency of alleles in a population.

11 / 83

Expected allele frequencies distribution

On timescales shorter than those required for mutations to fix, selection will change the mean frequency of alleles in a population.

For new mutations, the density of polymorphisms found at frequency q, is

f(q)=2μq(1q)1e(4Nes)(1q)1e(4Nes)

Wright, 1969

  • Where μ is the mutation rate.
  • s is the fitness effect.
    • Advantageous mutations have s>0 and deleterious mutations have s<0
12 / 83

Types of selection

Purifying selection: Ne×s<1

  • Deleterious mutations are eliminated

Positive selection: Ne×s>1

  • Opposite of purifying
  • Favorable mutations are selected

Effectively netural: 1<Ne×s<1

13 / 83

Types of selection

Purifying selection: Ne×s<1

  • Deleterious mutations are eliminated

Positive selection: Ne×s>1

  • Opposite of purifying
  • Favorable mutations are selected

Effectively netural: 1<Ne×s<1

To find loci that are under selection we test for departures from the neutral theory

14 / 83

The expected frequency spectra

f(q)=2μq(1q)1e(4Nes)(1q)1e(4Nes)

# expected freq spectra
f <- function(q, ns){
frq = 2/(q*(1-q)) * (1 - exp(-4*ns*(1-q))) / (1 - exp(-4*ns))
return(frq)}
q <- seq(from = 0.01, to =0.99, by=0.01)
## Ploting function
plot(q, f(q, ns=0.01), type="l", lty=1, lwd=3, xlab="Ns", ylab="No. of polymorhpic sites", cex.lab=2)
lines(q, f(q, ns=-50), type="l", lty=1, lwd=3, col="red")
lines(q, f(q, ns=-5), type="l", lty=2, lwd=3, col="red")
lines(q, f(q, ns=5), type="l", lty=1, lwd=3, col="blue")
lines(q, f(q, ns=50), type="l", lty=2, lwd=3, col="blue")
legend(0.6, 200, title="Ne*s", legend=c("-50", "5", "0", "-5", "50"),
col=c("red", "red", "black", "blue", "blue"),
lty=c(1,2,1,1,2), cex=2, lwd=3)
15 / 83

The expected distribution of f(q)

f(q)=2μq(1q)1e(4Nes)(1q)1e(4Nes)

16 / 83

The expected distribution of f(q)

f(q)=2μq(1q)1e(4Nes)(1q)1e(4Nes)

17 / 83

The expected distribution of f(q)

f(q)=2μq(1q)1e(4Nes)(1q)1e(4Nes)

  • Deleterious alleles => lower frequencies

    • most strongly deleterious mutations are immediately removed from the population
  • Advantage alleles shifted toward higher frequencies

    • most strongly advantageous mutations fix very rapidly.
18 / 83

Signature of negative selection

Site Freq Spectrum (SFS)

19 / 83

Signature of negative selection

Site Freq Spectrum (SFS)

  • Comparison of expected and observed is uneven

  • The rare alleles are at lower freq than expected

  • Evidence of negative selection (or purifying selection)

  • However, confounded by population demographics (i.e., bottleneck effect)

20 / 83

Signature of positive/balancing selection

Site Freq Spectrum (SFS)

21 / 83

Signature of positive/balancing selection

Site Freq Spectrum (SFS)

  • Comparison of expected and observed is too even

  • The most common allele is more common than expected

  • Evidence of positive selection or balancing selection

  • However, confounded by population demographics (i.e., population expansion)

22 / 83

Diversity measurement

We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles

  • As these capture more information about our sequencing data.
23 / 83

Diversity measurement

We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles

  • As these capture more information about our sequencing data.

Fu and Li (1993) defined a statistic, ϵ1, based on the number of derived singletons in a sample.

ϵ1=S1

  • Where S1 is the number of segregating site with derived alleles found on only one haplotype.
24 / 83

Diversity measurement

We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles

  • As these capture more information about our sequencing data.

Fu and Li (1993) defined a statistic, ϵ1, based on the number of derived singletons in a sample.

ϵ1=S1

  • Where S1 is the number of segregating site with derived alleles found on only one haplotype.

If we don't know the ancestral status, we can aslo define a statistic, η1, based on all singletons in a sample

η1=S1n1n

  • Where S1 is all the singletons.
25 / 83

Diversity measurement

A second summary statistic of diversity that uses ancestral state information is θH:

θH=n1i=1i2Sin(n1)/2

  • Where Si is again the number of segregating sites where i haplotypes carry the derived allele (Fay and Wu, 2000).
26 / 83

Summary of the θ statistics

All of these statistics --- ϵ1,η1,θH --- are estimators of θ

  • at mutation-drift equilibrium
  • under an infinite sites mutational model
27 / 83

Summary of the θ statistics

All of these statistics --- ϵ1,η1,θH --- are estimators of θ

  • at mutation-drift equilibrium
  • under an infinite sites mutational model

Specifically,

E(ϵ1)=E(η1)=E(θH)

These relationships arise because we know the expected shape of the allele frequency distribution under our standard neutral assumptions.

28 / 83

Detecting selection using the SFS

The effects of positive selection

Hanh, 2020

  • After sweep ended, new mutations started to accumulate.

  • These new mutations are by definition singletons

    • there is only one origin in the sample with each derived allele.

The SFS can be skewed toward an excess of low-frequency polymorphisms relateive to the neutral spectrum.

29 / 83

Detecting selection using the SFS

The effects of balancing selection

Here we consider a simple scenario with a single biallelic site that has been under balancing selection for a long time.

  • Variation within each allelic class has been able to build up and reach equilibrium

Bitarello et al., 2018

  • Neutral mutations has accumulated both within and between allelic classes

  • Overall variation is higher

  • SNPs at intermediate frequency show a distinctive "bump" in the SFS.

30 / 83

Detecting selection using SFS

A straightforward way would be test a difference between two SFSs.

  • However, linkage among sites means that SNPs at a locus are not independent, which violates the assumptions made by almost all such test.
31 / 83

Detecting selection using SFS

A straightforward way would be test a difference between two SFSs.

  • However, linkage among sites means that SNPs at a locus are not independent, which violates the assumptions made by almost all such test.

Instead, we use θ to detect deviations.

  • θπ: pairwise necleotide diversity.
32 / 83

Detecting selection using SFS

A straightforward way would be test a difference between two SFSs.

  • However, linkage among sites means that SNPs at a locus are not independent, which violates the assumptions made by almost all such test.

Instead, we use θ to detect deviations.

  • θπ: pairwise necleotide diversity.

  • θW: Watterson's θ, using total number of segregating sites

33 / 83

Detecting selection using SFS

A straightforward way would be test a difference between two SFSs.

  • However, linkage among sites means that SNPs at a locus are not independent, which violates the assumptions made by almost all such test.

Instead, we use θ to detect deviations.

  • θπ: pairwise necleotide diversity.

  • θW: Watterson's θ, using total number of segregating sites

  • ϵ1=S1: the number of derived singletons in a sample.

  • η1: based on all singletons in a sample.
34 / 83

Detecting selection using SFS

A straightforward way would be test a difference between two SFSs.

  • However, linkage among sites means that SNPs at a locus are not independent, which violates the assumptions made by almost all such test.

Instead, we use θ to detect deviations.

  • θπ: pairwise necleotide diversity.

  • θW: Watterson's θ, using total number of segregating sites

  • ϵ1=S1: the number of derived singletons in a sample.

  • η1: based on all singletons in a sample.

Under the standard neutral model, all of these test statistics are expected to have a mean of 0.

35 / 83

Tajima's D and related tests

Tajima (1989) constructed the first test to detect difference between the SFS.

His statistic, D, was defined as:

D=θπθWVar(θπθW)

36 / 83

Tajima's D and related tests

Tajima (1989) constructed the first test to detect difference between the SFS.

His statistic, D, was defined as:

D=θπθWVar(θπθW)

Fu and Li (1993) created similar statistics. These are known as Fu and Li's D, F, D, and F.

D=θπϵ1Var(θπϵ1)

F=θWϵ1Var(θWϵ1)

D=θπη1Var(θπη1)

F=θWη1Var(θWη1)

37 / 83

Tajima's D and related tests

Tajima (1989) constructed the first test to detect difference between the SFS.

His statistic, D, was defined as:

D=θπθWVar(θπθW)

Originally designed to fit a normal distribution, however, none of these test statistics fit a parametric distribution very well.

Calculation

  • Only variable sites at each locus are needed

  • The number of invariant sites do not figure into any calculations.

38 / 83

Interpreting values of the test statistics

Tajima's D, Fu and Li's D,F,D,F:

D=θπθWVar(θπθW)

  • After a sweep, all SNPs are low in frequency, θπ will be much lower than expected.

  • While statistics based on counts of segregating sites (like θW) will be much closer to their expected values.

39 / 83

Interpreting values of the test statistics

Tajima's D, Fu and Li's D,F,D,F:

D=θπθWVar(θπθW)

  • After a sweep, all SNPs are low in frequency, θπ will be much lower than expected.

  • While statistics based on counts of segregating sites (like θW) will be much closer to their expected values.


  • All negative when there has been a sweep
40 / 83

Interpreting values of the test statistics

Tajima's D, Fu and Li's D,F,D,F:

D=θπθWVar(θπθW)

Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.

In such case, θπ will be greater than θW and other statistics.

41 / 83

Interpreting values of the test statistics

Tajima's D, Fu and Li's D,F,D,F:

D=θπθWVar(θπθW)

Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.

In such case, θπ will be greater than θW and other statistics.


  • All negative when there has been a sweep

  • All positive when there is balancing selection

42 / 83

Interpreting values of the test statistics

Tajima's D, Fu and Li's D,F,D,F:

D=θπθWVar(θπθW)

Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.

In such case, θπ will be greater than θW and other statistics.


  • All negative when there has been a sweep

  • All positive when there is balancing selection

  • Are usually significant when the values >+2 or <2

    • The exact thresholds depend on sample size, number of SNPs, etc.
43 / 83

The power of the SFS

The time window for positive selection is limited.

  • Too early during the sweep

    • Signal will be not strong enough
44 / 83

The power of the SFS

The time window for positive selection is limited.

  • Too early during the sweep

    • Signal will be not strong enough
  • Too late after the sweep

    • Both levels and frequencies of variants will have returned to normal
45 / 83

The power of the SFS

The time window for positive selection is limited.

  • Too early during the sweep

    • Signal will be not strong enough
  • Too late after the sweep

    • Both levels and frequencies of variants will have returned to normal

Power also determined by the distance between our studied loci and the location of the selected site.

  • Because of the effect of the recombination.
  • Move far away enough and there will be no signal of selection at all.
46 / 83

Direct Selection

47 / 83

The accumulation of sequence divergence

Necleotide substituion rate ( k )

The variable k is defined as the substitution rate of new alleles

  • The rate of alleles that are fixed over long periods of time.
  • It determines how quickly two squences are expected to diverge over time.
48 / 83

The accumulation of sequence divergence

Necleotide substituion rate ( k )

The variable k is defined as the substitution rate of new alleles

  • The rate of alleles that are fixed over long periods of time.
  • It determines how quickly two squences are expected to diverge over time.

Sequence divergence ( d )

We define d as the genetic distance between two orthologous sequences.

  • We generally calculate d by taking a single sequence from each species and counting the number of positions that differ between them, divided by the total number of aligned necleotides.
49 / 83

The accumulation of sequence divergence

The contribution of the rate of substitution ( k ) to the expected amount of divergence ( d ) can be seen in the following equation:

E(d)=k2t+θAnc

  • Where k represents the allele substitution rate.

  • t is the time since the species split

    • We use 2t because substitutions can occur on both branches of the phylogenetic tree.
  • θAnc: average amount of nucleotide variation expected between two sequences in the ancestor.

    • Because at the time of speciation there differences have already accumulated along the two linages.
50 / 83

The accumulation of sequence divergence

The contribution of the rate of substitution ( k ) to the expected amount of divergence ( d ) can be seen in the following equation:

E(d)=k2t+θAnc

  • Where k represents the allele substitution rate.

  • t is the time since the species split

    • We use 2t because substitutions can occur on both branches of the phylogenetic tree.
  • θAnc: average amount of nucleotide variation expected between two sequences in the ancestor.

    • Because at the time of speciation there differences have already accumulated along the two linages.

Simplified as below if assuming divergence levels are much greater than the expected levels of polymorphism in the ancestral species,

E(d)=k2t

51 / 83

What affects k?

Two quantities determine the rate of substitution ( k ).

  • The probability of fixation of any mutation ( u ).

  • The total number of mutations that arise and can possibly be fixed.

52 / 83

Fixation rate of new mutation

Neutral mutation ( u0 )

If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.

53 / 83

Fixation rate of new mutation

Neutral mutation ( u0 )

If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.

New mutations always begin at frequency 12N, therefore,

u0=12N

54 / 83

Fixation rate of new mutation

Neutral mutation ( u0 )

If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.

New mutations always begin at frequency 12N, therefore,

u0=12N

Advantageous mutations ( ua )

For new, advantageous mutations ( s>0 ) and large effective population sizes, the probability of fixation is

ua2sa

according to Haldane 1927; Fisher 1930; Wright 1931.

  • sa is the selective advantage of the new allele in a heterozygote and 2sa in a homozygote.
55 / 83

Fixation rate of new mutation

Deleterious mutations ( ud )

For new, deleterious mutations ( s<0 ) that don't have large effects, the probability of fixation is (Kimura 1957):

ud2sd1e(4Nsd)

  • Here sd is the deleterious effect of the new allele in a heterozygote and 2sd is the effect in a homozygote.
56 / 83

Fixation rate of new mutation

Probability of fixation, relative to a neutral allele, of new, selected mutations:

u/u02s1e(4Nes)/12Ne=4Nes1e(4Nes)

ns <- seq(from = -1, to =1, by=0.01)
plot(ns, 4*ns/(1 - exp(-4*ns)), xlab="Ns", ylab="")
abline(v=0, lty=2, lwd=2)

57 / 83

Fixation rate of new mutation

Probability of fixation, relative to a neutral allele, of new, selected mutations:

u/u02s1e(4Nes)/12Ne=4Nes1e(4Nes)

ns <- seq(from = -1, to =1, by=0.01)
plot(ns, 4*ns/(1 - exp(-4*ns)), xlab="Ns", ylab="")
abline(v=0, lty=2, lwd=2)

  • Nes=0, neutral mutations

  • Nes>0, slightly advantageous mutations are not that much more likely to fix than neutral mutations

  • Nes<0, slightly deleterious mutations have some probability of fixing

58 / 83

What affects k?

Two quantities determine the rate of substitution ( k ).

The probability of fixation of any mutation ( u ).

u0=12Neua2saud2sd1e(4Nesd)

The total number of mutations that arise and can possibly be fixed.

59 / 83

The total number of mutations

If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.

60 / 83

The total number of mutations

If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.

  • with f0 representing the fraction of neutral mutations.
    • 2Nνf0 will be neutral
61 / 83

The total number of mutations

If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.

  • with f0 representing the fraction of neutral mutations.

    • 2Nνf0 will be neutral
  • The remaining will be advantageous ( fa fraction) and deleterious ( fd fraction).

    • 2Nνfa new advantageous mutations
    • 2Nνfd new deleterious mutations
62 / 83

The total number of mutations

If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.

  • with f0 representing the fraction of neutral mutations.

    • 2Nνf0 will be neutral
  • The remaining will be advantageous ( fa fraction) and deleterious ( fd fraction).

    • 2Nνfa new advantageous mutations
    • 2Nνfd new deleterious mutations

If advantageous and deleterious mutations have no contribution, then the substitution rate is a function of only the total number of neutral mutations that arise and the probability that each of them fixes.

k=(2Nνf0)12N=νf0

63 / 83

Advantageous and deleterious mutations

The rate of substitution for advantageous mutations:

k=(2Neνfa)2sa=4Neνfasa

64 / 83

Advantageous and deleterious mutations

The rate of substitution for advantageous mutations:

k=(2Neνfa)2sa=4Neνfasa

The rate of substitution for deleterious mutations:

k=(2Neνfd)×2sd1e(4Nesd)=4Neνfdsd1e(4Nesd)

65 / 83

Advantageous and deleterious mutations

The rate of substitution for advantageous mutations:

k=(2Neνfa)2sa=4Neνfasa

The rate of substitution for deleterious mutations:

k=(2Neνfd)×2sd1e(4Nesd)=4Neνfdsd1e(4Nesd)


  • The effective population size ( Ne ) plays an important role in the rate of substitution of selected mutations.

  • More advantageous mutations will fix in larger populations than in smaller populations.

  • More deleterious mutation will fix in smaller populations relative to larger populations.

66 / 83

Detecting direct selection using divergence

In coding regions, we measure divergenece that is due to nonsynonymous and synonymous changes.

  • dN as the number of nonsynonymous difference per nonsynonymous site

  • dS as the number of synonymous differences per synonymous site

67 / 83

Detecting direct selection using divergence

In coding regions, we measure divergenece that is due to nonsynonymous and synonymous changes.

  • dN as the number of nonsynonymous difference per nonsynonymous site

  • dS as the number of synonymous differences per synonymous site

Note that natural selection has a profound effect on the number of nonsynonymous mutations that are fixed.

E(dN)=k2t=2t(νf0+4Nνfasa+4Neνfdsd1e(4Nesd))=ν2t(f0+4Nefasa+4Nefdsd1e(4Nesd))

The total nonsynonymous divergence in a region is due to all three types of mutations, therefore, our expression for dN includes all three terms.

68 / 83

Detecting selection using divergence

E(dN)=ν2t(f0+4Nfasa+4Nfdsd1e(4Nsd))

  • A higher underlying mutation rate, ν, and longer divergence times, t, will increase the amount of divergence

  • The proportion of advantageous mutations fixed will be a function of the frequency at which they arise and their average selective effect

  • The deleterious mutations can also contribute to divergence if selection is weak enough

69 / 83

Synonymous mutations

Here we assume all synonymous changes are neutral.

  • That is, f0=1 and fa=fd=0
70 / 83

Synonymous mutations

Here we assume all synonymous changes are neutral.

  • That is, f0=1 and fa=fd=0

The total expected amount of synonymous divergence between two sequences is:

E(dS)=ν2t

For neutral mutations, the substitution rate is simply equal to the mutation rate.

71 / 83

The ratio of nonsynonymous to synonymous divergence

Because both ν and t will be approximately the same of nonsynonymous and synonymous sites in the same gene, dividing above equations gives

E(dN)E(dS)=f0+4Nefasa+4Nefdsd1e(4Nesd)

72 / 83

The ratio of nonsynonymous to synonymous divergence

Because both ν and t will be approximately the same of nonsynonymous and synonymous sites in the same gene, dividing above equations gives

E(dN)E(dS)=f0+4Nefasa+4Nefdsd1e(4Nesd)

  • Relative to synonymous divergence, the level of nonsynonymous divergence is again due to the fractions of mutations that are neutral, advantageous, and deleterious.

  • Note that here, f0 represents only the nonsynonymous mutations.

73 / 83

Some general guidelines

dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.

74 / 83

Some general guidelines

dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.

dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.

75 / 83

Some general guidelines

dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.

dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.

dN/dS=1 This situation can occur in two cases:

  • First, there is no selection and all nonsynoymous mutations are neutral.
  • Second, there is simply a large number of neutral and advantageous mutations (as well as deleterious mutations).
76 / 83

Some general guidelines

dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.

dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.

dN/dS=1 This situation can occur in two cases:

  • First, there is no selection and all nonsynoymous mutations are neutral.
  • Second, there is simply a large number of neutral and advantageous mutations (as well as deleterious mutations).

dN/dS>1 There are many advantageous nonsynonymous mutations and positive selection is predominant, but there are still many deleterious mutations.

77 / 83

πN/πS

Within a species, by analogy with the logic of the comparison of dN and dS, we can compare the average number of non-synonymous differences per nonsynoymous site ( πN ) to the average number of synonymous differences per synonymous site ( πS ).

  • Combining the methods for calculating π
  • With the methods for calculating nonsynonymous and synonymous changes.
78 / 83

πN/πS

Within a species, by analogy with the logic of the comparison of dN and dS, we can compare the average number of non-synonymous differences per nonsynoymous site ( πN ) to the average number of synonymous differences per synonymous site ( πS ).

  • Combining the methods for calculating π
  • With the methods for calculating nonsynonymous and synonymous changes.

Interpretation of the ratio

  • Values of πN/πS below 1 are again evidence for the predominance of purifying selection, and the vast majority of all coding loci show πN/πS<1
  • However, interpretation of πN/πS>1 is different.
79 / 83

πN/πS

Interpretation of the ratio

  • Since positive selction will rapidly fix advantageous mutations, these adaptive changes will rarely be found in studies of polymorphism

  • Instead, balancing selection will result in πN/πS>1

    • heterozygote advantage (heterosis)

    • Therefore dN/dS>1 for strong evidence of positive selection

    • πN/πS>1 is a very strict criterion for detecting balancing selection.

    • Single sites under very strong selection will never contribute enough to values of πN to push πN/πS greater than 1.

80 / 83

The tb1 gene example

81 / 83

Join my lab as a PhD or a PostDoc

Who we are looing for

  • A master or PhD in plant genetics, genomics, or related field
  • Be familiar with at least one coding language, R or python
  • Can work independently and also a team player
  • Have excellent communication skills
82 / 83

Join my lab as a PhD or a PostDoc

Who we are looing for

  • A master or PhD in plant genetics, genomics, or related field
  • Be familiar with at least one coding language, R or python
  • Can work independently and also a team player
  • Have excellent communication skills

What we can provide

  • A stimulating and supportive international research environment
  • Access to state-of-the art research infrastructure
  • Living in a lovely city with a low cost and high life quality

Learn more about us

https://jyanglab.com

83 / 83

Syllabus

Module 1: Introduction and popgen terminology

  • Introduction of population genomics
  • Basic principles of evolutionary processes

Module 2: Diversity measurement

  • Heterozygosity and diversity
  • Population differentiation ( FST )
2 / 83
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow