"The neutral theory asserts that the great majority of evolutionary changes at the molecular level are caused NOT by Darwinian selection but by random drift of selectively neutral or nearly neutral mutants."
Motoo Kimura (木村 資生), 1983
- Iowa State with Jay Lush and then University of Wisconsin with James Crow
"The neutral theory asserts that the great majority of evolutionary changes at the molecular level are caused NOT by Darwinian selection but by random drift of selectively neutral or nearly neutral mutants."
Motoo Kimura (木村 資生), 1983
- Iowa State with Jay Lush and then University of Wisconsin with James Crow
Does NOT claim natural selection is unimportant in evolution
Does NOT claim natural selection is unimportant in evolution
It does NOT deny that most mutations are (slightly) deleterious (it claims most of the variation we see is neutral)
Most of the deleterious mutations have been eliminated
Rare mutations have been fixed
Does NOT claim natural selection is unimportant in evolution
It does NOT deny that most mutations are (slightly) deleterious (it claims most of the variation we see is neutral)
Most of the deleterious mutations have been eliminated
Rare mutations have been fixed
Pr(fix)=1−e−2s1−e−4Nes
set.seed(12347)Ne=20; A1=1; t=4*Nefrq <- wright_fisher(N=Ne, A1=A1, t=t)plot(frq, type="l", ylim=c(0,1), col=3, xlab="Generations", ylab="Freq") for(u in 1:100){ frq <- wright_fisher(N=N, A1=A1, t=t) random <- sample(1:1000,1,replace=F) randomcolor <- colors()[random] lines(frq, type="l", lwd=3, col=(randomcolor)) }
On timescales shorter than those required for mutations to fix, selection will change the mean frequency of alleles in a population.
On timescales shorter than those required for mutations to fix, selection will change the mean frequency of alleles in a population.
For new mutations, the density of polymorphisms found at frequency q, is
f(q)=2μq(1−q)1−e(−4Nes)(1−q)1−e(−4Nes)
Wright, 1969
To find loci that are under selection we test for departures from the neutral theory
f(q)=2μq(1−q)1−e(−4Nes)(1−q)1−e(−4Nes)
# expected freq spectraf <- function(q, ns){ frq = 2/(q*(1-q)) * (1 - exp(-4*ns*(1-q))) / (1 - exp(-4*ns)) return(frq)}q <- seq(from = 0.01, to =0.99, by=0.01)## Ploting functionplot(q, f(q, ns=0.01), type="l", lty=1, lwd=3, xlab="Ns", ylab="No. of polymorhpic sites", cex.lab=2)lines(q, f(q, ns=-50), type="l", lty=1, lwd=3, col="red")lines(q, f(q, ns=-5), type="l", lty=2, lwd=3, col="red")lines(q, f(q, ns=5), type="l", lty=1, lwd=3, col="blue")lines(q, f(q, ns=50), type="l", lty=2, lwd=3, col="blue")legend(0.6, 200, title="Ne*s", legend=c("-50", "5", "0", "-5", "50"), col=c("red", "red", "black", "blue", "blue"), lty=c(1,2,1,1,2), cex=2, lwd=3)
f(q)=2μq(1−q)1−e(−4Nes)(1−q)1−e(−4Nes)
f(q)=2μq(1−q)1−e(−4Nes)(1−q)1−e(−4Nes)
f(q)=2μq(1−q)1−e(−4Nes)(1−q)1−e(−4Nes)
Comparison of expected and observed is uneven
The rare alleles are at lower freq than expected
Evidence of negative selection (or purifying selection)
However, confounded by population demographics (i.e., bottleneck effect)
Comparison of expected and observed is too even
The most common allele is more common than expected
Evidence of positive selection or balancing selection
However, confounded by population demographics (i.e., population expansion)
We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles
We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles
Fu and Li (1993) defined a statistic, ϵ1, based on the number of derived singletons in a sample.
ϵ1=S1
We now consider several statistics summarizing sequencing diversity that use information about the frequency of derived alleles
Fu and Li (1993) defined a statistic, ϵ1, based on the number of derived singletons in a sample.
ϵ1=S1
If we don't know the ancestral status, we can aslo define a statistic, η1, based on all singletons in a sample
η1=S∗1n−1n
A second summary statistic of diversity that uses ancestral state information is θH:
θH=∑n−1i=1i2Sin(n−1)/2
All of these statistics --- ϵ1,η1,θH --- are estimators of θ
All of these statistics --- ϵ1,η1,θH --- are estimators of θ
Specifically,
E(ϵ1)=E(η1)=E(θH)
These relationships arise because we know the expected shape of the allele frequency distribution under our standard neutral assumptions.
Hanh, 2020
After sweep ended, new mutations started to accumulate.
These new mutations are by definition singletons
The SFS can be skewed toward an excess of low-frequency polymorphisms relateive to the neutral spectrum.
Here we consider a simple scenario with a single biallelic site that has been under balancing selection for a long time.
Bitarello et al., 2018
Neutral mutations has accumulated both within and between allelic classes
Overall variation is higher
SNPs at intermediate frequency show a distinctive "bump" in the SFS.
A straightforward way would be test a difference between two SFSs.
A straightforward way would be test a difference between two SFSs.
A straightforward way would be test a difference between two SFSs.
θπ: pairwise necleotide diversity.
θW: Watterson's θ, using total number of segregating sites
A straightforward way would be test a difference between two SFSs.
θπ: pairwise necleotide diversity.
θW: Watterson's θ, using total number of segregating sites
ϵ1=S1: the number of derived singletons in a sample.
A straightforward way would be test a difference between two SFSs.
θπ: pairwise necleotide diversity.
θW: Watterson's θ, using total number of segregating sites
ϵ1=S1: the number of derived singletons in a sample.
Under the standard neutral model, all of these test statistics are expected to have a mean of 0.
Tajima (1989) constructed the first test to detect difference between the SFS.
His statistic, D, was defined as:
D=θπ−θW√Var(θπ−θW)
Tajima (1989) constructed the first test to detect difference between the SFS.
His statistic, D, was defined as:
D=θπ−θW√Var(θπ−θW)
Fu and Li (1993) created similar statistics. These are known as Fu and Li's D, F, D∗, and F∗.
D=θπ−ϵ1√Var(θπ−ϵ1)
F=θW−ϵ1√Var(θW−ϵ1)
D∗=θπ−η1√Var(θπ−η1)
F∗=θW−η1√Var(θW−η1)
Tajima (1989) constructed the first test to detect difference between the SFS.
His statistic, D, was defined as:
D=θπ−θW√Var(θπ−θW)
Originally designed to fit a normal distribution, however, none of these test statistics fit a parametric distribution very well.
Only variable sites at each locus are needed
The number of invariant sites do not figure into any calculations.
Tajima's D, Fu and Li's D,F,D∗,F∗:
D=θπ−θW√Var(θπ−θW)
After a sweep, all SNPs are low in frequency, θπ will be much lower than expected.
While statistics based on counts of segregating sites (like θW) will be much closer to their expected values.
Tajima's D, Fu and Li's D,F,D∗,F∗:
D=θπ−θW√Var(θπ−θW)
After a sweep, all SNPs are low in frequency, θπ will be much lower than expected.
While statistics based on counts of segregating sites (like θW) will be much closer to their expected values.
Tajima's D, Fu and Li's D,F,D∗,F∗:
D=θπ−θW√Var(θπ−θW)
Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.
In such case, θπ will be greater than θW and other statistics.
Tajima's D, Fu and Li's D,F,D∗,F∗:
D=θπ−θW√Var(θπ−θW)
Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.
In such case, θπ will be greater than θW and other statistics.
All negative when there has been a sweep
All positive when there is balancing selection
Tajima's D, Fu and Li's D,F,D∗,F∗:
D=θπ−θW√Var(θπ−θW)
Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site.
In such case, θπ will be greater than θW and other statistics.
All negative when there has been a sweep
All positive when there is balancing selection
Are usually significant when the values >+2 or <−2
The time window for positive selection is limited.
The time window for positive selection is limited.
The time window for positive selection is limited.
Power also determined by the distance between our studied loci and the location of the selected site.
The variable k is defined as the substitution rate of new alleles
The variable k is defined as the substitution rate of new alleles
We define d as the genetic distance between two orthologous sequences.
The contribution of the rate of substitution ( k ) to the expected amount of divergence ( d ) can be seen in the following equation:
E(d)=k2t+θAnc
Where k represents the allele substitution rate.
t is the time since the species split
θAnc: average amount of nucleotide variation expected between two sequences in the ancestor.
The contribution of the rate of substitution ( k ) to the expected amount of divergence ( d ) can be seen in the following equation:
E(d)=k2t+θAnc
Where k represents the allele substitution rate.
t is the time since the species split
θAnc: average amount of nucleotide variation expected between two sequences in the ancestor.
Simplified as below if assuming divergence levels are much greater than the expected levels of polymorphism in the ancestral species,
E(d)=k2t
Two quantities determine the rate of substitution ( k ).
The probability of fixation of any mutation ( u ).
The total number of mutations that arise and can possibly be fixed.
If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.
If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.
New mutations always begin at frequency 12N, therefore,
u0=12N
If a mutation has no effect on fitness, the probability of fixing is equal to its current frequency.
New mutations always begin at frequency 12N, therefore,
u0=12N
For new, advantageous mutations ( s>0 ) and large effective population sizes, the probability of fixation is
ua≈2sa
according to Haldane 1927; Fisher 1930; Wright 1931.
For new, deleterious mutations ( s<0 ) that don't have large effects, the probability of fixation is (Kimura 1957):
ud≈2sd1−e(−4Nsd)
Probability of fixation, relative to a neutral allele, of new, selected mutations:
u/u0≈2s1−e(−4Nes)/12Ne=4Nes1−e(−4Nes)
ns <- seq(from = -1, to =1, by=0.01)plot(ns, 4*ns/(1 - exp(-4*ns)), xlab="Ns", ylab="")abline(v=0, lty=2, lwd=2)
Probability of fixation, relative to a neutral allele, of new, selected mutations:
u/u0≈2s1−e(−4Nes)/12Ne=4Nes1−e(−4Nes)
ns <- seq(from = -1, to =1, by=0.01)plot(ns, 4*ns/(1 - exp(-4*ns)), xlab="Ns", ylab="")abline(v=0, lty=2, lwd=2)
Nes=0, neutral mutations
Nes>0, slightly advantageous mutations are not that much more likely to fix than neutral mutations
Nes<0, slightly deleterious mutations have some probability of fixing
Two quantities determine the rate of substitution ( k ).
u0=12Neua≈2saud≈2sd1−e(−4Nesd)
If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.
If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.
If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.
with f0 representing the fraction of neutral mutations.
The remaining will be advantageous ( fa fraction) and deleterious ( fd fraction).
If the probability of a mutation at a nucleotide in each generation is ν, then in a population of N diploid individuals, there will be 2Nν new mutations per generation at a single site.
with f0 representing the fraction of neutral mutations.
The remaining will be advantageous ( fa fraction) and deleterious ( fd fraction).
If advantageous and deleterious mutations have no contribution, then the substitution rate is a function of only the total number of neutral mutations that arise and the probability that each of them fixes.
k=(2Nνf0)12N=νf0
The rate of substitution for advantageous mutations:
k=(2Neνfa)2sa=4Neνfasa
The rate of substitution for advantageous mutations:
k=(2Neνfa)2sa=4Neνfasa
The rate of substitution for deleterious mutations:
k=(2Neνfd)×2sd1−e(−4Nesd)=4Neνfdsd1−e(−4Nesd)
The rate of substitution for advantageous mutations:
k=(2Neνfa)2sa=4Neνfasa
The rate of substitution for deleterious mutations:
k=(2Neνfd)×2sd1−e(−4Nesd)=4Neνfdsd1−e(−4Nesd)
The effective population size ( Ne ) plays an important role in the rate of substitution of selected mutations.
More advantageous mutations will fix in larger populations than in smaller populations.
More deleterious mutation will fix in smaller populations relative to larger populations.
In coding regions, we measure divergenece that is due to nonsynonymous and synonymous changes.
dN as the number of nonsynonymous difference per nonsynonymous site
dS as the number of synonymous differences per synonymous site
In coding regions, we measure divergenece that is due to nonsynonymous and synonymous changes.
dN as the number of nonsynonymous difference per nonsynonymous site
dS as the number of synonymous differences per synonymous site
Note that natural selection has a profound effect on the number of nonsynonymous mutations that are fixed.
E(dN)=k2t=2t(νf0+4Nνfasa+4Neνfdsd1−e(−4Nesd))=ν2t(f0+4Nefasa+4Nefdsd1−e(−4Nesd))
The total nonsynonymous divergence in a region is due to all three types of mutations, therefore, our expression for dN includes all three terms.
E(dN)=ν2t(f0+4Nfasa+4Nfdsd1−e(−4Nsd))
A higher underlying mutation rate, ν, and longer divergence times, t, will increase the amount of divergence
The proportion of advantageous mutations fixed will be a function of the frequency at which they arise and their average selective effect
The deleterious mutations can also contribute to divergence if selection is weak enough
Here we assume all synonymous changes are neutral.
Here we assume all synonymous changes are neutral.
The total expected amount of synonymous divergence between two sequences is:
E(dS)=ν2t
For neutral mutations, the substitution rate is simply equal to the mutation rate.
Because both ν and t will be approximately the same of nonsynonymous and synonymous sites in the same gene, dividing above equations gives
E(dN)E(dS)=f0+4Nefasa+4Nefdsd1−e(−4Nesd)
Because both ν and t will be approximately the same of nonsynonymous and synonymous sites in the same gene, dividing above equations gives
E(dN)E(dS)=f0+4Nefasa+4Nefdsd1−e(−4Nesd)
Relative to synonymous divergence, the level of nonsynonymous divergence is again due to the fractions of mutations that are neutral, advantageous, and deleterious.
Note that here, f0 represents only the nonsynonymous mutations.
dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.
dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.
dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.
dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.
dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.
dN/dS=1 This situation can occur in two cases:
dN/dS<<1 The vast majority of nonsynonymous mutations are deleterious, and negative (purifying) selection is predominant.
dN/dS<1 The majority of nonsynonymous mutations are deleterious, but here may be some unknown fraction of advantageous mutations.
dN/dS=1 This situation can occur in two cases:
dN/dS>1 There are many advantageous nonsynonymous mutations and positive selection is predominant, but there are still many deleterious mutations.
Within a species, by analogy with the logic of the comparison of dN and dS, we can compare the average number of non-synonymous differences per nonsynoymous site ( πN ) to the average number of synonymous differences per synonymous site ( πS ).
Within a species, by analogy with the logic of the comparison of dN and dS, we can compare the average number of non-synonymous differences per nonsynoymous site ( πN ) to the average number of synonymous differences per synonymous site ( πS ).
Since positive selction will rapidly fix advantageous mutations, these adaptive changes will rarely be found in studies of polymorphism
Instead, balancing selection will result in πN/πS>1
heterozygote advantage (heterosis)
Therefore dN/dS>1 for strong evidence of positive selection
πN/πS>1 is a very strict criterion for detecting balancing selection.
Single sites under very strong selection will never contribute enough to values of πN to push πN/πS greater than 1.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |