class: center, middle, inverse, title-slide # Linked Selection ### Jinliang Yang ### Feb. 22nd, 2022 --- # Linked selection SNPs themselves have no effect on fitness ( `\(s=0\)` ) but are affected by selection occurring nearby. -- ## The effect of direct selection on linked loci - It can either __raise__ or __lower__ the linked neutral diversity - Understand levels of neutral variation will go up or down will help to gain intuition into the modes of selection -- - Positive selection - Negative selection - Balancing selection --- ## Positive selection on linked neutral variation Positive selection will lower levels of diversity in the nearby region! => __selective sweep__ -- ### Phases of selective sweep: .pull-left[ <div align="center"> <img src="fig8.1a.png" height=100> </div> ] .pull-right[ An advantageous mutation arises (shown as a start) ] --- ## Positive selection on linked neutral variation Positive selection will lower levels of diversity in the nearby region! => __selective sweep__ ### Phases of selective sweep: .pull-left[ <div align="center"> <img src="fig8.1b.png" height=110> </div> ] .pull-right[ The advantageous allele rises in frequency ] <br> - During this process all completely linked variation is swept aside, or __hitchhiking__ effect. - The hitchhikers (the neutral polymorphisms that happen to reside on the lucky haplotype) will aslo rise in frequency. --- ## Positive selection on linked neutral variation Positive selection will lower levels of diversity in the nearby region! => __selective sweep__ ### Phases of selective sweep: .pull-left[ <div align="center"> <img src="fig8.1c.png" height=120> </div> ] .pull-right[ The advantageous allele has fixed. ] <br> - After completely fixation, there would be almost no variation. - The longer a mutation takes to fix, the more time there is for associated polymorphisms to accumulate. --- # Factors affect the reduction in diversity <div align="center"> <img src="fig8.1.png" height=120> </div> - The strength of selection ( `\(s\)` ) or fitness effect - The rate of recombination ( `\(c\)` ) - The time ( `\(T\)` ) since the sweep ended -- <div align="center"> <img src="fig8.2.jpg" height=200> </div> Seletive sweeps generate __a valley in levels of polymorphism__ surrounding the location of the advantageous allele. --- # Factors affect the reduction in diversity - The strength of selection ( `\(s\)` ) or fitness effect - The time ( `\(T\)` ) since the sweep ended `\begin{align*} T_{fix} \approx \frac{2 \ln{(2N_e)}}{s} \end{align*}` - The strength of selection, together with the population size, determines how quickly the advantageous allele fixes. (Nei 1973) - The more quick it fixes, the lower the level of diversity and the deeper the valley. -- .pull-left[ <div align="center"> <img src="fig8.2.jpg" height=200> </div> (Macpherson et al. 2007) ] .pull-right[ Sweeps 1 and 2 had a similar selective advantages, but sweep2 ended longer ago than sweep1. ] --- # Factors affect the reduction in diversity - The strength of selection ( `\(s\)` ) or fitness effect - The rate of recombination ( `\(c\)` ) - The recombination will allow nearby neutral SNPs to escape the hitchhiking effect. - __Negatively affect the width__ of the swept region. -- The size of the region will be determined by (Barton 1998): `\begin{align*} s/c \end{align*}` -- .pull-left[ <div align="center"> <img src="fig8.2.jpg" height=200> </div> (Macpherson et al. 2007) ] .pull-right[ <br> - Sweeps 1 and 3 ended at the same time - but sweep3 had a smaller `\(s\)`. ] --- # Types of sweeps Not all sweeps have _completely fixed_ the advantageous allele. <div align="center"> <img src="fig8.1b.png" height=110> </div> ### Partial sweeps __Partial sweep__ is sampled on its way to fixation. - or because a change in conditions has altered the selection coefficient associated with the relevant allele. --- # Types of sweeps Not all sweeps are the result of _a single advantageous mutation_ arising once and fixing. <div align="center"> <img src="soft.png" height=110> </div> ### Hard and soft Sweeps The scenario in which the adaptive allele has __only a single origin__ is called __hard sweep__. - A hard sweep is when a single new mutation arises and is immediately favored by selection - Drags along a single haplotype --- # Types of sweeps Not all sweeps are the result of _a single advantageous mutation_ arising once and fixing. <div align="center"> <img src="soft.png" height=110> </div> ### Hard and soft Sweeps __Soft sweep__ occur when multiple copies of the advantageous allele are fixed. - Mutations to the same advantageous state occur on different backgrounds (multiple-origins soft sweep). - selection acts on __standing variation__ that was previously neutral or deleterious (single-origin soft sweep) - Less signal with a hard sweep --- # Balancing selection on linked sites <div align="center"> <img src="bs.png" height=150> </div> Balancing selection acting to maintain two or more alleles at intermediate frequency in a population. - Balancing selection acts to counter the effect of drift - __Increase__ levels of linked neutral variation. --- # Balancing selection on linked sites <div align="center"> <img src="adh.png" height=300> </div> - _Adh_ gene in D. melanogaster showed a signature of __long-term balancing selection__ (Kreitman and Hudson 1991) --- # Negative selection on linked sites <div align="center"> <img src="bgs.png" height=100> </div> Deleterious alleles can have an effect by reducing linked neutral variation, this process called __background selection__. - Deleterious variants are removed from the population. - It also removes linked neutral polymorphism with them. ### Genomic signature __Decreased levels of genetic variation__ relative to predictions of neutral evolution (Charlesworth et al., 1993) --- # Modes of selection on linked sites <div align="center"> <img src="sigs.png" height=450> </div> > Cutter and Payseur, 2013 --- # Detecting selection using the SFS ## The effects of positive selection .pull-left[ <div align="center"> <img src="sfs1.png" height=300> </div> > Hanh, 2020 ] .pull-right[ - After sweep ended, new mutations started to accumulate. - These new mutations are by definition __singletons__ - there is only one origin in the sample with each derived allele. ] The SFS can be skewed toward an excess of low-frequency polymorphisms relateive to the neutral spectrum. --- # Detecting selection using the SFS ## The effects of balancing selection Here we consider a simple scenario with a single biallelic site that has been under balancing selection for a long time. - Variation within each allelic class has been able to __build up__ and __reach equilibrium__ .pull-left[ <div align="center"> <img src="sfs3.png" height=200> </div> > Bitarello et al., 2018 ] .pull-right[ - Neutral mutations has accumulated both within and between allelic classes - Overall variation is higher - SNPs at intermediate frequency show __a distinctive "bump"__ in the SFS. ] --- # Detecting selection using SFS A straightforward way would be test a difference between two SFSs. - However, linkage among sites means that __SNPs at a locus are not independent__, which violates the assumptions made by almost all such test. -- ### Instead, we use `\(\theta\)` to detect deviations. - `\(\pi\)`: pairwise necleotide diversity. -- - `\(\theta_W\)`: watterson's `\(\theta\)`, using total number of segregating sites -- - `\(\epsilon_1 = S_1\)`: the number of derived singletons in a sample. - `\(\eta_1\)`: based on all singletons in a sample. -- Under the standard neutral model, all of these test statistics are expected to have a mean of 0. --- # Tajima's D and related tests Tajima (1989) constructed the first test to detect difference between the SFS. His statistic, `\(D\)`, was defined as: `\begin{align*} D = \frac{\pi - \theta_W}{\sqrt{Var(\pi - \theta_W)}} \end{align*}` -- Fu and Li (1993) created similar statistics. These are known as Fu and Li's `\(D\)`, `\(F\)`, `\(D^*\)`, and `\(F^*\)`. `\begin{align*} D = \frac{\pi - \epsilon_1}{\sqrt{Var(\pi - \epsilon_1)}} \end{align*}` `\begin{align*} F = \frac{\theta_W - \epsilon_1}{\sqrt{Var(\theta_W - \epsilon_1)}} \end{align*}` `\begin{align*} D^* = \frac{\pi - \eta_1}{\sqrt{Var(\pi - \eta_1)}} \end{align*}` `\begin{align*} F^* = \frac{\theta_W - \eta_1}{\sqrt{Var(\theta_W - \eta_1)}} \end{align*}` --- # Tajima's D and related tests Tajima (1989) constructed the first test to detect difference between the SFS. His statistic, `\(D\)`, was defined as: `\begin{align*} D = \frac{\pi - \theta_W}{\sqrt{Var(\pi - \theta_W)}} \end{align*}` Originally designed to fit a normal distribution, however, none of these test statistics fit a parametric distribution very well. ### Calculation - Only variable sites at each locus are needed - The number of invariant sites do not figure into any calculations. --- # Interpreting values of the test statistics Tajima's `\(D\)`, Fu and Li's `\(D, F, D^*, F^*\)`: `\begin{align*} D = \frac{\pi - \theta_W}{\sqrt{Var(\pi - \theta_W)}} \end{align*}` After a sweep, all SNPs are low in frequency, `\(\pi\)` will be much lower than expected. While statistics based on counts of segregating sites (like `\(\theta_W\)`) will be much closer to their expected values. -- ------ - All __negative__ when there has been a sweep --- # Interpreting values of the test statistics Tajima's `\(D\)`, Fu and Li's `\(D, F, D^*, F^*\)`: `\begin{align*} D = \frac{\pi - \theta_W}{\sqrt{Var(\pi - \theta_W)}} \end{align*}` Balancing selection lead to an excess of intermediate frequency neutral variation surrounding a selected site. In such case, `\(\pi\)` will be greater than `\(\theta_W\)` and other statistics. -- ------ - All __negative__ when there has been a sweep - All __positive__ when there is balancing selection -- - Are usually __significant__ when the values `\(> +2\)` or `\(< -2\)` - The exact thresholds depend on sample size, number of SNPs, etc. --- # The power of the SFS The time window for positive selection is limited. - Too early during the sweep - signal will be not strong enough -- - Too late after the sweep - Both levels and frequencies of variants will have returned to normal -- Power also determined by the distance between our studied loci and the location of the selected site. - Because of the effect of the recombination. - Move far away enough and there will be no signal of selection at all.