Population structure and Fst

class: center, middle, inverse, title-slide

.title[
# Population structure and Fst
]
.author[
### Jinliang Yang
]
.date[
### Feb. 14, 2024
]

---

# Population subdivision

.pull-left[
- ### Selection
- ### Migration
- ### Mutation
- ### Drift (inbreeding)
]

.pull-right[
<div align="center">
<img src="pops.png" width=400>
</div>
]

- ### Cryptic population structure

Unknown (cryptic) population structure can appear as a __deficiency of heterozygotes__
- Just one more way to get reduced heterozygosity and increased homozygosity in your sample

---

# Population subdivision

| Genotype | `\(A_1A_1\)`   | `\(A_1A_2\)`    | `\(A_2A_2\)` 
| :-------: | : ------ : | :-------: | :-------: | 
| sub-pop1  | 64    | 32    | 4    | 
| sub-pop2  | 4     | 32    | 64    | 
| Total     | 68    | 64    | 68    |

The frequencies of the `\(A_1\)` allele in sub-pop1 and sub-pop2:
`\begin{align*}
p_1 & = \frac{2 \times 64 + 32}{2 \times 100} = 0.8 \\
p_2 & = \frac{2 \times 4 + 32}{2 \times 100} = 0.2 \\
\end{align*}`

The frequency of the `\(A_1\)` allele in the combined population:

`\begin{align*}
\bar{p} & = \frac{2 \times 68 + 64}{2 \times 200} = 0.5 \\
\end{align*}`

The expected value of heterozygotes for the combined population is:

`\begin{align*}
2\bar{p}\bar{q} \times n = 2 \times 0.5 \times 0.5 \times 200 = 100 \\ 
\end{align*}`

---

# Wahlund Effect

The _perceived deficiency_ of heterozygotes due to treating two different populations as one --- the __Wahlund effect__. 
> Wahlund, 1928

---

# Wahlund Effect

.pull-left[
<div align="center">
<img src="wahlund.png" height=400>
</div>
]

.pull-right[

- The blue dashed horizontal segment is the `\(H_{exp} = 2 \bar{p}\bar{q}\)`

- The red dashed horizontal segment is the `\(H_{obs}\)` 
  - mean value of the `\(H_1\)` and `\(H_2\)`

- Because the curve is concave downward, `\(H_{obs}\)` is always less than `\(H_{exp}\)`

]

Even if subpopulations are __partially isolated__, the Wahlund effect is held true.

---

# Heterozygosity to describe Populations

In a subdivided population, the overall deviation from HWE heterozygosity:

- Factors acting within subpopulations
- Simply due to the subdivision itself (the Wahlund effect)

### Wright's F-statistics

Assuming HWE, Wright’s F describes the deviation from expected heterozygosity

`\begin{align*}
F\text{-}stat = \frac{H_e - H_o}{H_e}
\end{align*}`

- Expected Heterozygosity ( `\(H_e\)` ) = estimate of diversity

- Observed Heterozygosity ( `\(H_o\)` )  = current characterization of the sample

- This measures the proportionate reduction in heterozygosity compared to HWE
  - If `\(H_o < H_e\)`, what does it mean and how can we use this?

---

# Heirarchical F-statistics

.pull-left[
<div align="center">
<img src="pops.png" width=300>
</div>
]

.pull-right[
<div align="center">
<img src="wahlund.png" height=280>
</div>
]

- `\(H_I\)` = mean observed heterozygosities over all subpopulations = `\(\frac{H_1 + H_2}{2}\)`

- `\(H_S\)` = mean expected heterozygosity within random mating subpopulations = `\(2p_iq_i\)`

- `\(H_T\)` = expected heterozygosity in a random mating total population = `\(2 \bar{p} \bar{q}\)`

---

# Heirarchical F-statistics

.pull-left[
<div align="center">
<img src="wahlund.png" height=280>
</div>
]

.pull-right[
- `\(H_I\)`, the average of the observed heterozygosities

- `\(H_S\)`, the average of the expected heterozygosities
]

The average deviation in heterozygosity within subpopulations is

`\begin{align*}
F_{IS} = \frac{H_S - H_I}{H_S}
\end{align*}`

- The subscript `\(IS\)` here indicates deviation among __individuals__ relative to their __subpopulation__

---

# Heirarchical F-statistics

.pull-left[
<div align="center">
<img src="wahlund.png" height=280>
</div>
]

.pull-right[
- `\(H_I\)`, the average of the observed heterozygosities

- `\(H_S\)`, the average of the expected heterozygosities

- `\(H_T\)`, the expected heterozygosity in the toal population
]

The deviation in heterozygosity due to subdivision alone is

`\begin{align*}
F_{ST} = \frac{H_T - H_S}{H_T}
\end{align*}`

- The subscript `\(ST\)` here indicates deviation among __subpopulations__ relative to the __total population__

---

# Heirarchical F-statistics

.pull-left[
<div align="center">
<img src="wahlund.png" height=280>
</div>
]

.pull-right[
- `\(H_I\)`, the average of the observed heterozygosities

- `\(H_S\)`, the average of the expected heterozygosities

- `\(H_T\)`, the expected heterozygosity in the toal population
]

Finally, the overall deviation in heterozygosity in the total pouplation is

`\begin{align*}
F_{IT} = \frac{H_T - H_I}{H_T}
\end{align*}`

- The subscript `\(IT\)` here indicates deviation among __individuals__ relative to the __total population__

---

# Summary of Heirarchical F-statistics

### `\(F_{ST}\)`

- Measures how different the subpopulations are.

- It can be interpreted as the proportion of the total heterozygosity that is due to differences in allele frequencies among subpopulations

- `\(F_{ST}\)` can be calculated from allele frequencies lone

### `\(F_{IS}\)` and `\(F_{IT}\)`

- `\(F_{IS}\)` is due to factors (selection, drift, mutation) acting within subpopulations

- `\(F_{IT}\)` depends on the values of `\(F_{ST}\)` and `\(F_{IS}\)`

- `\(F_{IS}\)` and `\(F_{IT}\)` require observed genotype frequencies

---

# An example for F-statistics calculation

Heterozygosity indicies (over individuals `\(H_{I}\)`, subpopulations `\(H_{S}\)`, and total population `\(H_{T}\)`)

- `\(H_I\)` based on __observed__ heterozygosities in __subpopulations__:

`\begin{align*}
H_I & = (H_{osb1} \times N_1 + H_{obs2} \times N_2) \times \frac{1}{N_{total}} \\
& = (0.32 \times 100 + 0.32 \times 100) \times \frac{1}{200} = 0.32 \\
\end{align*}`

---

# An example for F-statistics calculation

The frequencies of the `\(A_1\)` allele in sub-pop1, sub-pop2, and combined population:
`\begin{align*}
p_1  = 0.8, p_2 = 0.2, \bar{p} = 0.5 \\
\end{align*}`

- `\(H_S\)` based on __expected__ heterozygosities in __subpopulations__:

`\begin{align*}
H_S & = (H_{exp1} \times N_1 + H_{exp2} \times N_2) \times \frac{1}{N_{total}} \\
& = (2 \times 0.8 \times 0.2 \times 100 + 2 \times 0.8 \times 0.2 \times 100) \times \frac{1}{200} = 0.32 \\
\end{align*}`

- `\(H_T\)` based on __expected__ heterozygosities in __total populations__:

`\begin{align*}
H_T & = 2 \bar{p} \bar{q} \\
& = 2 \times 0.5 \times 0.5 = 0.5 \\
\end{align*}`

---

# An example for F-statistics calculation

`\begin{align*}
H_I & = 0.32 \\
H_S & = 0.32 \\
H_T & = 0.5 \\
\end{align*}`

--------------

`\begin{align*}
H_{IS} & = \frac{H_S - H_I}{H_S}  = \frac{0.32 - 0.32}{0.32} =0 \\
\end{align*}`

`\begin{align*}
H_{ST} & = \frac{H_T - H_S}{H_T}  = \frac{0.5 - 0.32}{0.5} = 0.36 \\
\end{align*}`

`\begin{align*}
H_{IT} & = \frac{H_T - H_I}{H_T}  = \frac{0.5 - 0.32}{0.5} = 0.36 \\
\end{align*}`

- There is no evidence of inbreeding within each subpopulations

- But clear evidence of population subdivision

- And population subdivision accounts for all the genetic variation.

---
# More on the Hierarchical F-statistics

### `\(F_{IS}\)` - Inbreeding index

- Reduction in heterozygosity of an individual due to non-random mating in a subpopulation

### `\(F_{ST}\)`  - Fixation index

- Reduction in heterozygosity of a subpopulation relative to the total population due to drift;

- Measure of genetic differentiation

---
# More on the Hierarchical F-statistics

### `\(F_{IT}\)` - Overall fixation index

- Mean reduction in heterozygosity of an individual relative to the total population

- Combined effects of non-random mating ( `\(F_{IS}\)` ) with drift ( `\(F_{ST}\)` )

### Relationship of `\(F_{IS}\)`, `\(F_{ST}\)`, and `\(F_{IT}\)` 
However, it is NOT simple additive relationship

They are related by the expression
`\begin{align*}
(1-F_{IT}) = (1-F_{IS})(1-F_{ST})
\end{align*}`

It can be rewritten as

`\begin{align*}
F_{IT} = F_{IS} + F_{ST} - F_{IS}F_{ST}
\end{align*}`

---

# Of course there are issues...

`\(F_{ST}\)` must be adjusted to account for the numbers of individuals in each sub-population (sampling error)

- Unbiased estimators, such as `\(\hat{F_{ST}}\)`
  > Nei and Chesser, 1983

CANNOT compare `\(F_{ST}\)` between studies or samples if they are not based upon the exact same loci!!

- Like `\(D'\)` you can express `\(F_{ST}\)` as `\(F_{ST}'\)`
> Hedrick, 2005

- Jost’s D uses number of effective alleles instead of heterozygosity
> Jost, 2008