class: center, middle, inverse, title-slide # Wright-Fisher Simulation ### Jinliang Yang ### Jan. 27th, 2022 --- # Syncing a fork (from the web UI) 1. Click the __Fork__ button for the Git Repo `https://github.com/jyanglab/2022-agro932-lab` 2. And then clone to your own system `git clone git@github.com:YOURID/2022-agro932-lab.git` -- ### If you have __Forked__ it before: 1. On GitHub, navigate to the main page of the forked repository that you want to sync with the upstream repository. 2. Select the __Fetch upstream__ drop-down. 3. Review the details about the commits from the upstream repository, then click __Fetch and merge__. --- # Unix Commands - __`cd`__: change the working directory - __`mkdir`__: make directories - __`pwd`__: print name of current working directory - __`ls`__: list directory contents - __`chmod`__: change the access permissions to files and directories - __`head`__: output the first part of files - __`tail`__: output the last part of files - __`more`__ and __`less`__: display contents of large files page by page or scroll line by line up and down - __`wc`__: print line, word, and byte counts for each each file - __`grep`__: print lines matching a pattern - __`|`__: pipe, i.e., __`ls -la | head -n 100 >> new_file`__ --- # Unix Commands: how to get help If you don’t know how to use a command, i.e., you don’t know about its parameters and return type etc, then make use of __`–-help`__ command. ```bash less --help ``` -- ### How to use help command Simply type __`help`__, which is used for listing all possible commands that are pre-installed. ```bash help help cd ``` --- # Unix Commands: how to get help ### The __`man`__ command or man pages __`man`__ is used when you want to get a detailed manual of a command. ```bash man mkdir man bash ``` --- # Getting to know slurm [Slurm](https://slurm.schedmd.com/overview.html) is job managing system: you submit jobs via batch scripts. ```bash sinfo # view information about Slurm nodes and partitions. sinfo --help sinfo | head -n 10 ``` - Here we see our __`PARTITION`__ and __`TIME LIMIT`__, and all of their inferior but still useful friends. - Note that there is a column of __`STATE`__, which indicates the state of the machine. --- # Slurm Commands A better way of looking at what’s going on on each machine is with __`squeue`__, which is the job queue. ```bash squeue --help # view information about jobs located in the Slurm scheduling queue. squeue squeue | wc -l # how many jobs are running squeue | grep "agro932" ``` - This shows each __job ID__ (very important), partition the job is running on, name of person running the job. - Also note __`TIME`__ which is how long a job has been running. -- We can use __`squeue`__ to find the job ID, allowing us to cancel a job with __`scancel`__. Let’s kill a job: ```bash squeue | grep "USERID" scancel JOBID ``` --- # Warnings!!! __Do not run anything on the headnode__ - Except cluster management tools (squeue, sbatch, etc), compilation tasks (but usually ask CSE help for big apps), or downloading files. - If you run anything on the headnode, you will disgrace your lab. Instead, you can get a `bash` terminal using `srun`: ```bash srun --licenses=common --nodes=1 --mem 1G --ntasks=1 --time=1:00:00 --pty bash ### get a quick job srun --licenses=common --qos=short --nodes=1 --ntasks=1 --mem 1G --time 1:00:00 --pty bash ``` -- __Monitor your disk space__, as it can fill up quickly. - You will see the disk usage every time you login ```bash hcc-du ``` --- # Conduct the Wright-Fisher Simulation ### Clone your GitHub Repo ```bash cd $COMMON mkdir courses # clone the github repo git clone git@github.com:YOURID/2022-agro932-lab.git # if you already have it, then sync it git pull ``` -- ### Load R Console ```bash module load R R ``` --- # Wright-Fisher Simulation - Consider a single locus with two alleles, `\(A_1\)` and `\(A_2\)` - In generation `\(t\)` there are `\(x\)` individuals carrying allele `\(A_1\)`, which is at frequency `\(p_t = x/2N\)`. - This implies that there are `\(2N - x\)` individuals carrying allele `\(A_2\)`, which is at frequency `\(q_t = 1- p_t= 1- x/2N\)`. - The sampling of alleles for the next generations is equivalent to sampling from a binomial distribution with parameters size = `\(2N\)` and prob= `\(x/2N\)`. - Therefore, the mean and variance of `\(p\)` in the next generation for the Wright-Fisher model are: `\begin{align*} E(p_{t+1}) &= p_t \\ Var(p_{t+1}) &= p_tq_t/2N \end{align*}` --- # Wright-Fisher Simulation - Consider a single locus with two alleles, `\(A_1\)` and `\(A_2\)` ```r # create two scalars A1 <- 100 # number of A1 allele (the minor allele) in the first generation N <- 1000 # number of diploid individuals in the population ``` -- - In generation `\(t\)` there are `\(x\)` individuals carrying allele `\(A_1\)`, which is at frequency `\(p_t = x/2N\)`. ```r # create allele frequency p <- A1/(2*N) ``` -- - This implies that there are `\(2N - x\)` individuals carrying allele `\(A_2\)`, which is at frequency `\(q_t = 1- p_t= 1- x/2N\)`. ```r # create A2 allele frequency q <- 1 -p ``` --- # Wright-Fisher Simulation - The sampling of alleles for the next generations is equivalent to sampling from a binomial distribution with parameters size = `\(2N\)` and prob= `\(x/2N\)`. ```r # sample from a binomial distribution with parameters size =2N and prob=p A1 <- rbinom(1, 2*N, p) p <- A1/(2*N) ``` --- # Wright-Fisher Simulation ### Using [R looping](https://www.statmethods.net/) - Consider a single locus with two alleles, `\(A_1\)` and `\(A_2\)` ```r # create two scalars A1 <- 100 # number of A1 allele (the minor allele) in the first generation N <- 1000 # number of diploid individuals in the population p <- A1/(2*N) ``` -- - In generation `\(t\)` there are `\(x\)` individuals carrying allele `\(A_1\)`, which is at frequency `\(p_t = x/2N\)`. ```r ### make a numeric vector to hold the results freq <- as.numeric(); t=1000 ### Use for loop to run over t generations for (i in 1:t){ A1 <- rbinom(1, 2*N, p) p <- A1/(2*N) freq[i] <- p } ``` --- # Wright-Fisher Simulation ### Visualize the results ```r #pdf("graphs/sim1.pdf", widht=5, height=5) plot(freq, type="l", ylim=c(0,1), col="red", xlab="time", ylab=expression(p(A[1])), main="Wright-Fisher Simulation") ``` ![](data:image/png;base64,#w2lab_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ```r #dev.off() ``` --- # Wright-Fisher Simulation ### Create [R function](https://www.statmethods.net/) ```r wright_fisher <- function(N=1000, A1=100, t=1000){ p <- A1/(2*N) ### make a numeric vector to hold the results freq <- as.numeric(); t=1000 ### Use for loop to run over t generations for (i in 1:t){ A1 <- rbinom(1, 2*N, p) p <- A1/(2*N) freq[i] <- p } return(freq) } ``` ### Use the function ```r frq <- wright_fisher(N=1000, A1=100, t=1000) ``` --- # Wright-Fisher Simulation Then apply the `wright_fisher` R function to conduct __100__ simulations ```r set.seed(1234) frq <- wright_fisher(N=1000, A1=100, t=1000) plot(frq, type="l", ylim=c(0,1), col=3, xlab="Generations",ylab=expression(p(A[1]))) for(u in 1:100){ frq <- wright_fisher(N=1000, A1=100, t=1000) random <- sample(1:1000,1,replace=F) randomcolor <- colors()[random] lines(frq,type="l",col=(randomcolor)) } ``` ![](data:image/png;base64,#w2lab_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- # Wright-Fisher Simulation Then apply the `wright_fisher` R function to conduct __1,000__ simulations using `crane`. ```r set.seed(1234) frq <- wright_fisher(N=1000, A1=100, t=1000) pdf("graphs/sim1000.pdf", width=8, height=8) plot(frq, type="l", ylim=c(0,1), col=3, xlab="Generations", ylab=expression(p(A[1]))) for(u in 1:1000){ frq <- wright_fisher(N=1000, A1=100, t=1000) random <- sample(1:1000,1,replace=F) randomcolor <- colors()[random] lines(frq,type="l",col=(randomcolor)) } dev.off() ``` -- Conduct version control and then get the results ```bash git add --all git commit -m "sim 1000 fig" git push ```