Hands on Workshop: Introduction to Bioinformatics

Daily Questions

Questions for discussion

Topic 2. What is one task you’d rather use an R script instead of a shell script? Why? What is one task you’d rather use a shell script, instead of an R script? Why?

Topic 3 Q1. Compare the two .html files. What kinds of differences do you see in the files? Why do you think these differences are found (hint: think about the types of data you are analyzing)?

Topic 3 Q2. Try different filtering options for the GBS data (see http://prinseq.sourceforge.net/manual.html for options) and plot QC graphs. Discuss which options you would choose to implement if this was your data and why.

Topic 4. What are two ways that could be used to evaluate which aligner is best?

Topic 5. You’re trying to create a very stringent set of SNPs for measuring population structure in a PCA. Based on the site information GATK produces, what filters would you use? Include the actual VCF abbreviations.

Topic 6/7. For a site that is invariant in both populations (i.e. a locus with no variation), what is Fst?

Topic 6/7. If you have a dataset of 100 samples and 100,000 SNPs, what is the maximum number of PC axes? PCs can also be called eigenvectors. HINT: Here’s an explanation of PCAs