R for Biologists Workshop
The goal of the workshop is to help biologists get acquainted with R, which will, in turn, help them with their analysis. The workshop includes three sessions designed to span three weeks. There are no pre-requirements for the workshop in terms of skills, but some familiarity with Unix is helpful. The workshop is available on Rstudio, provided through a preconfigured virtual machine hosted on Jetstream. You can also do the activities on your home computer if you install R yourself.
Getting Started on Jetstream
If you would like to use the Jetstream instance set up for this workshop, simply do an image search for "NCGAS" and click the RStudio Image that pops up. Once the image is up and running, you can copy the IP and add :8787 to the end. For example, if the IP give for my image is 22.214.171.124 the link to RStuido would be 126.96.36.199:8787. Username for all images is guest_user and password is learningR. If you have trouble logging in, try rebooting the image.
Direct link to Jetstream image
Materials for the workshop are available in the zip file below. This contains the full textbook as well as individual notes files for each Chapter and the files required to complete the activities.
R Materials Zip File
The goal of this section is to get you acquainted with R, both the environment and the language. We’ll discuss data type manipulation, the structure of commands, how to get help and more information, how to load packages, and how to use the environment. The hope is to make using R more intuitive – the instructors will not be going through a specific analysis or demo. They will focus on reading and make sense of the language (this is very helpful for new users or anyone currently copying, paste, and hoping). This day will cover Chapter 1 and 2 of the materials provided.
Requirements: There are no requirements for this section, basic Unix (how variables work, cat, pwd, etc.) is helpful as there will be references to bash command similarities, but we won’t be using the command line at all.
Lab: There will be a self-guided activity to practice your skills after the initial workshop material. This will give you practice using R and working with sequence data/vectors and give you time to ask us questions.
Day 2: Introduction to Visualizaiton
We will build on the basic data types and syntax of R to explore visualization of geological data. The two main families of plotting will be introduced (plot style and ggplot style) with examples of how to plot various types of data on geographical maps. This is a useful skill for ecologists and geneticists alike. This day will cover Chapters 3 and 4 of the materials provided.
Requirements: This is a lab based on the material covered in day 1, so familiarity with that material is very, very useful (day 1 material will be available online).
Lab: After walking through geographical mapping together, a self-guided activity will extend the same plotting syntax types to a different kind of data - plotting ordination (PCA, PCoA, nMDS plots) for use in exploring various data you may have. Microbiome, ecological, or population genetics are common examples.
Day 3: Making your own scripts and functions
The goal of this section is to get a bit more in depth on how to read, understand, and troubleshoot R code—by introducing classes and functions. Classes and functions are a large part of R, and therefore a large part of understanding the syntax and function of the language. We will walk through creating your own function for summarizing tables of data (both ecological and genetic data sets are available for use). This day will cover Chapters 5 and 6 of the materials provided.
Requirements: This material assumes basic usage of R covered in the previous two days, or a moderate familiarity with R basics.
Lab: A self-guided walkthrough building on Day 1's Lab, where you will create a function to graph a sliding window plot for GC content. This activity is meant to practice building functions, but this particular example can easily be applied to visualize the variation across any continuous data, such as ecological measure through time, population variation over a genome, etc.