Installing Software - R Packages
June 07 2019
NOTE: This is an excerpt from our R workshop materials. If you want to more, be sure to check out the full NCGAS R for Biologists Texbook, found here. Also, the online course version is available here.
Packages are things you can download that add functionality to R. They contain functions written by other people, and packaged up all nicely so you can just load them, and then call the functions within. Packages will also include additional classes, such as "fasta" or "sam" that define classes of objects that can be really handy to deal with intuitively, rather than shoehorning them into the above data types.
In RStudio, installing and loading packages is easy. On the right frame, there is a section of tabs – one is labeled packages. This will list all the installed packages, which you can just click on or off. The code to do it manually will be written in the terminal, for your interest.
If you want to install a package, you can click "Install Package" and follow the prompts. This only works if you want code from the repositories (CRAN, Bioconductor). You can install packages from git, but this is much less common. If you want custom downloaded code, you will have to do it the command line way!
To download a package:
Note: You will see a LOT of red text scrolling through your terminal - this is perfectly normal, even though it may look like an error. This is just the installation occuring! Unless you see something that says "package not found" or "package load failed" or similar just before it gives you your prompt (">") back, the package is likely installing just fine ^_^.
Load a package:
However, if you are doing this on the clusters, the lack of root access will make things fail if we don’t direct the program to a location we have access to! In this example, I use my home directory on Indiana University’s cluster - but the file path will have to be a different, existing path on your cluster!
Packages only need to be installed once, but they need to be loaded every time you have a new R instance. This is much easier if you are working on your own machine, or in Rstudio.
When you are in the terminal or working with R on a cluster, you can (and should) define a variable called R_LIB_USER. This is often defined as "~/R/library" but you can put your libraries anywhere. This path becomes your default location for installing packages when you don’t have permission to the main repository (which is VERY common on servers) – it also becomes the default location that R will look for packages if they aren’t in the system wide installation.
CRAN versus Bioconductor
I mentioned CRAN and bioconductor above – what are they?
Basically, CRAN is an awesome general use repository of about 7000 packages. It holds a bunch of useful stuff with the purpose to basically hold software the open community writes. Bioconductor is very similar, in the fact that it is a giant repo of about 1000 packages, but it is only for genomics content. Because it has the general mission to make genomic analysis easier, it is managed differently than CRAN. I assume this is why they are separate (see: here).
So what’s different about them? Bioconductor has more requirements than CRAN packages do, such as 1) high quality documentation with 2) at least one vignette on how to run a genomic analysis demonstrating 3) its contribution to the better analysis of genomic data. It has a heavier focus on teaching computation and analysis to the community it serves. It also heavily encourages the use of its other packages, data structures, etc. to establish common input/output formatting between packages to prevent "reinvention of the wheel". There is a really cool network analysis of the resulting differences between CRAN and bioconductor found here.
Both libraries are called the same, but the installation is a bit different. See bioconductor.org for more information (the installation procedure is currently changing, which is why I’m not writing it out here!
Using Bioconductor on Carbonate
If you load the bioconductor module, it will automatically load the correct R version. However, it does not add the R library (all non-bioconductor packages) into R_LIBS (main location variable for R packages) or R_LIB_USER (your personal installation directory variable). You will need to do the following to fix this issue:
modue load bioconductor/3.6
That will load R's main pacakges and bioconductor into R_LIBS and you can still use R_LIB_USER for your individual installations.