Running Anvio on Jetstream
August 17 2018
Anvi’o is an open source tool that allows some really fun visualization of whole genome metagenomic data. A metagenome is a subset of an environmental sample that is all sequenced with no prior culturing steps and in this case no sequence-specific amplification like 16s sequences. This software has been written for whole-genome sequencing data - both bacterial genomes and metagenomes.
If you started working with these datasets, Anvi’o can be used to visualize all the fun stuff the data is trying to tell you. For example, here is an Anvi’o figure published in the research paper. The figure is showing two isolates “EQPAC1” and “MIT9314” from a set of metagenome samples collected from the Mediterranean Sea, Atlantic Ocean, Red Sea, Indian Ocean, Pacific Ocean, and the Southern Ocean. Each circle then includes a representation of the genes within an isolate from the samples, and the outer circle (green and red) represents the gene collections found in all the genomes. In the below figure, green representing the environmental core and the red represents the hypervariable regions for all the samples.
Here is another figure from the Anvi’o paper directly explaining one of the graphs.
These are just two examples of figures you can plot, you can also add phylogenetic trees, reconstruct metagenome assembled genomes (MAGs), and recover 16s sequences from MAGs. The project is still under active development, so there are constant updates and new features being added.
If you are analyzing a large number of metagenomes, and you are worried about memory limitations on your local computer, we have installed it on Jetstream, a free cloud computing platform. In this blog post, we talk about how to run Anvi'o on Jetstream mostly. If you haven’t come across Jetstream platform earlier, it is a cloud computing platform that hosts a library of pre-configured VM’s to help – here are some posts to familiarize yourself - https://ncgas.org/training.php, look under the section "Jetstream Information".
In this blog post, we will focus only on running Anvi’o on Jetstream not explaining the workflows to generate these diagrams. The reason is that Merenlab's blog is great for that, it’s easy to follow, and the lab did write the program so they are a better place to ask Anvi’o related questions as well.
Access to Jetstream
We have several blog posts on Jetstream, here are the steps to get started and the link to the tutorials
- Follow the blog post to get started with an account in Jetstream
- Once you have an account, spin up the Anvi’o image
- Optional - If you have a
reallydataset, use Globus to transfer from laptop to Jetstream image
- Optional - There is also an option to save this dataset in a volume instead of the image directly
Launching Anvi'o image
When you selected “Launch” the pop-up window open requesting the resources you will need. In this box,
- First, don’t forget to change the name from Anvi'o to a better description of the image. It’s no fun to have three Anvi'o images up, labeled Anvi'o to guess which image hosts which data.
- The next parameter to be careful with here is Instance Size. There is no easy way to guess the size of the instance you will need,
- make this decision based on the size of your data.
- Most of Anvi’o commands allow multithreading (--num-threads n) so I prefer to choose an image at least larger than medium.
- Once you have launched this instance, wait a bit before the image is active, as shown below.
- SSH into the IP address (using Putty,
CygWin, Terminal), or use the Web Desktop or Web Shell on the right to login to the image (in the above figure).
In the shell, type in “$
anvi-profile –version”, to confirm the version of Anvi'o installed is the latest (if it is not the latest, send us an email at email@example.com, and we will update the version), or the version you want to work with.
If you are interested in using an older version, here are the steps to take
Here is a link to all the anvio versions built in conda.
- Now, let's run the test suite on Anvi'o, to make sure everything looks good. In the terminal run the command
NOTE: I recommend running the '
Once the test is done, you will see the following line in the terminal
“* The server is now listening in port number”8081”. When you are finished, press CTRL^C to terminate the server.
If you are working in the WebDesktop, you will also notice that a web browser pop up with the Anvi''o-interactive image.
If you are working on the web shell or from Putty/terminal, go to google chrome and type in the URL
http://IP-ADDRESS -OF-VM:8081/ app/index.html ,this should display the Anvi’o interactive diagram of the ' anvi-self-test'.
You can now share this URL with collaborators or make changes to the image and save it for your presentation/paper.