Getting Started on Jetstream
October 18 2017
Quick guide to get started on Jetstream
Before we start, here are a couple reasons for “Why Jetstream?”
It is a cloud computing resource that provides access to preconfigured virtual machines (VM) with root access. Using VM’s helps with the transition of non-computer science background professionals to command line, software installs and onto running analysis on LINUX environment (as most HPC clusters). For more on Jetstream, go here.
Preconfigured VMs mean the software you would like to work with may already be installed - and if not, it is much easier to install software with root access!
Let’s start by getting on Jetstream.
Getting Started – Signing Up
1. Register for an XSEDE account here. Jetstream is part of the XSEDE national cyberinfrastructure and requires a user name. This only takes a couple minutes and is the first step in accessing all the XSEDE machines, such as PSC’s Bridges – our go to for very high memory jobs. You will only have to do this once.
2. Use the XSEDE credentials to log into Jetstream
Step 1 - Select login
Step 2 - Select Continue
Step 3 - Enter username and password
Sign in and now you should be logged into your Jetstream account
3. If you don’t have Jetstream allocation here are your options
- Request to be part of NCGAS allocation by mailing firstname.lastname@example.org.
This form of allocation is preferable when you would like to test Jetstream to see if it meets your computational needs. Also, you can get started on your project while waiting for your allocation to be approved (see b or c). Do note that use of our allocation is for trial and beginning purposes – if you begin to use significant computational time, we may contact you to remind you to get your own allocation.
- Request for a trial access allocation from Jetstream.
No formal allocation proposal request and will be given access within a day with minimum number of computational time.
- Make a formal allocation request to XSEDE. This can take a bit of time to get approved, which is why this is typically done after you have a trial allocation in place.
Once you have allocations, it will appear on your home page on Jetstream as shown below- in the example below the account has two allocations (red) and 2 instances currently running (yellow). When the allocations run out, you will have to renew your allocation, so occasionally check to make sure you aren’t getting too low!
4. Add an ssh-key to Jetstream account. This allows you to sign into VMs easily.
For more information on shh keys and why they are useful see this link.
Step 1 - Click on username on the top-right end of your screen, then select Settings
Step 2 - Select “Show more” under Advanced
Step 3 - Under SSH configuration, select “+” to add a public key
Step 4 - add your key- as shown below , then click Confirm
Now when you launch your instance/VM, this key will be added to it. You won’t need to remember a password 😊
Getting Started – Launch an Instance
Jetstream offers a set of preconfigured VMs, such as Bio-Linux, Galaxy, R with a set of software packages already installed. New preconfigured VMs are added regularly, so be search the repository before toiling through installing software unnecessarily.
The basic steps to launch an instance are listed here.
Below is an example on how to launch an instance that is pre-configured to have bcbio-nextgen toolkit (Documentation) already installed.
1. Select projects, then click on create new project
In the below case, there are already two projects directories.
2. Fill in Project name and a description. Select create
Now this project should show up as project. For example if I named the project “test”, you will see a project called test as shown below.
3. Click on the project “test”
4. Click on “new”
5. Click on “Instance” to start a VM
Note the Volume option - to store the input data and outputs from your Jetstream image. More information can be found here.
6. To launch a “bcbio-nextgen toolkit” Click on “Show all”, then type “bcbio” in search.
Below is the list shown with R with GCC, R with intel compilers etc are precompiled Jetstream instances with software (R) already installed.
You can also search directly by name - here I use bcbio. Click on the instance to advance to the next step.
7. Fill the following details out
- Instance name- you can keep the same, but if you were to spin up 2 or 3 of the same instances- and they all have the same name. You should be able to see the problem with that
- Image version- the images are updated to fix bugs or to add software updates, unless for specific reasons, always choose the latest update
- Allocation source- this is tab to consider if you have more than 1 allocations, otherwise ignore
- Provider- either is fine. You may have a bit easier time getting direct help from us if you select IU, however.
- Instance Size- You can see here bcbio image default is medium, so you can spin up an instance that is medium or bigger than medium.
How to you select a size- look at the size of your data- input data size is 100GB- pick an instance with <=120GB memory required by the software- high memory jobs, like assembly pick 60 GB parallelizing – look for the number of vCPU’s.
Once you make the changes as per your requirements, Click “Launch Instance”
8. Once the instance is launched, you can use the WebDesktop (Graphical User Interface, in red) or Webshell (command line, in black) that can be found on right hand of the screen. The beta versions are updated versions.
If you would like to access the VM using SSH- here are the instructions.
Getting Started – Installing Software
sudo pip install “python package”
You can also install R packages with the default install.package(“name”), without designating a R_LIBS or other location! Most default instruction for installation now work when you have root access!
(But remember getting sudo/root access gives you power to customize your VM, but with “Great power comes great responsibility.”.)
Another great benefit of using a VM is that if you made changes to your VM you don’t like, and you can’t undo, you can try rebooting (restart). If that doesn’t work, stop (shut down) => start. Still didn’t work? Delete (kill VM and delete all data inside) => setup new VM (new machine) with the same image. Easy right?
But, deleting VM’s and restarting them over and over again means using up more of your allocation to redo things you already did (set up, load data, etc).
When you are given access to Jetstream you are given a set allocation of Service units (SU’s) or 1 SU per vCPUcore-hour (use of one virtual core of a CPU per hour). See here for more information. If you do use up your allocation, you can request for more resources – feel free to email us for help when this happens!
That should get you started using Jetstream! We’ll discuss how to move files onto Jetstream and IU from your home computer using Globus in the next post. In the meantime, for more information and guides for Jetstream go here.