Getting Started on Globus
November 02 2017
File transfer using Globus on Your Home Computer, Clusters, and JetstreamGlobus on Home Computer
Globus on HPC Clusters
Globus on Jetstream VMs
File transfer to a Jetstream instance can be done through several ways. For small files, such as a simple two-line text file, copy pasting the content is probably just as quick as transferring the file. When you get to larger or multiple files, such as a directory (folder) or bigger file, you can transfer it using:
- scp, rsync (if you are comfortable with command line),
- WinSCP , FileZilla (windows users) or
- CyberDuck (windows and OS).
However, when you have really large files, such as raw sequence data, it can be a bit tedious to transfer files via command line, not to mention problematic if your internet connection drops out. In these cases, Globus is a favorable alternative. Globus allows you to transfer gigabytes or terabytes of data securely and quickly.
To transfer files between two points, you need to make sure the two points have active Globus endpoints. Endpoints are the two points you want to transfer data to and from. Both must have Globus set up according to the endpoint type. Setting Globus up on your computer is free (see below), but clusters, academic, or commercial endpoints are required to pay for a subscription. For more information about subscription type and their features, see the subscription information here.
Setting up Globus endpoint on your personal computer
It is likely you will need to transfer files between your preferred cluster and your home computer at some point. To set up Globus on your home computer, first register for an account with Globus connect by going here and click sign in. You can set up an account using google using the following steps here if your university or company does not have a subscription.
You have access to Globus plus features with an XSEDE account (required for Jetstream – if you don’t have one, go here and fill out the form. It does not take long. See here for more detail on getting started on XSEDE and Jetstream). Globus plus allows you to transfer files between two personal computers or between Jetstream image to personal computer. Login into your globus account, and select on “Accounts”.
Go to subsection “Globus Plus”. Under “Select another organization” select option XSEDE Plus Sponsor.
Under the Subgroups tab, find a link to XSEDE Global Plus Users. Select Join group.
Once your request to join the group is approved (usually takes ~hour), you will receive an email to your primary identity (email account) listed under Accounts. As soon as you are accepted, you will be able to transfer between two Globus personal endpoints (i.e. your computer and your Jetstream VM)!
More information and tutorials on globus set up are available here.
Globus endpoints on HPC clusters
If your company/institution has a Globus endpoint installed (this is a paid subscription), you should be able to search for it, or consult your local documentation. For example, if you have access to IU clusters, you will have automatic access to the /N/dc2/scratch/username directory, and if you enter the path, you can also access your /N/dc2/project directories. Files here can be transferred to other global endpoints or your personal computer.
Look for the endpoint on globus transfer as iu#dc2. Follow these steps:
1. Log in at https://www.globus.org/
You should see this page after logging in, if not click on “Transfer”.
2. Click on endpoint and you should see something similar to this. In this example, there are three endpoints setup; IU’s file system (iu#dc2), an endpoint on my bcbio instance using the endpoint set up in the first section (bcbio_js), and my laptop (my laptop). Now I can transfer files between any of these three endpoints. If you don’t see an endpoint you just setup, search for it in the tab highlighted below.
Setting up globus endpoint on Jetstream VMs
These are basically the instructions for the set up of Globus on any Linux machine. However, an example on Jetstream with screencaptures is provided to assist newer users.
Hopefully you should already have an account on Globus - if not make one see the first section of this tutorial. In the following instructions, lines in red are commands, the portions in purple indicate that you need fill in the information. I have also attached screenshots of how a screen will look if the command ran successfully for the first few commands, to gain some confidence that you are doing it right.
Log into the VM (terminal in WebDesktop, Webshell or ssh- your preference). If you are new to command line, “Welcome to the Dark Side!”. You can access the WebDesktop (red) and Webshell (black) with the following links when you click on an active image in your Jetstream Atmosphere Portal. In this case, you will want to use the Web Shell.
NOTE: If the beta is still listed as such, please use that. The old version does not have support for copy and paste. If there is no longer a beta listed, it is likely that the new system has fully replaced the old and you can just use the only webshell listed.
You can also run these commands via putty (Windows), ssh command on terminal (LINUX/MAC), you can login to your VM using your username, IP address (can be found under Jetstream instance details), your key, port 22. This can make things a bit easier if you are comfortable using terminals. For more information here.
If you are working on the Webshell, the main weirdness you will run into is using copy or paste from your computer to Webshell and vice-versa. Here is the trick, on the WebShell:
Windows (shift+ctl+alt)/Mac (shift+ctrl+cmd) -> This brings up a side bar, with a box for use as a common clipboard.
- To paste from your computer to the VM, you will have to paste the text here. Then, exit the sidebar with shift+ctl+cmd, and paste into the shell with shift+ctl+alt/shift+ctrl+v(mac).
- To paste from the VM to your computer, highlight the text, bring up the side bar, and make sure the text is in the common clip board, then paste as normal.
- To paste from the VM to the VM, you can just hightlight the text and paste as normal (not going across computers).
Once you are in a terminal of your choosing, follow these steps:
1. In your home directory, or wherever you install files, download globus connect using this command: wget https://s3.amazonaws.com/connect.globusonline.org/linux/stable/globusconnectpersonal-latest.tgz
2. Unzip the globus connect file downloaded:
tar xzvf globusconnectpersonal-latest.tgz
3. Make a new ssh-key for the instance:
ssh-keygen -t rsa -b 2048 -f js
This command will create a public and private key at current working directory under the name “js”. Type in a passphrase here- acts as 2 step verification process.
Once done, you will see a pattern as shown below.
4. Move the “js.pub” and “js” to ./ssh
mv js.* ~/.ssh/
chmod 600 js.* # this command changes the permissions to the key pair.
5. Add new key (public) to globusconnect online https://www.globus.org/app/transfer
- Click on Account
- In accounts, click on “ manage SSH and X.509 keys.”
IF YOU DO NOT SEE THE ABOVE OPTION: Look at the linked identities under Account. If you do not see an account ending in “globusid.org”, you will need to get one. A globusid is required to add keys. You are able to log in with a gmail/institutional email, and may not have a globus id.
To get globusid, follow this link.
Once you have signed up, you should link your account to your main account (listed under globus.org, Accounts).
The option “ manage SSH and X.509 keys” should show up now.
- Then select “add a new key”
- Fill in the information and copy paste public key to the body
- Click on “Add Key”
6. Back to Jetstream instance, in the terminal go to globusconnect directory. In our case, we installed in the home directory:
7. Log into globusonline.org with your ssh key from your instance (you need to use your username):
ssh -i ~/.ssh/js firstname.lastname@example.org
8. Now setup endpoint on globusonline.org for your instance
endpoint-add -n name_of_the_endpoint --gc
This command will give you an Endpoint ID and Setup ID for the instance, which you WILL NEED in the next set of commands. Copy paste them to a text editor if needed. Sign out of the ssh with "exit", then in the globus install location:
./globusconnect -setup Setup_ID
./globusconnect -start &
ssh -i ~/.ssh/js email@example.com endpoint-activate Endpoint_ID
Globus connect endpoint is now setup and files can be transferred to and from Jetstream to local computer or other places with Globus endpoints already set up. (Ex: /N/dc2/scratch/username- if you have access to IU cluster)
9. Verification step to see if the globus connect endpoint has been setup
ssh -i ~/.ssh/js firstname.lastname@example.org endpoint-search --scope my-endpoints
10. Transfer files using globus connect online https://www.globus.org/app/transfer
11. To stop globusconnect connection in instance, type
DONE! You are all set! If you have any questions mail us at email@example.com