Running Canu on Carbonate
February 02 2018
Running canu assembler on carbonate
Canu is an assembler that was developed from Celera assembler designed for PacBio and Oxford MinION sequencing technologies. For more information on canu, http://canu.readthedocs.io/en/latest/quick-start.html
Running canu on carbonate
Canu is already installed on carbonate, and is available as module. To add canu scripts and its libraries to your path, use the following command
$module load canu
For more information on where the scripts are located in server, the version of canu, its license, prerequisites (doesn’t have any) and other information can be found in
$module display canu
Currently we have only canu version 1.6 available as a module which is the latest release of this program.
Canu command tweaking on carbonate
Coming to the main reason for this blog post is to show you command line options that must be included in command to run canu on carbonate. The reason being that canu was developed compatible to different job schedulers, allowing the users to submit a canu script on the login node without a need to write the job submission scripts. Canu’s script then identifies the job scheduler being used in the cluster and run correction, trimming and assembly as a series of jobs. Canu writes the users input to already prebuilt job submission scripts, which means you will not get an angry email about running jobs on the login node J. Canu also requests resources in the job scripts after estimating the resources available on cluster and necessary for the job. What this comes down to is that no more predicting how many cores you need or memory, just submit a script and the canu will do the work for you. It’s all a perfect solution, except…
The built in job script requests memory using the -mem flag instead of the –vmem that carbonate job submission script requires. If you don’t overwrite the –mem flag in the job scripts, the jobs submitted get only a default of 16GB memory, which is a problem unless you are working with a really small genome. So now the user has to define in the command how much of vmem is required for the canu jobs. If you think about it, correction and trimming do not require as much memory as the assembly step, but unfortunately the solution I have here is to instead request the same memory (-vmem flag) for all the three steps. The setback being that since largemem will be requested for steps that may not need that much memory, you will need to sit back and wait longer for the jobs to run.
Canu command on carbonate
$module load canu
Fill in between <> with your information
The MUST options on carbonate is “gridOptions="-l vmem=400gb,walltime=10:00:00" useGrid=true”, which basically requests for memory of 400GB and walltime of 10 hours for every job.
usegrid=true options lets the canu script know to use job scheduler available to submit a string of jobs, if you were to use usegrid=false, then make sure to submit the above command requesting an entire node with all processors and submit the command in a job submission script (found here http://ncgas.org/Blog_Posts/Job%20Management.php), otherwise it will run on the login node—which is a not okay.
For more options to add to the command line above, http://canu.readthedocs.io/en/latest/command-reference.html.
In the case that canu has not completed successfully, here is where to look to understand why it failed. In the output directory mentioned with the –d flag in the canu command. There will be a file called canu.out. Open this file and towards the end there will be an explanation at which step or why the job failed. Here was a line from the canu.out file “Exceeded job memory limit” or “Exceed job walltime”. These are obvious errors, restart the job with more resources.