Transcriptome Pipeline Documentation

Transcriptome Pipeline Welcome Basket

Resources

  • Presentation on how Trinity works, and an introduction to the pipeline (from MDIBL Environmental Genomics 2018)
  • Presentation focused more on the pipeline (from PAG 2018)

Start up

  • Get the pipeline project folder
  • The pipeline is currently set up for SLURM (v3 and under) and TORQUE (all versions) job handlers, with current set up being for IU Carbonate Cluster and PSC Bridges. If you want to run this on your own hardware, we can help you convert the scripts to run on your machine.
  • READ THE READMEs. There are READMEs in every folder.
  • There is a setup script you must run before moving your data into the input_files folder. If you move your data first, it won't hurt anything, but setup will take a VERY long time. Please see the README on setup instructions, but this script now replaces the previous commands to set up email and working directory. It also now handles set up for single or double (default) stranded data.
  • Place your input in the input_sequences folder. These should be trimmed and quality controlled sequences. This pipeline does not deal with that step, as it is very context and project dependent. Concatonate all your left reads into one file called left.fq; combine your right reads into right.fq.
  • For each assembler folder, run each set of steps. If there are run files with the same prefix number (e.g. 1a and 1b), these can be run concurrently. Simply submit both with qsub. See README in each file for more information.
  • For description, documentation, and licence for each program, go here. The documentation and license information is also listed in each respective folder's README. On Carbonate, you can also use “module display $NAME” where $NAME is the name of the module listed in the run files.
  • You can now get citations for all software by using the -c flag with the start up script.
  • After you have finished the assemblies, run the ./Combine.sh script. This will combine the kmers, label them with the kmer and the assembler, and output them to the final_assemblies folder
  • After all assemblers are done and there are <ASSEMBLER>.fa files for each in the final_assemblies files, run the final_assemblers/Combine.sh first, then run RunEvigenes. See the README for details on the output and next steps.
  • Downstream handling of Quality Control (through Quast and BUSCO), Differential Expression (through Trinity wrappers) and Annotation (Trinotate) are now implemented. These are all in separate directories within the final_assemblies folder and have (you guessed it) READMEs.
  • If you have questions, feel free to email us at help@ncgas.org