Trinity

Trinity, developed at the Broad Institute, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

Installed on blacklight.

In this document, you will find instructions for running a Trinity job on blacklight

Other resources that may be helpful include:

Running a Trinity job

Trinity jobs are submitted to blacklight's batch queues to run. Only very small test runs with a small number of cores and a short run time can be run interactively. Production runs of any size must be submitted as batch jobs.

To run a Trinity job, follow these steps.  Details for each step follow.

    1. Choose which version of Trinity to use. Multiple versions of Trinity are installed. It is important to know which version you are using, as the command line options and the default settings can change between versions. It is also a good practice to use the same version throughout your project. For help on selecting a version:

      There is always a default version of Trinity defined, but it changes as new versions are added and older versions deleted. For this reason, you should never just load the default version; it may change without notice. Always load the specific version that you want.

    2. Create a job script. The script will contain commands to
      1. Combine stdout and stderr
      2. Load the appropriate modules
      3. Set the stack size to unlimited
      4. Copy your input files to $SCRATCH
      5. Move to your $SCRATCH directory
      6. Run Trinity. One thing you must do in the Trinity command line is redirect the Trinity output to a file.
      7. Copy your Trinity output file from $SCRATCH

 

  1. Submit your job with the qsub command
  1. Choose which version of Trinity to use

    Multiple versions of Trinity are installed. It is important to know which version you are using, as the command line options and the default settings can change between versions.

    The module command can tell you what versions are installed.  When you have chosen a version, you will use the "module load" command to set up the correct environment to run that version of Trinity. For more information, see documentation on the module command.

    • See what versions are available

      To see what versions of Trinity are available, type

      module available trinity

      This example shows four versions installed, from r2012-01-25 to r2012-06-08:

      tg-login1:~> module avail trinity
      -------------------------- /usr/local/opt/modulefiles --------------------------
      trinity/r2012-01-25 trinity/r2012-05-18
      trinity/r2012-03-17 trinity/r2012-06-08
      tg-login1:~>
      
    • See the options for a specific version

      You can see the available options and defaults for a given version. First load the specific Trinity module you are interested in with the module load command. Type in the full name of the module from the module available trinity command. For example, to see the options for version r2012-03-17, type

      module load trinity/r2012-03-17

      To see the Trinity command line options, now type

      Trinity.pl
    • Read the release notes for a specific version

      You can read the release notes for any version by looking at the Release.Notes file, found in the top level directory (/usr/loca/packages/trinity/version) for that version. For example, to see the release notes for version 2012-06-08, look at the file /usr/local/packages/trinity/r2012-06-08/Release.Notes.

  2. Create a job script

    Create a job script which will do all the set-up necessary and then run Trinity. See the blacklight document for extensive details on the structure of blacklight job scripts, including the necessary PBS directives.

    Your script should contain commands to

      1. Combine stdout and stderr

        In a batch job, the messages and errors that are normally displayed on the monitor while an interactive job runs are instead written to two files, stdout and stderr, respectively.

        Redirect stdout and stderr by using the PBS directive -j oe. This combines both stdout and stderr into one file, which makes debugging easier. Put this line into your batch script:

        #PBS -j oe my-PBS-output-file
        

        For more information on PBS directives in batch jobs, see the blacklight document.

      2. Load the appropriate modules

        Use module load to define the correct environment to run a specific version of Trinity.  Type in the full name of the module from the module available trinity command.

        module load trinity/r2011-11-26
        module load samtools/0.1.18

        If you are planning on doing any post-assembly analyses using RSEM, you must load the RSEM module with the command

        module load rsem/1.2.11

        The RSEM module is no longer automatically loaded when you load the trinity module.

      3. Set the stack size to unlimited

        You must set the stack size to unlimited in your batch script before running Trinity, or the job will fail.

        If you are using bash, type

        ulimit -s unlimited

        If you are using csh, type

        limit stacksize unlimited
      4. Copy your input files to $SCRATCH

        Your $SCRATCH directory on blacklight is intended to be used as working space for your running jobs. All of the files that your job needs should be copied to $SCRATCH.   Copy them with

        cp inputfile $SCRATCH
      5. Move to your $SCRATCH directory

        Move to your $SCRATCH before starting Trinity with

        cd $SCRATCH
        
      6. Run Trinity

        Some typical command lines are given below. We recommend that you look at the complete list of options available, given at http://trinityrnaseq.sourceforge.net.

        It is important that the output produced by Trinity be redirected into a file. By default, Trinity output is written to stdout.  This can cause trouble on blacklight, because stdout and stderr files are limited to 20 Mbytes each. If either file exceeds this limit, the job will be killed.

        The Trinity package writes a lot of information to stdout and stderr and often exceeds these limits. To prevent your job from being killed by the system, you should redirect Trinity output to a different file. To redirect your Trinity output, use the ">" operator. Here is a command line showing Trinity output redirected to a file called my-trinity-output.out.

        Trinity.pl command-line-options > my-trinity-output.out

         

        Typical Trinity command lines

        Variables in brackets should be replaced with the desired options or names of your input files. Do not include the brackets themselves in the command line.

        Strand Specific Sequencing (Preferred Library Method typical of the dUTP/UDG sequencing method)

        Please note that other methods of Strand Specific library generation may require FR orientation. See the Trinity website for a full explanation.

        Trinity.pl --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --SS_lib_type RF --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out

        Non-Strand Specific Library

        Trinity.pl --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out
      7. Copy your Trinity output file from $SCRATCH

        Although $SCRATCH files should be available for 21 days, it is good practice to copy your Trinity output file back to your home directory before the job ends. Copy it back to your home directory with

        cp my-trinity-output.out $HOME
  3. Submit your job with the qsub command

    qsub my-job-script

    For a detailed description of the qsub command and its options, see the blacklight document.

Big Assemblies

If you are performing an assembly of 200 million reads or more we recommend you use our big assemblies script, which is available for download as one of our example files. If you have 200 to 600 million reads you can probably use 64 cores. For an assembly of more than 600 million reads you should use 96 cores. In fact, this approach is probably useful in any assembly in which you use 64 or more cores. If your assembly is less than 200 million reads you will need to decrease your number of threads accordingly.

Last Updated on Wednesday, 19 March 2014 07:05  

More on Trinity

Examples

  • Example files which show how to run Trinity in one run (for very small datasets) or in four stages (for larger datasets).

Documentation

User Information

PSC Passwords

Connect to PSC systems:

PSC Policies

For technical questions:
Call the PSC hotline: 412-268-6350 / 800-221-1641 or mail to remarks@psc.edu.

Other services PSC provides:

Advanced Networking: High-speed network design, testing and tuning

3ROX: High-speed network access

Biomedical Applications: Computational biomedical research and training