Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing, communications and informatics.

Bowtie

 

 

Bowtie is an ultrafast, memory-efficient short read aligner.

Installed on blacklight, biou

Other resources that may be helpful include:

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Genome Biology 10:R25.

Website: http://bowtie-bio.sourceforge.net/index.shtml

Running Bowtie

1) Make Bowtie availiable for use
a) blacklight:
The Bowtie program will be made availiable for use through the module command. To load the Bowtie module enter:

module load bowtie

b) biou:
The bowtie programs are availiable through the Galaxy instance on biou.

To make the bowtie programs availiable through the command line, csh users should enter the following command:

% source /packages/bin/SETUP_BIO_SOFTWARE

To make the bowtie programs availiable through the command line, bash users should enter the following command:

% source /packages/bin/SETUP_BIO_SOFTWARE

2) General Usage:

Running bowtie is generally a two-step process. First build the bowtie index with the bowtie-build command. Then, map the reads to the bowtie index.

bowtie-build [options]* <reference_in> <ebwt_outfile_base>

reference_in comma-separated list of files with ref sequences
ebwt_outfile_base write Ebwt data to files with this dir/basename

Options

-f reference files are Fasta (default)
-c reference sequences given on cmd line (as <seq_in>)
-C/--color build a colorspace index
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p/--packed use packed strings internally; slower, uses less mem
--bmax <int> max bucket sz for blockwise suffix-array builder
--bmaxdivn <int> max bucket sz as divisor of ref len (default: 4)
--dcv <int> diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4.ebwt (packed reference) portion
-3/--justref just build .3/.4.ebwt (packed reference) portion
-o/--offrate <int> SA is sampled every 2^offRate BWT chars (default: 5)
-t/--ftabchars <int> # of chars consumed in initial lookup (default: 10)
--ntoa convert Ns in reference to As
--seed <int> seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit

bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

<m1> Comma-separated list of files containing upstream mates (or the sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be a mixture of paired and unpaired. Specify "-" for stdin.
<s> Comma-separated list of files containing unpaired reads, or the sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)

Input:

-q query input files are FASTQ .fq/.fastq (default)
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c query sequences given on cmd line (as <mates>, <singles>)
-C reads and index are in colorspace
-Q/--quals <file> QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1/--Q2 <file> same as -Q, but for mate files 1 and 2 respectively
-s/--skip <int> skip the first <int> reads/pairs in the input
-u/--qupto <int> stop after first <int> reads/pairs (excl. skipped reads)
-5/--trim5 <int> trim <int> bases from 5' (left) end of reads
-3/--trim3 <int> trim <int> bases from 3' (right) end of reads
--phred33-quals input quals are Phred+33 (default)
--phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals input quals are from GA pipleline ver. >= 1.3
--integer-quals qualities are given as space-separated integers (not ASCII)

Alignment:

-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int> seed length for -n (default: 28)
--nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I/--minins <int> minimum insert size for paired-end alignment (default: 0)
-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc do not align to forward/reverse-complement reference strand
--maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int> max # attempts to find mate for anchor hit (default: 100)
-y/--tryhard try hard to find valid alignments, at the expense of speed
--chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)

Reporting:

-k <int> report up to &lt;int&gt; good alignments per read (default: 1)
-a/--all report all alignments per read (much slower than low -k)
-m <int> suppress all alignments if > &lt;int&gt; exist (def: no limit)
-M <int> like -m, but reports 1 random hit (MAPQ=0); requires
--best --best hits guaranteed best stratum; ties broken by quality
--strata hits in sub-optimal strata aren't reported (requires --best)

Output:

-t/--time print wall-clock time taken by search phases
-B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
--quiet print nothing but the alignments
--refout write alignments to files refXXXXX.map, 1 map per reference
--refidx refer to ref. seqs by 0-based index rather than name
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--max <fname> write reads/pairs over -m limit to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
--fullref write entire ref name (default: only up to 1st space)

Colorspace:

--snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30) or
--snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq print aligned colorspace seqs as colors, not decoded bases
--col-cqual print original colorspace quals, not decoded quals
--col-keepends keep nucleotides at extreme ends of decoded alignment

SAM:

-S/--sam write hits in SAM format
--mapq <int> default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead supppress header lines (starting with @) for SAM output
--sam-nosq supppress @SQ header lines for SAM output
--sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header

Performance:

-o/--offrate <int> override offrate of index; must be >= index's offrate
-p/--threads <int> number of alignment threads to launch (default: 1)
--mm use memory-mapped I/O for index; many 'bowtie's can share
--shmem use shared mem for index; many 'bowtie's can share

Other:

--seed <int> seed for random number generator
--verbose verbose output (for debugging)
--version print version information and quit
-h/--help print this usage message

Example PBS script (blacklight):

#!/bin/bash
#PBS -q batch
#PBS -j oe
#PBS -l ncpus=16
#PBS -l walltime=24:00:00
#PBS -N Bowtie
#
# ---------------
# Bowtie Setup
# ---------------
source /usr/share/modules/init/bash
module load bowtie/1.0.0
module load samtools/0.1.18
THREADS=16
#
set -x cd $SCRATCH
#---------------------------------------------------------
# WFILE1 and WFILE2 should point to your fastq read files
# REFFILE should point to the reference file to be indexed
#---------------------------------------------------------
WFILE1=SRR189815_1.fastq
WFILE2=SRR189815_2.fastq
REFFILE=human_g1k_v37.fasta
# ---------------
# Build Bowtie Index
# ---------------
bowtie-build -f $IDIR/$REFFILE Bowtieidx
#
# ---------------
# RUN Bowtie
# ---------------
bowtie -p $THREADS -X 1000 -q --phred33-quals --fr --chunkmbs 1024 \
    --best -S -t Bowtieidx -1 $WFILE1 -2 $WFILE2 Bowtie.sam

Stay Connected

Stay Connected with PSC!

facebook 32 twitter 32 google-Plus-icon