Runs STAR aligner in two pass mode for an array of samples together. The two pass mode means that the samples are aligned to the reference genome provided and STAR will create a list of identified splice junctions in each sample. In the in-build STAR two pass mode, those splice junctions are used to improve the reference genome and the samples are realigned to this enhanced genome. However, in this function we pool together the splice junctions of all the samples to make one single enhanced genome to which all the samples are aligned to. If you do not want to pool together the splice junctions of different samples you can run the two pass mode by using the STAR component or Align function and adding the two pass mode option (according to STAR manual) as a parameter. You can control which splice junctions are included by using the lowBound and highBound parameters. For example, lowBound="UniqueMapping=5" discards all splice junctons which do not have at least 5 uniquely mapped reads overlapping the junction. Columns available in the splice junctions file are "Chromosome,Start,End,Strand,IntronMotif,Annotated,UniqueMapping,MultiMapping,MaxOverhang". Only splice junctions from canonical chromosomes (1-22,X,Y) are kept. It is recommended to supply your own initial genome because if you need to rerun any parameters related to genome generation are used both for the first and second pass genome. lowBound and highBound parameters work exactly as defined in the CSVFilter component on each individual splice junctions file.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Alejandra Cervera (alejandra.cervera@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | STAR |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | BinaryFile | Mandatory | Reference genome. |
reads | Array<BinaryFile> | Mandatory | FASTA or FASTQ file containing reads for the alignment. |
mates | Array<BinaryFile> | Optional | FASTA or FASTQ file containing mates. Required for paired end data. |
genome | BinaryFolder | Optional | A STAR genome for first pass. |
annotation | BinaryFile | Optional | Genome annotation. A GTF file will work by default. For a GFF3 file add to genomeParameters: "--sjdbGTFtagExonParentTranscript Parent" |
parameterFile | TextFile | Optional | This file overrides default STAR parameters, but will itself be overridden by the command line. Use parametersDefault from STAR source as template. It is needed unless you have specified every single parameter (even default ones as a parameter string). |
custom | TextFile | Optional | if you want to add custom parameters, aka sample specific such as readgroups, provide them here as they should be added to the STAR call (only for second pass). It must have two columns: Key and Custom. The keys should match the input keys. |
Name | Type | Description |
---|---|---|
folder | Array<BinaryFolder> | All files created by STAR in the output folder. |
alignments | Array<BAM> | (Sorted) alignment. A coordinate sorted file will be indexed, i.e. there is a .bai file. |
spliceJunctions | CSV | Splice junctions. This CSV file is created by adding a header to STAR output. ("Chromosome\tStart\tEnd\tStrand\tIntronMotif\tAnnotated\tUniqueMapping\tMultiMapping\tMaxOverhang"):
|
Name | Type | Default | Description |
---|---|---|---|
alignParameters | string | "" | Parameters passed on to STAR on the second alignment step. |
execute | string | "changed" | Change it to "once" if you do not want to re-execute the first pass and making genome steps if you change any parameter (such as threads) |
genomeLoad | string | "LoadAndRemove" | LoadAndRemove works for parallel STAR instances and if everything goes fine, should free memory after the last STAR exits. LoadAndKeep, LoadAndRemove, Remove, LoadAndExit and NoSharedMemory are the options. |
genomeParameters | string | "" | Parameters passed on to STAR in any of the two possible generating genome steps. |
highBound | string | "" | Same as lowBound but for max values (instead of min). |
lowBound | string | "" | For subsetting the splice junctions files used for generating the second pass genome. Define column name and threshold, ex: UniqueMapping=5. |
mainAlignmentType | string | "" | Depending on thparameters more than one alignment may be produced (ex. sortedByCoord or toTranscriptome). The alignment not selected will still be available in the folder output. The string defined here will define which alignment will be linked to the alignment output of this component. |
memory | int | 10000 | Memory passed to STAR call. |
readFilesCommand | string | "" | Used when input reads are compressed, ex. zcat or acat |
threads | int | 1 | Number of threads passed to STAR. |
useEncodeParams | boolean | true | set the parameters used for the Encode project specified in the manual (max and min intron size and max number of multiple alignments) |
Test case | Parameters▼ | IN reference |
IN reads |
IN mates |
IN genome |
IN annotation |
IN parameterFile |
IN custom |
OUT folder |
OUT alignments |
OUT spliceJunctions |
---|---|---|---|---|---|---|---|---|---|---|---|
case1 | properties | reference | reads | mates | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
genomeLoad=NoSharedMemory, |
|||||||||||
case2 | properties | reference | reads | mates | (missing) | (missing) | (missing) | custom | (missing) | (missing) | (missing) |
genomeLoad=NoSharedMemory, |