This function reduces the reads in a bam file using the Genome Analysis Toolkit (GATK) 2 and outputs a dramatically more compressed bam file. The output file is callable for variants (at least using GATK) and also visualizable. Note that GATK 1 does not include this functionality.
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Rony Lindell (rony.lindell@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
bam | BAM | Mandatory | Input BAM file. |
Name | Type | Description |
---|---|---|
alignment | BAM | Reduced alignment bam file. |
Name | Type | Default | Description |
---|---|---|---|
cleanup | boolean | false | Removes input alignment files by replacing them with empty files. Use this to save space when the previous alignments are no longer needed. |
downsample | int | -1 | The coverage of a variable region is downsampled to size INT. Use -1 if you want to use the default downsampling value of GATK. A value of 0 turns downsampling off. If memory problems occur, reducing the value e.g. down to 40 might help. |
gatk | string | "" | Path to GATK directory containing the 'GenomeAnalysisTK.jar' file. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to the GATK thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. |
Test case | Parameters▼ | IN reference |
IN bam |
OUT alignment |
||
---|---|---|---|---|---|---|
case1 | properties | reference | bam | (missing) | ||
# Default run using less memory, |