This function reduces the reads in a bam file using the Genome Analysis Toolkit (GATK) 2 and outputs a dramatically more compressed bam file. The output file is callable for variants (at least using GATK) and also visualizable. Note that GATK 1 does not include this functionality.

Complete documentation:

Version 1.0
Bundle sequencing
Categories Alignment
Authors Rony Lindell (rony.lindell@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values


Name Type Mandatory Description
reference FASTA Mandatory The reference fasta file.
bam BAM Mandatory Input BAM file.


Name Type Description
alignment BAM Reduced alignment bam file.


Name Type Default Description
cleanup boolean false Removes input alignment files by replacing them with empty files. Use this to save space when the previous alignments are no longer needed.
downsample int -1 The coverage of a variable region is downsampled to size INT. Use -1 if you want to use the default downsampling value of GATK. A value of 0 turns downsampling off. If memory problems occur, reducing the value e.g. down to 40 might help.
gatk string "" Path to GATK directory containing the 'GenomeAnalysisTK.jar' file. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located.
memory string "4g" The amount of java-heap memory being allocated to the GATK thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc.

Test cases

Test case Parameters IN
case1 properties reference bam (missing)

# Default run using less memory,

