Calls genomic sites of variation using the specified caller. Implemented callers are (see 'caller'):
The caller should be specified using the parameter, as well as the path to the software unless environment variables are used (GATK_HOME, VARSCAN_HOME pointing to installation directories). Samtools executables "samtools", "bcftools" and "vcfutils.pl" must be in PATH and made executable (vcfutils.pl).
The options parameter can be used to add additional software options in the software specific format. For samtools caller the options are added to only the mpileup command. The following options are hard implemented:
Here is a list of caller specific options that should be considered depending on the data:
The output is always an array with one or more files. Here follows a description of the files specific for the caller:
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | VariationAnalysis |
Authors | Rony Lindell (rony.lindell@helsinki.fi), Riku Louhimo (Riku.Louhimo@Helsinki.FI) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference genome together with possible auxiliary files. |
bam1 | BAM | Optional | Input bam file of normal or tumor sample. The BAM index file (.bai) should be located in the same directory when using GATK. |
bam2 | BAM | Optional | Tumor bam file to call from in VarScan comparison calling. The variants from 'bam1' and 'bam2' will be compared in order to separate germline and somatic variants. [varscan only] |
bams | BAMList | Optional | File containing newline separated paths to bam files. When using this input GATK usage will be forced and multiple sample variant calling using the UnifiedGenotyper will be executed. One multi-sample VCF file will be produced. [gatk only]
The extension of the file must be .list . |
intervals | BED | Optional | File with genomic intervals to operate on. Only calls hitting these areas will be output.
Note: This can be used to mask out introns in exome sequencing by providing an BED file containing all exonic regions. |
dbsnp | VCF | Optional | File with known SNPs, usually from latest dbSNP distribution. The file is used in GATK to improve calling and for annotation. [gatk only]
The file must be pointed to by the key 'vcf'. |
Name | Type | Description |
---|---|---|
snp | VCF | Output snp (and indel) calls. When indels are written into the same file (GATK, Samtools), this file will contain both snps and indels. |
indel | VCF | Output indel calls. VarScan writes snps and indels into separate files. |
metrics | TextFile | Calling metrics. GATK will produce a file with some information about the calling results. |
Name | Type | Default | Description |
---|---|---|---|
callIndels | boolean | true | If false, indel calling will be skipped and only snps are called. |
caller | string | "gatk" | Software/algorithm to use to call variants. Possible values: {samtools, gatk, varscan}. |
exclude | string | "none" | Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'. [gatk only] |
gatk | string | "" | Path to GATK jar file, e.g. "/opt/gatk/GenomeAnalysisTK.jar". If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to GATK, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. For optimal performance this should be a multiple of the threads used (see threads ). [gatk only] |
options | string | "" | This string will be added to the command and can include any number of options in the software specific format, e.g. "-q 1" will skip zero-quality alignments or "-stand_emit_conf 10.0" to emit calls with quality higher than 10. See software specific documentation for more information. |
region | string | "all" | Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input. [gatk only] |
samOptions4VS | string | "" | This string will be added to the samtools command when VarScan is used and can include any number of options in the software specific format. If string is empty, "-q 1" will be applied to skip zero-quality alignments. See software specific documentation for more information. |
threads | int | 1 | Number of threads allocated. Preferably allocate k*INT amount of memory to accompany the threads, e.g. 1*4=4 gb of memory for 4 threads, or 2*8=16 gb for 8 threads. [gatk only] |
variantsOnly | boolean | true | If true, only variant sites are called. When false, all confident sites are called, even those which are equal to the reference allele. Assigning this to false might be indicated for normal samples in normal-tumor comparison calculations. |
varscan | string | "" | Path to the VarScan jar file (typically VarScan.v2.X.Y.jar varying with the version). If empty string is given (default), VARSCAN_HOME environment variable is assumed to point to the VarScan directory, where VarScan.jar is the program file or a link pointing to it. |
Test case | Parameters▼ | IN reference |
IN bam1 |
IN bam2 |
IN bams |
IN intervals |
IN dbsnp |
OUT snp |
OUT indel |
OUT metrics |
---|---|---|---|---|---|---|---|---|---|---|
case1_samtools | properties | reference | bam1 | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
# Simple testcase for the samtools caller, |
||||||||||
case2_gatk | properties | reference | bam1 | (missing) | (missing) | (missing) | dbsnp | (missing) | (missing) | (missing) |
# Simple testcase for the gatk caller, |
||||||||||
case3_varscan | properties | reference | bam1 | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
# Simple testcase for the varscan germline caller, |
||||||||||
case4_varscan_paired | properties | reference | bam1 | bam2 | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
# Simple testcase for the varscan somatic caller, |
||||||||||
case5_varscan_paired | properties | reference | bam1 | bam2 | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
# Simple testcase for the varscan somatic caller, |