This function validates the input variant file using the Genome Analysis Toolkit (GATK). Malformated files or files with misinformation will give errors. Applying a dbSNP file will also check concordance of dbSNP ID annotations.
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | VariationAnalysis |
Authors | Rony Lindell (rony.lindell@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
variants | VCF | Mandatory | Input variants. |
dbsnp | VCF | Optional | File with known SNPs, usually from latest dbSNP distribution. |
Name | Type | Description |
---|---|---|
errors | TextFile | File containing founds errors. |
Name | Type | Default | Description |
---|---|---|---|
gatk | string | "" | Path to GATK directory containing the 'GenomeAnalysisTK.jar' file. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "2g" | The amount of java-heap memory being allocated to the GATK thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. |
Test case | Parameters▼ | IN reference |
IN variants |
IN dbsnp |
OUT errors |
|
---|---|---|---|---|---|---|
case1 | properties | reference | variants | dbsnp | (missing) | |
memory=1g |