Select variants from a VCF file using specific criteria, e.g. type, annotations or genomic intervals.
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | VariationAnalysis |
Authors | Rony Lindell (rony.lindell@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
variants | VCF | Mandatory | Input variants. |
intervals | BED | Optional | Select only variants in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki. |
conc | VCF | Optional | Concordance variants. Output only the variants that were also called in this file. |
disc | VCF | Optional | Discordance variants. Output only the variants that were not called in this file. |
Name | Type | Description |
---|---|---|
calls | VCF | The selected variants. |
Name | Type | Default | Description |
---|---|---|---|
gatk | string | "" | Path to GATK jar file, e.g. "/opt/gatk/GenomeAnalysisTK.jar". If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to GATK, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. For huge data sets when using a memory-draining selection this may require an increase. |
options | string | "" | This string will be added to the command and can include any number of options in the software specific format.
Example: options="-sn SAMPLE1 -sn SAMPLE2 -fraction 0.05" |
region | string | "" | Genomic region from which to select, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input. |
select | string | "" | Selection criteria as a JEXL expression. See the
GATK wiki more information about how to construct the expression. Note the case-sensitivity of the annotations.
Example: select="QUAL > 30.0 && DP == 10" |
variantType | string | "" | Type of variants to select. A comma separated list will choose multiple types. Values: {INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION}
Example: type="INDEL,MNP" selects all indel and phased variants. |
Test case | Parameters▼ | IN reference |
IN variants |
IN intervals |
IN conc |
IN disc |
OUT calls |
---|---|---|---|---|---|---|---|
case1 | properties | reference | variants | intervals | (missing) | (missing) | (missing) |
# Selection test using criteria and regions, |