Up: Component summary Function

VariantSelector

Select variants from a VCF file using specific criteria, e.g. type, annotations or genomic intervals.

Complete documentation:

Version 1.0
Bundle sequencing
Categories VariationAnalysis
Authors Rony Lindell (rony.lindell@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
reference FASTA Mandatory The reference fasta file.
variants VCF Mandatory Input variants.
intervals BED Optional Select only variants in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki.
conc VCF Optional Concordance variants. Output only the variants that were also called in this file.
disc VCF Optional Discordance variants. Output only the variants that were not called in this file.

Outputs

Name Type Description
calls VCF The selected variants.

Parameters

Name Type Default Description
gatk string "" Path to GATK jar file, e.g. "/opt/gatk/GenomeAnalysisTK.jar". If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located.
memory string "4g" The amount of java-heap memory being allocated to GATK, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. For huge data sets when using a memory-draining selection this may require an increase.
options string "" This string will be added to the command and can include any number of options in the software specific format.

Example: options="-sn SAMPLE1 -sn SAMPLE2 -fraction 0.05"
region string "" Genomic region from which to select, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input.
select string "" Selection criteria as a JEXL expression. See the GATK wiki more information about how to construct the expression. Note the case-sensitivity of the annotations.

Example: select="QUAL > 30.0 && DP == 10"
variantType string "" Type of variants to select. A comma separated list will choose multiple types. Values: {INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION}

Example: type="INDEL,MNP" selects all indel and phased variants.

Test cases

Test case Parameters IN
reference
IN
variants
IN
intervals
IN
conc
IN
disc
OUT
calls
case1 properties reference variants intervals (missing) (missing) (missing)

# Selection test using criteria and regions,
memory=1g,
type=SNP,
select=DP
== 2 && (QUAL == 32.99 || QUAL == 32.97),
region=1:46615-46620,1:46603,1:46667,1:94780-94790


Generated 2019-02-07 07:42:35 by Anduril 2.0.0