Up: Component summary Component

VariantAnnotator

This component is not being actively maintained currently, since Annovar is updated to frequently. A simpler, more maintainable interface is available at in the component VCF2AnnotatedCSV. Another alternative is the AnnTools component, which is even easier to maintain, but so far largely unused.

This variant annotation function uses ANNOVAR which isn't allowed to be redistributed but must be downloaded directly from the author for each user ( http://www.openbioinformatics.org/annovar/ ). Installation however is as easy as uncompressing. You do need to have the appropriate database files in the reference directory, otherwise nothing will work.

The quick start guide of ANNOVAR may prove helpful.

ANNOVAR uses three main methods:

This component offers four different ways to call ANNOVAR, the first of which just calls it directly. The others use a configurable, automated sequence of filtering steps. You can select the method by specifying the annotator parameter. The scripts used by each of the extra methods are explained in ANNOVAR Accessory Programs

ANNOVAR only outputs a CSV file when using the summarize or table method. For other methods it produces non-standard VCF files that have extra columns before the standard VCF columns. It doesn't output a header. This component adds a CSV header with anonymous field names.

The first fields are called "Annotation1", "Annotation2" etc. and the rest are taken from the VCF specification, i.e. they are called CHROM, POS, ID, REF, ALT, QUAL, FILTER and INFO.

You should check ANNOVAR website to find out what the column headers mean in each case. The meaning of the extra fields might also depend on the database that is used so result interpretation must be done carefully.

Documentation for ANNOVAR itself can be mainly found on its website. Each ANNOVAR command's help file itself can be viewed with "perldoc annotate_variation.pl" (substitute the script name), for those who don't want to read the perl files directly.

ANNOVAR provides a script called "variant_reduction.pl" which can be perform multiple variant reduction steps (as does the table_annovar.pl script). It needs a two special options: the protocols to apply, and how they are applied. The exact format of these is documented in the script's own help or explained more verbosely in Variant reduction

Explanations from the ANNOVAR documentation for some gene-based annotations:

exonicvariant overlaps a coding exon
splicingvariant is within 2-bp of a splicing junction
ncRNAvariant overlaps a transcript without coding annotation in the gene definition
UTR5variant overlaps a 5' untranslated region
UTR3variant overlaps a 3' untranslated region
intronicvariant overlaps an intron
upstreamvariant overlaps 1-kb region upstream of transcription start site
downstreamvariant overlaps 1-kb region downtream of transcription end site
intergenicvariant is in intergenic region

Version 0.4
Bundle sequencing
Categories VariationAnalysis
Specialties generic
Authors Lauri Lyly (lauri.lyly@helsinki.fi), Miko Valori (miko.valori@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml main.sh
Usage Example with default values
Deprecated

Use the VCF2AnnotatedCSV (for ANNOVAR) or AnnTools component instead.

Type parameters (generics)

Inputs

Name Type Mandatory Description
variantQuery VCF Mandatory Variants to annotate which are first converted to ANNOVAR's own format for convenience. All standards fields are retained even in the output. The additional FORMAT fields are discarded by ANNOVAR.
reference BinaryFolder Optional Path to directory containing reference variant and other databases. The location should be kept up-to-date by running various download commands as documented on the ANNOVAR website. For SBL, the default should be /mnt/csc-gc/resources/annovar/hg19_db. May optionally be specified as a string parameter.

Outputs

Name Type Description
calls Array<CSV> Annotation method (parameter "caller") specific files that are contained in an Anduril array directory.
log Array<TextFile> Log files produced by the run i.e. those ending in .log.
raw Array<TextFile> This is only used for the summarize annotator. Contains the raw outputs that are used to form the summary. Specify the --remove option to avoid producing them.

Parameters

Name Type Default Description
annotator string "annovar" Software/algorithm to use to call variants. Values: {annovar, variant_reductor, summarize, table}. Notice that for the variant_reductor and table methods you practically always want to pass an option string containing the protocol and operation, as documented in ANNOVAR itself. See details above.
annovar string "" Path to the ANNOVAR installation directory. If empty string is given (default), ANNOVAR_HOME environment variable is assumed to point to the ANNOVAR directory, where all the associated scripts are assumed to reside.
convertFromType string "vcf4old" The first step is conversion from various formats to ANNOVAR's internal format. Specify "none" if you want to skip the conversion. Otherwise consult the convert2annovar.pl script or ANNOVAR documentation for the available formats. vcf4 and vcf4old are the most obvious ones.
countAlts boolean true

(FIXME: Not implemented yet!) Enable to calculate four additional columns: alt_samples, ref_samples, alt_alleles and called_samples.

These indicate the number of samples presenting a non-reference allele, number of samples homozygous for the reference allele, total number of alternative alleles and number of samples for which a call was present for this variant, respectively.

index string "hg19" Basename of the genome build, e.g. "hg19" for ucsc.hg19. The relevant files are assumed to reside in the directory pointed to by the "reference" input. Note that support for different builds may vary.
options string "defaults" This string will be added to the command and can include any number of options in the software specific format. The default value of "defaults" chooses method specific defaults, which are not necessarily same as method's own defaults in ANNOVAR.

For the summarize annotator the default is: "--verdbsnp 135"
This is because otherwise ANNOVAR will use a non-existent database. For the "annovar" annotator, the options string is empty.
Notice that for the variant_reductor method you practically always want to pass an option string containing the protocol and operation, as documented in ANNOVAR itself. See details above.
referenceDir string "" Alternative to specifying the same as an input.

Test cases

Test case Parameters IN
variantQuery
IN
reference
OUT
calls
OUT
log
OUT
raw
case1 properties variantQuery (missing) calls (missing) (missing)

annotator=annovar,
annovar=/opt/annovar,
referenceDir=/mnt/csc-gc/resources/annovar/hg19_db,
metadata.timeout=0

case2_reduce properties variantQuery (missing) calls (missing) (missing)

annotator=variant_reduction,
annovar=/opt/annovar,
referenceDir=/mnt/csc-gc/resources/annovar/hg19_db,
metadata.timeout=0

case3_summarize properties variantQuery (missing) (missing) (missing) (missing)

annotator=summarize,
annovar=/opt/annovar,
referenceDir=/mnt/csc-gc/resources/annovar/hg19_db,
metadata.timeout=0

case4_table properties variantQuery (missing) (missing) (missing) (missing)

annotator=table,
annovar=/opt/annovar,
referenceDir=/mnt/csc-gc/resources/annovar/hg19_db,
convertFromType=none,
metadata.timeout=0


Generated 2019-02-07 07:42:22 by Anduril 2.0.0