This component is not being actively maintained currently, since Annovar is updated to frequently. A simpler, more maintainable interface is available at in the component VCF2AnnotatedCSV. Another alternative is the AnnTools component, which is even easier to maintain, but so far largely unused.
This variant annotation function uses ANNOVAR which isn't allowed to be redistributed but must be downloaded directly from the author for each user ( http://www.openbioinformatics.org/annovar/ ). Installation however is as easy as uncompressing. You do need to have the appropriate database files in the reference directory, otherwise nothing will work.
The quick start guide of ANNOVAR may prove helpful.
ANNOVAR uses three main methods:
This component offers four different ways to call ANNOVAR, the first of which just calls it directly. The others use a configurable, automated sequence of filtering steps. You can select the method by specifying the annotator parameter. The scripts used by each of the extra methods are explained in ANNOVAR Accessory Programs
ANNOVAR only outputs a CSV file when using the summarize or table method. For other methods it produces non-standard VCF files that have extra columns before the standard VCF columns. It doesn't output a header. This component adds a CSV header with anonymous field names.
The first fields are called "Annotation1", "Annotation2" etc. and the rest are taken from the VCF specification, i.e. they are called CHROM, POS, ID, REF, ALT, QUAL, FILTER and INFO.
You should check ANNOVAR website to find out what the column headers mean in each case. The meaning of the extra fields might also depend on the database that is used so result interpretation must be done carefully.
Documentation for ANNOVAR itself can be mainly found on its website. Each ANNOVAR command's help file itself can be viewed with "perldoc annotate_variation.pl" (substitute the script name), for those who don't want to read the perl files directly.
ANNOVAR provides a script called "variant_reduction.pl" which can be perform multiple variant reduction steps (as does the table_annovar.pl script). It needs a two special options: the protocols to apply, and how they are applied. The exact format of these is documented in the script's own help or explained more verbosely in Variant reduction
Explanations from the ANNOVAR documentation for some gene-based annotations:
exonic | variant overlaps a coding exon |
splicing | variant is within 2-bp of a splicing junction |
ncRNA | variant overlaps a transcript without coding annotation in the gene definition |
UTR5 | variant overlaps a 5' untranslated region |
UTR3 | variant overlaps a 3' untranslated region |
intronic | variant overlaps an intron |
upstream | variant overlaps 1-kb region upstream of transcription start site |
downstream | variant overlaps 1-kb region downtream of transcription end site |
intergenic | variant is in intergenic region |
Version | 0.4 |
---|---|
Bundle | sequencing |
Categories | VariationAnalysis |
Specialties | generic |
Authors | Lauri Lyly (lauri.lyly@helsinki.fi), Miko Valori (miko.valori@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml main.sh |
Usage | Example with default values |
Deprecated |
Use the VCF2AnnotatedCSV (for ANNOVAR) or AnnTools component instead. |
Name | Type | Mandatory | Description |
---|---|---|---|
variantQuery | VCF | Mandatory | Variants to annotate which are first converted to ANNOVAR's own format for convenience. All standards fields are retained even in the output. The additional FORMAT fields are discarded by ANNOVAR. |
reference | BinaryFolder | Optional | Path to directory containing reference variant and other databases. The location should be kept up-to-date by running various download commands as documented on the ANNOVAR website. For SBL, the default should be /mnt/csc-gc/resources/annovar/hg19_db. May optionally be specified as a string parameter. |
Name | Type | Description |
---|---|---|
calls | Array<CSV> | Annotation method (parameter "caller") specific files that are contained in an Anduril array directory. |
log | Array<TextFile> | Log files produced by the run i.e. those ending in .log. |
raw | Array<TextFile> | This is only used for the summarize annotator. Contains the raw outputs that are used to form the summary. Specify the --remove option to avoid producing them. |
Name | Type | Default | Description |
---|---|---|---|
annotator | string | "annovar" | Software/algorithm to use to call variants. Values: {annovar, variant_reductor, summarize, table}. Notice that for the variant_reductor and table methods you practically always want to pass an option string containing the protocol and operation, as documented in ANNOVAR itself. See details above. |
annovar | string | "" | Path to the ANNOVAR installation directory. If empty string is given (default), ANNOVAR_HOME environment variable is assumed to point to the ANNOVAR directory, where all the associated scripts are assumed to reside. |
convertFromType | string | "vcf4old" | The first step is conversion from various formats to ANNOVAR's internal format. Specify "none" if you want to skip the conversion. Otherwise consult the convert2annovar.pl script or ANNOVAR documentation for the available formats. vcf4 and vcf4old are the most obvious ones. |
countAlts | boolean | true | (FIXME: Not implemented yet!) Enable to calculate four additional columns: alt_samples, ref_samples, alt_alleles and called_samples. These indicate the number of samples presenting a non-reference allele, number of samples homozygous for the reference allele, total number of alternative alleles and number of samples for which a call was present for this variant, respectively. |
index | string | "hg19" | Basename of the genome build, e.g. "hg19" for ucsc.hg19 . The relevant files are assumed to reside in the directory pointed to by the "reference" input. Note that support for different builds may vary. |
options | string | "defaults" | This string will be added to the command and can include any number of options in the software specific format. The default value of "defaults" chooses method specific defaults, which are not necessarily same as method's own defaults in ANNOVAR.
For the summarize annotator the default is: "--verdbsnp 135" This is because otherwise ANNOVAR will use a non-existent database. For the "annovar" annotator, the options string is empty. Notice that for the variant_reductor method you practically always want to pass an option string containing the protocol and operation, as documented in ANNOVAR itself. See details above. |
referenceDir | string | "" | Alternative to specifying the same as an input. |
Test case | Parameters▼ | IN variantQuery |
IN reference |
OUT calls |
OUT log |
OUT raw |
---|---|---|---|---|---|---|
case1 | properties | variantQuery | (missing) | calls | (missing) | (missing) |
annotator=annovar, |
||||||
case2_reduce | properties | variantQuery | (missing) | calls | (missing) | (missing) |
annotator=variant_reduction, |
||||||
case3_summarize | properties | variantQuery | (missing) | (missing) | (missing) | (missing) |
annotator=summarize, |
||||||
case4_table | properties | variantQuery | (missing) | (missing) | (missing) | (missing) |
annotator=table, |