Up: Component summary
Component
VCF2AnnotatedCSV
A quick and dirty wrapper for ANNOVAR. Annotates variants from a VCF file using the chosen databases and outputs CSV.
Explanations from the ANNOVAR documentation for some gene-based annotations:
exonic | variant overlaps a coding exon |
splicing | variant is within 2-bp of a splicing junction |
ncRNA | variant overlaps a transcript without coding annotation in the gene definition |
UTR5 | variant overlaps a 5' untranslated region |
UTR3 | variant overlaps a 3' untranslated region |
intronic | variant overlaps an intron |
upstream | variant overlaps 1-kb region upstream of transcription start site |
downstream | variant overlaps 1-kb region downtream of transcription end site |
intergenic | variant is in intergenic region |
This component will add four additional columns: alt_samples, ref_samples, alt_alleles and called_samples.
These indicate the number of samples presenting a non-reference allele, number of samples homozygous for the reference allele, total number of alternative alleles and number of samples for which a call was present for this variant, respectively.
Inputs
Name |
Type |
Mandatory |
Description |
vcf |
VCF |
Mandatory |
VCF file to be annotated. |
Outputs
Name |
Type |
Description |
annotated |
CSV |
CSV file containing the annotated variants. |
Parameters
Name |
Type |
Default |
Description |
annovar_bin |
string |
"/opt/annovar"
|
Path to the ANNOVAR home directory. |
annovar_db |
string |
"/opt/annovar/humandb"
|
Path to the ANNOVAR database directory. |
buildver |
string |
"hg18"
|
Either hg19 or hg18. |
operation |
string |
"g"
|
Comma separated list of annotation operations corresponding to the databases listed in the protocol parameter.
For the protocol example we would use "g,f,f,r".
Use "g" for gene-based annotations, "f" for filter-based and "r" for region-based.
Gene-based annotations are for gene definition files, filter-based for specific variant information containing files and region-based for files than contain genomic regions.
ANNOVAR doesn't automatically know what to do with the databases defined in the protocol parameter so you need to use this parameter to guide it. |
protocol |
string |
"ensGene"
|
Comma separated list of databases to use, e.g. "ensGene,1000g2012apr_all,snp137,cytoBand" for annotating the variants with Ensembl gene definitions, 1000 Genomes allele frequencies, dbSNP identifiers and cytogenetic band locations. |
Generated 2019-02-07 07:42:22 by Anduril 2.0.0