Fetches attributes with given filters using BioMart. This component uses
the R-function getBM
to fetch given attributes for a list of
filter values. See documentation of the R-package biomaRt
.
There are two modes of operation when input filter
is
given: one filter row at a time (batchSize
=1) and all
filter values in blocks (batchSize
>1). In
batchSize
=1 mode, there is always a unique row for each
filter row in the result file. In batchSize
>1 mode, there
may be several result rows for one filter row, rendering it difficult
to interpret results. This happens when querying genes associated
to Gene Ontology terms, for instance. However, when annotating, e.g.,
a gene or a microarray probe, there typically is exactly one row for
a filter value. Batch mode is significantly faster than non-batch mode.
In batch mode, the filter must also be included as an attribute so
that results can be mapped to input query values. This is set with
idAttribute
, which defaults to the filter type of the
filter.
The query to BioMart database can also be made defining the filters
using the constantFilters
parameter. This must be done
using the non-batch mode (batchSize
=1)
There can be more than one filter, except when
batchSize
>1, which is currently limited to one filter.
If one filter value produces multiple attribute values for one attribute those values are collapsed into a comma separated list. NA and duplicate values are removed from the filter value list before querying the BioMart database.
Available databases, filters and attributes can be browsed through BioMart web site. You may select the mart database and use the query tool to select the settings of interest. The actual keywords can be seen in the XML output that can be generated based on the selections.
For convenience, here are some Mart lists current as of 2013-10: all marts, datasets in ensembl, attributes in hsapiens_gene_ensembl, filters in hsapiens_gene_ensembl.
Version | 1.3 |
---|---|
Bundle | microarray |
Categories | Annotation |
Authors | Erkka Valo (erkka.valo@helsinki.fi), Viljami Aittomaki (viljami.aittomaki@helsinki.fi), Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI) |
Issue tracker | View/Report issues |
Requires | libssl-dev (DEB) ; biomaRt (R-bioconductor) ; RCurl (R-package) |
Source files | component.xml BiomartAnnotator.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
filter | CSV | Optional | A list of filter values |
Name | Type | Description |
---|---|---|
annotations | AnnotationTable | Attributes returned from the database with given filters |
databases | Properties | A properties file which lists the database version and the dataset used for fetching annotations. |
Name | Type | Default | Description |
---|---|---|---|
attributes | string | (no default) | A comma separated list of attributes to fetch. See biomaRt documentation on how to list available attributes for a given mart and a dataset in R. |
batchSize | int | 1 | If greater than one, enable batch mode where all filter values are fetched with one query. This is significantly faster than non-batch mode (=1), but in some instances there may be several result rows for one filter value. If 1, filter values are fetched individually. |
constantFilters | string | "" | A comma separated list of filterType=filterValue pairs that
are common for all input rows when input filter
is given. These can be used also as the only filters without
the input filter . |
dataset | string | "hsapiens_gene_ensembl" | Dataset to get annotations from. Different BioMart databases (marts) have their own datasets. See biomaRt documentation on how to list available datasets for a mart in R. |
filterColumns | string | "" | Names of the filter column within filter file or an empty string for the first column(s). |
filterTypes | string | "" | Types of the filter values in the filter input,
as a comma-separated list. See biomaRt documentation on how to
list available filters for a given mart and a dataset in R. |
idAttribute | string | "" | For batchSize>1 mode, this is the name of the ID attribute that produces values that correspond to filter IDs. If empty, the value of filterType is used. Often, the name of the filter and the corresponding attribute are identical, in which case the default (empty) value can be used. This parameter is not used for batchSize=1 mode. |
listLayout | boolean | true | Result format is either lists (true), in which multiple hits are collapsed for comma-separated list for each column, or standard CSV-file having no collapsed columns. |
mart | string | "ensembl" | BioMart database to use. See biomaRt documentation on how to list available BioMart databases (marts). |
martHost | string | "www.ensembl.org" | Mart hosting server |
martPath | string | "/biomart/martview" | Mart web service URL within the server |
uniq | boolean | true | Removes duplicates from the values of individual result cells. Different filter entities may still produce references to the same attribute values. |
Test case | Parameters▼ | IN filter |
OUT annotations |
OUT databases |
||
---|---|---|---|---|---|---|
case01 | properties | filter | annotations | databases | ||
attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated, |
||||||
case02_filtercolumn | properties | filter | annotations | databases | ||
attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated, |
||||||
case03_batch | properties | filter | annotations | databases | ||
attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated, |
||||||
case04_multifilter | properties | filter | annotations | databases | ||
attributes=ensembl_gene_id,external_gene_id, |
||||||
case05_noresults | properties | filter | annotations | databases | ||
attributes = seq_region_start_1057,feature_type_name_1057, |
||||||
case06_COSMIC | properties | filter | annotations | databases | ||
attributes =accession_number, |
||||||
case07_constants | properties | filter | annotations | databases | ||
attributes = ensembl_gene_id,external_gene_id, |
||||||
case08_emptyinput | properties | filter | annotations | databases | ||
attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated, |
||||||
case09_layout | properties | filter | annotations | databases | ||
attributes=allele,chr_name,chrom_start,chrom_strand,ensembl_gene_stable_id,validated, |
||||||
case10_batch_collapse | properties | filter | annotations | databases | ||
attributes = id_mutation, |
||||||
case11_constants_only | properties | (missing) | annotations | databases | ||
attributes = ensembl_gene_id,ensembl_transcript_id,chromosome_name,gene_biotype, |
||||||
case12_list_layout_multifilter | properties | filter | annotations | databases | ||
attributes=refsnp_id,allele,chr_name,chrom_start,validated, |