Fetches DNA sequences from the Ensembl database. The sequences to fetch are determined by giving their genomic locations.
Ensembl Perl API needs to be installed and the enviroment variable PERL5LIB
set to include the installation
directories for the modules ensembl, ensembl-variation, ensembl-compara, ensembl-functgenomics and BioPerl.
See installation instructions from the Ensembl homepage.
Version | 1.0 |
---|---|
Bundle | microarray |
Categories | Annotation |
Authors | Ping Chen (ping.chen@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI), Erkka Valo (erkka.valo@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | Ensembl Perl API |
Source files | component.xml EnsemblAPI.pm EnsemblDNA.pl |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
regions | DNARegion | Mandatory | Genomic locations of the target sequences. The input file should contain ID, strand, chromosome, start and end information for each sequence. The chromosomal start location should be lower than the end location. Sequences can be fetched either from the 1 or -1 strand. If the target sequence is located in the -1 strand, the start site of the target sequence is taken to be the given end site and the end site of the target sequence is taken to be the given start site. |
connection | Properties | Mandatory | Connection parameters of Ensembl database, including host, database, port, user and driver information. |
Name | Type | Description |
---|---|---|
sequences | FASTA | Target sequences. Sequences are returned in 5' to 3' direction in the strand of the target sequence. |
Name | Type | Default | Description |
---|---|---|---|
csvOutput | boolean | false | Output as CSV not FASTA formatted. |
length | int | 0 | If not 0, fetch a sequence of length length downstream from the start site in the strand of the target sequence. Note that the start site is first moved according to the value off the offset parameter. |
mask | boolean | false | A flag that can be used to activate repeat masking. |
offset | int | 0 | Number of base pairs to offset the start site. Negative for upstream, positive for downstream. The offset is done in the strand of the the target sequence. |
Test case | Parameters▼ | IN regions |
IN connection |
OUT sequences |
||
---|---|---|---|---|---|---|
case1 | (missing) | regions | connection | sequences | ||
case2 | properties | regions | connection | sequences | ||
offset=-5, |
||||||
case3 | properties | regions | connection | sequences | ||
csvOutput = true |