Creates reference files for gene features and known ncRNAs and creates additional columns for a putatively novel miRNA expression matrix with information on relative genomic location (e.g. intragenic/intergenic, host gene and transcript) and neighbouring or overlapping ncRNAs.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Novel smallRNA |
Authors | Katherine Icay (katherine.icay@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | biomaRt (R-package) |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
expression | CSV | Mandatory | Expression matrix of putative novel miRNAs with the first four columns containing the following information: chromosome, start position, end position, and strand. |
gtf | BinaryFile | Mandatory | Ensembl genes GTF file, unformatted. Contains transcript and exon locations. |
Name | Type | Description |
---|---|---|
annotated | CSV | Expression matrix of putative novel miRNAs with additional information on relative genomic location, host gene, and neighbouring (or overlapping) known ncRNAs. |
Name | Type | Default | Description |
---|---|---|---|
ensembl_dataset | string | "hsapiens_gene_ensembl" | biomaRt dataset parameter (i.e. species) to use. |
ensembl_host | string | "feb2014.archive.ensembl.org" | URL of Ensembl version to use (see Ensembl Archives). To guarantee optimal identification of transcripts, be sure to use the same genome build AND version of the genome as reference_hairpin . |
knownFeatures | string | "" | Path to a previously formatted tab-delimited file containing information on known ncRNAs to be included in the nearest-neighbour analysis. The file must have the following columns for the component to work: chr, start, end, strand, biotype, geneID, Name. This option is purely to speed up the component, which has a much longer run time when this option is not provided and it must process the gtf file. |