Generates an expression table from individual samples expression files.
It was created with genes.fpkm_tracking and isoforms.fpkm_tracking files from Cufflinks in mind, but it can be used to summarize in one table any group of expression files that have an id column common to all files and the expression values.Version | 1.1 |
---|---|
Bundle | sequencing |
Categories | Expression |
Authors | Alejandra Cervera (alejandra.cervera@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
array | Array<BinaryFile> | Mandatory | The key column is used for naming the samples. The expression files should all have a matching id column with the gene or transcript id and a column with the expression values that are going to be included in the final expression table. |
auxiliary | CSV | Optional | If given, contains one column (see "matchColAux") whose values are matched to a column in the input files (see "matchCol"). |
Name | Type | Description |
---|---|---|
table | CSV | Expression table that has at least one column with the gene or transcript ids, and expression columns corresponding to several samples. |
log2 | CSV | Expression table that has at least one column with the gene or transcript ids, and expression columns corresponding to several samples with the expression values in log2. |
topHits | CSV | Table with the top most expressed genes/transcripts in each sample. |
plots | Latex | Density, histogram and boxplot on the expression values. |
Name | Type | Default | Description |
---|---|---|---|
collapseNumeric | string | "consensus" | If the ids are not unique then the values are collapsed, options are "median" (take median of non-NA values), "mean", "sum", "max", "min", "first" (take the first row),"median","majority" (take the value that is present on the largest number of rows), "consensus" (require that all rows have the same value) and "indicator". |
extraCols | string | "gene_id,gene_short_name" | Other columns that will be included in the expression table, for example the gene id and gene short name corresponding to the transcript or gene referred to in the idsCol. |
filter | string | "FPKM_status=OK" | Allows to filter out expression values based on the value in a different column. For example, in Cufflinks expression files the FPKM status column can be used to decide if the FPKM value is reliable. The default behavior is to keep only the values that have an "OK" status. If no filtering is desired, then it should be set to "". |
highBound | string | "" | Equivalent to highBound in CSVFilter |
idsCol | string | "tracking_id" | The name of the column that has the gene,transcript or exon id. The default is from the expression samples from Cufflinks. |
log2transform | boolean | true | Output a expression table with log2 transformed values. Log2 values are needed for generating the stats, so if set to false stats will not be produced either. |
lowBound | string | "" | Equivalent to lowBound in CSVFilter |
matchCol | string | "" | Column name in the input files that is matched to the "matchColAux" for subsetting the expression table. If empty, the first column of the input files is used. |
matchColAux | string | "" | Column name in "auxiliary" containing values that must match the "matchCol" column in input files. If empty, the first column of "auxiliary" is used. |
numberTopHits | int | 10 | Number of top hits to provide as statistic, the tophits of each sample are included so the final list could be much longer than the number provided. |
valueCols | string | "FPKM" | The name of the column that contains the expression values. The default is from Cufflinks' expression files. |
Test case | Parameters▼ | IN array |
IN auxiliary |
OUT table |
OUT log2 |
OUT topHits |
OUT plots |
---|---|---|---|---|---|---|---|
case1 | properties | array | (missing) | (missing) | (missing) | (missing) | (missing) |