Up: Component summary Component


Calculates a score matrix for a dichotomially classified set of samples. Each sample is represented its class (0 or 1, e.g. cancer or not) and by a gene expression ranking of size G. The rows of the input are gene expression ranks of genes for the sample. The only requirement for ranks is that they must be comparable and lower ranks are interpreted as more expressed. The output is a weighted vote of ordered candidate gene pairs, e.g. a matrix A whose element A(i, j) will correspond to gene pair (i, j), with higher scores for gene pairs whose relative expression seems to differentiate well between the classes based on similar pairwise gene ranks in each sample for that class as opposed to the other class. If the ranks seem independent of sample class, the score will be closer to zero. If the ranks differ based on sample class the score approaches 1 or -1 depending on whether the ordered gene pair in the matrix indicates for class 0 or 1 respectively. Based on the article "Merging microarray data from separate breast cancer studies provides a robust prognostic test" by Lei Xu, Aik Choon Tan, Raimond L Winslow and Donald Geman.

Version 1.0
Bundle microarray
Authors Lauri Lyly (lauri.lyly@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml rank_score.py
Usage Example with default values


Name Type Mandatory Description
inClasses CSV Mandatory Sample classes. The method only supports two classes. Rows are the classes - either 0 or 1
ranks CSV Mandatory Gene ranks by sample. First column is a list of gene IDs. Rest of columns are interpreted as samples. Row of these columns represent gene rankings. A lower rank is interpreted to be more expressed though this only affects the signum of the result.
genes CSV Optional Genes of interest or all genes if not specifies. The names must exactly match those in the first column of the ranks CSV file.


Name Type Description
scores CSV Score matrix
outClasses CSV Class names in order that corresponds to the scores matrix sign


Name Type Default Description
count_na boolean true The score function has a normalization constant for the number of samples of given classification which is used to convert the number of specifically ordered gene pairs into a probability measure. There are two basic possibilities - to count all samples, or only those samples that actually have a value other than NA for one gene or the other. This affects whether genes for which there are only few measurements will ever get a high score because if the NA samples are counted for that gene pair the normalization constant will always be very high and thus the probability close to zero. On the other hand, if only one class has samples for specific gene, then that could also tilt the results in favor of that class if this parameter isn't set to true. By default it is false because it's hard to say that genes without measurements should be considered important if there's e.g. just one "positive" measurement for them.

Test cases

Test case Parameters IN
case1 (missing) inClasses ranks (missing) (missing) (missing)
case2 (missing) inClasses ranks genes (missing) (missing)
case3 properties inClasses ranks (missing) (missing) (missing)


case4 properties inClasses ranks genes (missing) (missing)


Generated 2019-02-08 07:42:10 by Anduril 2.0.0