Biclustering of gene expression data using the BackSPIN algorithm
Version | 1.0 |
---|---|
Bundle | tools |
Categories | Clustering |
Authors | Antti Hakkinen (antti.e.hakkinen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | python ; installer (bash) |
Source files | component.xml backspin.py |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | CSV | Mandatory | The expression file to be clustered. Columns are genes and rows are samples. |
Name | Type | Description |
---|---|---|
rowClusts | CSV | Clustering of the input rows. Rows represent the genes (original rows), columns the clustering depths, and values the cluster labels. |
colClusts | CSV | Clustering of the input columns. Rows represent the cells (original columns), columns the clustering depths, and values the cluster labels. |
permutedInput | CSV | Permuted and filtered (if feature section used) copy of the input |
Name | Type | Default | Description |
---|---|---|---|
feature_fit | boolean | true | Feature selection is performed before BackSPIN. Selection is based on expected noise (a curve fit to the CV-vs-mean plot). |
feature_genes | int | 2000 | Argument controls how many genes are selected for features |
first_run_iters | int | 10 | Number of iterations of preparatory SPIN |
first_run_step | float | 0.1 | Controls the decrease rate of the width parameter used in the preparatory SPIN. Smaller values will increase the number of SPIN iterations and result in higher precision in the first step but longer execution time. |
low_thrs | float | 0.2 | If the difference between the average expression of two groups is lower than threshold the algorithm uses higly correlated genes to assign the gene to one of the two groups |
normal_spin | boolean | false | Run normal SPIN instead of backSPIN. Normal spin respects the parameters "runs_iters" and "runs_step". |
normal_spin_axis | string | "both" | An axis value 0 (or "genes") to only sort genes (rows), 1 (or "cells") to only sort cells (columns) or "both" for both |
numLevels | int | 2 | Depth/Number of levels: The number of nested splits that will be tried by the algorithm |
preprocess | boolean | false | Transform the input data using log2(x+1) transform and by subtracting the mean gene expression as the BackSPIN script always does |
runs_iters | int | 8 | Number of the iterations used for every width parameter. Does not apply on the first run (use "first_run_iters" instead) |
runs_step | float | 0.3 | Controls the decrease rate of the width parameter. Smaller values will increase the number of SPIN iterations and result in higher precision but longer execution time. Does not apply on the first run (use "first_run_step" instead) |
seed | int | 12345 | Seed for the pseudorandom number generator |
split_limit_c | int | 2 | Minimal number of cells that a group must contain for splitting to be allowed. |
split_limit_g | int | 2 | Minimal number of genes that a group must contain for splitting to be allowed. |
stop_const | float | 1.15 | Minimum score that a breaking point has to reach to be suitable for splitting. |
verbose | boolean | false | Print to the stdoutput extra details of what is happening |
Test case | Parameters▼ | IN in |
OUT rowClusts |
OUT colClusts |
OUT permutedInput |
|
---|---|---|---|---|---|---|
case-example | properties | (missing) | rowClusts | colClusts | permutedInput | |
seed=123, |
||||||
case-small | (missing) | in | rowClusts | colClusts | permutedInput |