Clusters data using Phenograph clustering method.
PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") representing phenotypic similarities between cells and then identifying communities in this graph.
https://github.com/jacoblevine/PhenoGraph
Version | 1.0 |
---|---|
Bundle | tools |
Categories | Clustering |
Authors | Ville Rantanen (ville.rantanen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | git (DEB) ; setuptools (python3) ; installer (bash) |
Source files | component.xml phenoclus.py |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | CSV | Mandatory | Data to cluster. |
Name | Type | Description |
---|---|---|
out | CSV | A CSV with the columns idColumn, "clusterId". The column "clusterId" contains the computed cluster index for each sample as an integer. ClusterId -1 is regarded as an outlier. |
graph | CSV | A CSV |
Name | Type | Default | Description |
---|---|---|---|
columnsToRemove | string | "" | Comma separated list of names of columns not to be used in clustering. Useful if you want to ignore some attribute in the data while clustering. |
directed | boolean | false | Whether to use a symmetric (default) or asymmetric ("directed") graph The graph construction process produces a directed graph, which is symmetrized by one of two methods (see below) |
idColumn | string | "" | Name of a column that contains a unique identifier for each row in the input data. This column is not included in the clustering but is copied to the output. Use an empty string if the input does not have such a column. If this string is empty the output will have a column called "index" which has a running index number for each row of input (starting from 1). |
jaccard | boolean | true | If true, use Jaccard metric between k-neighborhoods to build graph. If false, use a Gaussian kernel. |
k | int | 30 | Number of nearest neighbors to use in first step of graph construction. |
louvainTimeLimit | int | 2000 | Maximum number of seconds to run modularity optimization. If exceeded the best result so far is returned |
metric | string | "euclidean" | Distance metric to define nearest neighbors. Options include: euclidean,manhattan,correlation,cosine. |
minClusterSize | int | 10 | Cells that end up in a cluster smaller than min_cluster_size are considered outliers and are assigned to -1 in the cluster labels |
nJobs | int | -1 | Nearest Neighbors and Jaccard coefficients will be computed in parallel using n_jobs. If n_jobs=-1, the number of jobs is determined automatically |
prune | boolean | false | Whether to symmetrize by taking the average (prune=False) or product (prune=True) between the graph and its transpose |
qTol | float | 0.001 | Tolerance (i.e., precision) for monitoring modularity optimization |
Test case | Parameters▼ | IN in |
OUT out |
OUT graph |
||
---|---|---|---|---|---|---|
case1 | properties | in | out | graph | ||
idColumn=, |