Splits the leafs of a tree into two sets such that the mutual information between the split and leaf annotations is maximized.
Version | 1.02 |
---|---|
Bundle | tools |
Categories | Graph |
Specialties | generic |
Authors | Mikko Kivelä (bolozna@gmail.com) |
Issue tracker | View/Report issues |
Requires | Python ; networkx (python) |
Source files | component.xml treesplitter.py |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
tree | InGraph (generic) | Mandatory | The tree |
nodeAnnotations | AnnotationTable | Optional | --- |
Name | Type | Description |
---|---|---|
newTree | OutGraph (generic) | The tree with the splits. |
splits | SetList | The split category for each node. |
Name | Type | Default | Description |
---|---|---|---|
MIThreshold | float | 0.2 | Mutual information threshold to be used when the split method is iterative. |
allowMissingLeafs | boolean | false | By default, all the leafs of the nodes should have a row in the nodeAnnotations table. If this parameter is set true, missing leafs are allowed in nodeAnnotations and all annotations for those leafs are considered as missing data. |
annotationColumnTypes | string | "guess" | The type of the columns in the nodeAnnotations data, e.g. 'str', 'int', 'float' or 'bool'. |
data_label | string | (no default) | The label of the data of the leaves which is used for deciding the split. Multiple data labels can be provided as a comma separated list. |
inputTreeDataType | string | "graphML" | Only 'graphML' for now. |
localMI | boolean | false | If true, the random node can only be selected from the subtree spanned by the father of the split node. Currently only supported for splitMethod=single. |
node_label | string | "label" | The node label in the input tree, which is matched to the nodeAnnotations node label. |
outputTreeDataType | string | "graphML" | Only 'graphML' for now. |
pThreshold | float | 0.01 | P-value threshold to be used when the split method is iterative. |
randomizations | int | 0 | The number of times the data is shuffled in order to calculate the p-value for the mutual information. If 0, no randomizations are done. |
splitMethod | string | "single" | How to make the split. Currently supported: single or iterative. |
useMissingData | boolean | true | If true, the missing data (=NA values) is used also to calculate the MI and missing data can thus be enriched in the branches of the tree. If false, the random process to select a node uniformly random, which is used to define the MI, is modified in the way that a node with missing data cannot be selected. |
Test case | Parameters▼ | IN tree |
IN nodeAnnotations |
OUT newTree |
OUT splits |
|
---|---|---|---|---|---|---|
case1 | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data |
||||||
case2_missing_data | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data, |
||||||
case3_multiple_data_labels | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data,Data2 |
||||||
case4_iterative_splitting | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data, |
||||||
case5_missing_rows_in_annotation_file | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data, |
||||||
case6_local_mi | properties | tree | nodeAnnotations | (missing) | splits | |
data_label=Data, |