Up: Component summary Component

TreeSplitter

Splits the leafs of a tree into two sets such that the mutual information between the split and leaf annotations is maximized.

Version 1.02
Bundle tools
Categories Graph
Specialties generic
Authors Mikko Kivelä (bolozna@gmail.com)
Issue tracker View/Report issues
Requires Python ; networkx (python)
Source files component.xml treesplitter.py
Usage Example with default values

Type parameters (generics)

Inputs

Name Type Mandatory Description
tree InGraph (generic) Mandatory The tree
nodeAnnotations AnnotationTable Optional ---

Outputs

Name Type Description
newTree OutGraph (generic) The tree with the splits.
splits SetList The split category for each node.

Parameters

Name Type Default Description
MIThreshold float 0.2 Mutual information threshold to be used when the split method is iterative.
allowMissingLeafs boolean false By default, all the leafs of the nodes should have a row in the nodeAnnotations table. If this parameter is set true, missing leafs are allowed in nodeAnnotations and all annotations for those leafs are considered as missing data.
annotationColumnTypes string "guess" The type of the columns in the nodeAnnotations data, e.g. 'str', 'int', 'float' or 'bool'.
data_label string (no default) The label of the data of the leaves which is used for deciding the split. Multiple data labels can be provided as a comma separated list.
inputTreeDataType string "graphML" Only 'graphML' for now.
localMI boolean false If true, the random node can only be selected from the subtree spanned by the father of the split node. Currently only supported for splitMethod=single.
node_label string "label" The node label in the input tree, which is matched to the nodeAnnotations node label.
outputTreeDataType string "graphML" Only 'graphML' for now.
pThreshold float 0.01 P-value threshold to be used when the split method is iterative.
randomizations int 0 The number of times the data is shuffled in order to calculate the p-value for the mutual information. If 0, no randomizations are done.
splitMethod string "single" How to make the split. Currently supported: single or iterative.
useMissingData boolean true If true, the missing data (=NA values) is used also to calculate the MI and missing data can thus be enriched in the branches of the tree. If false, the random process to select a node uniformly random, which is used to define the MI, is modified in the way that a node with missing data cannot be selected.

Test cases

Test case Parameters IN
tree
IN
nodeAnnotations
OUT
newTree
OUT
splits
case1 properties tree nodeAnnotations (missing) splits

data_label=Data

case2_missing_data properties tree nodeAnnotations (missing) splits

data_label=Data,
useMissingData=false

case3_multiple_data_labels properties tree nodeAnnotations (missing) splits

data_label=Data,Data2

case4_iterative_splitting properties tree nodeAnnotations (missing) splits

data_label=Data,
splitMethod=iterative,
pThreshold=1.0

case5_missing_rows_in_annotation_file properties tree nodeAnnotations (missing) splits

data_label=Data,
allowMissingLeafs=true,
useMissingData=false

case6_local_mi properties tree nodeAnnotations (missing) splits

data_label=Data,
localMI=true


Generated 2019-02-08 07:42:20 by Anduril 2.0.0