Divides matrix rows in to two, by balancing the occurrence of unique labels in column. The rest of the data is saved to remainder output.
Useful in dividing data to training and testing sets. The remainder can be further balanced to evaluation and test sets with the same tool.
Version | 1.0 |
---|---|
Bundle | tools |
Categories | Classification |
Authors | Ville Rantanen (ville.rantanen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | R |
Source files | component.xml SampleBalancer.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | CSV | Mandatory | Expression matrix. |
Name | Type | Description |
---|---|---|
balanced | CSV | Table with balanced classes |
remainder | CSV | The rest of the data |
Name | Type | Default | Description |
---|---|---|---|
classCol | string | (no default) | Name of the column with class information |
ratio | float | 0.5 | Ratio of samples to be assigned as training set. Defaults to half. |
seed | int | 20151208 | Seed for randomization |
Test case | Parameters▼ | IN in |
OUT balanced |
OUT remainder |
||
---|---|---|---|---|---|---|
case1 | properties | in | balanced | remainder | ||
classCol=Class |