Clusters data using clustering methods implemented in Weka. Note that Weka treats columns as variables and rows as samples. The component does not give a warning for incorrect parameters for some clustering methods, because Weka does not report them (even though it should).
Version | 1.0 |
---|---|
Bundle | tools |
Categories | Clustering |
Authors | Viljami Aittomaki (viljami.aittomaki@helsinki.fi), Sirkku Karinen (sirkku.karinen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | weka.jar (jar) ; csbl-javatools.jar (jar) |
Source files | component.xml |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | CSV | Mandatory | Data to cluster. |
Name | Type | Description |
---|---|---|
out | CSV | A CSV with the columns idColumn, "clusterId" and "clusterProbs" (where the first one is indicated by the parameter of the same name) or "index", "clusterId" and "clusterProbs" if the parameter idColumn is an empty string. The column "clusterId" contains the computed cluster index for each sample as an integer (these start from 1). The column "clusterProbs" contains the probabilities of each sample belonging to each cluster as a comma separated list. See the parameter idColumn for an explanation of the first column. |
Name | Type | Default | Description |
---|---|---|---|
columnsToRemove | string | "" | Comma separated list of names of columns not to be used in clustering. Useful if you want to ignore some attribute in the data while clustering. |
idColumn | string | "" | Name of a column that contains a unique identifier for each row in the input data. This column is not included in the clustering but is copied to the output. Use an empty string if the input does not have such a column. If this string is empty the output will have a column called "index" which has a running index number for each row of input (starting from 1). |
method | string | (no default) | Clustering method used for clustering. This should be the name of the corresponding class in Weka (without the package prefixes). See Weka API for possible methods (i.e. classes). |
wekaParameters | string | "" | Parameters passed to clustering method. See Weka API for possible values. |
Test case | Parameters▼ | IN in |
OUT out |
|||
---|---|---|---|---|---|---|
case1 | properties | in | out | |||
idColumn = id, |