Up: Component summary Component


Clusters data using clustering methods implemented in Weka. Note that Weka treats columns as variables and rows as samples. The component does not give a warning for incorrect parameters for some clustering methods, because Weka does not report them (even though it should).

Version 1.0
Bundle tools
Categories Clustering
Authors Viljami Aittomaki (viljami.aittomaki@helsinki.fi), Sirkku Karinen (sirkku.karinen@helsinki.fi)
Issue tracker View/Report issues
Requires weka.jar (jar) ; csbl-javatools.jar (jar)
Source files component.xml
Usage Example with default values


Name Type Mandatory Description
in CSV Mandatory Data to cluster.


Name Type Description
out CSV A CSV with the columns idColumn, "clusterId" and "clusterProbs" (where the first one is indicated by the parameter of the same name) or "index", "clusterId" and "clusterProbs" if the parameter idColumn is an empty string. The column "clusterId" contains the computed cluster index for each sample as an integer (these start from 1). The column "clusterProbs" contains the probabilities of each sample belonging to each cluster as a comma separated list. See the parameter idColumn for an explanation of the first column.


Name Type Default Description
columnsToRemove string "" Comma separated list of names of columns not to be used in clustering. Useful if you want to ignore some attribute in the data while clustering.
idColumn string "" Name of a column that contains a unique identifier for each row in the input data. This column is not included in the clustering but is copied to the output. Use an empty string if the input does not have such a column. If this string is empty the output will have a column called "index" which has a running index number for each row of input (starting from 1).
method string (no default) Clustering method used for clustering. This should be the name of the corresponding class in Weka (without the package prefixes). See Weka API for possible methods (i.e. classes).
wekaParameters string "" Parameters passed to clustering method. See Weka API for possible values.

Test cases

Test case Parameters IN
case1 properties in out

idColumn = id,
method = SimpleKMeans,

Generated 2019-02-07 07:42:33 by Anduril 2.0.0