The component takes the output of the MMClustering component,
where each data sample has been clustered with multiple different cluster
numbers, and determines the optimal cluster number for each sample. The
optimal number of clusters for each sample is determined independently
of the other samples. The parameters method
and
metric
determine how the optimal number of cluster is chosen.
Name |
Type |
Default |
Description |
clusterClustCol |
string |
"cluster"
|
The name of the column in the clusterFiles which represents
the cluster number of the rows. |
method |
string |
"min"
|
Method used to choose the optimal clustering given the metric.
Possible values are 'min', 'max' and 'changepoint'. 'min' and
'max' choose the clustering results with the minimum and maximum
value of the metric respectively. 'changepoint' fits two linear
models to the data to detect the changepoint. |
metric |
string |
"SWR"
|
Metric used for choosing the optimal number of clusters for
each sample. Possible values are SWR (Scaleefree Weighted
Ratio), AID (Average Intercluster Distance), IIR
(Average Intracluster Distance / Average Intercluster
Distance), AIC (Akaike Information Criterion, BIC
(Bayesian Information Criterion) or ICL (Integrated
Completed Likelihood). |
nSample |
int |
1000
|
The number of data points to sample from each clustering result
to calculate AID and IIR. If there is less or equal number of
data points as nSample , all data points are used.
This can be very slow for large values of nSample. |
seed |
int |
123456
|
Random seed. Used to make the sampling of the data reproducible. |
useAIDAndIIR |
boolean |
true
|
If true calculate AID and IIR metrics for the different
clustering results. This can be time consuming. |