Creates a classifier based on the given sample data.
Version | 1.1 |
---|---|
Bundle | tools |
Categories | Classification |
Authors | Marko Laakso (Marko.Laakso@Helsinki.FI), Sirkku Karinen (sirkku.karinen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | weka.jar (jar) ; WekaUtils.jar (jar) ; csbl-javatools.jar (jar) ; weka.jar (jar) ; installer (bash) |
Source files | component.xml |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
data | TextFile | Mandatory | Sample data for the supervised learning. |
testdata | TextFile | Optional | Validation data to estimate accuracy of the new classifier. |
classifydata | TextFile | Optional | Data for which classes are predicted. NOTE This is not used in training or in validation! Weka requires class-column also for this dataset. You should add a column named with the parameter 'classColumn' to this dataset. It is a good trick to name id-column as 'classColumn', in this case it is also added to the 'predictedClasses' data set. |
inClassifier | BinaryFile | Optional | A classifier object that is used instead of building new classifier based on training data. NOTE If this is set parameter 'methodClass' or input 'data' are not used, you should still provide these values (empty values). |
Name | Type | Description |
---|---|---|
outClassifier | BinaryFile | A new classifier that has been produced. |
report | Latex | Textual description for the classifier and its performance. The exact content of this report depends on the method selection. |
confusion | Matrix | Confusion matrix with the class prediction frequencies as columns |
evaluation | CSV | Evaluation |
predictedClasses | CSV | If input 'classifydata' is provided, classes are predicted for the data and results are in this output. Otherwise this output is an empty file. |
Name | Type | Default | Description |
---|---|---|---|
classColumn | string | (no default) | Column name for the column that contains the reference class. |
columnsToRemove | string | "" | Comma separated list of names of columns not to be used in classification. Useful if you want to ignore some attribute in the data while teaching the classifier. |
crossValidation | int | 500 | Number of folds for the cross-validation |
dataType | string | "CSV" | Name for data file type (CSV/arff) |
methodClass | string | (no default) | A fully qualified Java class name for the implementation of Weka Classifier. |
processMissing | boolean | false | Process NA-values suitable for Weka. |
randomSeed | int | 1 | Seed value provided to the pseudo-random generator. |
runInternalTests | boolean | false | Flag for running internal tests of the classifier. These tests give information about functionality of classifier and print report at the end of the Latex report. |
sectionTitle | string | "" | Title for latex-section |
wekaParameters | string | "" | A space separated list of parameters passed to clustering method. See Weka API for possible values. |
Test case | Parameters▼ | IN data |
IN testdata |
IN classifydata |
IN inClassifier |
OUT outClassifier |
OUT report |
OUT confusion |
OUT evaluation |
OUT predictedClasses |
---|---|---|---|---|---|---|---|---|---|---|
case1 | properties | data | (missing) | classifydata | inClassifier | outClassifier | report | confusion | evaluation | predictedClasses |
classColumn = class, |
||||||||||
case2_NA | properties | data | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) | (missing) |
classColumn = class, |