Randomly selects rows and columns from a text or CSV file without replacement.
The input file is interpreted as a CSV file if any of the column
parameters are given non-default values; otherwise, it is interpreted as a text
file. The number of sampled rows and columns can be specified as fractions or
absolute numbers. There may be header rows and columns that are copied verbatim
to the output; the random sample comes after the header entries.
Version | 0.5 |
---|---|
Bundle | tools |
Categories | Analysis |
Specialties | generic |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | commons-math3-3.2.jar (jar) |
Source files | component.xml RandomSampler.java |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | T (generic) | Mandatory | Input file. May be either a CSV file or a general text file. |
Name | Type | Description |
---|---|---|
out | T (generic) | Random subset of the input file. |
Name | Type | Default | Description |
---|---|---|---|
columnFraction | boolean | true | If true, numColumns is a fraction. If false, numColumns is an absolute count. Only used for CSV files. |
headerColumns | int | 0 | For CSV files, number of header columns before the actual randomized content. These are the first columns in the CSV file. Header columns are copied verbatim to each output row. |
headerRows | int | 1 | Number of header rows before the actual randomized content. The header is copied to output verbatim. For CSV files, this must be 1. |
numColumns | float | 1 | For CSV files, number or fraction of columns to be randomly selected. If columnFraction is true, this is a fraction between 0 and 1; otherwise, this is an absolute number. |
numRows | float | (no default) | Number or fraction of rows to be randomly selected. If rowFraction is true, this is a fraction between 0 and 1; otherwise, this is an absolute number. |
rowFraction | boolean | true | If true, numRows is a fraction. If false, numRows is an absolute count. |
shuffleColumns | boolean | false | If true, column order is randomly shuffled. If false, columns appear in the output in the same order as they are in the input. Only used for CSV files. |
Test case | Parameters▼ | IN in |
OUT out |
|||
---|---|---|---|---|---|---|
case1_text_fraction | properties | in | out | |||
headerRows=2, |
||||||
case2_text_count | properties | in | out | |||
headerRows=1, |
||||||
case3_text_part | properties | in | out | |||
headerRows=0, |
||||||
case4_csv_fraction | properties | in | out | |||
numRows=1, |
||||||
case5_csv_count | properties | in | out | |||
numRows=1, |
||||||
case6_csv_part1 | properties | in | out | |||
numRows=1, |
||||||
case7_csv_part2 | properties | in | out | |||
numRows=3, |