Splits a text file (such as CSV) into an array of smaller text files. Splitting can be done in two modes: (1) fixed and (2) non-fixed number (default) of elements in output array. Fixed mode is enabled when numElements is non-negative and non-fixed mode otherwise.
The input file is divided into records, which are regions each having N rows, where N is given with rowsPerRecord. By default, each line is considered as a record. Records are written into the output array using splitting criteria.
Common use cases:
This component is the inverse operation to CSVListJoin.
Version | 1.0 |
---|---|
Bundle | tools |
Categories | Internal |
Specialties | generic |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml TextFileSplitter.java |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | T1 (generic) | Mandatory | Input text file. |
Name | Type | Description |
---|---|---|
out | Array<T1> (generic) | Array of smaller text files whose contents are derived from the input file. Array elements and rows within individual elements are in the order of the original file. |
Name | Type | Default | Description |
---|---|---|---|
headerRows | int | 1 | Number of header rows in the beginning of the file. These rows are included in every output element and are not counted as actual records. |
keyPattern | string | "%d" | Defines the format of keys in the output array. The wildcard %d is replaced with the index of the current element, starting at 1. The default creates elements with keys 1, 2, etc. |
maxRecords | int | -1 | Maximum number of records in each element. If negative, there is no upper limit. Must be negative when fixed mode is enabled. |
numElements | int | -1 | Defines the fixed number of elements that the output array will have. If negative, the number of elements is not fixed and the non-fixed mode is enabled. When non-negative, the fixed mode is enabled and maxRecords and splitRegexp can not be used. |
rowsPerRecord | int | 1 | Number of rows that each record spans. |
splitRegexp | string | "" | Java regular expression that indicates the start of a new array element. When the current line matches this regular expression, a new element is started and the current line is written into the beginning of the new element. If empty, the expression is not used. Must not be used when fixed mode is enabled. |
Test case | Parameters▼ | IN in |
OUT out |
|||
---|---|---|---|---|---|---|
case1_defaults | (missing) | in | out | |||
case2_maxrec | properties | in | out | |||
headerRows=4, |
||||||
case3_regexp | properties | in | out | |||
headerRows=2, |
||||||
case4_csv_numelem | properties | in | out | |||
numElements=7 |
||||||
case5_csv_maxrec | properties | in | out | |||
maxRecords=3 |