Up: Component summary Component

JCSVJoin

Java implementation of R CSVJoin component.

Joins rows from two or more CSV files from all the inputs, optionally using one column as a matching key.

If a key column is not used, the result contains all rows and all columns of the input files. Missing values (NA) may be introduced when a column is not present in all input files. Each column is present once and duplicate rows are removed.

If a key column is used, the rows in each input file are matched using values from the key column. The result file has one row for each key value. In the result, the first column is the key column; its name is obtained from the first CSV input (csv1). If the intersection parameter is true, a key is included in the result if the key value is present in all inputs. If intersection is false, a key is included if it is present in at least one input (key union). Union semantics may introduce NA values in the result. If several input files have the same column, the value is obtained from the first file. However, if the first file contains a missing value (NA) and the second file contains a non-missing value, the non-missing value is used instead.

For more complex join operations, see TableQuery. You may also use CSVListJoin to join multiple large files efficiently.

Version 1.0
Bundle tools
Categories Convert
Specialties generic
Authors Vladimir Rogojin (vladimir.rogojin@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml JCSVJoin.java
Usage Example with default values

Type parameters (generics)

Inputs

Name Type Mandatory Description
csv1 CSV Optional CSV file 1.
csv2 CSV Optional CSV file 2.
csv3 CSV Optional CSV file 3.
csv4 CSV Optional CSV file 4.
csv5 CSV Optional CSV file 5.
csv6 CSV Optional CSV file 6.
csv7 CSV Optional CSV file 7.
csv8 CSV Optional CSV file 8.
csvDir BinaryFolder Optional Directory containing CSV files.
array Array<CSV> Optional Array containing CSV files.

Outputs

Name Type Description
csv T (generic) Result CSV file.

Parameters

Name Type Default Description
arrayKeyColumnName string "" The key column name in all csv files from array.
csvDirKeyColumnName string "" The key column name in all csv files from csvDir.
csvDirRegexp string ".*" If input port csvDir is connected, then this pattern is used to select files from csvDir. In contrast to original R CSVJoin component, all subdirs of csvDir and all the files in all subdirs are being considered.
intersection boolean true Defines how keys are handled; only used when useKeys=true. If intersection is true, the result contains a key if the key is present in all input files. If false, the result contains a key if the key is present in at least one input file.
keyColumnNames string "" Comma-separated list of key column names for csv1, csv2, ..., csv8 in-ports; only used when useKeys=true. The first name refers to csv1, the second to csv2, etc. An empty value refers to the first column. Empty values may be omitted from the list, so all these are equivalent: "col1" ; "col1," ; "col1,," ; etc. To define keyColumnNames for files from csvDir and from array check parameters csvDirKeyColumnName and arrayKeyColumnName respectively.
minRows int 0 Fail component if there are less than minRows rows of data (excluding the header).
useKeys boolean true If true, use one column from each CSV file as a matching key column and the columns are combined. If false, rows are combined without a join.

Test cases

Test case Parameters IN
csv1
IN
csv2
IN
csv3
IN
csv4
IN
csv5
IN
csv6
IN
csv7
IN
csv8
IN
csvDir
IN
array
OUT
csv
case1 (missing) csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv
case10_array_dir_files properties csv1 csv2 csv3 csv4 csv5 csv6 csv7 csv8 csvDir array csv

useKeys=false

case2_union properties csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv

intersection=false

case3_nokeys properties csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv

useKeys=false

case4_names properties csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv

keyColumnNames=KEY1,KEY2

case5_many properties csv1 csv2 csv3 (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv

useKeys=false

case6_sepnames (missing) csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv
case7_sepnames_nokeys properties csv1 csv2 (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csv

useKeys=false

case8_many properties csv1 csv2 csv3 csv4 csv5 csv6 csv7 csv8 (missing) (missing) csv

useKeys=false

case9_dir properties (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing) csvDir (missing) csv

intersection=false


Generated 2019-02-08 07:42:17 by Anduril 2.0.0