Up: Component summary Component


Joins rows from two or more CSV files from all the inputs, optionally using one column as a matching key.

If a key column is not used, the result contains all rows and all columns of the input files. Missing values (NA) may be introduced when a column is not present in all input files. Each column is present once and duplicate rows are removed.

If a key column is used, the rows in each input file are matched using values from the key column. The result file has one row for each key value. In the result, the first column is the key column; its name is obtained from the first CSV input (csv1). If the intersection parameter is true, a key is included in the result if the key value is present in all inputs. If intersection is false, a key is included if it is present in at least one input (key union). Union semantics may introduce NA values in the result. If several input files have the same column, the value is obtained from the first file. However, if the first file contains a missing value (NA) and the second file contains a non-missing value, the non-missing value is used instead.

For more complex join operations, see TableQuery. You may also use CSVListJoin to join multiple large files efficiently.

Version 1.1
Bundle tools
Categories Convert
Specialties generic
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi)
Issue tracker View/Report issues
Requires R
Source files component.xml CSVJoin.r
Usage Example with default values

Type parameters (generics)


Name Type Mandatory Description
in1 CSV Optional CSV file 1.
in2 CSV Optional CSV file 2.
in3 CSV Optional CSV file 3.
in4 CSV Optional CSV file 4.
inDir BinaryFolder Optional Directory containing CSV files.
in Array<CSV> Optional Array containing CSV files.


Name Type Description
out T (generic) Result CSV file.


Name Type Default Description
asConnection boolean false If true, assign file connections (class file) to the inputs to variables of the same name. This is useful e.g. if you want to read large data files row by row.
intersection boolean true Defines how keys are handled; only used when useKeys=true. If intersection is true, the result contains a key if the key is present in all input files. If false, the result contains a key if the key is present in at least one input file.
keyColumnNames string "" Comma-separated list of key column names; only used when useKeys=true. The first name refers to csv1, the second to csv2, etc. An empty value refers to the first column. Empty values may be omitted from the list, so all these are equivalent: "col1" ; "col1," ; "col1,," ; etc.
minRows int 0 Fail component if there are less than minRows rows of data (excluding the header).
useKeys boolean true If true, use one column from each CSV file as a matching key column and the columns are combined. If false, rows are combined without a join.

Test cases

Test case Parameters IN
case1 (missing) in1 in2 (missing) (missing) (missing) (missing) out
case10_array_dir_files properties in1 in2 in3 in4 inDir in out


case2_union properties in1 in2 (missing) (missing) (missing) (missing) out


case3_nokeys properties in1 in2 (missing) (missing) (missing) (missing) out


case4_names properties in1 in2 (missing) (missing) (missing) (missing) out


case5_many properties in1 in2 in3 (missing) (missing) (missing) out


case6_sepnames (missing) in1 in2 (missing) (missing) (missing) (missing) out
case7_sepnames_nokeys properties in1 in2 (missing) (missing) (missing) (missing) out


case8_many properties in1 in2 in3 in4 (missing) (missing) out


case9_dir properties (missing) (missing) (missing) (missing) inDir (missing) out


Generated 2019-02-08 07:42:15 by Anduril 2.0.0