Joins rows from two or more CSV files from all the inputs, optionally using one column as a matching key.
If a key column is not used, the result contains all rows and all columns of the input files. Missing values (NA) may be introduced when a column is not present in all input files. Each column is present once and duplicate rows are removed.
If a key column is used, the rows in each input file are matched using values from the key column. The result file has one row for each key value. In the result, the first column is the key column; its name is obtained from the first CSV input (csv1). If the intersection parameter is true, a key is included in the result if the key value is present in all inputs. If intersection is false, a key is included if it is present in at least one input (key union). Union semantics may introduce NA values in the result. If several input files have the same column, the value is obtained from the first file. However, if the first file contains a missing value (NA) and the second file contains a non-missing value, the non-missing value is used instead.
For more complex join operations, see TableQuery. You may also use CSVListJoin to join multiple large files efficiently.
Version | 1.1 |
---|---|
Bundle | tools |
Categories | Convert |
Specialties | generic |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | R |
Source files | component.xml CSVJoin.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in1 | CSV | Optional | CSV file 1. |
in2 | CSV | Optional | CSV file 2. |
in3 | CSV | Optional | CSV file 3. |
in4 | CSV | Optional | CSV file 4. |
inDir | BinaryFolder | Optional | Directory containing CSV files. |
in | Array<CSV> | Optional | Array containing CSV files. |
Name | Type | Description |
---|---|---|
out | T (generic) | Result CSV file. |
Name | Type | Default | Description |
---|---|---|---|
asConnection | boolean | false | If true, assign file connections (class file) to the inputs to variables of the same name. This is useful e.g. if you want to read large data files row by row. |
intersection | boolean | true | Defines how keys are handled; only used when useKeys=true. If intersection is true, the result contains a key if the key is present in all input files. If false, the result contains a key if the key is present in at least one input file. |
keyColumnNames | string | "" | Comma-separated list of key column names; only used when useKeys=true. The first name refers to csv1, the second to csv2, etc. An empty value refers to the first column. Empty values may be omitted from the list, so all these are equivalent: "col1" ; "col1," ; "col1,," ; etc. |
minRows | int | 0 | Fail component if there are less than minRows rows of data (excluding the header). |
useKeys | boolean | true | If true, use one column from each CSV file as a matching key column and the columns are combined. If false, rows are combined without a join. |
Test case | Parameters▼ | IN in1 |
IN in2 |
IN in3 |
IN in4 |
IN inDir |
IN in |
OUT out |
---|---|---|---|---|---|---|---|---|
case1 | (missing) | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
case10_array_dir_files | properties | in1 | in2 | in3 | in4 | inDir | in | out |
useKeys=false |
||||||||
case2_union | properties | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
intersection=false |
||||||||
case3_nokeys | properties | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
useKeys=false |
||||||||
case4_names | properties | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
keyColumnNames=KEY1,KEY2 |
||||||||
case5_many | properties | in1 | in2 | in3 | (missing) | (missing) | (missing) | out |
useKeys=false |
||||||||
case6_sepnames | (missing) | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
case7_sepnames_nokeys | properties | in1 | in2 | (missing) | (missing) | (missing) | (missing) | out |
useKeys=false |
||||||||
case8_many | properties | in1 | in2 | in3 | in4 | (missing) | (missing) | out |
useKeys=false |
||||||||
case9_dir | properties | (missing) | (missing) | (missing) | (missing) | inDir | (missing) | out |
intersection=false |