Transforms sets using union, intersection, difference and other functions. Transformations are defined using an expression syntax like the following: union(intersection(S1, S2), S3). Here, union and intersection are set functions and S1..S3 refer to set names (IDs) which must be present in input files. Set references can be quoted or unquoted: union(S1, S2) is equal to union('S1', "S2"). Quotes have to be used if set names contain special characters. Expressions can nest to arbitrary depth and function calls can have an arbitrary number of arguments, unless stated otherwise. The expression language is based on JEP 2. Transformations are evaluated in the order they appear and results of earlier transformations can be used in later transformations.
Supported functions are described in the following table. For domains, S denotes a set (corresponds to the Members column of SetList), C denotes a character string argument, N denotes an integer argument, S^{n} and C^{n} denote collections of sets and character strings, and "x" is Cartesian product. Each set transformation must produce an object of type S, i.e., a single set.
Definition | Domain | Description |
---|---|---|
intersection(sets...) | S^{n} -> S | Intersection of n sets. Example: intersection(S1, S2, S3) . |
union(sets...) | S^{n} -> S | Union of n sets. Example: union(S1, S2, S3) . |
diff(sets...) | S^{n} -> S | Non-symmetric difference so that diff(S1, ..., Sn) = S1 - S2 - ... - Sn
= S1 - union(S2, ..., Sn) , where Si - Sj denotes elements present in Si
but not in Sj. For example, diff(S1, S2) returns the elements present
in S1 but not in S2. |
freq(low, high, sets...) | N^{2} x S^{n} -> S | Return elements whose frequencies (occurrences) are within given
bounds; bounds are given as two numeric arguments.
For example, freq(1, 2, S1, S2, S3) returns elements that are present in
1 or 2 of the input sets. |
minfreq(low, sets...) | N x S^{n} -> S | Like freq but takes only the lower bound: minfreq(x, sets...)
= freq(x, infinity, sets...) . |
set(strings...) | C^{n} -> S | Construct a literal set. For example, set("x", "y") creates a set
with two elements. Arguments must be strings. |
allnames() | none -> S | Return the set of all defined set names, corresponding to the
ID column of SetList. Example: allnames() . |
match(pattern, set) | C x S -> S | Use Java regular expressions to filter elements of the given
set. For example, match('a.*', S1) returns all elements
in set S1 that start with a. |
setmatch(pattern) | C -> S^{n} | Use Java regular expressions to refer to set names. Return a
collection of sets. Note that setmatch(pattern) =
expand(match(pattern, allnames())) . For example,
setmatch("S[0-9]+") returns all sets whose names
are like S1, S2, etc. |
replace(search, replace, set) | C^{2} x S -> S | Search-and-replace text strings in the elements of given set.
Return the set with modified items. The search pattern uses
Java regular expressions and the replace pattern may refer to
captured subpatterns using $1, $2, etc.
For example, replace('a(b|B)c', '$1', S1) replaces
each occurrence of 'abc' with 'b' and 'aBc' with 'B' for every
element in set S1. |
names(sets...) | S^{n} -> S | Return the names (identifiers) of argument sets, corresponding to
the ID column of SetList. For example, names(S1, S2) returns the
set {"S1", "S2"}. |
expand(set) | S -> S^{n} | The reverse of names : expand a set of names (set identifiers)
into a collection of sets. For example, expand(set('S1', 'S2')) returns
the sets S1 and S2. |
If the transformation target or expression contain *
, the
transformation is iterated over a set of names and each *
is
replaced with the current name for each iteration. This enables a single
transformation to yield several result sets. Iterations that lead to invalid
expressions are ignored, but the transformation must yield at least one
result set. By default, iteration is done over all set names. This can be
overridden by defining an IterationSet column in transformation
that contains an expression that evaluates to a set of names. If the column
is not present or contains NA, the expression is allnames()
.
Example: Target is *_deg
, Definition is union("*_up",
"*_down")
and IterationSet is names(S1, S2)
. Assuming
sets S1_up
, S1_down
, S2_up
and
S2_down
exist, this creates sets S1_deg
=
union(S1_up, S1_down)
and S2_deg
=
union(S2_up, S2_down)
. If IterationSet is omitted, the
transformation is looped over all sets.
Version | 0.6 |
---|---|
Bundle | tools |
Categories | Convert |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | jep-2.4.1.jar (jar) |
Source files | component.xml SetTransformer.java Functions.java |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
transformation | CSV | Mandatory | Set transformations, one per CSV row. The columns Target (target set ID) and Definition (transformation expression) must be present. The column IterationSet may be present if iterated transformations are used. IterationSet should contain NA for non-iterated transformations. Any other columns are interpreted as annotation columns and are copied to the output. For iterated transformations, the wildcard * in annotations is replaced with the current set ID. |
set1 | SetList | Mandatory | Input sets 1. NA values in Members are interpreted as empty sets. |
set2 | SetList | Optional | Input sets 2. |
set3 | SetList | Optional | Input sets 3. |
set4 | SetList | Optional | Input sets 4. |
set5 | SetList | Optional | Input sets 5. |
Name | Type | Description |
---|---|---|
result | SetList | Result sets. Sets are in the order they appear in transformations. Set members are in alphabetic order. |
Name | Type | Default | Description |
---|---|---|---|
includeAnnotation | string | "*" | Comma-separated list of column names in the transformation input that should be used as annotation columns. The wildcard * includes all columns. The special columns Target, Definition and IterationSet are excluded automatically. |
includeOriginal | boolean | false | If true, the original sets from input files are included in the output as well. If false, only sets defined in transformations are included. |
Test case | Parameters▼ | IN transformation |
IN set1 |
IN set2 |
IN set3 |
IN set4 |
IN set5 |
OUT result |
---|---|---|---|---|---|---|---|---|
case1 | (missing) | transformation | set1 | set2 | (missing) | (missing) | (missing) | result |
case2_inclorig | properties | transformation | set1 | set2 | (missing) | (missing) | (missing) | result |
includeOriginal=true |
||||||||
case3_wildcard | (missing) | transformation | set1 | (missing) | (missing) | (missing) | (missing) | result |
case4_annotation | properties | transformation | set1 | set2 | (missing) | (missing) | (missing) | result |
includeOriginal=true, |