Creates sample group tables based on sample names read from data files. These tables define relationship between samples, e.g., which samples are biological or technical replicates. The list of sample names is read from data files (either column names or one column along rows) and sample groups are defined using regular expressions. Also, constant ("verbatim") groups where the set of member samples does not depend on data can be created. A maximum of nine patterns can be defined; the type of each pattern is given with "patternTypes", which is one of "re", "relist" or "verbatim" for each pattern.
Groups having the type "re" are defined by regular expressions. Here, patternN is a Java regular expression that matches sample names and definitionN has the format NAME,TYPE or NAME,TYPE,DESCRIPTION. NAME is the ID of the sample group and TYPE defines the relationship of samples within the group (mean, median, ratio or sample); see documentation on SampleGroupTable for details. Optionally, DESCRIPTION is a human-readable name. All samples that match the pattern are members of the group.
One pattern may spawn several sample groups if grouping operators "(" and ")" are used in the pattern. These capturing groups can be referred to as $1, $2, etc. in NAME and DESCRIPTION. For example, when pattern1 is "S([0-9]+)[a-z]" and definition1 is "MyGroup_$1,median", the following groups may be created: "MyGroup_1" containing "S1a,S1b" and "MyGroup_2" containing "S2a,S2b,S2c".
Groups having the type "relist" are a special case of two-sample groups defined by two regular expressions. Here, patternN has the format PATTERN1,PATTERN2. Both elements are regular expressions. All two-sample pairs that match the patterns are created as groups. Capturing groups may be present only in PATTERN1; PATTERN2 may refer to them as $1, $2, etc. The definitionN parameter is as before and it may also refer to capturing groups of PATTERN1. For example, if pattern1 is "S([0-9]+)_green,S$1_red" and definition1 is "ratio_S$1,ratio", the following groups may be created: "ratio_S1" containing "S1_green,S1_red" and "ratio_S2" containing "S2_green,S2_red".
Groups having the type "verbatim" are constant groups. Here, patternN is a comma-separated list of member sample group names and definitionN is as before, except capturing groups $1, ..., can not be used.
Version | 1.2.2 |
---|---|
Bundle | tools |
Categories | Preprocessing |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml SampleGroupCreator.java |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
data1 | CSV | Mandatory | Data file 1 for reading sample names. Either column names or values on one column contain sample names. |
data2 | CSV | Optional | Data file 2 for reading sample names. Either column names or values on one column contain sample names. |
data3 | CSV | Optional | Data file 3 for reading sample names. Either column names or values on one column contain sample names. |
Name | Type | Description |
---|---|---|
groups | SampleGroupTable | Result sample groups. |
Name | Type | Default | Description |
---|---|---|---|
columns | string | "" | Defines which columns in input files contain sample names, or whether column names should be used (default). This parameter is a comma-separated list of at most three values which name columns in inputs data1 to data3. If an entry is empty, the column names of the corresponding input file are used instead. |
definition1 | string | "" | Definition of group 1. If empty, the group is omitted. Format: NAME,TYPE or NAME,TYPE,DESCRIPTION. |
definition2 | string | "" | Definition of group 2. If empty, the group is omitted. |
definition3 | string | "" | Definition of group 3. If empty, the group is omitted. |
definition4 | string | "" | Definition of group 4. If empty, the group is omitted. |
definition5 | string | "" | Definition of group 5. If empty, the group is omitted. |
definition6 | string | "" | Definition of group 6. If empty, the group is omitted. |
definition7 | string | "" | Definition of group 7. If empty, the group is omitted. |
definition8 | string | "" | Definition of group 8. If empty, the group is omitted. |
definition9 | string | "" | Definition of group 9. If empty, the group is omitted. |
pattern1 | string | "" | Pattern for group 1. If empty, the group is omitted. Format depends on the pattern type. |
pattern2 | string | "" | Pattern for group 2. If empty, the group is omitted. |
pattern3 | string | "" | Pattern for group 3. If empty, the group is omitted. |
pattern4 | string | "" | Pattern for group 4. If empty, the group is omitted. |
pattern5 | string | "" | Pattern for group 5. If empty, the group is omitted. |
pattern6 | string | "" | Pattern for group 6. If empty, the group is omitted. |
pattern7 | string | "" | Pattern for group 7. If empty, the group is omitted. |
pattern8 | string | "" | Pattern for group 8. If empty, the group is omitted. |
pattern9 | string | "" | Pattern for group 9. If empty, the group is omitted. |
patternTypes | string | "" | Comma-separated list of pattern types for each pattern. Each item is one of "re" (default), "relist" or "verbatim". The types are explained above. An empty value is interpreted as "re". For example, ",re,relist,,verbatim" specifies that pattern3 has the type "relist" and pattern5 the type "verbatim"; all others (including pattern6 and above) have the type "re". |
Test case | Parameters▼ | IN data1 |
IN data2 |
IN data3 |
OUT groups |
|
---|---|---|---|---|---|---|
case1 | properties | data1 | (missing) | (missing) | groups | |
pattern1=S([0-9]+).*, |
||||||
case2_twochannel | properties | data1 | data2 | (missing) | groups | |
pattern1=S([0-9]+)_green,S$1_red, |