Up: Component summary Component

RegionTransformer

Computes DNA region set operations such as union and overlap. Transformations are defined using an expression syntax like the following: union(r1, length(5, 15, "r2")). Here, union and length are functions and r1 and r2 refer to region sets keys in input array. Region set references can be quoted or unquoted. See SetTransformer component for an analogous interface for computing set operations.

This component is implemented using, and follows the API of, GROK (Genomic Region Operation Kit). See GROK documentation for details. The GROK API can be used to implement more complex analyses than is possible with the simple function expression described below; a full Python script can be supplied with the script input port. Test cases present some common use cases. However they weren't written for this Anduril component originally, so they may not be representative of real use of this component - please contribute.

Region transformation functions are divided into three basic types. Type P (identity preserving) functions preserve region identity and consider regions as indivisible relations. Type A (annotation preserving) functions modify properties of regions, but migrate annotations to new regions. Type N (non-preserving) functions operate at sequence (location) level and only preserve score annotations; new scores are computed using customizable aggregate functions.

Function definitions are below. Here, regs... indicates any number of region set arguments and reg/reg1/reg2 indicate single region sets. [X] indicates optional argument. Functions may also have integer or string arguments. Functions with definitions of the form reg.function are method calls: "reg" is a region set reference, such as reg1.

Definition Type Description
union(regs...) P Regions that are present in any region set.
unionL(regs...) N Locations that are present in any region set.
intersection(regs...) P Regions that are present in all region sets.
intersectionL(regs...) N Locations that are present in all region sets.
freq(low, high, regs...) P Regions that are present in at least low and at most high region sets.
freqL(low, high, regs...) N Locations that are present in at least low and at most high region sets.
diff(reg1, reg2) P Regions that are present in reg1 but not in reg2.
diffL(reg1, reg2) N Locations that are present in reg1 but not in reg2.
reg.strand(n) P Regions whose strand matches given n (numeric): -1 for reverse, 0 for any or 1 for forward strand.
reg.expand(start, end) A Expand regions by start elements from start of region and by end elements from end of region. Negative values shrink regions. Takes strands into account.
reg.shift(n) A Shift regions by n positions in sense direction (right for forward strand, left for reverse strand). Negative values shift in anti-sense direction.
reg.flip([fixed]) A Change the strand of regions: forward becomes reverse and vice versa. If fixed (numeric: -1/0/1) is given, all regions have this strand instead.
reg.merge([gap]) N Merge regions whose gap is at most gap positions. gap defaults to 0.

Version 0.3
Bundle sequencing
Categories
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi), Lauri Lyly (lauri.lyly@helsinki.fi)
Issue tracker View/Report issues
Requires python ; python-dev (DEB) ; installer (bash)
Source files component.xml RegionTransformer.py
Usage Example with default values

Inputs

Name Type Mandatory Description
regions Array<DNARegion2> Optional Source region sets. In the transformation expression, each region set is referred to by its key in the array. I.e. if the key is "r1" then a FileReader region set will be in the variable "r1". An alternative way to access the region sets is via the "readers" dictionary. That is, readers["r1"] would yield the same object.
region_set DNARegion2 Optional Single source region set. In the transformation expression, this region set is referred to as "region_set".
folder BinaryFolder Optional Source region sets - can be of any file type known by GROK. In the transformation expression, each region set is referred to via its filename, as folder["my_regions.csv"]. This yields a FileReader region set.
scriptFile PythonSource Optional Python script to evaluate instead of the transform function parameter, if specified. The "regions" array's keys are turned into local variables and may be used to refer to corresponding FileReader region stores in the script. The script will have certain other variables visible in its global scope, including all GROK functions, and variables corresponding to the inputs and outputs.

Outputs

Name Type Description
result DNARegion2 Result region set, which is set if you specify the transformation expression in the "script" parameter instead of the input. You can also write to this from a script by storing a region set in the "result" variable. The result will be written to this output port. An alternative way is to write to the file cf.get_output("result") directly, with a writer or without.
array Array<DNARegion2> Optional array of produced region sets. Files are automatically added to the array by writing array["myfile"]. The file extension is appended, defaults to "csv" and may be specified with e.g. array.set_type("bam"). All GROK's output types are supported. Inside the script, the variable called array is actually an "AndurilOutputArray" from the Python anduril module.
folder BinaryFolder Optional output folder. The path is visible in the script as "folderOutput". A convenient way to specify the path is then e.g. os.path.join(folderOutput, "myfile.csv"). For this, you need to import the os module.

Parameters

Name Type Default Description
script string "" Same as the script input file, except can contain only a single expression whose value is written to the "result" output. This is evaluated with Python's "eval" function. Used only if corresponding input is unspecified.

Test cases

Test case Parameters IN
regions
IN
region_set
IN
folder
IN
scriptFile
OUT
result
OUT
array
OUT
folder
case1 properties regions (missing) (missing) (missing) (missing) (missing) (missing)

script=diffL(r1, r2)

case2-annotations properties regions (missing) (missing) scriptFile result array folder

case3-filtering properties regions (missing) (missing) scriptFile result array (missing)

script=diffL(r1, r2)

case4-overlap properties regions (missing) (missing) scriptFile result array (missing)

script=diffL(r1, r2)

case5-partition properties regions (missing) (missing) scriptFile result array (missing)

script=diffL(r1, r2)

case6-read-iterate properties regions (missing) folder scriptFile result array folder

script=diffL(r1, r2)

case7-setops properties regions (missing) folder scriptFile result array (missing)

script=diffL(r1, r2)

case8-stores properties regions (missing) (missing) scriptFile result array folder

script=diffL(r1, r2)

case9-transformations properties regions (missing) (missing) scriptFile result array (missing)

script=diffL(r1, r2)


Generated 2019-02-08 07:42:12 by Anduril 2.0.0