Converts files containing genomic or other sequence related regions to other formats retaining all applicable information. Supported input and output types depend on the conversion method. The default is GROK's command line interface which supports all major sequencing file formats and various options. Type "grok" on the command line to get an overview.

Notice that currently GROK does not support BAM extra fields.

GROK's native CSV format is DNARegion2.

As a special case, DNARegion is explicitly supported as the output type "DNARegion" because some other components depend on it. That conversion allows specifying an ID column for the regions, so that region identity may be preserved.

Version 0.1
Bundle sequencing
Authors Lauri Lyly (lauri.lyly@helsinki.fi)
Requires python ; installer (bash)
Source files component.xml convert_file.py
Name Type Mandatory Description
file BinaryFile Optional Single file to convert.
array Array<BinaryFile> Optional Array to convert.
folder BinaryFolder Optional Folder of files to convert. All convertible files or only those matching to the input type parameter will be converted.


Name Type Description
file BinaryFile File converted from the file input.
array Array<BinaryFile> Files converted from the array input.
folder BinaryFolder Files converted from the folder input.


Name Type Default Description
from string "" Input file type for all input files. By default, deduced from the file suffix. Accepted types are FIXME
id_column string "" Used only when output type is "DNARegion". Specifies the column or annotation field from which to read the ID for each region. Otherwise an ID will be generated for each region, starting from 1 and increasing.
method string "GROK" The framework used for conversion. GROK is the only choice now. This parameter is provided to make this component extensible without triggering re-execution.
options string "" Method specific options, usually appended to a command line or interpreted in some other way. A notable choice is to use --gzip for GROK to compress the output.
threads int 1 How many threads to use maximally at once. Set to 0 to detect amount of CPUs. The default value is chosen for safety to be 1.
to string "csv" The output type for all output files. The default type "csv" means DNARegion2, a CSV format. A suffix will be appended to the file names for arrays and folders. Accepted types are FIXME

