Converts files containing genomic or other sequence related regions to other formats retaining all applicable information. Supported input and output types depend on the conversion method. The default is GROK's command line interface which supports all major sequencing file formats and various options. Type "grok" on the command line to get an overview.
Notice that currently GROK does not support BAM extra fields.
GROK's native CSV format is DNARegion2.
As a special case, DNARegion is explicitly supported as the output type "DNARegion" because some other components depend on it. That conversion allows specifying an ID column for the regions, so that region identity may be preserved.
Version | 0.1 |
---|---|
Bundle | sequencing |
Categories | |
Authors | Lauri Lyly (lauri.lyly@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | python ; installer (bash) |
Source files | component.xml convert_file.py |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
file | BinaryFile | Optional | Single file to convert. |
array | Array<BinaryFile> | Optional | Array to convert. |
folder | BinaryFolder | Optional | Folder of files to convert. All convertible files or only those matching to the input type parameter will be converted. |
Name | Type | Description |
---|---|---|
file | BinaryFile | File converted from the file input. |
array | Array<BinaryFile> | Files converted from the array input. |
folder | BinaryFolder | Files converted from the folder input. |
Name | Type | Default | Description |
---|---|---|---|
from | string | "" | Input file type for all input files. By default, deduced from the file suffix. Accepted types are FIXME |
id_column | string | "" | Used only when output type is "DNARegion". Specifies the column or annotation field from which to read the ID for each region. Otherwise an ID will be generated for each region, starting from 1 and increasing. |
method | string | "GROK" | The framework used for conversion. GROK is the only choice now. This parameter is provided to make this component extensible without triggering re-execution. |
options | string | "" | Method specific options, usually appended to a command line or interpreted in some other way. A notable choice is to use --gzip for GROK to compress the output. |
threads | int | 1 | How many threads to use maximally at once. Set to 0 to detect amount of CPUs. The default value is chosen for safety to be 1. |
to | string | "csv" | The output type for all output files. The default type "csv" means DNARegion2, a CSV format. A suffix will be appended to the file names for arrays and folders. Accepted types are FIXME |
Test case | Parameters▼ | IN file |
IN array |
IN folder |
OUT file |
OUT array |
OUT folder |
---|---|---|---|---|---|---|---|
case1_vcf2csv | properties | file | (missing) | (missing) | file | array | (missing) |
from=vcf |