Preprocessing of RNA bam files according to GATK best practices (adding groups, marking duplicates, splitNtrim and base recalibration)
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Amjad Alkodsi (amjad.alkodsi@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
in | BAM | Mandatory | Input BAM file. |
dbsnp | VCF | Mandatory | File with known SNPs to mask, usually from latest dbSNP distribution. The file is used in GATK to improve base quality calibration. |
mask | VCF | Optional | File with known sites to mask. This will be used exactly as 'dbsnp'. |
Name | Type | Description |
---|---|---|
alignment | BAM | Final preprocessed bam file. |
markReport | BinaryFile | The report file created by picard during marking duplicates. |
recalReport | GatkReport | The report file created in the first step and used in the second step of recalibration. |
Name | Type | Default | Description |
---|---|---|---|
ID | string | "1" | Read group ID. |
LB | string | "library1" | Read group library. |
PL | string | "ILLUMINA" | Read group platform. |
PU | string | "Unit1" | Read group platform unit. |
SM | string | "Sample1" | Read group sample name. |
gatkPath | string | "" | Path to GATK directory containing the 'GenomeAnalysisTK.jar'. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to each GATK and Picard thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. |
picardPath | string | "" | Path to Picard directory, e.g. "/opt/picard", which containg the Picard-tools. If empty PICARD_HOME is used |
Test case | Parameters▼ | IN reference |
IN in |
IN dbsnp |
IN mask |
OUT alignment |
OUT markReport |
OUT recalReport |
---|---|---|---|---|---|---|---|---|
case1 | properties | reference | in | dbsnp | (missing) | (missing) | (missing) | (missing) |
ID=ID2 |