This function will do base quality score recalibration using Genome Analysis Toolkit (GATK). Performed as part of alignment in RNA-seq and DNA whole genome (WGS) or targeted (exome) sequencing analyses, as in GATK best practices. This requires a dbSNP file containing known sites of variation (see 'dbnsp').
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Rony Lindell (rony.lindell@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
bam | BAM | Mandatory | Input BAM file. The file can be a single-sample or a merged multi-sample alignment file. |
dbsnp | VCF | Mandatory | File with known SNPs to mask, usually from latest dbSNP distribution. The file is used in GATK to improve base quality calibration. |
mask | VCF | Optional | File with known sites to mask. This will be used exactly as 'dbsnp'. |
intervals | BED | Optional | Select only reads in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki. |
Name | Type | Description |
---|---|---|
alignment | BAM | Final recalibrated alignment bam file. |
report | GatkReport | The report file created in the first step and used in the second step of recalibration. |
plots | Plots generated into a PDF file for quality checking. |
Name | Type | Default | Description |
---|---|---|---|
exclude | string | "" | Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'. |
gatk | string | "" | Path to GATK directory containing the 'GenomeAnalysisTK.jar' and 'AnalyzeCovariates.jar' files. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to each GATK and Picard thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. |
optionsRecal | string | "" | Custom GATK PrintReads (second step) parameters can be set in their native format. E.g. "-dcov 40". |
optionsTables | string | "" | Custom GATK BaseRecalibrator (first step) parameters can be set in their native format. E.g. "--no_standard_covs". |
plot | boolean | true | Plots of the calibration results are created in a pdf file when true. |
region | string | "" | Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input. |
Test case | Parameters▼ | IN reference |
IN bam |
IN dbsnp |
IN mask |
IN intervals |
OUT alignment |
OUT report |
OUT plots |
---|---|---|---|---|---|---|---|---|---|
case1 | properties | reference | bam | dbsnp | (missing) | (missing) | (missing) | (missing) | (missing) |
# Regional calibration, |