Up: Component summary Function

BaseRecalibrator

This function will do base quality score recalibration using Genome Analysis Toolkit (GATK). Performed as part of alignment in RNA-seq and DNA whole genome (WGS) or targeted (exome) sequencing analyses, as in GATK best practices. This requires a dbSNP file containing known sites of variation (see 'dbnsp').

Complete documentation:

Version 1.0
Bundle sequencing
Categories Alignment
Authors Rony Lindell (rony.lindell@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml function.scala
Usage Example with default values

Inputs

Name Type Mandatory Description
reference FASTA Mandatory The reference fasta file.
bam BAM Mandatory Input BAM file. The file can be a single-sample or a merged multi-sample alignment file.
dbsnp VCF Mandatory File with known SNPs to mask, usually from latest dbSNP distribution. The file is used in GATK to improve base quality calibration.
mask VCF Optional File with known sites to mask. This will be used exactly as 'dbsnp'.
intervals BED Optional Select only reads in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki.

Outputs

Name Type Description
alignment BAM Final recalibrated alignment bam file.
report GatkReport The report file created in the first step and used in the second step of recalibration.
plots PDF Plots generated into a PDF file for quality checking.

Parameters

Name Type Default Description
exclude string "" Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'.
gatk string "" Path to GATK directory containing the 'GenomeAnalysisTK.jar' and 'AnalyzeCovariates.jar' files. If empty string is given (default), GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located.
memory string "4g" The amount of java-heap memory being allocated to each GATK and Picard thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc.
optionsRecal string "" Custom GATK PrintReads (second step) parameters can be set in their native format. E.g. "-dcov 40".
optionsTables string "" Custom GATK BaseRecalibrator (first step) parameters can be set in their native format. E.g. "--no_standard_covs".
plot boolean true Plots of the calibration results are created in a pdf file when true.
region string "" Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input.

Test cases

Test case Parameters IN
reference
IN
bam
IN
dbsnp
IN
mask
IN
intervals
OUT
alignment
OUT
report
OUT
plots
case1 properties reference bam dbsnp (missing) (missing) (missing) (missing) (missing)

# Regional calibration,
memory=2g,
region=chr1:10000-60000,
exclude=chr1:30000-40000,
optionsTables=--insertions_default_quality 20 -l DEBUG,
optionsRecal=-dcov 20


Generated 2019-02-08 07:42:21 by Anduril 2.0.0