Up: Component summary Component

Realigner

This function will do local realignment around indels using Genome Analysis Toolkit (GATK).

Complete documentation:

Version 1.0
Bundle sequencing
Categories Alignment
Authors Amjad Alkodsi (amjad.alkodsi)
Issue tracker View/Report issues
Source files component.xml Realigner.sh
Usage Example with default values

Inputs

Name Type Mandatory Description
reference FASTA Mandatory The reference fasta file.
bam1 BAM Mandatory Input BAM file for case sample. The file can be a single-sample or a merged multi-sample alignment file.
bam2 BAM Optional input Bam file for control sample. If provided, a matched realignment will be done on both the case and control and two separate outputs will be produced.
knownTargets IntervalList Optional Usage of pre-calculated targets will skip new target calculation. This can be used for using "static" targets from known indels, where the same targets can be used for each sample when no sample-wise calculated targets are wanted or needed.
indels1 VCF Optional File with known INDELs, usually from 1000 Genomes project or similar large scale projects. The file is used in GATK in local realignment around indels in order to improve the result and speed up the process.
indels2 VCF Optional File with known INDELs used in the same manner as 'indels1'.
indels3 VCF Optional File with known INDELs used in the same manner as 'indels1'. Usage of more indel files is enabled by merging them prior to use.
intervals BED Optional Select only reads in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki.

Outputs

Name Type Description
realignedCase BAM Realigned case
realignedControl BAM Realigned control. Empty file if control is not provided.
targets IntervalList The targets used in the realignment. This will be a list of new targets unless 'knownTargets' was used.

Parameters

Name Type Default Description
exclude string "" Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'.
gatk string "/opt/share/gatk/" Path to GATK directory containing the 'GenomeAnalysisTK.jar' and 'AnalyzeCovariates.jar' files. If empty string is given, GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located.
memory string "4g" The amount of java-heap memory being allocated to each GATK and Picard thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc.
optionsRealigner string "" Custom GATK parameters for the realigner can be set in their native format. E.g. "-LOD 1.0 -noTags".
optionsTargets string "" Custom GATK parameters for the targets creator can be set in their native format. E.g. "-window 5 -minReads 2".
region string "" Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input.

Test cases

Test case Parameters IN
reference
IN
bam1
IN
bam2
IN
knownTargets
IN
indels1
IN
indels2
IN
indels3
IN
intervals
OUT
realignedCase
OUT
realignedControl
OUT
targets
case1 properties reference bam1 (missing) (missing) indels1 indels2 (missing) (missing) (missing) (missing) (missing)

# Regional calibration,
memory=2g,
region=chr1:10000-60000,
exclude=chr1:30000-40000,
optionsTargets=-window 5 -minReads 2,
optionsRealigner=-LOD 1.0 -noTags

case2 properties reference bam1 bam2 (missing) indels1 indels2 (missing) (missing) (missing) (missing) (missing)

# Regional calibration,
memory=2g,
region=chr1:10000-60000,
exclude=chr1:30000-40000,
optionsTargets=-window 5 -minReads 2,
optionsRealigner=-LOD 1.0 -noTags


Generated 2019-02-08 07:42:12 by Anduril 2.0.0