This function will do local realignment around indels using Genome Analysis Toolkit (GATK).
Complete documentation:
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Amjad Alkodsi (amjad.alkodsi) |
Issue tracker | View/Report issues |
Source files | component.xml Realigner.sh |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
reference | FASTA | Mandatory | The reference fasta file. |
bam1 | BAM | Mandatory | Input BAM file for case sample. The file can be a single-sample or a merged multi-sample alignment file. |
bam2 | BAM | Optional | input Bam file for control sample. If provided, a matched realignment will be done on both the case and control and two separate outputs will be produced. |
knownTargets | IntervalList | Optional | Usage of pre-calculated targets will skip new target calculation. This can be used for using "static" targets from known indels, where the same targets can be used for each sample when no sample-wise calculated targets are wanted or needed. |
indels1 | VCF | Optional | File with known INDELs, usually from 1000 Genomes project or similar large scale projects. The file is used in GATK in local realignment around indels in order to improve the result and speed up the process. |
indels2 | VCF | Optional | File with known INDELs used in the same manner as 'indels1'. |
indels3 | VCF | Optional | File with known INDELs used in the same manner as 'indels1'. Usage of more indel files is enabled by merging them prior to use. |
intervals | BED | Optional | Select only reads in the genomic regions specified in this file. Other than BED type files can be forced. Explore the allowed formats in the GATK wiki. |
Name | Type | Description |
---|---|---|
realignedCase | BAM | Realigned case |
realignedControl | BAM | Realigned control. Empty file if control is not provided. |
targets | IntervalList | The targets used in the realignment. This will be a list of new targets unless 'knownTargets' was used. |
Name | Type | Default | Description |
---|---|---|---|
exclude | string | "" | Genomic region from which to exclude reads (i.e., skip) in the same format as 'region'. |
gatk | string | "/opt/share/gatk/" | Path to GATK directory containing the 'GenomeAnalysisTK.jar' and 'AnalyzeCovariates.jar' files. If empty string is given, GATK_HOME environment variable is assumed to point to the GATK directory where GenomeAnalysisTK.jar is located. |
memory | string | "4g" | The amount of java-heap memory being allocated to each GATK and Picard thread, given in the format "4g" for 4 gigabytes or "2560m" for 2560 megabytes (2,5g) etc. |
optionsRealigner | string | "" | Custom GATK parameters for the realigner can be set in their native format. E.g. "-LOD 1.0 -noTags". |
optionsTargets | string | "" | Custom GATK parameters for the targets creator can be set in their native format. E.g. "-window 5 -minReads 2". |
region | string | "" | Genomic region from which to select reads, e.g. "chr1" or "chr2:1-20000". The region can also be a comma separated list, e.g. "chr1,chr2:1-20000,chrX:1-4000". A more extensive list of regions can be defined as a file using the 'intervals' input. |
Test case | Parameters▼ | IN reference |
IN bam1 |
IN bam2 |
IN knownTargets |
IN indels1 |
IN indels2 |
IN indels3 |
IN intervals |
OUT realignedCase |
OUT realignedControl |
OUT targets |
---|---|---|---|---|---|---|---|---|---|---|---|---|
case1 | properties | reference | bam1 | (missing) | (missing) | indels1 | indels2 | (missing) | (missing) | (missing) | (missing) | (missing) |
# Regional calibration, |
||||||||||||
case2 | properties | reference | bam1 | bam2 | (missing) | indels1 | indels2 | (missing) | (missing) | (missing) | (missing) | (missing) |
# Regional calibration, |