The function collects alignment and coverage statistics from a bam file. It uses picard CollectMultipleMetrics, bedtools genomecov/coverage and some in-house plotting and reporting scripts to produce the output. The function accept sorted bam files produced by either whole-genome or targeted (exome) sequencing experiments.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Alignment |
Authors | Amjad Alkodsi (amjad.alkodsi@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | picard-tools ; bedtools ; ggplot2 (R-package) ; hwriter (R-package) |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
bam | BAM | Mandatory | Input bam file, should be sorted. |
refGenome | FASTA | Mandatory | The reference genome used to produce the alignment. |
targets | BED | Optional | Targets file if experiment is targeted. |
chrLength | CSV | Optional | Chromosomes lengths formatted as a headerless CSV with three columns: chr,start,end. Required only if the experiment is not targeted. Only chromosomes specified by this file will be analyzed. |
markStats | BinaryFile | Optional | The statistics file reported by Picard MarkDuplicates (Anduril function DuplicateMarker). If specified, the statistics will be included in the report. |
Name | Type | Description |
---|---|---|
report | BinaryFolder | Binary folder containing index.html and plotted images. |
summary | CSV | Two-column CSV file with metrics in first column and measured statistics in the second column which named as the sampleName parameter. This file can be easily combined with other files
when iterating over large number of samples. |
Name | Type | Default | Description |
---|---|---|---|
bedtools | string | "" | Path to bedtools binary directory,If empty string is given (default), BEDTOOLS_HOME environment variable is assumed to point to the bedtools directory. |
maxCoverage | int | 150 | Coverage higher than this value will be truncated from the histogram. Negative or zero value will suppress truncation. |
memory | string | "4g" | This value is used with Picard. e.g. "4g" or "8g". |
paired | boolean | true | Whether the bam file is paired-end or single-end. |
picard | string | "../../lib/picard" | Path to Picard directory, e.g. "/mnt/csc-gc5/opt/picard-tools-1.113", which containg the Picard-tools .jar files. If empty string is given (default), PICARD_HOME environment variable is assumed to point to the Picard directory. Note that some older versions of picard have bugs in the CollectMultipleMetrics module. |
sampleName | string | "Sample" | Sample name or key to be used in the report and the output summary. |
stopAfter | int | 0 | Number of reads that picard will use to report the statistics. The default value "0" will use all reads in the input file. |
targeted | boolean | false | Whether the sequencing experiment is targeted or not. If true, the targets input should be specified, and if false, the chrLength input should be specified. |
Test case | Parameters▼ | IN bam |
IN refGenome |
IN targets |
IN chrLength |
IN markStats |
OUT report |
OUT summary |
---|---|---|---|---|---|---|---|---|
case1 | properties | bam | refGenome | (missing) | chrLength | markStats | (missing) | (missing) |
maxCoverage=100, |
||||||||
case2 | properties | bam | refGenome | targets | (missing) | (missing) | (missing) | (missing) |
targeted=true, |