Up: Component summary Component

BSAlign

Aligns BS or RRBS data though BSMAP software, version 2.74. The software only works with base-space data and does not perform Methylation Calling (for this purpose use the component MethylCall).

Version 1.0
Bundle sequencing
Categories Alignment DNA Methylation
Authors Chiara Facciotto (chiara.facciotto@helsinki.fi)
Issue tracker View/Report issues
Requires download (bash) ; bsmap ; samtools
Source files component.xml BSAlign.sh
Usage Example with default values
Deprecated

BismarkAlign is a better aligner.

Inputs

Name Type Mandatory Description
reference FASTA Mandatory The reference genome file in fasta format. It supports also gzipped fasta format.
reads BinaryFile Mandatory Reads in fasta, fastq or in bam format. It supports gzipped fasta/fastq format.

Use the inputType parameter to define the format.

For paired-end alignment using a single bam input the same input bam file should be assigned to both 'reads' and 'mate'.
mates BinaryFile Optional Mate in fasta,fastq or bam format. It supports gzipped fasta/fastq format.

If the file format is bam then the same input bam file should be assigned to both 'reads' and 'mate'.

Outputs

Name Type Description
alignedReads BAM Aligned reads in compressed bam format.

Parameters

Name Type Default Description
gapSize int 0 Gap size. BSMAP only allow 1 continuous gap (insertion or deletion) with up to 3 nucleotides. Gaps will not be allowed within 6nt of the read edges.

The number of mismatches of gapped alignment is calculated as #gap_size+#mismatches+1
mismatches float 0.08 Number of allowed mismatches. If this value is between 0 and 1, it's interpreted as the mismatch rate w.r.t to the read length. Otherwise it's interpreted as the maximum number of mismatches allowed on a read. Max=15.

Example: mismatches = 5 (max #mismatches = 5), mismatches = 0.1 (max #mismatches = read_length * 10%)
optionsAlignment string "" Other options for alignment. This parameter is given as written to the aligner execution command. Example: "-H -s 10" allows respectively to avoid the header informations in the sam file and to set the seed size to 10 (Note: The '-s' option is valid only when working on WGBS mode).
restrictionSite string "" Restriction site recognized by the restriction enzyme used in the experimental procedure. It sets restriction enzyme digestion site and activates reduced representation bisulfite mapping mode (RRBS mode).

Possible restriction sites are "C-CGG" (recognized by MspI, the most commonly used restriction enzyme) and "T-CGA" (recognized by TaqI), where the symbol "-" represent the location in which the fragment is digested.

Note: To analyze whole genome bisulfite sequencing data (WGBS mode), set the parameter restrictionEnzyme = "" (default mode).
threads int 1 The number of processors to use. Default=CPU cores detected (up to 8 threads). Setting the parameter to '-1' allows to use the default value.

Note: The parallel performance scales well with 12 threads or less, no significant speed gain for >12 threads.

Test cases

Test case Parameters IN
reference
IN
reads
IN
mates
OUT
alignedReads
case1_reads (missing) reference reads (missing) alignedReads
case2_reads_and_mate (missing) reference reads mates alignedReads
case3_restriction_site properties reference reads mates alignedReads

# Testing BSAlign component,
restrictionSite=C-CGG

case4_mismatches_gapSize properties reference reads mates alignedReads

# Testing BSAlign component,
restrictionSite=C-CGG,
mismatches=3.0,
gapSize=1

case5_optionsAlignment properties reference reads mates alignedReads

# Testing BSAlign component,
optionsAlignment=-s 10


Generated 2019-02-08 07:42:12 by Anduril 2.0.0