Up: Component summary Component


SISSRs is a software application for precise identification of genome-wide transcription factor binding sites from ChIP-Seq data. It is essentially a perl implementation of the SISSRs algorithm outlined in Jothi et al [1], with several new features that were not fully described in the original paper. [1] Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Research (2008) 36(16):5221-31.

The component installation folder contains a PDF version of the original manual.

Version 1.0
Bundle sequencing
Authors Lauri Lyly (lauri.lyly@helsinki.fi)
Issue tracker View/Report issues
Source files component.xml sissr.sh
Usage Example with default values


Name Type Mandatory Description
treatment BED Mandatory The aligned ChIP-Seq reads for the sample in BED format.
control BED Optional The aligned ChIP-Seq reads for the control in BED format. [-b ] background file containing tags in BED format to be used as control; -e and -p can be set to desired values to control specificity and specificity, resp.
exclude TextFile Optional [-q ] file containing genomic regions to exclude; reads mapped to these regions will be ignored; file format: 'chr startCoord endCoord'


Name Type Description
output TextFile Output file


Name Type Default Description
background_sites float 10 [-e] e-value (>=0); it is the number of binding sites one might expect to infer by chance (default: 10); this option is irrelevant if -b option is NOT used
fdr float 0.001 [-D] false discovery rate (default: 0.001) if random background model based on Poisson probabilities need to be used as control (also check option -b below)
fraglen int -1 [-F] average length of DNA fragments that were sequenced (default: estimated from reads, i.e. -1)
genome_size float 2700000000 Genome size.
keep_monostrand_sites boolean false [-u] (also) reports binding sites supported only by reads mapped to either sense or anti-sense strand; this option will recover binding sites whose sense or anti-sense reads were not mapped for some reason (e.g., falls in unmappable/repetitive regions)
keep_single boolean false [-a] only one read is kept if multiple reads align to the same genomic coordinate (minimizes amplification bias)
mappable_fraction float 0.8 [-m] fraction of genome mappable by reads (default: 0.8 for hg18, assuming ELAND was used to map the reads; could be different for different organisms and other alignment algorithms)
maxlen int 500 [-L] upper-bound on the DNA fragment length (default: 500)
merge_regions boolean false [-c] same as the -r (report_region) option, except that it reports binding sites that are clustered within F bp of each other as a single binding region; this is the default option
min_reads int 2 [-E] min number of 'directional' reads required on each side of the inferred binding site (>0); (default: 2)
p_value float 0.001 [-p] p-value threshold for fold enrichment of ChIP tags at a binding site location compared to that at the same location in the control data (default: 0.001); this option is irrelevant if -b option is NOT used
print_progress boolean true [-x] do not print progress report (default: prints report)
report_narrow boolean false [-t] reports each binding site as a single genomic coordinate (transition point t in Fig 1A)
report_wide boolean false [-r] reports each binding site as an X-bp binding region centered on inferred binding coordinate; X denotes the distance from the start of the right-most red bar (see Fig 1A in the manuscript; lower-left) to the end of the left-most blue bar surrounding the actual binding site (transition point t in Fig 1A)
window_size int 20 [-w] scanning window size (even number > 1), which controls for noise (default: 20)

Generated 2019-02-08 07:42:12 by Anduril 2.0.0