SISSRs is a software application for precise identification of genome-wide transcription factor binding sites from ChIP-Seq data. It is essentially a perl implementation of the SISSRs algorithm outlined in Jothi et al [1], with several new features that were not fully described in the original paper. [1] Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Research (2008) 36(16):5221-31.
The component installation folder contains a PDF version of the original manual.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | |
Authors | Lauri Lyly (lauri.lyly@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml sissr.sh |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
treatment | BED | Mandatory | The aligned ChIP-Seq reads for the sample in BED format. |
control | BED | Optional | The aligned ChIP-Seq reads for the control in BED format. [-b ] background file containing tags in BED format to be used as control; -e and -p can be set to desired values to control specificity and specificity, resp. |
exclude | TextFile | Optional | [-q ] file containing genomic regions to exclude; reads mapped to these regions will be ignored; file format: 'chr startCoord endCoord' |
Name | Type | Description |
---|---|---|
output | TextFile | Output file |
Name | Type | Default | Description |
---|---|---|---|
background_sites | float | 10 | [-e] e-value (>=0); it is the number of binding sites one might expect to infer by chance (default: 10); this option is irrelevant if -b option is NOT used |
fdr | float | 0.001 | [-D] false discovery rate (default: 0.001) if random background model based on Poisson probabilities need to be used as control (also check option -b below) |
fraglen | int | -1 | [-F] average length of DNA fragments that were sequenced (default: estimated from reads, i.e. -1) |
genome_size | float | 2700000000 | Genome size. |
keep_monostrand_sites | boolean | false | [-u] (also) reports binding sites supported only by reads mapped to either sense or anti-sense strand; this option will recover binding sites whose sense or anti-sense reads were not mapped for some reason (e.g., falls in unmappable/repetitive regions) |
keep_single | boolean | false | [-a] only one read is kept if multiple reads align to the same genomic coordinate (minimizes amplification bias) |
mappable_fraction | float | 0.8 | [-m] fraction of genome mappable by reads (default: 0.8 for hg18, assuming ELAND was used to map the reads; could be different for different organisms and other alignment algorithms) |
maxlen | int | 500 | [-L] upper-bound on the DNA fragment length (default: 500) |
merge_regions | boolean | false | [-c] same as the -r (report_region) option, except that it reports binding sites that are clustered within F bp of each other as a single binding region; this is the default option |
min_reads | int | 2 | [-E] min number of 'directional' reads required on each side of the inferred binding site (>0); (default: 2) |
p_value | float | 0.001 | [-p] p-value threshold for fold enrichment of ChIP tags at a binding site location compared to that at the same location in the control data (default: 0.001); this option is irrelevant if -b option is NOT used |
print_progress | boolean | true | [-x] do not print progress report (default: prints report) |
report_narrow | boolean | false | [-t] reports each binding site as a single genomic coordinate (transition point t in Fig 1A) |
report_wide | boolean | false | [-r] reports each binding site as an X-bp binding region centered on inferred binding coordinate; X denotes the distance from the start of the right-most red bar (see Fig 1A in the manuscript; lower-left) to the end of the left-most blue bar surrounding the actual binding site (transition point t in Fig 1A) |
window_size | int | 20 | [-w] scanning window size (even number > 1), which controls for noise (default: 20) |