Wraps the iSeq R package that implements the methods described in "A fully Bayesian hidden Ising model for ChIP-seq data analysis" by Qianxing Mo
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | |
Authors | Lauri Lyly (lauri.lyly@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | iSeq (R-bioconductor) |
Source files | component.xml iSeq.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
treatment | CSV | Mandatory | Sample CSV file, with fields: chromosome name, region middle position, strand (either 1 for forward or 2 for reverse) |
control | CSV | Mandatory | Control CSV file with similar format as treatment file. |
Name | Type | Description |
---|---|---|
peaks | CSV | Called peaks regions returned from peakreg: A data frame with rows corresponding to enriched regions and columns corresponding to the following: chr: Chromosome IDs. gstart: The start genomic position of the enriched region. gend: The end genomic position of the enriched region. rstart: The row number for gstart in chrpos. rend: The row number for gend in chrpos. peakpos: The inferred center (peak) of the enriched region. meanpp: The mean posterior probability of the merged regions/bins. ct1: total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct1=sum(count[rstart:rend,1]) ct2: total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct2=sum(count[rstart:rend,2]) ct12: ct12 = ct1 + ct2 sym: A parameter used to measure if the forward and reverse tag counts are symmetrical (or balanced) in enriched regions. The values range from 0.5 (perfect symmetry) to 0 (complete asymmetry). |
report | Latex | Various plots. To be determined more exactly - no plot is yet produced. |
Name | Type | Default | Description |
---|---|---|---|
a0 | float | 1 | a0: The scale hyper-parameter of the Gamma prior, alpha0. |
a1 | float | 0.5 | a1: The scale hyper-parameter of the Gamma prior, alpha1. |
b0 | float | 1 | b0: The rate hyper-parameter of the Gamma prior, beta0. |
b1 | float | 1 | b1: The rate hyper-parameter of the Gamma prior, beta1. |
burnin | int | 500 | burnin: The number of MCMC burn-in iterations. |
ctcut | float | 0.95 | ctcut: A value used to set the initial state for each window/bin. If tag count of a bin is greater than quantile(Y[,4],probs=ctcut), its state will be set to 1, otherwise -1. For typical ChIP-seq data, because the major regions are non-enriched, a good value for ctcut could be in the interval (0.9, 0.99). |
cutoff | float | 0.5 | The cutoff value (a scalar) used to call enriched bins. If use posterior probability as a criterion (method="ppcut"), a bin is said to be enriched if its pp is greater than the cutoff. If use FDR as a criterion (method="fdrcut"), bins are said to be enriched if the bin-based FDR is less than the cutoff. The FDR is calculated using a direct posterior probability approach (Newton et al., 2004). The default value 0.5 is applicable to ppcut, a more likely default for fdrcut would be 0.05. |
gap | int | 300 | gap: gap is the average length of the sequenced DNA fragments. If the distance between two nearest bins is greater than 'gap', a bin with 0 tag count is inserted into the two bins for modeling. |
k0 | float | 3 | k0: The initial parameter used to control the strength of interaction between neighboring bins, which must be a positive value (k0>0). A larger value of kappa represents a stronger interaction between neighboring bins. |
maxgap | int | 300 | The criterion used to merge enriched bins. If the genomic distance of adjacent bins is less than maxgap, the bins will be merged into the same enriched region. |
maxk | float | 10 | Unused for iSeq2. maxk: The maximum value of k(kappa) allowed. |
maxlen | int | 80 | The maximum length of the genomic window/bin into which sequence tags are aggregated. |
method | string | "iSeq1" | Either iSeq1 or iSeq2. iSeq1 implements the method that models the bin-based tag counts using Poisson-Gamma distribution and the hidden states of the bins using a standard 1D Ising model. iSeq2 is similar but uses a hidden high-order Ising model. |
mink | float | 0 | Unused for iSeq2. mink: The minimum value of k(kappa) allowed. |
minlen | int | 10 | The minimum length of the genomic window/bin into which sequence tags are aggregated. |
normsd | float | 0.1 | Unused for iSeq2. normsd: iSeq1 uses a Metropolis random walk proposal for sampling from the posterior distributions of the model parameter kappa. The proposal distribution is a normal distribution with mean 0 and standard deviation specified by normsd |
ntagcut | int | 10 | The tag count cutoff value for triggering bin size change. For example, suppose L_i and C_i are the length and tag count for bin i, respectively. If C_i >= ntagcut, the length for bin i+1 will be min(L_i/2,minlen); if C_i < ntagcut, the length for bin i+1 will be max(2*L_i, maxlen). Note, by default, the bin sizes decrease/increase by a factor of 2. Thus, the user should let maxlen = (2^n)*minlen. |
peakreg_method | string | "ppcut" | 'ppcut' or 'fdrcut', depending on whether cutoff is applied to posterior probability values or false discovery rate. |
sampling | int | 2000 | sampling: The number of MCMC sampling iterations. The posterior probability of enriched and non-enriched state is calculated based on the samples generated in the sampling period. |
verbose | boolean | false | verbose: A logical variable. If TRUE, the number of completed MCMC iterations is reported. |
Test case | Parameters▼ | IN treatment |
IN control |
OUT peaks |
OUT report |
|
---|---|---|---|---|---|---|
case1 | properties | treatment | control | (missing) | (missing) | |
|