Up: Component summary Component

iSeq

Wraps the iSeq R package that implements the methods described in "A fully Bayesian hidden Ising model for ChIP-seq data analysis" by Qianxing Mo

Version 1.0
Bundle sequencing
Categories
Authors Lauri Lyly (lauri.lyly@helsinki.fi)
Issue tracker View/Report issues
Requires iSeq (R-bioconductor)
Source files component.xml iSeq.r
Usage Example with default values

Inputs

Name Type Mandatory Description
treatment CSV Mandatory Sample CSV file, with fields: chromosome name, region middle position, strand (either 1 for forward or 2 for reverse)
control CSV Mandatory Control CSV file with similar format as treatment file.

Outputs

Name Type Description
peaks CSV Called peaks regions returned from peakreg: A data frame with rows corresponding to enriched regions and columns corresponding to the following: chr: Chromosome IDs. gstart: The start genomic position of the enriched region. gend: The end genomic position of the enriched region. rstart: The row number for gstart in chrpos. rend: The row number for gend in chrpos. peakpos: The inferred center (peak) of the enriched region. meanpp: The mean posterior probability of the merged regions/bins. ct1: total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct1=sum(count[rstart:rend,1]) ct2: total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct2=sum(count[rstart:rend,2]) ct12: ct12 = ct1 + ct2 sym: A parameter used to measure if the forward and reverse tag counts are symmetrical (or balanced) in enriched regions. The values range from 0.5 (perfect symmetry) to 0 (complete asymmetry).
report Latex Various plots. To be determined more exactly - no plot is yet produced.

Parameters

Name Type Default Description
a0 float 1 a0: The scale hyper-parameter of the Gamma prior, alpha0.
a1 float 0.5 a1: The scale hyper-parameter of the Gamma prior, alpha1.
b0 float 1 b0: The rate hyper-parameter of the Gamma prior, beta0.
b1 float 1 b1: The rate hyper-parameter of the Gamma prior, beta1.
burnin int 500 burnin: The number of MCMC burn-in iterations.
ctcut float 0.95 ctcut: A value used to set the initial state for each window/bin. If tag count of a bin is greater than quantile(Y[,4],probs=ctcut), its state will be set to 1, otherwise -1. For typical ChIP-seq data, because the major regions are non-enriched, a good value for ctcut could be in the interval (0.9, 0.99).
cutoff float 0.5 The cutoff value (a scalar) used to call enriched bins. If use posterior probability as a criterion (method="ppcut"), a bin is said to be enriched if its pp is greater than the cutoff. If use FDR as a criterion (method="fdrcut"), bins are said to be enriched if the bin-based FDR is less than the cutoff. The FDR is calculated using a direct posterior probability approach (Newton et al., 2004). The default value 0.5 is applicable to ppcut, a more likely default for fdrcut would be 0.05.
gap int 300 gap: gap is the average length of the sequenced DNA fragments. If the distance between two nearest bins is greater than 'gap', a bin with 0 tag count is inserted into the two bins for modeling.
k0 float 3 k0: The initial parameter used to control the strength of interaction between neighboring bins, which must be a positive value (k0>0). A larger value of kappa represents a stronger interaction between neighboring bins.
maxgap int 300 The criterion used to merge enriched bins. If the genomic distance of adjacent bins is less than maxgap, the bins will be merged into the same enriched region.
maxk float 10 Unused for iSeq2. maxk: The maximum value of k(kappa) allowed.
maxlen int 80 The maximum length of the genomic window/bin into which sequence tags are aggregated.
method string "iSeq1" Either iSeq1 or iSeq2. iSeq1 implements the method that models the bin-based tag counts using Poisson-Gamma distribution and the hidden states of the bins using a standard 1D Ising model. iSeq2 is similar but uses a hidden high-order Ising model.
mink float 0 Unused for iSeq2. mink: The minimum value of k(kappa) allowed.
minlen int 10 The minimum length of the genomic window/bin into which sequence tags are aggregated.
normsd float 0.1 Unused for iSeq2. normsd: iSeq1 uses a Metropolis random walk proposal for sampling from the posterior distributions of the model parameter kappa. The proposal distribution is a normal distribution with mean 0 and standard deviation specified by normsd
ntagcut int 10 The tag count cutoff value for triggering bin size change. For example, suppose L_i and C_i are the length and tag count for bin i, respectively. If C_i >= ntagcut, the length for bin i+1 will be min(L_i/2,minlen); if C_i < ntagcut, the length for bin i+1 will be max(2*L_i, maxlen). Note, by default, the bin sizes decrease/increase by a factor of 2. Thus, the user should let maxlen = (2^n)*minlen.
peakreg_method string "ppcut" 'ppcut' or 'fdrcut', depending on whether cutoff is applied to posterior probability values or false discovery rate.
sampling int 2000 sampling: The number of MCMC sampling iterations. The posterior probability of enriched and non-enriched state is calculated based on the samples generated in the sampling period.
verbose boolean false verbose: A logical variable. If TRUE, the number of completed MCMC iterations is reported.

Test cases

Test case Parameters IN
treatment
IN
control
OUT
peaks
OUT
report
case1 properties treatment control (missing) (missing)


Generated 2019-02-07 07:42:21 by Anduril 2.0.0