RUBIC detects recurrent copy number aberrations using copy number breaks, rather than recurrently amplified or deleted regions. This allows for a vastly simplified approach as recursive peak splitting procedures and repeated re-estimation of the background model are avoided. Furthermore, the false discovery rate is controlled on the level of called regions, rather than at the probe level.
Consult RUBIC webpage for the full documentation.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | Copy Number Analysis |
Authors | Gabriele Partel (gabrielepartel@gmail.com) |
Issue tracker | View/Report issues |
Requires | installer (bash) ; biomaRt (R-bioconductor) ; data.table 1.9.4 ; pracma (R-package) ; digest (R-package) ; ggplot2 1.0.1 ; gtable (R-package) |
Source files | component.xml main.sh |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
segments | Array<CSV> | Mandatory | Array of copy number segmented files. Each file should have the same columns in the same order. Each CSV can contain arbitrary number of columns, but four fields are expected to exist for each segment:
|
markers | CSV | Mandatory | The markers file indicates the exact locations of measurement probes (markers) for the given platform. For sequencing data, copy number values are often estimated with fixed bin sizes (prior to segmentations). In this case each marker should be associated with a bin and the center genomic position of the bin. The file must contain at least 3 columns:
|
genes | TextFile | Optional | Plot only selected genes. Text file (without header) containing in a single column a list of the Ensembl ID of the genes that will be plotted. |
Name | Type | Description |
---|---|---|
gains | CSV | Focal gains output file. |
losses | CSV | Focal losses output file. |
plots | BinaryFolder | Creates and saves two plots for each chromosome; one plot showing the gains and one plot showing the losses. In each plot is shown the location of the genes used to compute the focal events. However, it is possible to plot a different set of genes using the genes input. |
Name | Type | Default | Description |
---|---|---|---|
ampLevel | float | 0.1 | A positive number specifying the threshold used for calling amplifications. |
assembly | string | "hg19" | Genome assembly used. Possible values: hg19, hg38. |
colChr | int | 1 | The number of the column containing the chromosome name in input segments CSV files. |
colEnd | int | 3 | The number of the column containing the end position of each segment in input segments CSV files. |
colLogR | int | 4 | The number of the column containing the log ratio value in input segments CSV files. |
colStart | int | 2 | The number of the column containing the start position of each segment in input segments CSV files. |
delLevel | float | -0.1 | A negative number specifying the threshold used for calling deletions. |
fdr | float | 0.25 | False discovery rate. |
maxMean | float | 0.0 | A number specifying the maximum mean copy number allowed. If 0, segments will not be filtered based on their maximum mean copy number |
minMean | float | 0.0 | A number specifying the minimum mean copy number allowed. If 0, segments will not be filtered based on their minimum mean copy number. |
Test case | Parameters▼ | IN segments |
IN markers |
IN genes |
OUT gains |
OUT losses |
OUT plots |
---|---|---|---|---|---|---|---|
case1 | properties | segments | markers | (missing) | (missing) | (missing) | (missing) |
assembly=hg19, |
|||||||
case2 | properties | segments | markers | genes | (missing) | (missing) | (missing) |
colLogR=6, |