Determine significantly mutated genes in a set of genetic variations using MutSig. The algorithm takes genetic variations mapped to patients as input, and computes gene significance based on sample- and gene-specific background mutation rates (BMRs). Gene-specific BMRs are estimated using gene features (covariates) that may affect mutation rate, such as gene expression. BMR is further estimated separately for different mutation categories such as C->T, A->G, and indels. MutSig requires silent (and optionally noncoding) variants in addition to nonsilent variants, as they are used for estimating BMR.
The output is a prioritized list of significant genes with the following columns:
This component may be called using a variety of inputs. Mandatory inputs include only patient identifiers, variant locations and variant alleles. In this case, other information is inferred using ANNOVAR and custom logic. It is also possible to provide all necessary information a priori.
Genes are normally specified using HUGO as per MutSig convention. MutSig requires approx. 10 GB memory. See below for installation instructions.
Version | 1.0 |
---|---|
Bundle | sequencing |
Categories | VariationAnalysis |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | MutSigCV ; MATLAB Compiler Runtime ; ANNOVAR ; Scala |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
variants | CSV | Mandatory | Raw variants. Must contains columns for patient identifiers, chromosome, chromosomal position and variant (alternative) allele. May also contain columns for gene identifiers (HUGO), predicted effect of variation, and reference allele. Column names are configured using parameters. |
coverage | CSV | Optional | If given, contains genomic intervals for those parts of the genome that were selectively sequenced in the experiment. For example, an exome sequencing experiment would include the regions that were captured. The following columns must be present: Chromosome, Start (1-based), End (inclusive). Note: annotating coverage using ANNOVAR is relatively slow. |
covariates | CSV | Optional | Gene covariate table. The first column contains gene identifiers (HUGO) and the rest of the columns contain numeric attributes of the genes. |
Name | Type | Description |
---|---|---|
genes | CSV | Significance of mutated genes (for all defined genes). Contains the columns: Gene (HUGO); p (nominal p-value); pFDR (FDR-corrected p-value), as well as MutSig metrics and copies of covariates. |
genesExcel | Excel | The genes output as a formatted Excel. |
Name | Type | Default | Description |
---|---|---|---|
altAlleleColumn | string | "ALT" | In variants, column name for the variant allele. Must always be defined. |
annovarBin | string | "" | Path to ANNOVAR binary installation directory. This directory contains convert2annovar.pl, table_annovar.pl, etc. Only needed when geneColumn or effectColumn are empty or custom coverage is provided. If empty, the environment variable ANNOVAR_HOME is used instead (when needed). |
annovarDB | string | "" | Path to ANNOVAR database directory. This directory often contains hgNN_X.{fa,idx,txt} files. If empty, the environment variable ANNOVAR_DB is used instead (when needed). |
builtinCovariates | string | "*" | Comma-separated list of column names for builtin covariates located in gene.covariates.txt in the MutSig installation directory. These covariates are combined with custom covariates, if defined. The special value * selects all builtin covariates. |
chromColumn | string | "CHROM" | In variants, column name for the chromosome. Must always be defined. |
effectColumn | string | "" | In variants, column name for variant effect. This is either the native MutSig effect (noncoding/nonsilent/silent/null); the Variant_Classification column in the MAF format; ANNOVAR output; or the AminoChange column in RikuRator export files. Non-MutSig effect specifications are converted to the native format. If empty, this is inferred using ANNOVAR |
geneColumn | string | "" | In variants, column name for gene identifiers (HUGO). If empty, this is inferred using ANNOVAR. |
label | string | "MutSig" | Label for the experiment that is used as sheet name in the Excel report. |
matlab | string | "" | Path to the MATLAB compiler runtime directory. This directory contains the subdirectories bin, mcr, resources, etc. If empty, the environment variable MCRROOT is used instead. |
mutsig | string | "" | Path to MutSig installation directory. This directory contains the main MutSig MCR binaries (run_MutSigCV.sh, etc.) and supplementary data files available on the MutSig web site. Always needed is mutation_type_dictionary_file.txt. Depending on optional inputs, chr_files_hg19 (reference genome directory), exome_full192.coverage.txt and gene.covariates.txt may also be needed. If empty, the environment variable MUTSIG_HOME is used instead. |
patientColumn | string | (no default) | In variants, column name for patient identifiers. Must always be defined. |
positionColumn | string | "POS" | In variants, column name for chromosomal position. Must always be defined. |
refAlleleColumn | string | "REF" | In variants, column name for the reference allele. If empty, this is inferred from mutsig/chr_files_hg19. |
Test case | Parameters▼ | IN variants |
IN coverage |
IN covariates |
OUT genes |
OUT genesExcel |
---|---|---|---|---|---|---|
case1 | properties | variants | coverage | (missing) | (missing) | (missing) |
geneColumn=gene, |
||||||
case2_minimal | properties | variants | (missing) | (missing) | (missing) | (missing) |
patientColumn=patient, |
||||||
case3_effect | properties | variants | coverage | (missing) | (missing) | (missing) |
geneColumn=gene, |
||||||
case4_covariates | properties | variants | coverage | covariates | (missing) | (missing) |
geneColumn=gene, |