Performs Gene Set Enrichment Analysis by using category definitions from KEGG or GO. First, a descriptive statistic (t-test statistic or signal to noise ratio) is calculated at a single gene level to compare the means of two sample groups e.g. case and control. Then, another test value is calculated at the gene set level. The idea is to summarize the single gene statistics into a single value presenting the whole set of genes. There are several ways of attaining this gene set level value.
GSEAAnalyzer supports two approaches: summary score (SS) and enrichment score (ES). In SS, the gene set level score statistic is attained by summing the single gene statistics, and adjusting the sum with the square root of the number of genes in the gene set. Thus, in SS the background genes (the genes that are not in a specific gene set) are not taken into account whereas in ES the background genes are considered in the analysis; the score statistic is calculated by walking down the ranked list of all genes, increasing the sum if a gene is in the gene set, and decreasing if it's not. Final score statistic is defined as the maximum deviation from zero. The statistical significance of the gene sets is assessed by permutation testing in both approaches.
Additionally, SS can be used in two ways a) to detect only the gene sets where the direction of regulation is the same or b) to detect the sets where the direction of regulation is not taken into account.
Version | 1.0 |
---|---|
Bundle | microarray |
Categories | GO Pathway |
Authors | Minna Miettinen (Minna.Miettinen@Helsinki.FI), Kari Nousiainen (Kari.Nousiainen@Helsinki.FI), Marko Laakso (Marko.Laakso@Helsinki.FI) |
Issue tracker | View/Report issues |
Requires | Category (R-bioconductor) ; csbl.go (R-package) ; genefilter (R-bioconductor) |
Source files | component.xml analyzer.R |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
annotation | CSV | Mandatory | Gene annotation table. Parameters sourceId and targetId specify the columns containing the names of the genes and respective GO or KEGG annotations. Gene identifiers can be converted to GO annotations with KorvasieniAnnotator component, and to KEGG annotations with KEGGPathway component. |
expr | LogMatrix | Mandatory | The expression values of the genes. First column should contain the same gene identifiers (Ensemble/Uniprot) as the sourceId column in the annotation table. The number of rows i.e. genes, can be more than the number of sourceIds in the annotation table. However, expression values for all the sourceIds should be found in the expr table. |
sampleGroupTable | SampleGroupTable | Mandatory | SampleGroupTable represents the relation between a sample and its group. |
Name | Type | Description |
---|---|---|
report | Latex | Latex report of the GSEA results: QQ-plot of score statistics, scatter plots of interesting gene sets, and a table presenting statistics for each gene set. |
ResultTable | CSV | CSV file of the GSEA results |
Name | Type | Default | Description |
---|---|---|---|
GeneSet | string | "GO" | Gene classification scheme. The possible values are "KEGG" and "GO". |
Method | string | "ES" | GSEA method. The possible values are "SS" and "ES", referring to summary score and enrichment score, respectively. |
Metric | string | "Ttest" | The metric used to score and rank the genes. The possible values are "Ttest" and "signal2noise". |
SSMethod | string | "directed" | SS method. Only used if Method is SS. The possible values are "directed" and "absolute". Use "directed" to detect the gene sets where the direction of regulation is the same, and "absolute" to detect the sets where the direction of regulation is not taken into account. |
geneOrder | string | "descending" | Used if Method is ES. Defines whether the genes should be sorted in ascending or descending order. The possible values are "ascending" and "descending". |
group1 | string | (no default) | Group label of group1. Group1 will be tested against group2. Preferably, choose case group as group1. |
group2 | string | (no default) | Group label of group2. Group2 will be tested against group1. Preferably, choose control group as group2. |
nperm | int | 1000 | The permutation distribution is computed based on nperm permutations. |
pMethod | string | "separate" | Method for calculating the p values for each gene set. Used if Method is ES. The possible values are "separate" and "same". If "separate" is used, for negative (positive) enrichment scores only the negative (positive) permuted enrichment scores are taken into account when defining the p value. If "same" is used for all (positive and negative) enrichment scores all the permuted enrichment scores are taken into account. |
pagebreak | boolean | false | Tells if the result document should start with a page break. |
sLimit | float | 0.05 | Significance limit to call gene sets interesting |
section | string | "" | Section title for the table container or an empty string if no section should be generated. |
sectionType | string | "subsection" | Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty. |
seed | int | 12345 | Seed number for the pseudo random number generator |
sourceId | string | "GeneId" | Column name of gene identifiers in input annotation |
targetId | string | "GeneSetId" | Column name of gene set identifiers in input annotation |
threshold | int | 10 | The minimum number of genes in a gene set |
Test case | Parameters▼ | IN annotation |
IN expr |
IN sampleGroupTable |
OUT report |
OUT ResultTable |
---|---|---|---|---|---|---|
case1 | properties | annotation | expr | sampleGroupTable | report | ResultTable |
group1 = low, |
||||||
case2 | properties | annotation | expr | sampleGroupTable | (expecting failure) | (expecting failure) |
group1 = low, |