Up: Component summary Component

GSEAAnalyzer

Performs Gene Set Enrichment Analysis by using category definitions from KEGG or GO. First, a descriptive statistic (t-test statistic or signal to noise ratio) is calculated at a single gene level to compare the means of two sample groups e.g. case and control. Then, another test value is calculated at the gene set level. The idea is to summarize the single gene statistics into a single value presenting the whole set of genes. There are several ways of attaining this gene set level value.

GSEAAnalyzer supports two approaches: summary score (SS) and enrichment score (ES). In SS, the gene set level score statistic is attained by summing the single gene statistics, and adjusting the sum with the square root of the number of genes in the gene set. Thus, in SS the background genes (the genes that are not in a specific gene set) are not taken into account whereas in ES the background genes are considered in the analysis; the score statistic is calculated by walking down the ranked list of all genes, increasing the sum if a gene is in the gene set, and decreasing if it's not. Final score statistic is defined as the maximum deviation from zero. The statistical significance of the gene sets is assessed by permutation testing in both approaches.

Additionally, SS can be used in two ways a) to detect only the gene sets where the direction of regulation is the same or b) to detect the sets where the direction of regulation is not taken into account.

Version 1.0
Bundle microarray
Categories GO Pathway
Authors Minna Miettinen (Minna.Miettinen@Helsinki.FI), Kari Nousiainen (Kari.Nousiainen@Helsinki.FI), Marko Laakso (Marko.Laakso@Helsinki.FI)
Issue tracker View/Report issues
Requires Category (R-bioconductor) ; csbl.go (R-package) ; genefilter (R-bioconductor)
Source files component.xml analyzer.R
Usage Example with default values

Inputs

Name Type Mandatory Description
annotation CSV Mandatory Gene annotation table. Parameters sourceId and targetId specify the columns containing the names of the genes and respective GO or KEGG annotations. Gene identifiers can be converted to GO annotations with KorvasieniAnnotator component, and to KEGG annotations with KEGGPathway component.
expr LogMatrix Mandatory The expression values of the genes. First column should contain the same gene identifiers (Ensemble/Uniprot) as the sourceId column in the annotation table. The number of rows i.e. genes, can be more than the number of sourceIds in the annotation table. However, expression values for all the sourceIds should be found in the expr table.
sampleGroupTable SampleGroupTable Mandatory SampleGroupTable represents the relation between a sample and its group.

Outputs

Name Type Description
report Latex Latex report of the GSEA results: QQ-plot of score statistics, scatter plots of interesting gene sets, and a table presenting statistics for each gene set.
ResultTable CSV CSV file of the GSEA results

Parameters

Name Type Default Description
GeneSet string "GO" Gene classification scheme. The possible values are "KEGG" and "GO".
Method string "ES" GSEA method. The possible values are "SS" and "ES", referring to summary score and enrichment score, respectively.
Metric string "Ttest" The metric used to score and rank the genes. The possible values are "Ttest" and "signal2noise".
SSMethod string "directed" SS method. Only used if Method is SS. The possible values are "directed" and "absolute". Use "directed" to detect the gene sets where the direction of regulation is the same, and "absolute" to detect the sets where the direction of regulation is not taken into account.
geneOrder string "descending" Used if Method is ES. Defines whether the genes should be sorted in ascending or descending order. The possible values are "ascending" and "descending".
group1 string (no default) Group label of group1. Group1 will be tested against group2. Preferably, choose case group as group1.
group2 string (no default) Group label of group2. Group2 will be tested against group1. Preferably, choose control group as group2.
nperm int 1000 The permutation distribution is computed based on nperm permutations.
pMethod string "separate" Method for calculating the p values for each gene set. Used if Method is ES. The possible values are "separate" and "same". If "separate" is used, for negative (positive) enrichment scores only the negative (positive) permuted enrichment scores are taken into account when defining the p value. If "same" is used for all (positive and negative) enrichment scores all the permuted enrichment scores are taken into account.
pagebreak boolean false Tells if the result document should start with a page break.
sLimit float 0.05 Significance limit to call gene sets interesting
section string "" Section title for the table container or an empty string if no section should be generated.
sectionType string "subsection" Type of LaTeX section: usually one of: section, subsection, or subsubsection. No section statement is written if section title is empty.
seed int 12345 Seed number for the pseudo random number generator
sourceId string "GeneId" Column name of gene identifiers in input annotation
targetId string "GeneSetId" Column name of gene set identifiers in input annotation
threshold int 10 The minimum number of genes in a gene set

Test cases

Test case Parameters IN
annotation
IN
expr
IN
sampleGroupTable
OUT
report
OUT
ResultTable
case1 properties annotation expr sampleGroupTable report ResultTable

group1 = low,
group2 = high,
Method = ES,
Metric = signal2noise,
nperm = 1000,
SSMethod = directed,
geneOrder = descending,
metadata.timeout=0

case2 properties annotation expr sampleGroupTable (expecting failure) (expecting failure)

group1 = low,
group2 = high,
Method = ES,
Metric = signal2noise,
nperm = 1000,
SSMethod = directed,
geneOrder = descending,
sourceId = GeneID


Generated 2019-02-08 07:42:09 by Anduril 2.0.0