Up: Component summary Component

GOEnrichment

Computes enriched GO terms in a set of genes or proteins. Enrichment analysis is done using Fisher's Exact Test. Fisher's test compares the observed frequency of each present GO term to the frequency in a reference gene/protein set. A GO term is present if some input gene/protein is annotated with the GO term or its descendants.

Genome-wide reference sets are provided for six organisms (see the parameter organism). If the study is not a genome-wide study (e.g. a microarray with only 25% of total genes), it is better to use a custom reference table, given as the enrichmentTable input. Custom tables are computed with GOProbabilityTable.

The component also computes adjusted p-values using FDR (Benjamini and Hochberg, 1995). For other multiple comparison correction methods, see MultipleComparisonCorrection. However, note that multiple comparison correction might not work well with GO enrichment analysis since a large number of statistical tests are done and no effort is done to reduce the number of tests.

Visualization of enriched GO terms is created in GraphML format. There is one network for each GO ontology. Nodes can be colorized according to the p-value. Colors have a base 10 logarithmic scale, i.e. p-values 1, 0.1 and 0.01 are equally distant from each other. Nodes contain a URL hyperlink to a description of the GO term in the geneontology.org site by default.

Version 1.0.4
Bundle microarray
Categories GO
Authors Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI)
Issue tracker View/Report issues
Requires csbl.go (R-package) ; igraph (R-package) ; installer (bash)
Source files component.xml GOEnrichment.r
Usage Example with default values

Inputs

Name Type Mandatory Description
goAnnotations CSV Mandatory GO annotations for genes or proteins. GO terms are searched using a regular expression, so the format is very flexible. Each row is considered as a distinct gene or protein.
enrichmentTable CSV Optional Custom GO probability reference table that is used in enrichment computation. If this is not given, a built-in table for a given organism is used (see the parameter organism). Probability tables can be created with GOProbabilityTable component. The table must have columns "goid" (GO accession number with GO: prefix), "prob" (probability of observing the GO term in a random gene product) and "ontology" (one of CC, BP, MF).

Outputs

Name Type Description
goTerms CSV Enriched GO terms. The list is sorted by corrected p-values if filterFDR is true and by raw p-values otherwise. The following columns are present: GOID; Frequency (number of genes that match the term); Proportion (relative frequency, in the set of IDs that have some annotation in the same ontology); PValue (raw p-value); Priori (relative frequency in the reference set); PValueCorrected (multiple hypotheses corrected p-values); Ontology (one of BP, CC, MF); Description; IDs (list of genes that match the GO term).
graphBP GraphML The network of enriched GO terms from the biological process ontology in GraphML format. The network can be rendered using GraphVisualizer.
graphCC GraphML The network of enriched GO terms from the cellular component ontology in GraphML format. The network can be rendered using GraphVisualizer.
graphMF GraphML The network of enriched GO terms from the molecular function ontology in GraphML format. The network can be rendered using GraphVisualizer.

Parameters

Name Type Default Description
colorEnd string "#ff0000" When colorizing GO graphs, this is the color of a node with a minimally low p-value. The threshold depends on the colorMinP parameter. All nodes with p-value less than the threshold also get this color.
colorMiddle string "" When colorizing GO graphs, this is a color between the two extreme colors. This allows to create color slides between three colors. If the value is empty, a color slide with two colors is used.
colorMinP float 0.0001 When colorizing GO graphs, all nodes with p-value below this get the color given with color colorEnd. If the value is 0, the node with the smallest p-value gets the color colorEnd, i.e. the color range is scaled using the p-values present in the data.
colorStart string "#ffffff" When colorizing GO graphs, this is the color of a node with p-value 1. Setting this to empty disables node coloring.
filterFDR boolean false If true, use FDR-corrected p-values for filtering (column: pvalueCorrected). Otherwise, use raw p-values (colum: pvalue).
filterParents boolean true if true, then a GO term is excluded from the result if a child of the term has occurred higher in the list (with a lower p-value).
includeGraphAttributes boolean true If true, frequency and p-value of each GO term is included in the graph.
maxEdgeWidth float 10 Maximum edge line width in the graphs, in points. Edge widths are computed based on the frequency of the target node so that nodes with a large number of annotations have wide in-coming edges. Setting this to 1 gives the same width for all edges.
maxFrequency int 999999 For output GO terms, maximum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is above this threshold. Filtering is done before FDR correction.
maxPriori float 1.00 Maximum value of the priori probability that can be accepted for a GO term. Filtering is done before FDR correction.
minFrequency int 1 For output GO terms, minimum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is below this threshold. Filtering is done before FDR correction.
organism int 9606 NCBI taxonomy ID for the organism whose gene set is used for GO probabilities. This is used if the input enrichmentTable is not given. Supported organisms: Homo sapiens: 9606, Saccharomyces cerevisiae: 4932, Caenorhabditis elegans: 6239, Drosophila melanogaster: 7227, Mus musculus: 10090, Rattus norvegicus: 10116.
threshold float 0.05 P-value threshold for filtering GO terms.
urlPattern string "http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=%s" A printf-like pattern for creating a URL for a GO term. The pattern must contain one %s string that is expanded with the GO term in question, e.g. GO:0005575. If the value is empty, no hyperlinks are created in graphs.

Test cases

Test case Parameters IN
goAnnotations
IN
enrichmentTable
OUT
goTerms
OUT
graphBP
OUT
graphCC
OUT
graphMF
case1 properties goAnnotations (missing) goTerms graphBP graphCC graphMF

threshold=0.1,
filterFDR=false,
minFrequency=2

case2_empty (missing) goAnnotations (missing) goTerms graphBP graphCC graphMF
case3 (missing) goAnnotations (missing) (missing) (missing) (missing) (missing)
case4_p0 properties goAnnotations enrichmentTable goTerms graphBP graphCC graphMF

minFrequency=2,
colorMinP=0


Generated 2019-02-08 07:42:09 by Anduril 2.0.0