Computes enriched GO terms in a set of genes or proteins. Enrichment analysis is done using Fisher's Exact Test. Fisher's test compares the observed frequency of each present GO term to the frequency in a reference gene/protein set. A GO term is present if some input gene/protein is annotated with the GO term or its descendants.
Genome-wide reference sets are provided for six organisms (see the parameter organism). If the study is not a genome-wide study (e.g. a microarray with only 25% of total genes), it is better to use a custom reference table, given as the enrichmentTable input. Custom tables are computed with GOProbabilityTable.
The component also computes adjusted p-values using FDR (Benjamini and Hochberg, 1995). For other multiple comparison correction methods, see MultipleComparisonCorrection. However, note that multiple comparison correction might not work well with GO enrichment analysis since a large number of statistical tests are done and no effort is done to reduce the number of tests.
Visualization of enriched GO terms is created in GraphML format. There is one network for each GO ontology. Nodes can be colorized according to the p-value. Colors have a base 10 logarithmic scale, i.e. p-values 1, 0.1 and 0.01 are equally distant from each other. Nodes contain a URL hyperlink to a description of the GO term in the geneontology.org site by default.
Version | 1.0.4 |
---|---|
Bundle | microarray |
Categories | GO |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi), Marko Laakso (Marko.Laakso@Helsinki.FI) |
Issue tracker | View/Report issues |
Requires | csbl.go (R-package) ; igraph (R-package) ; installer (bash) |
Source files | component.xml GOEnrichment.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
goAnnotations | CSV | Mandatory | GO annotations for genes or proteins. GO terms are searched using a regular expression, so the format is very flexible. Each row is considered as a distinct gene or protein. |
enrichmentTable | CSV | Optional | Custom GO probability reference table that is used in enrichment computation. If this is not given, a built-in table for a given organism is used (see the parameter organism). Probability tables can be created with GOProbabilityTable component. The table must have columns "goid" (GO accession number with GO: prefix), "prob" (probability of observing the GO term in a random gene product) and "ontology" (one of CC, BP, MF). |
Name | Type | Description |
---|---|---|
goTerms | CSV | Enriched GO terms. The list is sorted by corrected p-values if filterFDR is true and by raw p-values otherwise. The following columns are present: GOID; Frequency (number of genes that match the term); Proportion (relative frequency, in the set of IDs that have some annotation in the same ontology); PValue (raw p-value); Priori (relative frequency in the reference set); PValueCorrected (multiple hypotheses corrected p-values); Ontology (one of BP, CC, MF); Description; IDs (list of genes that match the GO term). |
graphBP | GraphML | The network of enriched GO terms from the biological process ontology in GraphML format. The network can be rendered using GraphVisualizer. |
graphCC | GraphML | The network of enriched GO terms from the cellular component ontology in GraphML format. The network can be rendered using GraphVisualizer. |
graphMF | GraphML | The network of enriched GO terms from the molecular function ontology in GraphML format. The network can be rendered using GraphVisualizer. |
Name | Type | Default | Description |
---|---|---|---|
colorEnd | string | "#ff0000" | When colorizing GO graphs, this is the color of a node with a minimally low p-value. The threshold depends on the colorMinP parameter. All nodes with p-value less than the threshold also get this color. |
colorMiddle | string | "" | When colorizing GO graphs, this is a color between the two extreme colors. This allows to create color slides between three colors. If the value is empty, a color slide with two colors is used. |
colorMinP | float | 0.0001 | When colorizing GO graphs, all nodes with p-value below this get the color given with color colorEnd. If the value is 0, the node with the smallest p-value gets the color colorEnd, i.e. the color range is scaled using the p-values present in the data. |
colorStart | string | "#ffffff" | When colorizing GO graphs, this is the color of a node with p-value 1. Setting this to empty disables node coloring. |
filterFDR | boolean | false | If true, use FDR-corrected p-values for filtering (column: pvalueCorrected). Otherwise, use raw p-values (colum: pvalue). |
filterParents | boolean | true | if true, then a GO term is excluded from the result if a child of the term has occurred higher in the list (with a lower p-value). |
includeGraphAttributes | boolean | true | If true, frequency and p-value of each GO term is included in the graph. |
maxEdgeWidth | float | 10 | Maximum edge line width in the graphs, in points. Edge widths are computed based on the frequency of the target node so that nodes with a large number of annotations have wide in-coming edges. Setting this to 1 gives the same width for all edges. |
maxFrequency | int | 999999 | For output GO terms, maximum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is above this threshold. Filtering is done before FDR correction. |
maxPriori | float | 1.00 | Maximum value of the priori probability that can be accepted for a GO term. Filtering is done before FDR correction. |
minFrequency | int | 1 | For output GO terms, minimum number of gene products that are annotated with the given term. GO terms are filtered from the output if their associated frequency is below this threshold. Filtering is done before FDR correction. |
organism | int | 9606 | NCBI taxonomy ID for the organism whose gene set is used for GO probabilities. This is used if the input enrichmentTable is not given. Supported organisms: Homo sapiens: 9606, Saccharomyces cerevisiae: 4932, Caenorhabditis elegans: 6239, Drosophila melanogaster: 7227, Mus musculus: 10090, Rattus norvegicus: 10116. |
threshold | float | 0.05 | P-value threshold for filtering GO terms. |
urlPattern | string | "http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=%s" | A printf-like pattern for creating a URL for a GO term. The pattern must contain one %s string that is expanded with the GO term in question, e.g. GO:0005575. If the value is empty, no hyperlinks are created in graphs. |
Test case | Parameters▼ | IN goAnnotations |
IN enrichmentTable |
OUT goTerms |
OUT graphBP |
OUT graphCC |
OUT graphMF |
---|---|---|---|---|---|---|---|
case1 | properties | goAnnotations | (missing) | goTerms | graphBP | graphCC | graphMF |
threshold=0.1, |
|||||||
case2_empty | (missing) | goAnnotations | (missing) | goTerms | graphBP | graphCC | graphMF |
case3 | (missing) | goAnnotations | (missing) | (missing) | (missing) | (missing) | (missing) |
case4_p0 | properties | goAnnotations | enrichmentTable | goTerms | graphBP | graphCC | graphMF |
minFrequency=2, |