Converts a KEGG pathway (KGML) to a GraphML file. The KGML file can be queried dynamically (using FTP) or provided as input. See KEGG XML API for some notes on the input format. Currently, chemical reactions modeled in metabolic pathways are not supported, i.e. the component is mostly useful for signaling pathways. Attributes in the GraphML file mostly correspond to attributes in the KGML file, so see the KGML specification for details.
As KEGG pathways use internal KEGG IDs, these may be cross-linked to other databases (e.g., UniProt) using the annotation parameter. This feature requires the SSOAP R library.
This component requires the XML R package, which may not be available for Windows directly from CRAN. However, there should be a Windows binary built by third parties; see the CRAN page for XML. On Unix, headers for the libxml2 library must be available. On Ubuntu, install libxml2-dev.
Nodes have (at least) the following attributes. "networkID" is the original numeric ID from the KGML file; the same ID may be shared between several nodes in some cases. "name" is a KEGG database identifier and is suitable for using as the node title. "type" is one of ortholog, enzyme, gene, group (complex), compound, map.
Protein complexes are split into individual nodes so that each protein gets its own node. Nodes with type=group are complexes; also, regular nodes with multiple names are treated as complexes. Nodes have a "complex" attribute that contains a unique identifier for each complex so that all proteins in the complex have the same attribute value. This allows to reconstruct complexes if necessary. All proteins in a complex have edges between each other. Also, all proteins in a complex share the same incoming and outgoing edges to proteins outside the complex.
Edges have a "type" attribute that is one of ECrel, PPrel, GErel, PCrel, maplink. For each subtype with name X, there is an attribute named "subtype_X_name" and possibly "subtype_X_value" if the value contains interesting information.
Version | 1.0 |
---|---|
Bundle | microarray |
Categories | Pathway Graph |
Authors | Kristian Ovaska (kristian.ovaska@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | igraph (R-package) ; SSOAP (R-package) ; XML (R-package) ; utils (R-package) |
Source files | component.xml KGML2GraphML.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
kgml | XML | Optional | Optionally contains the KEGG pathway KGML file. If this is not given, the file is fetched dynamically using the pathway specified by the pathwayID parameter. |
Name | Type | Description |
---|---|---|
pathway | GraphML | The produced GraphML file representing the pathway topology |
Name | Type | Default | Description |
---|---|---|---|
annotation | string | "" | Comma-separated list of cross-link databases that are used to annotate the nodes. Entries in the KGML file use KEGG identifiers; this parameter is used to cross-link those identifiers to other databases. Some databases include uniprot and ncbi-geneid. The full list of databases can be seen in http://www.genome.jp/dbget/linkdb.html. The cross-links are written as vertex annotations in the graph using the name db_X, where X is the target database. |
pathwayID | string | "" | KEGG pathway identifier. For example, hsa04012 is the ErbB signaling pathway for Homo sapiens (hsa). |
Test case | Parameters▼ | IN kgml |
OUT pathway |
---|---|---|---|
case1 | (missing) | kgml | (missing) |
case2 | (missing) | kgml | pathway |
case3 | (missing) | kgml | (missing) |