Hua Dong^{1, 2,} Yanghua Xiao^{3}, Wei Wang^{3}, Li Jin^{1,4}, Momiao Xiong^{1, 2*}
^{1}Laboratory of Theoretical Systems Biology and Center for Evolutionary Biology, State Key Laboratory of Genetic Engineering, School of Life Science, Fudan University, Shanghai 200433, China
^{2}Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
^{3}Department of Computing and Information Technology, Fudan University, Shanghai 200433, China
^{4}Chinese Academy of ScienceMaxPlanckGesellschaft Partner Institute for Computational Biology, Shanghai Institutes for Biological Science, CAS, Shanghai, 200433, China
Received Date: December 14, 2008; Accepted Date: December 24, 2008; Published Date: December 26, 2008
Citation: Hua D, Yanghua X, Wei W, Li J, Momiao X (2008) Symmetry of Metabolic Network. J Comput Sci Syst Biol 1: 001020. doi: 10.4172/jcsb.1000001
Copyright: © 2008 Hua D, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Computer Science & Systems Biology
Previous studies of properties of metabolic works have mainly focused on the statistic properties of networks, including the small world, and powerlaw distribution of node degree, and building block of network motifs. Symmetry in the metabolic networks has not been systematically investigated. In this report, symmetry in directed graph was introduced and an algorithm to calculate symmetry in directed and disconnected graphs was developed. We calculated several indices to measure the degree of symmetry and compared them with random networks. We showed that metabolic networks in KEGG and BioCyc databases are generally symmetric and in particular locally symmetric. We found that symmetry in metabolic networks is distinctly higher than that in random networks. We obtained all the orbits in networks which are defined as structurally equivalent nodes and found that compound pairs in the same orbit show much more similarity in chemical structures and function than random compound pairs in network, which suggests that symmetry in the metabolic network can generate the functional redundancy, increase the robustness and play an important role in network structure, function and evolution.
symmetry; metabolic network; structural equivalence; orbit
Metabolic networks are composed of consecutive chemical reactions to produce energy and various molecules. They are represented as directed hypergraphs, with compounds and(or) enzymes as nodes and the reactions activated by the enzymes as hyperarcs. How to characterize the structure of metabolic networks and how to link their structure with function are important in gaining deep understanding of metabolic networks. In the last decade, we have witnessed the great progress in theories and models of complex networks, which provide new powerful tools for study of metabolic networks. Previous research in complex networks have primarily focused on finding the statistical properties of various networks, such as small world properties(Jeong et al., 2000; Ma and Zeng, 2003; Wagner and Fell, 2001; Watts and Strogatz, 1998); powerlaw distribution of node degree(Mahadevan and Palsson, 2005; Samal et al., 2006); building block of network motifs(Eom, Lee and Jeong, 2006; Milo et al., 2002) and hierarchical structure of the network topology(Guimera and Nunes Amaral, 2005; Ravasz et al., 2002) . A lot of research exploiting the theory or model of complex networks has been dedicated towards metabolic networks. Jeong et al(Jeong et al., 2000), Mat and Zeng(Ma and Zeng, 2003) calculated the average path length of the metabolic networks and concluded that metabolic network belongs to smallworld network. The smallworld characteristic implies that information and energy can be rapidly transferred to the whole network and the cell can response quickly to perturbation of environments. Jeong(Jeong et al., 2000) also calculated the degree distribution and concluded that metabolic network follows the powerlaw distribution with exponential index r ≈ 2.2 . However, the smallworld property is still disputing(Arita, 2004). R. Milo(Milo et al., 2002) introduced the concept of‘network motifs’ and developed algorithms for their identification. YoungHo Eom (Eom et al., 2006) applied the concept of network motifs to metabolic network and identified the network motifs and the statistically significant subgraph patterns as well. Ravasz (Clauset, Moore and Newman, 2008; Guimera and Nunes Amaral, 2005; Ravasz et al., 2002) proposed the hierarchicalmodular model according to the characteristics of metabolic network. They calculated the average clustering coefficient for 43 different organisms as a function of the number of distinct substrates present in their metabolism. They found that, for all 43 organisms, the average clustering coefficient is about an order of magnitude larger than expected for a scalefree network of similar size.
However, symmetry, a universal property of real networks, has been rarely studied for metabolic network. Symmetry characterizes the invariance under possible transformations and implies conservation laws of nature (Hatcher, 2002; MacArthur, 2008; MacArthur and Anderson, 2006). Symmetry provides redundancy and robustness against perturbation of environments. There is increasing recognition that the universal evolution is caused by symmetry break, generating diversity and increasing complexity and energy(Mainzer, 2005; Quack, 2003). Symmetry break is often followed by addition of functional modules that usually show local symmetry, increasing network robustness(Felder et al., 2001). Until recently, after the concept of symmetry based on the automorphism has been utilized to explore real networks, quantitative methods for investigation network symmetry have been developed. MacArthur(MacArthur, 2008; MacArthur and Anderson, 2006) first found that large real networks are quite symmetric and such symmetry in real networks can be attributed to the local symmetry which is hidden in local substructures. Xiao(Xiao et al., 2008b; Xiao et al., 2008c) then proposed a principle referred to as “similar linkage pattern”, which means that nodes with similar properties such as degree tend to have similar linkage targets, to explain the emergence of symmetry. In Ref. [19], symmetry is exploited to characterize the structural heterogeneity of complex networks. Symmetry in real networks has been further used to characterize the simplicity hidden in networks and consequently has been utilized to simplify the network while reserving many key properties of original networks, such as complexity and communication(Xiao et al., 2008a).
To date, symmetry in metabolic networks and the relations between the structural symmetry and function of the network have rarely been investigated. It is still unknown whether symmetry exits in metabolic networks. If it does, the existence of such symmetry also begs a biological explanation. Purposes of this report are (1) to examine the symmetry of the metabolic networks, (2) to measure the abundance of the symmetry in metabolic networks, (3) to obtain the orbits (structurally equivalent nodes) through restricted network quotient and (4) to explore biological implication of the symmetry of the metabolic networks. To accomplish this, we first reconstruct metabolic networks for 705 organisms in KEGG and 373 in BioCyc databases. Then, we study the influence of connectivity of the reconstructed networks on the symmetry of metabolic networks. Previous works about symmetry in the general networks have focused on undirected graphs. However, the metabolic networks are usually handled as directed graphs. Hence, it’s necessary to explore symmetry in directed graphs and develope algorithms to find symmetry in the directed and disconnected networks. Based on these results, we then systematically investigate the properties of symmetry in metabolic networks, including the degree of symmetry, restricted network quotient. To explore functional implications of the structural symmetry of metabolic networks, we test the chemical structure similarity of the symmetric compounds in metabolic networks of 705 organisms which allow us to reveal the relationships between the network symmetry and its function.
Reconstruction of Metabolic Network
A metabolic network is represented as a directed graph G (N,E) with node set N representing compounds and the edge set E representing the chemical reactions which the compounds participate in. The direction of each edge implies the direction of the chemical reaction. We downloaded the metabolic network data from two major metabolic network databases: KEGG(Kanehisa and Goto, 2000) and BioCyc.
KEGG is a collection of simplified metabolic networks which are manually drawn pathway maps representing knowledge on reactions, while currency metabolites such as H_{2}Oand ATP are not included. Xml files of all metabolic pathways for 705 organisms were downloaded from KEGG FTP. We reconstructed the metabolic networks according to the reactions data extracted from the xml files and visualized the network by Pajek(Batagelj and Mrvar, 2003). According to KEGG metabolic functions classification, we integrated single pathways into functional modules. Finally we integrated 11 functional modules into a whole metabolic network. See Figure 1 (a) (b) for Drosophila melanogaster’s Cofactors and Vitamins Metabolism module and the integrated metabolic network.
Figure 1: KEGG Metabolic network reconstruction of Drosophila melanogaster
(A) Metabolism of Cofactors and Vitamins module for Drosophila melanogaster (fruit fly) Nodes: 100, Edges:96. The nodes represented compound ID in KEGG database. The nodes in different orbit are marked with different colours and nontrivial orbits are marked in green ellipse. We can see that there are 25 connected subgraphs and 12 orbits in the module. (B) All 100 metabolic pathways of Drosophila melanogaster were integrated into a metabolic network (Nodes: 1050, Edges:1340). The nodes in different orbit are marked with different colours. The compound ID is not shown in the figure.
The BioCyc collection of Pathway/Genome Databases (DBs) provides electronic reference sources on the pathways and genomes of different organisms. We collected metabolic pathways of 373 organisms, extracted the reaction data of each organism and integrated the reactions to a network for each organism. Direction of reactions is not given in BioCyc database but currency metabolites are included in. So we finally got 373 undirected graphs including currency metabolites from BioCyc. We first analyze symmetry of KEGG networks and replicate our experiment for BioCyc networks to validate whether our conclusions are ubiquitous in metabolic networks. See additional files 1 for organisms and pathways in KEGG and BioCyc.
The Connectivity of Metabolic Networks
For the integrated networks of 705 organisms in KEGG, the number of the connected subgraphs (NCS) varies from 1 to 271. Only 5 networks (0.7%) contain less than 10 connected subgraphs. The maximum connected subgraph (MCS) contains 7.8%100% nodes of the whole network. The proportion of MCS, which is defined as the ratio of the number of the nodes in MCS over the total number of nodes in the network, in 99.6% and 56% of the constructed metabolic networks is less than 80% and 60%, respectively. So we can conclude that the connectivity of metabolic is quite low. We introduced normalized entropy based on the connected subgraph (NECS) to measure the degrees of the connectivity of the network (See Materials and Methods). The larger NECS, the less the network connected. NECS value of the metabolic networks ranges from 0 to 0.778064 with the mean value 0.410917.
For the integrated networks of 373 organisms in BioCyc, the number of the connected subgraphs (NCS) varies from 2 to 76, which is distinctly less than that in KEGG. The maximum connected subgraph (MCS) contains 92.1%99.3% nodes of the whole network, definitely larger than that in KEGG. NECS value of the metabolic networks ranges from 0.007 to 0.075 with the mean value 0.0347128. As the currency metabolites were included in BioCyc , the connectivity of metabolic network is significantly increased.
To gain deeper understanding of the connectivity in metabolic networks, we compared it with the random networks. For each real graph, we generate 100 randomized networks by rewiring the local edges(Maslov, Sneppen and Zaliznyak, 2004). Then we compute the MCS, NCS and NECS for every network and summarize the mean and variance over the 100 randomized networks. From the error bar in Figure 2, we can see clearly that: most of the NCS in real metabolic network is larger than that in random network (89.8%); most of the MCS in real metabolic network is lower than that in random network (96.9%); NECS in real network is obviously larger than the corresponding random network (96.9%). We also compared the connectivity of BioCyc network with their random networks using the same method. In spite of relatively larger connectivity compared with that in KEGG, the connectivity in BioCyc metabolic network is still smaller than that in random networks. The results are shown in Supplementary Figure 1. The results of NCS, MCS and NECS value for 705 real networks and their corresponding randomized networks for both KEGG and BioCyc networks were shown in additional data file 2. The results above imply that the connectivity in metabolic network is rather small. Consequently connectivity has to be taken into account when exploring the network reduction of metabolic networks.
Figure 2: Comparison of the connectivity of metabolic network in KEGG with random network
The random networks are produced by randomly rewiring the local edge in the given real networks. (A) NCS of the real metabolic networks and their random networks. (B) MCS of the real metabolic networks and their random networks. (C) NECS of the real metabolic networks and their random networks. Please note that due to the tiny variances of MCS and NECS of random networks, the error bar looks like the scatter line.
Symmetry in Metabolic Networks
Given a metabolic network G(N,E), a onetoone mapping, or bijective mapping, from N onto itself is called a permutation on N. Two nodes are adjacent if there is an arc from one node to the other node. Among all the permutations in S(N), where S(N) is the set of permutations acting on N, some permutations can preserve the adjacency of the nodes and these permutations are called automorphisms acting on the node set. The set of automorphisms under the product of permutations forms a group: Aut(G)(Tinhofer and Klin, 1999).Two nodes x and y are automorphically equivalent to each other if there is an automorphism that transforms node x to node y. In the context without confusion, such equivalence relation of nodes is also called structural equivalence. A set of structurally equivalent nodes is defined as an orbit of Aut(G). According to such equivalent relation on node set, we can get a partition P={N_{1},N_{2},…N_{m}}, called as automorphism partition, which is composed of orbit sets N_{1},N_{2},…N_{m} . An orbit is nontrivial when it contains more than one node, otherwise it’s trivial. A network is called symmetric if we can find at least one nontrivial orbit in this network, otherwise the network is asymmetric. The quotient graph of an undirected graph is defined as a reduced graph which has every orbit (structurally equivalent nodes) as its new node and adds an edge to connect two nodes if there is at least one edge from any one node in the orbit to any one node in another orbit. The quotient graph of a directed graph is similar to that of undirected graph except that the direction of the arcs is preserved in the quotient of directed graph.
Consider the undirected graph G0 in Figure 3(A), the automorphism partition is P_{0}={N_{1},N_{2},N_{3}, N_{4}}, where N_{1}={1,2}, N_{2}={3}, N_{3}={4}, and N_{4}={5,6,7}. There are four orbits which are classified by different colours. The quotient graph of G_{0} is shown in G_{1}. The four orbits of G_{0} are reduced to four nodes in G_{1} , and as long as there is an edge between any two orbits of G_{0}, the corresponding nodes in G_{1} are adjacent. For directed graph G_{0} in Figure 3(B) , the automorphism partition is P_{0}={N_{1},N_{2},N_{3},N_{4},N_{5}}, where N_{1}={1,2}, N_{2}={3}, N_{3}={4}, and N_{4}={5,6} and N_{5}={7}. The five orbits of G_{0} are reduced to five nodes in G_{1} , and as long as there is an arc from one orbit to another orbit in G_{0}, there is an arc from one corresponding node to another corresponding node in G_{1} (shown in Figure 3(B)). Please note that in the directed graph G_{0} in Figure 3(B), the edges between orbits N_{1} and N_{2} and that between N_{2} and N_{3} are both bidirectional, which determines that the edge between corresponding nodes in G_{1} are bidirectional. Because the direction of arc <7, 4> is different from that of <4, 5> and<4, 6>, nodes 5 and 6 belong to the fourth orbit while node 7 belongs to the fifth orbit.
Consider Figure1 (A) where we take the Drosophila melanogaster’s Cofactors and Vitamins metabolism module as a real example. There are 100 nodes and 96 arcs in this module. The NCS(number of connected subgraph) is 25, which implies the connectivity of this network is very small. There are 12 nontrivial orbits in the network, each of which is marked in green ellipse.
To measure the degree of the symmetry of metabolic networks, we calculate α: the size of Aut(G), β: the relative degree of network symmetry of 705 metabolic networks^{1} α. reflects the absolute symmetry degree of network directly. β is used to measure the symmetry of networks with different sizes. Generally, the larger α and β is, the more symmetric the network is. Among all the tested metabolic networks, 99.3% of them have α larger than 1010 and 82.3% of them have α larger than 10100, which implies that most of metabolic networks are far away from asymmetric network. Hence, generally metabolic networks can be characterized as symmetric networks. Statistics of β shows that 98.7% of the metabolic network has β smaller than 0.1 and 83.3% of the metabolic network has β smaller than 0.01, which suggests that relative symmetry degree of metabolic network is quite low compared to the maximal symmetry degree of networks with equivalent number of nodes.
We also summarize φ: the degree of global symmetry for the networks. For some networks, there is some of automorphisms moving all the nodes or most of nodes. The action of such automorphism on node set will have global influence on the structure of the graph. However, for some other networks, all the automorphisms only move a small part of vertices or only action on the local subgraph of the network. Hence, when studying symmetry of networks, it’s necessary to characterize the degree of global symmetry or local symmetry of the network. Generally, the larger φ is, the more globally symmetric the network is. Among all the tested metabolic networks, 98.6% of which has φ smaller than 0.1 and 72.8% has φ between 0.050.01, which suggest that metabolic network is very local symmetric.
To validate our conclusion, we replicate our experiment using BioCyc datasets. Among all the tested metabolic networks, 97.6% of them have α larger than 10^{10} and only one network (metaCyc) has α larger than 10^{100}. Statistics of β shows that only 1metabolic network has β smaller than 0.001 and 5 of the metabolic networks has β larger than 0.01, β values of the other networks (98.4%) are all between 0.001 and 0.01. φ of BioCyc data varies from 0.002071 to 0.0434783 with a mean value of 0.008364. Please see additional data file 3 for the results of α, β and φ value for metabolic networks in KEGG and BioCyc.
To further investigate the symmetry of metabolic network, we compared three indices α, β and φ of metabolic network with corresponding randomized networks. For each real metabolic networks for703 organisms, we generated 100 randomized networks with the same number of nodes and edges as the real network following ErdosRenyi random graph model (Erdos and Renyi, 1960)^{2} .(Two organisms: Debaryomyces hansenii (dha) and Legionella pneumophila Corby (lpc) were not included because they cannot produce ErdosRenyi random networks due to their small network size). Then we compute α, β and φ for every random network and summarize the mean and variance over the 100 randomized networks for each real network. The comparisons of α, β and φ between real network and random networks are shown in Figure 4. We can see clearly from Figure 4 that 99.4% of α, 99.4% of β and 99.9% of φ in real metabolic network is larger than that in random network. These results demonstrated that the symmetry in metabolic network is obviously strong than that in random network, suggesting that symmetry is an important and nonignored feature in metabolic network.
Figure 4: Comparison of the symmetry indices of real metabolic networks with random networks
The random networks are produced by ErdosRenyi model in Pajek . (A) Error bar of the symmetry index αof real metabolic networks and corresponding randomized networks. (B) Error bar of the symmetry index βof real metabolic networks and their corresponding randomized networks. (C) Error bar of the symmetry index of real metabolic networks and their corresponding randomized networks. Please note that the variances of β and φ of randomized networks is so small that the error bar can not be observed obviously.
Again, we replicated our study in BioCyc metabolic networks. We found that in spite of relatively less symmetry compared with KEGG, the symmetry in BioCyc network is still substantially larger than that in random networks, suggesting that metabolic network is far away from asymmetric network. The comparisons of α, β and φ between real network in BioCyc and random networks are shown in Supplementary Figure 2. Please see additional data file 4 for the results of α, β and φ value for randomized networks in both KEGG and BioCyc.
Comparison of the Cconnectivity and Symmetry of Metabolic Networks between KEGG and BioCyc Datasets
In table 1, we compared the range, mean value and variance of three connectivity indices NCS, MCS, NECS and three symmetry indices α, β and φ between networks of 705 species in KEGG and networks of 373 species in BioCyc. We can see clearly that for all the six measures, the range and variances in BioCyc datasets is significantly less than that in KEGG datasets. (For the range of log_{10} α in BioCyc, if we get rid of the largest value 612.608 and the smallest value 0.903; remaining values range from 6.8596.367). The mean values of NCS and NECS of BioCyc datasets are less than that in KEGG datasets, however, MCS of BioCyc is larger than that of KEGG. All these facts suggest that metabolic network in BioCyc is more connected. Since symmetry is typically greater for lower connectivity and shorter branches networks (MacArthur, 2008), it’s naturally to find that symmetry in BioCyc networks is less than that in KEGG networks.
database  KEGG  BioCyc  

Number of organisms  705  373  
NCS  range  1271  276 
mean  124.048227  16.07238606  
variance  3128.634034  44.65872467  
MCS  range  7.8%100%  92.1%99.3% 
mean  0.548437869  0.966615067  
variance  0.023976601  0.00011063  
NECS  range  00.778064  0.0070.075 
mean  0.410917  0.0347128  
variance  0.012554904  0.000108611  
log_{10}α  range  0.301488.52  0.903612.608 
mean  199.39  30.69700184  
variance  11423.74545  1099.766327  
β  range  0.001960.72478  0.0005980.0581303 
mean  0.01  0.004181307  
variance  0.001709304  1.04908E05  
φ  range  0.005760.666667  0.002070.0434783 
mean  0.02  0.008363534  
variance  0.001298575  2.01729E05 
Table1: Comparison of the connectivity and symmetry of metabolic networks between KEGG and BioCyc datasets.
However, for both datasets, we clearly found that symmetry in real networks is obviously larger than that in randomized networks. No matter which metabolic network reconstruction methods were used, we came to the same conclusion that the symmetry of metabolic network, specifically, local symmetry, does exist.
Inferring Functional Equivalence from Structural Equivalence
We calculated the automorphism group (See material and methods for its definition) and obtained the orbits (structurally equivalent nodes) considering the connectivity and direction constraints for the reconstructed metabolic networks of all 705 organisms in KEGG. Since nodes in the same orbit are structurally equivalent to each other, it motivates us to further explore whether structurally equivalent nodes are functional equivalent. To accomplish this, we first generated two datasets which consist of similarity scores (See Materials and Methods for definition of similarity score). One dataset is referred to as orbit dataset. For each orbit in the metabolic network we calculated similarity scores for all pairs of compounds in the orbit and averaged them as the similarity score of the orbit. All similarity scores of the orbits in the networks of 705 organisms in KEGG were collected to form the orbit dataset where the replicated orbits were just calculated once. Another dataset which is referred to as random dataset was generated by collections of similarity scores of all pairs of compounds in the metabolic modules of all 705 organisms in KEGG. We used t statistics and wilcoxon rank sum statistics(Pagano and Gauvreau, 2000) to test whether there were significant differences in the similarity scores between orbit dataset and random dataset^{3}
The results were shown in Table 2, from which we can see that in all pathway/modules, the compounds in the orbit showed significant evidence of similarity in compound chemical structures, which implied that the structurally equivalent nodes in metabolic networks are similar in their chemical structure. The compounds with similar chemical structure will have similar functions and play similar roles in biochemical reactions(Gutteridge, Kanehisa and Goto, 2007). In Table 3, we listed the values of similarity of the compounds in all the orbits in Glycolysis/Gluconeogenesis and TCA cycle pathways. In Glycolysis/Gluconeogenesis pathway, 92.7% of orbits’ similarity is larger than 0.5, while in TCA cycle pathways 55% of orbits’ similarity is larger than 0.5. To gain further understanding of the nature of similarity among the compounds in the orbit, we presented the results of Glycolysis/ Gluconeogenesis pathway and TCA cycle pathway.
Modules  Orbits#  Compound pairs #  Pt1  Pt2  Pr 

Glycolysis  55  406  3.04E13  1.5814E10  2.3055E12 
TCA cycle  40  171  0.009  0.0192  0.0497 
Amino_Acid  557  131328  2.205E81  8.0344E32  2.9589E38 
Carbohydrate  618  92665  4.6399E110  1.5533E50  1.475E65 
Cofactors_Vitamins  211  37128  2.7279E92  1.3082E33  2.5883E47 
Energy  142  5050  0  0  0 
Glycan_Biosynthesis  24  2278  4.4982E26  0  8.2242E12 
Lipid  301  81406  3.6647E171  1.1755E63  3.395E90 
Nucleotide  193  9453  4.6127E66  2.1246E40  2.5256E49 
Other_Amino_Acids  102  10440  0  0.0015  0.0131 
Polyketides  9  780  0  0.0027  0.0007 
Secondary_Compounds  76  75078  1.5539E78  4.5943E17  9.5074E24 
Xenobiotics  134  73920  8.0514E59  7.1642E31  1.7688E43 
Table2: Pvalues for testing significance of the similarity of the compounds in the orbits.See additional files 5 for the module description. Orbits # denotes the number of pairs of compounds in the orbit, compound pairs # denotes the number of pairs of compounds in the random dataset, P_{t}^{1} is the Pvalues of right tail ttest with equal variance; P_{t}^{2} is the Pvalues of right tail ttest with unequal variance, P_{r} is the Pvalues of Wilcoxon twosided rank sum test.
Glycolysis/Gluconeogenesis  TCA cycle  

orbits  similarity  orbits  similarity 
C00668 C01172  1  C00042 C00122  1 
C00084 C00469  1  C00311 C05379  1 
C00221 C00267  1  C00036 C00149  1 
C00111 C00118  1  C00311 C00417  0.923077 
C00031 C00267  1  C00158 C00417  0.923077 
C00022 C00186  1  C00158 C00311 C00417  0.901099 
C00668 C01172 C05345  0.921569  C00122 C00149  0.888889 
C00111 C00197  0.909091  C00036 C00042  0.888889 
C00074 C00631  0.909091  C00036 C00122  0.888889 
C01172 C05345  0.882353  C00042 C00149  0.888889 
C00103 C01172  0.882353  C00068 C05381  0.787879 
C00103 C00668  0.882353  C00074 C00149  0.727273 
C00668 C05345  0.882353  C00036 C00074  0.727273 
C06187 C06188  0.88  C00149 C00158  0.692308 
C00197 C00631  0.833333  C00022 C00036  0.666667 
C00236 C06189  0.8125  C00022 C00149  0.666667 
C05345 C05378  0.8  C00022 C00074 C00149  0.664647 
C00197 C06189  0.785714  C00074 C00122  0.636364 
C00033 C00084  0.75  C00042 C00074  0.636364 
C00033 C00469  0.75  C00022 C00074  0.6 
C00221 C01172  0.75  C00022 C00122  0.555556 
C00111 C00118 C00668 C01172  0.75  C00033 C00122  0.5 
C00103 C00221  0.75  C00033 C00036  0.444444 
C00118 C00631  0.75  C00011 C 00026 C00311  0.433333 
C00103 C00267  0.75  C00036 C00158 C00566  0.333625 
C00221 C00668  0.75  C00011 C00026  0.3 
C00197 C00236  0.733333  C00022 C00024 C00033  0.273504 
C00197 C01159  0.733333  C00011 C00311  0.230769 
C00631 C01159  0.733333  C00158 C00566  0.177419 
C00074 C00197 C01159  0.716667  C00010 C00158  0.173077 
C01451 C06186  0.695652  C00024 C00074  0.173077 
C00031 C00267 C06187  0.681159  C00024 C00036  0.132075 
C00111 C00236  0.666667  C00024 C00149  0.132075 
C00111 C00118 C05378  0.666667  C00036 C00566  0.131148 
C00074 C00111  0.666667  C00042 C00091  0.125 
C00074 C00118  0.666667  C00091 C00122  0.125 
C00031 C00267 C06188  0.666667  C00091 C00149  0.122807 
C00118 C00236  0.666667  C00036 C00091  0.122807 
C00031 C00267 C06187 C06188  0.653913  C00022 C00024  0.096154 
C00103 C05378  0.636364  C00024 C00033  0.057692 
C00031 C06187 C06188  0.633913  
C00236 C00631  0.625  
C00111 C05345  0.625  
C00074 C00236  0.5625  
C00186 C00631  0.545455  
C00221 C05378  0.52381  
C00267 C05378  0.52381  
C00267 C06187  0.521739  
C00111 C05378  0.5  
C00267 C06188  0.5  
C00031 C06188  0.5  
C00186 C00236  0.4  
C00024 C00469  0.058824  
C00024 C00084  0.058824  
C00024 C00033  0.057692 
Table3: Orbits and their similarity in Glycolysis/Gluconeogenesis and TCA cycle pathways. Orbits were sorted in the descending order of similarity score. See additional files 4 for the orbit similarity and random compoun similarity of another 11 modules.
(1) The orbit [C00668, C01172] in Glycolysis/Gluconeogenesis pathway.
It included C00668 (alphaDGlucose 6phosphate) and C01172 (betaDGlucose 6phosphate). Their chemical structures were shown in Table 3. C01172 is an isomer of C0066, so their chemical structures are identical. We examined the pathways which two compounds C00668 and C01172 participate in and the enzymes catalyzing the biochemical reactions of these two compounds. We found that they shared most of the pathways and enzymes (some enzymes are the same; some enzymes are in the same category, like 3.2.1.86 and 3.2.1.26). Interested readers please see Table 4 for details.
C00668  C01172  
Name  alphaDGlucose 6phosphate  betaDGlucose 6phosphate 
Chemical structure  
Pathway 
map00052 Galactose metabolism map00500 Starch and sucrose metabolism 
map00500 Starch and sucrose metabolism 
Enzyme 
2.4.1.15 2.7.1.1 2.7.1.2 2.7.1.63 2.7.1.69
3.1.3.9 3.2.1.26 3.1.3.9 3.2.1.26 5.3.1.9 5.1.3.15 5.4.2.2 5.4.2.5 
1.1.1.49
2.7.1.1 2.7.1.2 2.7.1.63 3.2.1.86 5.3.1.9 5.1.3.15 5.4.2.6 
(2) The orbit [C00158, C00311, C00417] in TCA cycle pathway.
Their names, chemical structures, pathways which they participate in and enzymes they were catalyzed by were shown in Table 5. The compound C00311 (Isocitrate) is the isomer of the compound C00158 (Citrate). C00417 (cis Aconitate) is the product of dehydrolysis from C00158 and C00311. In TCA cycle, the three reactions among these three compounds are catalyzed by the same enzyme EC4.2.1.3:
C00158  C00311  C00417  
Name 



Chemical structure  
Pathway  map00020 Citrate cycle (TCA cycle)

map00020 Citrate cycle (TCA cycle)

map00020 Citrate cycle (TCA cycle)
map00660 C5Branched dibasic acid metabolism 
Enzyme  2.3.3.1 2.3.3.3 2.3.3.8 2.8.3.10 3.4.13.20 4.1.3.6 4.2.1.3 4.2.1.4 6.2.1.18 6.3.2.27 
1.1.1.41 1.1.1.42 1.1.1.286 2.3.1.126 4.1.3.1 4.2.1.3 
4.1.1.6 4.2.1.3 4.2.1.4 5.3.3.7 
Table5: Compounds in Orbits [C00158, C00311, C00417].
R01325: Citrate<=> cisAconitate + H_{2}O
R01900: Isocitrate <=> cisAconitate + H_{2}O
R01324: Citrate <=> Isocitrate
From the Table 5 we can see clearly that three compounds C00158, C00311 and C00417 participated in the same pathways and were catalyzed by same enzymes in most cases.
Since metabolic function is mainly determined by chemical structure(Gutteridge et al., 2007), our work showed that structural equivalent nodes in the metabolic networks were more likely to have the same or similar function.
Recently, symmetry in general complex networks has attracted certain research interest. All previous works (MacArthur, 2008; Xiao et al., 2008a; Xiao et al., 2008b; Xiao et al., 2008c)about symmetry in the networks have focused on undirected graphs. To explore symmetry in the metabolic networks, in this report we first introduced the concept of symmetry and developed algorithms to search symmetry in the directed networks. Then we further systematically investigated the symmetry properties of metabolic network, including the degree of symmetry, restricted network quotient. We observed much higher symmetry in metabolic networks which are reconstructed from KEGG and BioCyc datasets than that in random networks. Our preliminary results showed that metabolic networks are generally symmetric and in particular locally symmetric. To explore functional implications of the structural symmetry of metabolic networks, we tested significance of the chemical structure similarity of the compounds in the same orbit of the network. We found that compounds that are structurally equivalent to each other tend to have high similarity in chemical structures and that the structurally equivalent compounds often take part in the activities of the same pathway and are catalyzed by same enzymes. This may suggest that the symmetry in the metabolic network can generate the functional redundancy, and increase the robustness and the ability against attack of external disturbances.
Symmetry may arise from duplications in evolution of metabolic networks. In this report, we have focused on the symmetry of the metabolic networks. Due to the strong correlation between symmetry and duplication (a universe mechanism in biological networks) (Bhan, Galas and Dewey, 2002; Chung et al., 2003; Teichmann and Babu, 2004), symmetry is expected to be ubiquitous in a variety of other biological networks and to play an important role in the evolution of biological networks.
Despite increasing interests in exploring the role of symmetry in evolution of networks, mechanism of evolution of symmetry has not been well investigated. It is worth studying the mechanism of generating symmetry of the networks and the role of structurally equivalent compounds in cell and molecular functions in the future.
Metabolic Network Reconstruction
At present, two major metabolic reconstruction methods are usually used. One method is introduced in Ma and Zeng(Ma and Zeng, 2003), where “currency” metabolites like H2O, ATP, ADP are not included as nodes in network. This simplified metabolic network is biochemically meaningful in calculating path length. KEGG PATHWAY database uses the simplified metabolic network reconstruction method. The xml files of totally 152 metabolic pathways for every organism (705 organisms in total) were downloaded from KEGG FTP(Release date: Dec. 18, 2007). Reaction data were read from the xml files and represented as directed graph. The direction of each link implies the direction from an input compound (reactant) to an output compound (product) (See Figure 1(a)). Single pathways are combined to modules according to their metabolic functions, such as Carbohydrate Metabolism, Metabolism of Cofactors and Vitamins. There are 11 modules and each module contains 223 pathways. See additional data file 1. We also integrated all the 152 metabolic pathways into a metabolic network for every organism. Another metabolic network reconstruction method includes currency metabolites as nodes in network(Jeong et al., 2000), which makes metabolic network more connected. The metabolic network reconstruction in BioCyc database is a representative of this method. Totally 373 available Pathway/Genome Databases was downloaded (Release date: Oct. 15, 2008. Each database in the BioCyc collection describes the genome and
metabolic pathways of a single organism, including another independent database AraCyc which has not been combined into BioCyc yet). We processed the tabular flat files of reaction data and combined the reactions into an integrated network for each organism. Direction information was not given in reaction files of BioCyc. So the integrated networks are unidirectional networks.
Connectivity of Metabolic Networks
Since many enzymes have not been found in some organisms yet, the reactions (edges in network) which are catalyzed by these enzymes are absent in the current network. Hence, we expect that the connectivity of metabolic network is quite low compared to corresponding randomized synthetic networks. To verify such conjecture, we use the number of connected subgraphs (NCS) and ratio of size of maximum connected subgraph to the whole network size (MCS) to measure the connectivity of metabolic network. We calculate the number of connected subgraph (NCS) in every single network. Apparently, the larger the NCS is, the lower the connectivity of metabolic network is; the larger MCS is, the more connected the network is. Furthermore, based on these concepts, we can define a new index: entropy based on connected subgraph (ECS) to measure the average connectivity of a metabolic network:
ECS =  Σ p_{i} log p_{i}
0 ≤ i ≤ c,
where C is the set of connected subgraphs of the network and p_{i} is the probability that a node belongs to a connected subgraph C_{j}. Given all connected subgraphs C={C_{1} , C_{2} ,…, C_{k}}, we can calculate p_{i} as:
p_{i} = C_{i} / Σ _{j} C_{j} = C_{i} / N,
where N is the number of nodes in a graph. Clearly, for networks with N nodes, ECS_{max} =logN when p_{i}=1/N for each 0 ≤ i ≤ c (in this case, every node in the network is isolated from each other); ECS_{min} =0, when the network is a connected graph and consequently p=1 . Thus, we can define normalized entropy based on ECS_{max} and ECS_{min} :
NECS = ECS  ECS_{min} / ECS_{max}  ECS_{min} = ECS / log( N )
In general, networks that contain a large connected subgraph tend to have a relatively small value under the measurement of NECS. Whereas metabolic networks are expected to be less connected and consequently the value of NECS is expected to be relatively large. The connectivity statistics including NCS, MCS and NECS in this section are summarized from the underlying graphs of the metabolic networks, where the direction and self loops were removed from the original network.
Assessing Symmetry of Complex Networks
The degree of the symmetry of a graph G usually could be quantified by the following formula:
α =  Aut(G) 
which is the size of the automorphism group of graph G. Generally, α is very large and we usually use log_{10}Aut(G) .
In order to compare the symmetry of networks with different sizes, symmetry measure β is often used, which is defined as:
β = ( α / N! )^{1/N} ,
where N is the node number of network, β measures the symmetry relative to maximal possible automorphism group of a graph with N nodes.
In general, for empirical networks, when network grows its symmetry is often destroyed. As evolution proceeds we rarely find global symmetry in the network, which means we can rarely find automorphisms that transforms most of nodes. In fact, many real networks have been shown to be locally symmetric(MacArthur, 2008) , which means that we can only find automorphisms which transform only small part of nodes in the network. Here, we use φ to quantify the degree to which graph G is globally symmetric(Xiao Y et al., “Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs”. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT’09), March 2326, 2009.):
φ = max{ supp(g) : g ε ID(G)} / N ,
where supp(g)={ v_{i} : v_{i}^{g }≠ v_{i} } and ID(G) is the set of indecomposable automorphisms of graph G. An indecomposable automorphism of Aut(G) is a nonidentity automorphism in Aut(G) that can not be decomposed into the product of two automorphisms g_{1} and g_{2} such that g_{1} ≠ e and g_{2} ≠ e and supp(g_{1}) Π supp(g_{2}) = φ , where e is the identity permutation that transform each vertex to itself.
To compute the above measures, the well known nauty program(McKay, 1981), which is one of the most efficient graph isomorphism algorithms available, is used to calculate the size and structure of automorphism groups.
Symmetry in Metabolic Networks
Symmetry in Directed Graph
In most of the previous studies of complex networks, networks are usually preprocessed as their underlying graphs: where weights, directions and selfloops are omitted. However, in the studies of metabolic network, the direction can’t be omitted since many reactions are irreversible and the direction determines the reaction rates and the product output. Hence, when exploring symmetry in metabolic networks, we need to investigate symmetry in directed networks first.
In general, a directed network is a pair (N, E) with N representing the node set and E representing the set of ordered pairs of N. The related concepts of symmetry in directed graph is completely the same as that in undirected graphs. The fact that we need to highlight when exploring symmetry in directed networks is that any automorphism in a directed network need to preserve the oriented relation instead of unoriented relation in undirected graph.
It’s trivial to show that if g is an automorphism of a directed graph G, g will also be an automorphism of its underlying graph G’. However the inverse does not necessarily hold true. Hence, if G’ is the underlying graph of graph G, we have Aut(G) ⊆ Aut(G’), which implies that the degree of symmetry of G is smaller than that of G’. Consequently, the automorphism partition of the directed graph is finer^{4} than that of its underlying graph.
Restricted Network Quotient
Recall that nodes of a symmetric network can be partitioned into disjoint equivalent classes which are called orbits of the graph according to the automorphic equivalence relation on nodes. Nodes in the same orbit are structurally equivalent and cannot be distinguished from each other(Tinhofer and Klin, 1999) by usual measurement on nodes, such as degree, clustering coefficient. Therefore they can be glued together to create a coarse reduced network, known as the quotient. In Figure 3, G_{1} are quotient networks of G_{0}. Since metabolic networks possess a nontrivial automorphism partition they carry a significant amount of redundant information in which more than one node plays the same structural role.
In most of the previous researches about complex network, only the automorphism group of the largest connected subgraph is exploited. In above sections, we have shown that metabolic networks are more disconnected than other empirical networks. Hence it is necessary to explore the symmetry of metabolic networks with all disconnected subgraphs taken into account.
However, preserving all disconnected subgraphs will pose a challenge to the calculation of quotient of metabolic structure. Please note that when calculating quotient of a graph consisting of two isomorphic disconnected subgraphs, these two subgraphs will be merged into one subgraph under the action of the automorphism group of the graph. Hence, calculation of network quotient should take into account the connectivity constraint so that the isomorphic isolated modules will not be merged into one reduced subgraph in the quotient.
Specifically, assume that graph G contains pairwise isomorphic and disconnected subgraphs G_{1}, G_{2}, …G_{m}. Let H(G) be the set consisting of all those automorphisms that swap nodes between pairwise and disconnected subgraphs, i.e.
H(G)={g: x^{g }= y and g ε Aut(G) and x ε V(G_{i}) and y ε V(G_{j}) and i ≠ j }.
Then we can calculate the restricted quotient of graph G under the action of R(G) = Aut(G)  H(G).
It’s easy to show that in the restricted quotient all disconnected subgraphs in the original graph will not be merged and consequently the number of disconnected subgraphs will be preserved. We obtained the orbits in network through the restricted network quotient.
Given a graph G_{0} consisting of two isomorphic subgraphs, as shown in Figure5(A). Obviously, the automorphism group of G_{0} can be decomposed into the wreath product(Rotman, 1999) of G_{in} and G_{out}, where G_{in} is the triangle, shown in Figure 5(B), and G_{out} is the abstracted outer graph, as shown in Figure 5(C). Note that in Aut(G), there are some automorphisms, such as g=(1,4)(2,5)(3,6) that swap nodes in different isolated subgraphs( e.g. 1 and 4 are transformed to each other in spite of that these two vertices are in different isolated subgraphs). Under the action of such automorphisms, we finally can obtain one single orbit {1,2,3,4,5,6}for G_{0}. Thus the quotient of G_{0} is just a single node (shown in Figure 5(D)), which contradicts to the fact that two isolated triangle structures often interpreted as two isolated functional modularity for biological networks. However based on the concept of restricted network quotient, the network G_{0} can be reduced to a quotient network consisting of two isolated nodes (shown in Figure 5(E)).
In the computation of symmetry of directed networks, we find that nauty program can not ensure the correctness for directed graphs. Hence, we first use nauty to get the automorphism partition of the underlying graph of the network and then refine the automorphism partition. Considering the direction and connectivity of metabolic networks, the algorithm to get the orbits is shown in Algorithm 1. Although theoretically we can not ensure that the resulting partition is equivalent to the automorphism partition under the restricted automorphism group, the resulting partition is practically close to the desired partition and is practically useful in the exploration of functional equivalence of nodes in the same orbits:
Algorithm1: getOrbits (G )
Input: a metabolic network G
Output: new orbit partition P’
{
1. P’=φ, G={ G_{1}, G_{2}, …G_{m} } // get Connected Subgraphs of G ;
2. R(G)=Aut(G)H(G) // get restricted automorphisms
3. P={V_{1}, V_{2}, …V_{k}} // obtain partition according to R(G);
4. for each V_{i} ε P such that V_{i}>1
5. for each v ε V_{i}
6. L(v)=(N^{+}(v),N^{}(v)); // compute the indegree and outdegree of v
7. Order(V_{i}) ; // Sort Vi according to lexicographic order of L(v)
8. {V_{i}_{1}, V_{i}_{2},…, V_{i}_{k}} = Subdivide(V_{i});
9. P’=P’ υ {V_{i}_{1}, V_{i}_{2},…, V_{i}_{k}}
10. return P’
}
Analyze Orbit Similarity
In the above section, we have known that nodes in the same orbit group are structurally equivalent. It is well known that structural equivalence implies functional equivalence. Whether the structurally equivalent nodes in the network are more similar in function has still not been validated. In metabolic networks, nodes are compounds with specific structure which determines its function in reactions. If two compounds are structurally similar to each other, they will function similarly. Alex Gutteridge et.al have found that chemical structure of small molecular compounds often determines compound suitability for use in regulation and how groups of similar compounds can regulate sets of enzymes(Gutteridge et al., 2007). Hence, it is reasonable to believe that whether the structurally equivalent nodes in the network are functionally equivalent can be inferred from the similarity between their chemical structures.
Many chemical structure comparison methods have been proposed to analyze the compound similarity in the metabolic pathways (Hattori et al., 2003; Nobeli et al., 2003; Raymond, Gardiner and Willett, 2002; Raymond and Willett, 2002). In most of the algorithms, the chemical structure is treated as a two dimensional (2D) object, which can be presented as a graph consisting of nodes (atoms) and edges (bonds). In this paper, we use the method proposed by Masahiro H.(Hattori et al., 2003) to compute the similarities between chemical compounds. In this method, the similarities between compounds are measured by the size of maximal common subgraph (MCS) between two graphs representing these two compounds. A normalized similarity score based on Jaccard coefficient(Watson, 1983) is used in this method, which is defined as the ratio of the size of the maximal common substructure to the size of the nonredundant set of all substructures:
JC( G_{1} , G_{2} ) ≡  G_{1}∩ G_{2}  /  G_{1} U G_{2}  =  MCS ( G_{1} , G_{2} )  /  G_{1} +  G_{2}    MCS ( G_{1} , G_{2} ) 
, where G is defined as the number of nodes of graph G.
After obtaining the similarity of all the compound pairs by Masahiro’s algorithm, we compared the compound similarities between nodes in the same orbits to the compound similarity between nodes in the network and test the significance of similarity scores of nodes in the same orbit.
All similarity score of the orbits in the networks of 705 organisms were collected to form the orbit dataset where the replicated orbits were just calculated once. Another dataset which is referred to as random dataset was generated by collections of similarity scores of all pairs of compounds in the metabolic modules of all 705 organisms.
Three statistics were used to test differences in the similarity scores between the orbit dataset and random dataset: right tail ttest with equal variance, right tail ttest with unequal variance and Wilcoxon twosided rank sum test(Pagano and Gauvreau, 2000).
We assume that the compounds in the same orbit should have similar chemical structure and function. So we use Hattori’s maximalcommonsubgraph based algorithm with loose weighting condition to do chemical structure comparison. We summarized the average over similarity scores of all compound pairs in the same orbit:
S = Σ _{1 ≤ i < j ≤ n }S_{i,j} / n(n1) / 2
Authors’ Contributions
Hua Dong analyzed the data and wrote the draft; Yanghua Xiao was responsible for the algorithms and the program. Hua Dong and Yanghua Xiao contribute equally to this work. All authors wrote and approved the final manuscript.
This work was partially supported by grant from Shanghai Commission of Science and Technology (04dz14003) and Grant from HiTech Research and Development Program of China(863) (2007AA02Z300). Thanks for Hoicheong Siu’s help on network reconstruction from BioCyc dataset.