Reconstruction and Analysis of Human Liver-Specific Metabolic

Feb 6, 2010 - network. Here, the first human liver-specific metabolic network was ... context of the genome-scale metabolic network of Homo sapiens...
0 downloads 0 Views 2MB Size
Reconstruction and Analysis of Human Liver-Specific Metabolic Network Based on CNHLPP Data Jing Zhao,†,#,⊥ Chao Geng,§ Lin Tao,† Duanfeng Zhang,‡ Ying Jiang,§ Kailin Tang,† Ruixin Zhu,‡ Hong Yu,† Weidong Zhang,# Fuchu He,*,§ Yixue Li,*,†,| and Zhiwei Cao*,‡,†,[ Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy, Second Military Medical University, Shanghai, China, and Department of Mathematics, Logistical Engineering University, Chongqing, China Received July 15, 2009

Liver is the largest internal organ in the body that takes central roles in metabolic homeostasis, detoxification of various substances, as well as in the synthesis and storage of nutrients. To fulfill these complex tasks, thousands of biochemical reactions are going on in liver to cope with a wide range of foods and environmental variations, which are densely interconnected into an intricate metabolic network. Here, the first human liver-specific metabolic network was reconstructed according to proteomics data from Chinese Human Liver Proteome Project (CNHLPP), and then investigated in the context of the genome-scale metabolic network of Homo sapiens. Topological analysis shows that this organ-specific metabolic network exhibits similar features as organism-specific networks, such as power-law degree distribution, small-world property, and bow-tie structure. Furthermore, the structure of liver network exhibits a modular organization where the modules are formed around precursors from primary metabolism or hub metabolites from derivative metabolism, respectively. Most of the modules are dominated by one major category of metabolisms, while enzymes within same modules have a tendency of being expressed concertedly at protein level. Network decomposition and comparison suggest that the liver network overlays a predominant area in the global metabolic network of H. sapiens genome; meanwhile the human network may develop extra modules to gain more specialized functionality out of liver. The results of this study would permit a high-level interpretation of the metabolite information flow in human liver and provide a basis for modeling the physiological and pathological metabolic states of liver. Keywords: Metabolic network • Human Liver Proteome Project • Network reconstruction • Network topology • Network modularity

Introduction In the recent decade, the dramatic increase in the number of sequenced genomes has offered unprecedented opportunities to study cellular life at a systems level. Preliminary metabolic networks for hundreds of organisms have been reconstructed from the annotated genome information and been available from metabolic databases.1-4 Recently, two * Corresponding authors. For Z.C.: Tongji University; phone, 86-2154065003; fax, 86-21-54065057; e-mail, [email protected]. For Y.L.: Shanghai Center for Bioinformation and Technology; phone, 86-21-54920089; fax, 86-21-5492-0143; e-mail, [email protected]. For F.H.: Beijing Proteome Research Center;phone,86-10-68177417;fax,86-10-68171208;e-mail,[email protected]. † Shanghai Center for Bioinformation and Technology. ‡ College of Life sciences and technology, Tongji University. [ Key laboratory of Arrthythmias. § Beijing Institute of Radiation Medicine, and Beijing Proteome Research Center. | Chinese Academy of Sciences. # Second Military Medical University. ⊥ Logistical Engineering University.

1648 Journal of Proteome Research 2010, 9, 1648–1658 Published on Web 02/06/2010

refined metabolism databases have even been set up for humans manually from genomic and bibliomic data,5,6 which could provide fundamental assistance to study human metabolic network at organism level. As human is a complex multicell, multiorgan organism, in which different organs have different metabolic objectives and functions, reconstructing organ-specific metabolic networks could facilitate better understanding of human metabolism as a whole. However, the limitation on large-scale and accurate data about organ-specific metabolism hinders the progress in this direction.7 As the largest organ in the body, liver has many functions including producing substances that break down fats, converting glucose to glycogen, producing urea, making certain amino acids, filtering harmful substances from the blood, storing vitamins and minerals, and maintaining a proper level of glucose in the blood. Liver is also the most important site for drug metabolism because a superfamily of enzymes that is involved in xenobiotics degradation and drug clearance, cyto10.1021/pr9006188

 2010 American Chemical Society

Reconstruction/Analysis of Human Liver-Specific Metabolic Network 8

chrome P450 (CYP450), exists mainly in liver. As the first initiative of the human proteome project for human organs,9 the Human Liver Proteomics Project (HLPP) has gained the overall proteome expression profile of human liver through constant efforts in the past few years. Thus, it is now possible for us to reconstruct a human liver-specific metabolic network and investigate its metabolic functions at human organ level. In this work, we reconstructed the human liver-specific metabolic network based on the first-hand proteomics data from CNHLPP (Chinese Human Liver Proteome Project), one of the important parts of HLPP. Then, we explored the underlying organizing principles related to main functions of human liver by analyzing the topological features, topological modules of the liver-specific network and comparing it with the global human metabolic network. A database of biochemically, genetically and genomically structured genome-scale human metabolic network (BiGG),5 which was manually reconstructed component-by-component based on genomic and bibliomic data, was used as background database.

Materials and Methods Data Preparation. Identify All Expressed Proteins and Enzymes in the Human Liver. Below is a rough description of the sample preparation and data quality control from Jiang et al.10). Fresh human liver samples were collected from adult volunteers of the Chinese Han nationality who had undergone hepatic hemangioma resection after histopathological diagnostics. Powder samples were eventually pooled together with equal amounts from 10 individuals. Four technical approaches for proteome analysis were designed: 2DLC-ESI, 3DLC-ESI, 2DE-MALDI and 1DE-LC-ESI. All qualified peak lists were accepted for protein search against both IPI Human v3.07 database and its reversed version. To evaluate the likelihood of obtaining false positives, an identification threshold of 95% confidence was established for each technical approach. All peptides identified using the four approaches were combined, and the unique peptides were mapped back to IPI Human v3.07 database to identify the corresponding proteins/groups. The data has been stored online at http://hlpic.hupo.org.cn/dblep/ datahttp/Suppl_Table11_Protein-core.zip. Semiquantitation of protein abundance was carried out using the spectral counts (SC) method. The Spectral Count Index (SCI) values from different batches were normalized to estimate the quantitation of protein abundance. Refer to Jiang et al.10 for the detailed procedures and SOPs from the HUPO community of data quality control. Totally, 6788 distinct proteins (IPI codes) were extracted from the overall proteome expression profile of human liver identified with two or more peptide sequences, with a confidence level of 95%. The expression profile was generated from the CNHLPP project, and also supported by human liver transcriptome data obtained from this project. The 6788 proteins were matched to 6220 distinct genes, in which 1421 genes were found to encode 721 distinct enzymes. In addition, protein quantitation data generated from the CNHLPP project were converted to the enzyme abundance in the liver through gene ID. The quantitation of all the proteins was averaged if different IPI codes imply the same enzyme (see Supplementary 1). Identify All Reactions in Homo sapiens’s Genome and in the Human Liver. The BiGG database includes a list of 3311 reactions, including 1555 enzyme-catalyzed reactions and 1756 autocatalytic which do not need enzymes. By mapping the

research articles

enzymes identified in CNHLPP into the enzyme list of the BiGG database, we identified 1047 reactions that are catalyzed by liver enzymes. Reconstruction of Metabolic Networks. There are many kinds of graph representations for metabolism.11,12 In this study, a metabolic network is represented by a directed graph whose nodes correspond to metabolites and arcs correspond to reactions between these metabolites, in which irreversible reactions are presented as directed arcs while reversible ones as bidirected arcs.11 Substrates and products were extracted from each of the reactions and all resulting substrate-product pairs were listed to specify the connections between the substances. To focus on the metabolite flow in the metabolic system under study, we ignored the subcellular localization information in BiGG and treated the same metabolite appearing in different compartments as one node. Under this circumstance, both G6P[c] in cytoplasm and G6P[r] in endoplasmic reticulum correspond to one node G6P, and all transport reactions between compartments are deleted. The remained reactions having been processed in this way were used to reconstruct human metabolic network. In addition, some small molecules called currency metabolites are normally used as carriers for transferring electrons or certain functional groups and participate in many reactions, while typically not participating in product formation.13,14 Therefore, in order to illustrate biologically relevant transformations of substrates, we deleted 33 small molecules and their connections when they act as currency metabolites. We regarded a small molecule as currency metabolite if it satisfies two criteria: (1) it was treated as currency metabolite in ref 13, and (2) its corresponding node in the original human metabolic network that includes all metabolites exhibits top high degree.14 The 33 deleted small molecules are listed as follows: ADP, ATP, AMP, GDP, GMP, GTP, CDP, CMP, CTP, UDP, UMP, UTP, NAD+, NADH, NADP+, NADPH, FMN, FAD, FADH2, GLU-L, CoA, CO2, O2, H2O, H2O2, HCO3-, H+, K+, Na+, NH4+, Pi, PPi, SO42-. On the other hand, these small molecules were retained when they act as primary metabolites in corresponding reactions. For example, purines (ADP, ATP, AMP, GDP, GMP, GTP) and pyrimidines (CDP, CMP, CTP, UDP, UMP, UTP) are reserved in pathways of purine metabolism and pyrimidines metabolism, respectively. The reserved links were chosen manually according to the database constructed by Ma et al.13 (see Supplementary 2). During reconstruction of liver metabolic network, we need to judge which autocatalytic reactions are likely to happen in liver. We propose the following algorithm steps to get a reasonable reaction set: 1. Let the 1047 liver enzyme-catalyzed reactions constitute original core reaction set, and all of the autocatalytic reactions constitute initial candidate reaction set. 2. Extract all metabolites appearing in core reaction set to get core metabolite set. 3. Scan the list of candidate reactions for core metabolites. If all substrates for one reaction can be found in core metabolite set, add this reaction into core reaction set and remove it from the candidate set. 4. If step 3 cannot add any more reactions into core reaction set, stop; else, go to step 2. Running this algorithm added 427 autocatalytic reactions into the original core reaction set. The resulting reaction set Journal of Proteome Research • Vol. 9, No. 4, 2010 1649

research articles

Zhao et al.

includes reactions that are highly probable to happen in liver and, thus, was used to reconstruct liver metabolic network. The resulting metabolic networks of H. sapiens genome and human liver include 1473 and 1093 metabolites, respectively. Topological Features and Metrics of Networks. Arc Density. Arc density of a directed network is the ratio of the number of arcs to that of a completely connected network with the same number of nodes: Density )

A N(N - 1)

Where A, N are the numbers of arcs and nodes in this network, respectively. Degree Distribution of Network. In a directed network, the degree k of a node is the number of arcs linked to this node. The degree distribution p(k) of a network is the occurrence frequency of nodes with degree k, (k ) 1, 2, ...).15 Diameter and Average Path Length of Network. In a directed network, the distance dij from node ui to node uj is defined as the number of arcs in the shortest directed path from ui to uj. The diameter of a graph is the length of the longest short path in the graph, while the average path length is the average length of the shortest paths between any pair of reachable nodes in the graph.16 Random Counterparts of the Liver Network. Three types of random subnetworks of the human network were constructed as random counterparts of the liver network: Type I random subnetwork was constructed by randomly selecting the same number of arcs in the human network. Its node set includes all the nodes of the selected arcs. Type II random subnetwork was generated by randomly choosing the same number of nodes in the human network. Its arc set includes all the links between the selected nodes in the human network. Type III network consists of all the enzyme-catalyzed reactions as in the liver network and 427 randomly picked autocatalytic reactions. Network Decomposition. In this study, we applied the simulated annealing algorithm17,18 to break up the metabolic networks. This algorithm identifies topological modules by maximizing the network’s modularity metric through an stochastic optimization technique that enables one to find ‘low cost’ configurations without getting trapped in ‘high cost’ local minima, thus, generating a nearly best decomposition of the network. For a given decomposition of a network, the modularity metric is defined as the gap between the fraction of arcs within clusters and the expected fraction of arcs if the arcs are wired with no structural bias:19 r

M)

∑ [e i)1

ii

-(

∑e ) ] 2

m

µAB )

n

∑ ∑ |φ

- φA(i)φB(j)|

Since the value of µ is affected by finite sizes, it is hard to judge if a µ-value indicates a good or bad overlap. Therefore, we normalize the µ-value against those of the perfect overlaps and define overlap score of partitions A and B as follows: νAB )

µAB max(µAA , µBB)

The value of ν is between 0 and 1, and it is 1 for perfect matches. Generally, overlap score can be utilized to measure the similarity between different categories. In this study, we also applied this metric to quantify the overlap extent between network modules and KEGG pathway organization. HygCDF-Value. If we randomly draw n samples from a finite set, the probability of getting i samples with the desired feature by chance obeys hypergeometric distribution:

j

Journal of Proteome Research • Vol. 9, No. 4, 2010

AB(i, j)

i)1 j)1

ij

where r is the number of clusters, eij is the fraction of arcs that leads between vertices of cluster i and j. The maximum modularity metric corresponds to the partition that comprises as many as within-module links and as few as possible intermodule links. Clustering of Human Genome Metabolic Network Based on Modules of Liver Network. To investigate how the liver network resides in the global human network, we proposed the following algorithm steps to cluster the human network according to the topological modules of liver network: 1. Map the modules of the liver network into the human network, that is, assign nodes of the human network that 1650

also belong to the liver network into modules as they are in the liver network. 2. For nodes included in the human network but not in the liver network, if they are linked to form a connected cluster with at least six nodes, assign them into a new module (six nodes is set as a threshold to filter out too small modules); otherwise, add them into the corresponding old module obtained in step 1 that owns most links with them. In this way, the human network is decomposed into some modules highly similar with those in the liver network, as well as several extending modules not appearing in the biggest connected part of the liver network. Statistical Tests. Overlap Score. To measure the overlap extent of modules generated by different decompositions, we apply a method described in ref 20. Consider two decompositions A and B (for example, two partitions of the network obtained by two runs of independent simulated annealing.) and assume each metabolite is associated with a subset (module) of the decompositions of A and B. Let φA(i) and φB(j) denote the fraction of nodes in module i ∈ A and j ∈ B (i ) 1, 2, ..., m; j ) 1, 2, ..., n), respectively. Let φAB(i,j) denote the joint frequency of i and j, that is, the fraction of nodes that are partitioned in both module i ∈ A and j ∈ B. In a random distribution of modules the expectation value of φAB(i,j) is φA(i)φB(j). If the modules of different decompositions are overlapping, some φAB(i,j), the ones that overlap will be larger than φA(i)φB(j), while for the others, φAB(i,j) will be smaller than φA(i)φB(j). Thus, the overlapping of modules in partitions A and B can be quantitatively measured by

f(i) )

( )( ) () K i

N-K n-i N n

where N is the size of the set, K is the number of items with the desired feature in the set. Then, the probability of getting at least k samples with the desired feature by chance can be represented by hypergeometric cumulative distribution, called hypergeometric cumulative distribution function (HygCDF): k-1

HygCDF ) 1 -

∑ f(i) ) 1 i)0

( )( ) ∑ () k-1

i)0

K i

N-K n-i N n

research articles

Reconstruction/Analysis of Human Liver-Specific Metabolic Network Given significance level R, which is usually set as 0.05, a HygCDF-value smaller than R demonstrates low probability that the items with the desired feature are chosen by chance. Hence, HygCDF-value can be used to measure whether the n samples drawn from the set are more enriched with items of the desired feature than would be expected by chance.21 Z-score. Z-score22 was applied to quantify the difference between the liver network and its random counterparts: Z)

jr P-P ∆Pr

j r and ∆Pr where P is the graph metric in the liver network, P are the mean and standard deviation of the corresponding graph metric in the random ensemble. The higher the absolute value of a z-score, the more significant the difference.

Table 1. Basic Graph Metrics of Metabolic Networks under Study metabolic network

network for human liver

network for H. sapiens genome

Nodes Arcs Density Degree distribution Average path length Diameter Biggest cluster

Nodes Arcs GSC

1093 2209 0.0019 P(k)∼k-2.73 8.5 28 1026 2159 424 (41.3%)

1473 3361 0.0016 P(k)∼k-2.67 9.8 49 1407 3314 987 (70.2%)

S P IS

262 (25.5%) 187 (18.2%) 153 (14.9%)

117 (8.32%) 272 (19.3%) 31 (2.2%)

Bowtie of biggest cluster

Results and Discussion Metabolic Pathways Present in Liver. A total of 380 enzymes and 1047 enzyme-catalyzed reactions in BiGG were successfully mapped by CNHLPP data (see Supplementary 3). It is a common knowledge that many enzymes could appear in a variety of organs, especially those in central pathways that produce energy and precursors for biosynthesis. In this study, since we focus on liver metabolism in the background of global human metabolic network, we simply partition enzymes, reactions and pathways as of “present in the liver” and “absent from the liver”. Like in other metabolic databases,2,23 most pathways in BiGG include both enzyme-catalyzed and autocatalytic reactions. Considering that metabolic pathways often involve silent isozymes,7 or even highly active but low abundant enzymes, as well as that of the limitation of high-throughput proteomics technology, we simply allow 10% identification failure rate as a cutoff. A pathway was regarded as being present in liver if more than 90% reactions are autocatalytic or catalyzed by liver enzymes. Absent pathways were identified in the similar way. Listed in Table S1 of Supplementary 4, the main documented function can be reflected from the pathways present in liver, such as citric acid cycle, glycolysis/gluconeogenesis, oxidative phosphorylation and lipid metabolism. In addition, liver’s critical function of detoxification is associated with presenting pathways such as CYP metabolism, ROS detoxification, selenoamino acid metabolism, and glutathione metabolism. The literature support is also provided as references in the table. Therefore, the analysis in this part preliminarily verifies that data from CNHLPP include key information for liver metabolism and can be further used to reconstruct a liver-specific metabolic network. Subcellular Localization and Currency Metabolites during Network Reconstruction. To gain a better picture of metabolites flow, the subcellular localization was ignored during the network reconstruction in this paper. Undoubtedly, considering that could give alternative information. However, we found that the metabolic network will be too dense with multiple appearances of groups of redundant compounds to get a clear picture of metabolites flow, as the same metabolite could be localized in different cellular compartments being linked by transport reactions. Since the purpose of this paper is mainly focused on the organization of metabolites flow in human liver organ, nonredundant metabolites are required to construct our metabolic network. Actually, many studies have been done concerning the relationship between topology,

function and evolution of human metabolic networks without subcellular information.24-29 No evidence has been found that ignoring subcellular information will affect the scientific justification in this area. Thus, we think ignoring subcellular localization is unlikely to affect our final results. Second, removing currency metabolites is usually adopted14,25,26,29 for similar purpose, as including them would result in supernodes with too many links with other compounds which often leads to unrealistic conclusion in many cases. However, these metabolites are sometimes very important compounds and cannot be simply deleted in some reactions. To remove only those misleading currency metabolites and keep the network structure as much as possible, Ma and Zeng especially developed a database manually for this purpose based on the KEGG database,13 which has undergone scientific justification and been used in many high-level studies.25 In this paper, their work was fully respected and we did not invent new ways to define currency metabolites. Basic Topological Features of Metabolic Networks for Human Liver and H. sapiens Genome. Metabolic networks for liver and human genome were then reconstructed according to the method described in the Material and Methods section (See Supplementary 5 and Supplementary 6, respectively). We listed the basic topological features of these two networks in Table 1. As can be seen in Table 1, similar to human network which is dominated by one largest connected cluster whose size takes the percentage of 95.5% in the global networks, the liver network also has a biggest connected cluster of similar size in proportion (93.9%). This similarity suggests that the 427 autocatalytic reactions could be good bridges to link the liver enzyme-catalyzed reactions together to perform the metabolism in liver. Like most real-world networks from social, technological and biological background, the degree distributions of these two networks also obey power-law,15 suggesting that most substrates produce only a few products, while only a small number of substrates participate in the generation of a large number of products. Moreover, the liver network exhibits structure feature of bowtie topology29 as the human network does, further supporting our earlier conclusion that bowtie pattern is present at different levels and scales, and in different chemical and spatial units of metabolic networks.29,30 As suggested in earlier studies, the bowtie structure implies the organization pattern of metabolic system in which there are multiple inputs and Journal of Proteome Research • Vol. 9, No. 4, 2010 1651

research articles

Zhao et al.

Figure 1. Cartographic representation of the human liver-specific metabolic network. Each circle represents a module and is colored according to the KEGG pathway classification of the reactions belonging to it, while the arcs reflect the connections between clusters. The area of each color in one circle is proportional to the number of reactions that belong to the corresponding metabolism. The width of an arc is proportional to the number of reactions between the two corresponding modules. For simplicity, bidirected arcs are presented by gray edges. Table 2. Comparison of the Liver Metabolic Network with Its Random Counterparts

Liver network 100 type I random subnetworks (same arcs as the liver network) 100 type II random subnetworks (same nodes as the liver network) 100 type III random subnetworks (same enzyme-catalyzed reactions as the liver network)

nodes

arcs

density

average path length

diameter

fraction of nodes in the biggest cluster

Mean Z-score Mean Z-score Mean

1093 1317 -21.83 1093 1240

2209 2209 1870 5.25 2287

0.0019 0.0013 29.03 0.0016 5.25 0.0015

8.5 10.1 -3.27 9.6 -1.25 8.95

28 29.7 -0.47 30.2 -0.39 26.3

0.9387 0.8575 5.24 0.8309 5.09 0.8806

Z-score

-23.2

-6.17

27.2

-1.8

outputs being connected through a conserved processing core.31,32 The GSC of bowtie is the most tightly connected part of the network with multiple routes between any pair of nodes. Thus it is the most robust part of the network from topological viewpoint, which is supposed to be more resistant to mutation or perturbation. As a subsystem, the liver has relatively independent functions and also need to exchange material and information frequently with other subsystems. That the liver network has higher proportion of S and IS nodes while less proportion of GSC indicates that the liver metabolism may provide more interfaces with inputs and outputs to interact with other human metabolic subsystems. In addition, both the metabolic networks have short path length of the network sizes at the logarithmic scale, agreeing with earlier results on organism-specific networks.13,24,33 It has been suggested that such ‘small-world’ feature of metabolic network may allow metabolism to react rapidly to perturbations in enzyme or metabolite concentrations; thus the cell may react quickly to changes of the surroundings.33 Although we use different database, the average path length 9.8 of human network here is consistent with the conclusion of earlier study on KEGG database that the average path length for eukaryotes, archaea and bacteria organisms is 9.57, 8.50, and 7.22, respectively.13 It is interesting that the average path length 8.5 of liver network is significantly smaller than that of human network, 1652

Journal of Proteome Research • Vol. 9, No. 4, 2010

0.67

5.04

implying that liver metabolic network probably defines a particular region in the global metabolic network of H. sapiens genome. To validate whether these topological features are specific for the liver network, we derived 300 random subnetworks of three types from the human network as random counterparts of the liver network. The comparison between the liver network and its random counterparts is summarized in Table 2. It can be seen that the liver network linked by the 427 autocatalytic reactions has markedly fewer nodes than random subnetworks with same number of arcs, and more arcs than those with same number of nodes. Synthetically, the liver network is evidently more densely connected and include bigger fraction of nodes in its biggest connected cluster than its random counterparts, agreeing with common feature of function network previously found.29 These comparative results indicate that the liver network may correspond to definite subsystem of human metabolism, rather than a random subnetwork that has no functional significance. Functional Organization of Liver Revealed by Topological Modules in Liver-Specific Metabolic Network. To investigate the organizing structure of liver-specific metabolic network, we adopted the simulated annealing algorithm to decompose the network and obtained 16 topologically compact modules as shown in Figure 1 in a cartographic representa-

research articles

Reconstruction/Analysis of Human Liver-Specific Metabolic Network

Figure 2. Distribution of the 12 precursors in the 16 modules of the liver-specific metabolic network. The three central pathways, EmbdenMeyerhof-Parnas (EMP), tricarboxylic acid (TCA) and pentose phosphate pathway (PPP) for the generation of the 12 precursors are outlined.

Table 3. Main Derivative Metabolism Functions of the Topological Modules for Human Liver-Specific Metabolic Network main function category

Glycan biosynthesis and metabolism

Metabolism of cofactors and vitamins

Xenobiotics biodegradation and metabolism

module

main function

5 13 12 14 16 1 3 6 9 11 2 5

Biosynthesis of chondroitin/heparan sulfate and keratan sulfate Biosynthesis of N-glycan Degradation of heparan sulfate and N-glycan Degradation of chondroitin sulfate Degradation of keratan sulfate Metabolism of folate and vitamin B6 R group synthesis Heme biosynthesis Heme degradation and vitamin A metabolism Tetrahydrobiopterin; Vitamin B12 Metabolism ROS detoxification CYP metabolism

tion.25 Since the simulated annealing algorithm is stochastic, different runs may yield different decompositions of the network because of different initial conditions. The initial condition is the seed for the random number generator, which must be a negative integer. To verify the robustness of the decomposition, we first randomly generated 20 different negative integers, and then applied each of them as the initial condition to perform 20 times of independent simulated annealing and generate 20 partitions of the liver network. We calculated the overlap score for each pair of partitions and got an average ν-value of 0.92 ( 0.02, suggesting that the different decompositions have very high fractions of overlapping modules. It is thus confirmed that the topological modules are robustly and consistently identified. We then mapped the reactions in the modules to KEGG pathway.34,35 Thus, the arcs in the network were categorized in two ways: topological modules, and KEGG pathways. We applied overlap score to quantity the similarity between the two categories and obtained the value of overlap score as 0.68. We then generated 200 pairs of random clusters of the liver network, in which the cluster sizes are the same as in the real data. The average overlap score of the random ensemble was calculated as 0.082, while the z-score for the overlap score of

the two real partitions was 89.03, suggesting a high extent of overlap between the topological modules and the pathway systems with statistical significance. It can be noticed from Figure 1 that a majority of the 16 modules are dominated by one major category of metabolisms, indicating their relatively independent function. But different conventionally categorized metabolisms can also be topologically grouped into one module in liver, providing alternative insight into liver metabolism. For instance, glycan biosynthesis/ metabolism is closely related to modules of carbohydrate metabolism, and nucleotide metabolism is often coupled with modules of amino acid metabolism. It is reasonable because both carbohydrate metabolism and glycan biosynthesis/ metabolism are related to the metabolism of sugars, while both the metabolisms of nucleotides and amino acid are related to nitrogen atoms. Interestingly, cofactors and vitamins metabolism shows strong tendency to be grouped into modules of lipid metabolism instead of carbohydrate or amino acid, especially in module of 3 and 9, indicating the essential role of vitamins in lipid metabolism reactions.36 Generally, the basic metabolic function of the normal liver includes carbohydrate, lipid and protein metabolism in order to maintain a stable range of glucose and other nutrients in Journal of Proteome Research • Vol. 9, No. 4, 2010 1653

research articles

Zhao et al.

Figure 3. Distribution features of enzyme abundance in liver-specific metabolic network. (A) Log-log plot for distribution of enzyme abundance in overall liver network, P(Q) ∼ Q-2.34. (B) Average enzyme abundance within modules. (C) Percentages for high-abundance enzymes (Q > 2.35) within modules. (D) Percentages for low-abundance enzymes (Q < 0.5) within modules. The red column represents the global network. In B-D, the modules are ordered according to their average abundance in a decreasing way.

blood. On top of that, the liver is also responsible for the synthesis of bile acids and detoxification, which are carried out in module 6 and 2, respectively. In addition to that, urea cycle is one of the most critical aspects of protein metabolism occurring in the liver. Most reactions of urea cycle aggregate in module 7, while a few are carried out in module 1 where different main types of metabolism can be exchanged and interacted closely. The material metabolism in liver can be grouped into primary metabolism that produce common precursors and building blocks, and the derivative metabolism that then assembles these building blocks into macromolecular constituents of the cells or conducts metabolism of macromolecules.37 As described in ordinary textbook of biochemistry, external nutrients are first catabolized through primary metabolism to produce 12 common precursors such as glucose 6-phosphate, phosphoenolpyruvate and pyruvate; and then anabolism synthesizes larger building blockssamino acids, nucleotides, fatty acids and sugars, starting from the 12 precursors.31,37 Checking the network found all of the 12 precursors in it, verifying the fundamental function of liver in primary metabolism. As shown in Figure 2, the 12 precursors were distributed into 5 out of the 16 modules of liver network, and the precursors in EmbdenMeyerhof-Parnas (EMP) and tricarboxylic acid (TCA) scatter in several modules. Such distribution agrees with earlier observations concerning the high diversity of the TCA and EMP 1654

Journal of Proteome Research • Vol. 9, No. 4, 2010

pathway,38,39 as well as the clustering results for organismspecific metabolic networks.29,40 The fourth module is almost a pure carbohydrate metabolism module which includes reactions in glycolysis/gluconeogenesis, pentose phosphate pathway, fructose, mannose metabolism, and so on. Thus, it is reasonable that half of the precursors locate in this module. Moreover, most precursors are clustered into the same module together with the pathways that synthesize corresponding building blocks from them. For instance, pyruvate, oxaloacetate and 2-oxoglutarate are precursors for a variety of amino acids; acetyl-coA and succinyl-coA are precursors of fatty acid and heme, respectively.37 And these precursors locate in the first, third and sixth modules that include the metabolism of amino acids, fatty acids and heme, respectively. In addition to primary metabolism, the liver also performs derivative metabolism including glycan biosynthesis and metabolism, metabolism of cofactors and vitamins, and xenobiotics biodegradation and metabolism, in which the most dominant class is the metabolism of glycan in the modules labeled 5, 12, 13, 14, and 16. See Table 3 for a detailed list of the main categories of derivative metabolism in the modules. Enzyme Abundance in Topological Modules of LiverSpecific Metabolic Network. It has been proposed that the genes involved in a particular biological process or functional module could be coexpressed and coregulated under the given conditions.23,41 Similarly, it is expected that the enzyme within

Reconstruction/Analysis of Human Liver-Specific Metabolic Network

research articles

Figure 4. Clustering of the global human network according to the modules of the liver network. (A) Cartographic representation of the human metabolic network. Modules 1-16 are similar with corresponding modules of the liver network shown in Figure 1, while modules 101-115 are new modules not appearing in the biggest connected part of the liver network. (B) Module proportional distributions for metabolites that are in and out of the liver network. (C) Module proportional distributions for reactions that are in and out of the liver network.

one module would also function at concerted fashion in order to keep a relatively stable flux of various metabolites. To investigate this, we studied the distribution patterns of enzyme abundance. As Figure 3A shows, the distribution of enzyme abundance in overall liver-specific network obeys a power law distribution: P(Q) ∼ Q-2.34, indicating that only a small number of enzyme are highly abundant, whereas most enzyme are at a low level. Particularly, by setting enzyme abundance corresponding to the highest 10% and the lowest 70% as high-abundance and low-abundance enzymes, respectively, we obtained the following probabilities:

P(Q > 2.35) ) 10%; P(Q < 0.5) ) 70% Thus, 2.35 and 0.5 were set as threshold values of high- and low-abundance enzymes, respectively. Figure 3B-D shows the average enzyme abundance, as well as percentages of highand low-abundance enzymes in each module and in the global network. It can be seen that, to some extent, enzymes within the same module have a tendency of being expressed concertedly at protein level. Specifically, modules mainly for primary metabolism tend to be enriched with high-abundance enzymes but depleted of low-abundance enzymes relatively. This situation is especially true for modules enriched with precursors, Journal of Proteome Research • Vol. 9, No. 4, 2010 1655

research articles

Zhao et al.

Figure 5. Module 108 of the human network shown in Figure 4. Blue arcs, reactions catalyzed by enzymes out of liver; green arcs, autocatalytic reactions.

such as module 4 and 1. However, the modules mainly for derivative metabolism, such as modules 5 and 12-14, are just on the contrary. Such distribution pattern of enzyme abundance could be a result of the cells’ constant requirement for high quantity of building blocks that are widely used to assemble into macromolecular constituents of the cells. Comparison of Metabolic Network of Human Liver with That of the H. sapiens Genome. It is interesting to investigate the structural discrepancy between liver metabolic network and the global background one from human genome related to functional difference. The modular network of liver was first set as reference and the human metabolic network was clustered and extended accordingly by the algorithm described in Materials and Methods. As shown in Figure 4A, the human metabolic network was thereby clustered into 16 modules corresponding to the liver modules (coded as 1-16) and 15 extra modules. Figure 4B,C shows that metabolites and reactions within most of the 16 corresponding modules (except modules 5 and 16) include higher fractions of metabolites and reactions in liver than those within the global network. We used hypergeometric cumulative distribution21 to quantitatively measure whether a module is more enriched with metabolites and reactions in liver than that by chance. Given significance level R ) 0.05, a HygCDF-value smaller than R demonstrates low probability that the metabolites and reactions in liver appear in the same module by chance. It was found that the HygCDF-values for 10 of the 16 modules (modules 1-4, 6, 8, 9, 11-13) are smaller than 0.05, suggesting that most of the 16 corresponding modules in human network are highly similar with those of liver network with respect to metabolites and reactions. Generally, the 15 extra modules seldom include metabolites and reactions of liver and thus likely exist out of liver. In fact, only four extending modules, modules 103, 107, 110, and 114, contain partial liver enzymes or reactions which originally locate in the isolated parts of the liver network. There are two big modules appearing in human network, modules 108 and 113 that include 28 and 59 metabolites, respectively, in which module 108 is shown in Figure 5. All reactions in module 108 belong to sphingolipid metabolism pathway, while module 113 consists of three pathways, glycerophospholipid metabolism, glycosylphosphatidylinositol (GPI)-anchor biosynthesis, and N-glycan biosynthesis. Actually, most extending modules represent those specialized functions out of liver, such as blood group biosynthesis in modules 102, 104, and 115, and biotin metabolism in module 110. From this point of view, the human metabolic network could be regarded as a spread out from the liver-specific network to realize more specialized functions. 1656

Journal of Proteome Research • Vol. 9, No. 4, 2010

Figure 6. Module comparison between the human metabolic network and liver network based on independent simulated annealing, respectively. The red squares in the diagonal correspond to the 18 modules of the human network, and green squares in each red box represent that the corresponding node pairs were clustered into the same module of the liver network.

Although our algorithm can detect which parts of the human network are extended from the liver network, this clustering is not an optimum decomposition for the network. To make a fair comparison, the human network was independently broken into modules by simulated annealing algorithm without any reference and 18 modules were obtained. To compare the human network with its subnetwork, the liver network, we set the nodes in the isolated clusters of the liver network into a new module, and those not included in the liver network as another new module. In this way, the human network could be considered as being clustered by two methods: one partition includes 18 modules generated by simulated annealing; the other includes the 16 liver modules and the two new modules as described above. The overlap score of the two different partitions was calculated as 0.72. We then generated 200 pairs of random decompositions of the human network, in which the module sizes are the same as in the real data. The average overlap score of the random ensemble was calculated as 0.17, while the z-score for the overlap score of the two real partitions was 65.8, suggesting a high extent of similarity between the modules in the human network and the liver network. In Figure 6, we show the overlap between the modules of human network with those of the liver network. It can be seen

Reconstruction/Analysis of Human Liver-Specific Metabolic Network that two modules, modules 3 and 9, have high overlap with metabolites out of liver network. Checking reactions in the two modules found that module 3 consists of blood group biosynthesis, sphingolipid metabolism, and O-glycan biosynthesis, and module 9 includes glycerophospholipid metabolism, glycosylphosphatidylinositol (GPI)-anchor biosynthesis, and Nglycan biosynthesis. That is to say, module 9 is similar to module 113 in Figure 4, while module 3 is the integration of module 108 and some other small modules in Figure 4. In conclusion, as a subsystem of human metabolism, liver metabolic network takes predominant coverage of human global network and plays critical role in human metabolism. In addition to liver metabolism, human develops more specialized modules for extra metabolic functions.

Conclusions In this study, we have reconstructed a human liver-specific metabolic network based on proteomics data from Chinese Human Liver Proteome Project (CNHLPP). Although complicated, metabolic functions of liver are carried out in an ordered and modular way by clusters of enzymes densely associated by substrates and products. To maintain the stable flux of various metabolites in liver, the enzymes within one module need to function concertedly and thus tend to express at similar abundance, although different enzymes may have different kinetic properties and life spans. From a comparative point of view, the topological structures are indistinguishable between the liver and human global metabolic networks. However, human system has developed extra modules out of liver for more specialized function during the long process of evolution.28 Therefore, though the traditional classification of metabolic pathways has provided abundant information to biologists, the alternative study according to metabolites flow in this paper would shed light on the underlying structures supporting the function of liver, and thus could provide a basis for further metabolic modeling.

Acknowledgment. The authors thank Dr. Eytan Ruppin, Dr. Petter Holme, Livnat Jerby, Ori Folger, and the anonymous reviewers for their constructive comments, which contributed to the improvement of the manuscript. This work was supported in part by grants from National Natural Science Foundation of China (10971227, 30900832), Ministry of Science and Technology China (2006AA02312, 2007DFA31040, 2009zx10004-601, 2010CB833601), and Shanghai Municipal Education Commission (2000236018). Supporting Information Available: Supplementary 1, protein abundance quantization generated from the CNHLPP project; Supplementary 2, list of links including currency metabolites that are reserved in the metabolic networks; Supplementary 3, reaction and enzyme list of BiGG database and their classification according to CNHLPP data; Supplementary 4, pathways being present in and absent from the liver predicted by CNHLPP data; Supplementary 5, reconstructed liver metabolic network; Supplementary 6, reconstructed human metabolic network. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Maltsev, N.; Glass, E.; Sulakhe, D.; Rodriguez, A.; Syed, M. H.; Bompada, T.; Zhang, Y.; D’Souza, M. PUMA2sgrid-based highthroughput analysis of genomes and metabolic pathways. Nucleic Acids Res. 2006, 34 (Suppl_1), D369–372.

research articles

(2) Kanehisa, M.; Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28 (1), 27–30. (3) Romero, P.; Wagg, J.; Green, M.; Kaiser, D.; Krummenacker, M.; Karp, P. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2004, 6 (1), R2. (4) Karp, P. D.; Ouzounis, C. A.; Moore-Kochlacs, C.; Goldovsky, L.; Kaipa, P.; Ahren, D.; Tsoka, S.; Darzentas, N.; Kunin, V.; LopezBigas, N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 2005, 33 (19), 6083– 6089. (5) Duarte, N. C.; Becker, S. A.; Jamshidi, N.; Thiele, I.; Mo, M. L.; Vo, T. D.; Srivas, R.; Palsson, B. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. U.S.A. 2007, 104 (6), 1777–1782. (6) Ma, H.; Sorokin, A.; Mazein, A.; Selkov, A.; Selkov, E.; Demin, O.; Goryanin, I. The Edinburgh human metabolic network reconstruction and its functional analysis. Mol. Syst. Biol. 2007, 3, 135. (7) Shlomi, T.; Cabili, M. N.; Herrgard, M. J.; Palsson, B. O.; Ruppin, E. Network-based prediction of human tissue-specific metabolism. Nat. Biotechnol. 2008, 26 (9), 1003–1010. (8) Delgoda, R.; Westlake, A. C. G. Herbal interactions involving cytochrome P450 enzymes. Toxicol. Rev. 2004, 23 (4), 239–249. (9) He, F. Human liver proteome project: plan, progress, and perspectives. Mol. Cell. Proteomics 2005, 4 (12), 1841–1848. (10) Jiang, Y.; Ying, W.; Wu, S.; Chen, M.; Guan, W.; Yang, D.; Song, Y.; Liu, X.; Li, J.; Hao, Y.; Sun, A.; Geng, C. First insight into the human liver proteome from PROTEOMESKY-LIVERHu 1.0, a publicly available database. J. Proteome Res. 2010, 9 (1), 79–94. (11) Zhao, J.; Yu, H.; Luo, J.; Cao, Z.; Li, Y. Complex networks theory for analyzing metabolic networks. Chin. Sci. Bull. 2006, 51 (13), 1529–1537. (12) Klamt, S.; Gilles, E. D. Minimal cut sets in biochemical reaction networks. Bioinformatics 2004, 20, 226–234. (13) Ma, H.; Zeng, A.-P. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 2003, 19 (2), 270–277. (14) Huss, M.; Holme, P. Currency and commodity metabolites: Their identification and relation to the modularity of metabolic networks. IET Syst. Biol. 2007, 1, 280–285. (15) Barabasi, A. L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. (16) Bondy, J. A.; Murty, U. S. R., Graph Theory with Applications; Macmillan: London, 1976. (17) Guimera, R.; Amaral, L. A. N. Cartography of complex networks: modules and universal roles. J. Stat. Mech. 2005; P02001. (18) Guimera, R.; Sales-Pardo, M.; Amaral, L. A. N. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2004, 70, 025101. (19) Newman, M. E. J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2004, 69, 026113. (20) Holme, P. Model validation of simple-graph representation of metabolism. J. R. Soc. Interface 2009, 6 (40), 1027–1034. (21) Tavazoie, S.; Hughes, J. D.; Campbell, M. J.; Cho, R. J.; Church, G. M. Systematic determination of genetic network architecture. Nat. Genet. 1999, 22 (3), 281–285. (22) Maslov, S.; Sneppen, K.; Zaliznyak, A. Detection of topological patterns in complex networks: correlation profile of the internet. Physica A 2004, 333, 529–540. (23) Burgard, A. P.; Nikolaev, E. V.; Schilling, C. H.; Maranas, C. D. Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res. 2004, 14 (2), 301–312. (24) Jeong, H.; Tombor, B.; Albert, R.; Oltvai, Z. N.; Barabasi, A. L. The large-scale organization of metabolic networks. Nature 2000, 407 (6804), 651–654. (25) Guimera, R.; Amaral, L. A. N. Functional cartography of complex metabolic networks. Nature 2005, 433 (7028), 895–900. (26) Ravasz, E.; Somera, A. L.; Mongru, D. A.; Oltvai, Z. N.; Barabasi, A. L. Hierarchical organization of modularity in metabolic networks. Science 2002, 297 (5586), 1551–1555. (27) Holme, P.; Huss, M.; Jeong, H. Subnetwork hierarchies of biochemical pathways. Bioinformatics 2003, 19 (4), 532–538. (28) Zhao, J.; Ding, G.-H.; Tao, L.; Yu, H.; Yu, Z.-H.; Luo, J.-H.; Cao, Z.-W.; Li, Y.-X. Modular co-evolution of metabolic networks. BMC Bioinf. 2007, 8, 311. (29) Zhao, J.; Yu, H.; Luo, J.; Cao, Z.; Li, Y. Hierarchical modularity of nested bow-ties in metabolic networks. BMC Bioinf. 2006, 7, 386. (30) Zhao, J.; Tao, L.; Yu, H.; Luo, J.-H.; Cao, Z. W.; Li, Y. Bow-tie topological features of metabolic networks and the functional significance. Chin. Sci. Bull. 2007, 52, 1036–1045.

Journal of Proteome Research • Vol. 9, No. 4, 2010 1657

research articles (31) Csete, M.; Doyle, J. Bow ties, metabolism and disease. Trends Biotechnol. 2004, 22 (9), 446–450. (32) Tanaka, R.; Csete, M.; Doyle, J. Highly optimized global organisation of metabolic networks. IEE Proc: Syst. Biol. 2005, 152 (4), 179–184. (33) Wagner, A.; Fell, D. A. The small world inside large metabolic networks. Proc. R. Soc. London, Ser. B 2001, 268 (1478), 1803–1810. (34) Goto, S.; Nishioka, T.; Kanehisa, M. LIGAND: chemical database of enzyme reactions. Nucleic Acids Res. 2000, 28 (1), 380–382. (35) Goto, S.; Okuno, Y.; Hattori, M.; Nishioka, T.; Kanehisa, M. LIGAND: database of chemical compounds and reactions in biological pathways. Nucleic Acids Res. 2002, 30 (1), 402–404. (36) Fidanza, A.; Audisio, M. Vitamins and lipid metabolism. Acta Vitaminol. Enzymol. 1982, 4, 105–14. (37) Han, L. Physiology of Escherichia coli in batch and fed-batch cultures with special emphasis on amino acid and glucose

1658

Journal of Proteome Research • Vol. 9, No. 4, 2010

Zhao et al.

(38) (39) (40)

(41)

metabolism. Ph.D. dissertation, Royal Institute of Technology, Stockholm, Sweden, 2002. Dandekar, T.; Schuster, S.; Snel, B.; Huynen, M.; Bork, P. Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem. J. 1999, 343, 115–124. Huynen, M. A.; Dandekar, T.; Bork, P. Variation and evolution of the citric-acid cycle: a genomic perspective. Trends Microbiol. 1999, 7 (7), 281–291. Ma, H.-W.; Zhao, X.-M.; Yuan, Y.-J.; Zeng, A.-P. Decomposition of metabolic network into functional modules based on the global connectivity structure of reaction graph. Bioinformatics 2004, 20 (12), 1870–1876. Papin, J. A.; Reed, J. L.; Palsson, B. O. Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. Trends Biochem. Sci. 2004, 29 (12), 641–647.

PR9006188