Behind Every Good Metabolite there is a Great Enzyme (and perhaps a structure)

Many of the selected SSGCID targets are annotated enzymes from known metabolomic pathways essential to cellular vitality, suggesting selectively “knocking-out” one of the enzymes in an important pathway with a drug may be fatal to the organism. Because the active site can differ between close homologues [7], SSGCID selects identically annotated enzymes from several genera and species in order to increase the structural coverage of metabolomic pathways. One reason why metabolomic pathways are important is because of the small molecules, or metabolites, produced at various steps in these pathways and identified by metabolomic studies. Unlike genomics, transcriptomics, and proteomics that may be influenced by epigenetic, post-transcriptional, and post-translational modifications, respectively, the metabolites present in the cell at any one time represent downstream biochemical end products, and therefore, metabolite profiles may be most closely associated with a phenotype [8] and provide valuable information for infectious disease research. Metabolomic data would be even more useful if it could be linked to the vast amount of structural genomics data. Towards this goal SSGCID has created an automated website tool (http://apps.sbri.org/SSGCIDTargetStatus/Pathway) that assigns selected SSGCID target proteins to MetaCyc pathways (http:// metacyc.org/).

Many of the selected SSGCID targets are annotated enzymes from known metabolomic pathways essential to cellular vitality, suggesting selectively "knocking-out" one of the enzymes in an important pathway with a drug may be fatal to the organism. Because the active site can di er between close homologues [7], SSGCID selects identically annotated enzymes from several genera and species in order to increase the structural coverage of metabolomic pathways. One reason why metabolomic pathways are important is because of the small molecules, or metabolites, produced at various steps in these pathways and identi ed by metabolomic studies. Unlike genomics, transcriptomics, and proteomics that may be in uenced by epigenetic, post-transcriptional, and post-translational modi cations, respectively, the metabolites present in the cell at any one time represent downstream biochemical end products, and therefore, metabolite pro les may be most closely associated with a phenotype [8] and provide valuable information for infectious disease research. Metabolomic data would be even more useful if it could be linked to the vast amount of structural genomics data. Towards this goal SSGCID has created an automated website tool (http://apps.sbri.org/SSGCIDTargetStatus/Pathway) that assigns selected SSGCID target proteins to MetaCyc pathways (http:// metacyc.org/).
To date, SSGCID has selected at least one target from 936 of the 1790+ pathways in the Metacyc database. To illustrate some of the information present on the SSGCID-Pathway website, the tabulated information for PWY-4981 (Proline Biosynthesis II (from Arginine) as of 10/21/12 is shown in Figure 1.
is pathway contains ve enzymes and SSGCID has targeted at least one enzyme in this pathway from ve di erent genera. For each genus the number of enzymes selected, enzymes in PDB, enzymes

Editorial
In the early days of modern molecular biology the focus of scienti c research was to simplify problems as much as possible and study one gene/protein at a time [1]. Eventually, in part due to technology advances and in other part due to necessity, groups of genes and proteins were studied and biochemical pathways were identi ed. Today, due to yet greater number of technological advancements including the ability to store and analyze large amounts of data, it is possible to study everything at the same time. ese "totality" studies gave birth to the elds of genomics, transcriptomics, proteomics, and metabolomics. In turn, the combined study of all these global analyses gave birth to the eld of systems biology [2].
Another "totality" eld brought to life with new emerging technologies is structural genomics, an e ort to determine the threedimensional structure of every protein encoded in a genome. e Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a specialized structural genomics e ort composed of academic (University of Washington), government (Paci c Northwest National Laboratory), not-for-pro t (Seattle BioMed), and commercial (Emerald BioStructures) institutions that is funded by the National Seattle Structural Genomics Center for Infectious Diseases (Federal Contract: HHSN272200700057C and HHSN27220120025C) to apply genome-scale approaches in solving protein structures from biodefense organisms, as well as those causing emerging and re-emerging disease [3,4]. e SSGCID's target selection strategy focuses on drug targets, essential enzymes, virulence factors, and vaccine candidates from a number of bacterial (Bartonella, Brucella, Ehrlichia, Anaplasma, Rickettsia, Burkholderia, Borrelia and Mycobacterium) and eukaryotic (Babesia, Cryptosporidium, Toxoplasma, Giardia, Entamoeba, Coccidioides and Encephalitozoon) pathogens, as well as ssDNA and negative-strand ssRNA viruses. Community input (in the form of target requests for entry into the structure determination pipeline) is actively solicited at http://www.ssgcid.org/home/Community.asp. Target genes are PCR ampli ed, cloned, and screened for expression in Escherichia coli. Soluble proteins are puri ed in milligram quantity, screened for crystallization, and crystals analyzed by X-ray di raction using in-house sources or o -site synchrotron beam-lines. Small proteins (<25 kDa) that fail to crystallize are queued for structure determination by NMR-based methods. All expression clones, puri ed proteins, and protein structures produced by SSGCID are freely available to the scienti c community and lay the groundwork for research in more than 40 collaborative projects. In ve years over 60 manuscripts describing SSGCID structures and methods have been published, including an entire edition of Acta Crystallographica F (Sept 2011) devoted to SSGCID [5]. Over 540 structures have been deposited into the Protein Data Bank (PDB) by SSGCID. About one third of all SSGCID structures contain bound ligands, many of which are solved by SSGCID in PBD, SSGCID clones available, and SSGCID proteins available are listed. Information on SSGCID's genus-speci c coverage may be pulled-down by selecting a genus as shown for Burkholderia in Figure 2.
Here, the current status (updated weekly) of each target in the SSGCID structure determination pipeline is provided. Yet more speci c information for each enzyme may be pulled-down by selecting the EC number to the le of the enzyme's name as shown in Figure 3 for ornithine carbamoyltransferase (2.1.3.3).
Here, every clone generated by SSGCID is identi ed with an SSGCID ID number and the status of each clone provided in more detail. For this enzyme, orthologues of the gene from seven di erent Burkholderia species were selected to increase the odds of success (B. pseudomallei, B. ambifaria, B. cenocepacia, B. multivorans, B. phymatum, B. thailandensis, B. vietnamiensis, B. xenovorans). Additional information may be pulled-down by selecting the target SSGCID ID number. For example, if ButhA.00088.a is selected then the amino acid sequence, cloning information, and in this case, the PDB ID number (4F2G) is obtained. Note that the available clones may be ordered free of charge from the BEIR repository (http://www. beiresources.org) and available protein (le over primarily following crystal screening) are available free of charge from SSGCID (www. SSGCIDproteins.org). To guide the elicitation of new targets currently not in the SSGCID pipeline for structure determination, there is a userfriendly "box" for the scienti c community to request a protein's entry into the SSGCID pipeline as illustrated in Figure 4.
In summary, the rapidly progressing eld of metabolomics is poised to make signi cant strides in identifying important metabolites generated by infectious disease organisms. e SSGCID-Pathway website tool represents a rst big step towards linking metabolites and metabolic pathways to structural genomic data with the goal of accelerating the discovery of new agents to battle infectious diseases.