Received date: August 02, 2010; Accepted date: August 22, 2011; Published date: August 25, 2011
Citation: Kikuchi J, Ogata Y, Shinozaki K (2011) ECOMICS: Ecosystem Trans-OMICS Tools and Methods for Complex Environmental Samples and Datasets. J Ecosys Ecograph S2:001. doi:10.4172/2157-7625.S2-001
Copyright: © 2011 Kikuchi J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and and source are credited.
Visit for more related articles at Journal of Ecosystem & Ecography
Despite the apparent detachment of modern industrial societies from natural ecosystems, such systems continue to provide enormous and essential benefits in the form of ecosystem services, including potable water, clean air, and forest products and fisheries. Further, engineered ecosystems, but ecosystems nevertheless, provide the majority of agricultural and additional forestry products. Relationships between ecosystem services and human needs can be identified by connecting genomic information of component organisms with enzyme functions, metabolic pathways and resulting product chemicals such as biomass. Tools that advance the understanding of ecosystem function from the perspective of both environmental and metabolic systems are an important aspect of the emerging systems approach to biological science. We introduce here a web service designated “ECOMICS” to provide an omics approach to clarify such relationships. ECOMICS comprises the E-class web tool for taxonomic (metagenomic) classification based on prokaryotic and eukaryotic ribosomal sequences and selected functional (enzymatic) classification based on sequential domains, FT2DB for the digitization of NMR spectra for chemicals from metabolic to macromolecular phenotyping, Bm-Char for the chemical (macromolecular biomass) assignment of lignocellulose components, and HetMap for identifying and viewing correlations between heterogeneous trans-omics data sets that are produced by such web tools. This website is open to the public domain: https://database.riken.jp/ecomics/.
Ecosystem; Database; Environmental samples; NMR; DNA sequences
A majority of the rapid industrial development in the previous century has been dependent upon intensive exploitation of underground resources . To avoid future resource depletion at a time when many developing countries are exploiting available resources, we need to rather consider the conversion from "oil-refinery" to "bio-refinery" as an innovative approach for sustainable industrial production based on primary industries . Biorefinery aims to extricate us from a society dependent upon underground resources, such as ore and crude oil, to develop successful skills to recycle materials from a variety of organisms, and to utilize such skills on a grand scale. It has been studied in many countries, especially in the United States, where biorefinery factory complexes have been constructed. In a biorefinery, i) photosynthetic organisms, such as trees, grasses, and algae produce biomass by carbon metabolism through photosynthesis; ii) carbon dioxide, released from the soil since the industrial revolution, is fixed as biomass; and iii) biochar residue, after utilization of the biomass, is returned to the soil to achieve carbon-negative balance and exchange carbon dioxide, which may cause the greenhouse effect, into a usable renewed resource .
Photosynthetic organisms are effective resource crops in terms of untapped biomass utilization. For the utilization, it is critically important to understand a methodology to cultivate and make use of resource crops in production processes. In monocultural transformation, such as perennial grasslands transforming into cornfields, and tropical rainforest into oil palm plantations, although the transformation can be achieved in several years to decades, carbon originating from the soil is continuously released for a tremendously longer timeframe, e.g., over 840 years, as estimated by Fargiome et al. . In biomass utilization, Campbell et al.  mentioned that the biogeneration of electricity by biomass combustion is more effective than bioethanol production by saccharification of lignocellulose. However, previous reports that estimated the influence of aerosol released by biomass combustion on cloud core production and global cooling  must also be considered, and also economic effects, including health costs .
Biomass recycling is achieved by industrial and metabolic processes. Through industrial processes, a variety of biochar components, whose chemical structure and composition depend on the burning temperature, are produced as residue . In particular, well-selected biochar components may lead to the promotion of plant growth. Such application of biochar has allowed us to enter an era of advanced technology for analyzing complex organisms and chemical compounds, such as those found in soil and aerosol . In grasslands, perennial weeds as resource crops are grazed by herbivores, and the excretory products are biodegraded by insects and microorganisms in the soil. Such metabolic circulating systems have existed in nature since prehistoric times. If we apply such effective and functional information to industrial and metabolic processes related to resource crops, and if we regulate the processes efficiently, then we will succeed at creating a sustainable humanosphere.
The number of different species of organisms is over 100,000, even by underestimation, and it is estimated that we are only fully aware of approximately 1% of these. Genomes in such diverse organisms include the blueprints to acquire nutrients in various environments and survive using their metabolic systems. The blueprint, i.e., genomic DNA, contains the information required for synthesizing amino acids and then incorporating them into proteins. By specifying correct functional groups, such as amino acid side chains at different sites in a protein structure, a variety of molecular functions, which are carried out by the proteins and necessary for survival, is expressed.
Information science has had marked influences in protein science. Along with structural genomics [10-12], nuclear magnetic resonance (NMR) technology , and instrument development in the field , global application of technological innovations in DNA sequencers has extended the grasp of our analysis into entirely new areas such as transcriptomics and metagenomics, and the present sophistication of retrieval techniques from mass data resources have had a tremendous impact in the area. In recent years, much of protein science tends to focus on interaction and binding affinity of candidate medicines with receptor proteins, in studies of drug design, and these types of experiments are crucial in applications discovery. However, that constitutes only a small component of the total biological information resources at our disposal.
To give an example of a much more encompassing framework for visualizing this knowledge, we have focused on the metabolome and biomass. The metabolome is based on the proposition that the adjustment of balance between metabolites contributes to homeostasis [15,16]. Biomass is based on the physical principle that the conversion of energy into polysaccharides, lignins, lipids, etc. leads to the production of complex structures. However, much of the chemistry that would be necessary for effective utilization of unused biomass has not yet been described. Although woody biomass has been actively utilized because of its reasonable strength and refractoriness, exploitation has led to a decrease in soil fertility, deforestation, and groundwater supply. To avoid depletion of such resources, the improvement of monitoring and analysis technologies that are applicable to complex systems including plants, insects, and microorganisms in the soil is required. For such technologies, integrated applications and quantitative information retrieval (profiling) based on bioinformatics, stable isotopic labeling [17-19], NMR [20-24], and giga-sequencers are effective approaches that we now have at hand.
If we assume that strings of nucleotides (building blocks of genetic information) are primary biopolymers, and proteins (which carry out molecular mechanisms based on the translation of genetic information) are secondary biopolymers, then by an extension of this logic, biomass is composed additionally of polysaccharides as tertiary biopolymers (i.e., the product of proteins), and lignin as quaternary biopolymers (higher order biopolymers). Biomass has low solubility and degradability and thus has limited development of molecular biological protocols for biomass research, when compared with what has been attained for proteins and nucleotides. To measure solid samples, scanning electron microscope is used for analysis of surface structure, and solid-state NMR and Fourier transformation infrared spectrometry are used to gain information about internal chemical structure. For example, to utilize trees and grasses in biorefinery processes, it is important to optimize mechanical or ionic liquid-pretreated decay of their crystalline structure. To evaluate such decay, chemical shift values of C4 or C1 sites in cellulose, which may be measured by solid-state NMR, have been utilized as a marker of structural change.
Although physical properties that are characteristic of biomass are based on its macromolecular structure, higher resolution information acquired on its chemical composition leads to association of its physical properties with coordinates on a higher-order genetic blueprint. For example, genetic modification (GM) of the syringaldehyde:vanillin ratio, or compositional unit, of biomass has been attempted, especially in the United States, in order to increase its degradability, and there are extensive projects on strategic GM of biomass grasses and phenotypic profiling of their cell walls that have been described [25-29]. For such phenotypic profiling, it was shown that imidazole is effective at breaking hydrogen bonds in cellulose. However, it is still difficult to demarcate the chemical signals of hemicellulose, which shows broader chemical signals, and therefore, higher resolution will be necessary.
A similar strategy has been applied to the analysis of biochar and humic acids, which are utilized to return biorefinery residues to farmlands. However, chemical information on these macromolecules is insufficient to promote the utilization of unused biomass, and thus, technical developments for measurement, and informational developments based on NGS output, are required.
In analyses of small molecules, technical developments in metabolic profiling are continuously advancing, in areas such as mass spectrometry and NMR data processing. These developments have been reviewed by Kikuchi and Hirayama . In this paper, we will introduce approaches we have considered to elicit information from extraction residue and unassigned chemical signals. NMR can be utilized for evaluating intact living tissue, low-solubility residue, as well as soluble extracts. Thus, we have developed a novel protocol of metabolite annotation to more effectively utilize plant resources. For this purpose, a systematic methodology to evaluate components of extracts and their residues and to detect changes in the components was required. We established this methodology by binning and matrix operation of two-dimensional (2-D) NMR data using 13C-labeled plants .
Information on chemical structure is also as remarkable as the material resources themselves. Therefore, we devised a quantitative index named "p-value" to evaluate metabolic profiling data, similar to the E-value of the BLAST program, which is the prevalent homology search engine, and we developed the SpinAssign program, which adopts a method using p-value for metabolite profiling . By applying the program to 13C-labeled plants, we were able simultaneously to acquire 221 metabolite candidates. This methodology is a useful approach for databases of reference metabolite chemical shift, and it is available at the SpinAssign website  on the PRIMe server .
Recent advances in sequencing techniques to decode genomic and transcribed sequences have enabled us to acquire a vast amount of sequence information. The acquisition of genomic information leads to the regulation of biomass synthesis and degradation in order to utilize unused biomass and develop strategies for GM. Several equipment makers have developed NGSs, such as the Genome Sequence FLX system of Roche (http://www.454.com/), the 5500xl SOLiD system of Applied Biosystems (http://www.appliedbiosystems.com/), and the HiSeq 2000 system of Illumina (http://www.illumina.com/). Although the length of readable sequence obtained from such NGSs is still shorter than routinely obtained by Sanger's method, the amount of available sequence information is rapidly increasing. A transition from decoding genomes of model organisms to decoding genomes of various organisms, at the level of individual variation, may be observed in research in the field. Since 2005, NGS data for genome decoding have been continuously and increasingly published [35-50], and this trend will continue and intensify.
Sequencing using NGSs is also applicable to transcriptomics [51- 58], which evaluates quantitative gene expression, to metagenomics [59-63], which obtains genomic sequences of whole microorganisms in a specific environment, as well as to genomic sequencing. Especially, gene expression analysis of practical organisms whose genomes are not yet decoded and metagenomic analysis of microorganism flora will provide the genetic information needed for utilization of unused biomass. In general, the biological phenomena involved in biomass synthesis and degradation include various elements, such as elements in the metagenome, metatranscriptome, meta-metabolome, etc. [64- 69]. To reveal and understand such phenomena, a nonlinear approach, such as network analysis, will be useful.
As mentioned in the previous sections, rapid development of information science is revolutionizing the life sciences. However, omics research in model organisms, such as yeast and rice, represents the tendency of "linear" analyses, in which each omics output is collected and analyzed without integrating with other omics information. Recent advances in sequencing technology has promoted the shift of decoding genomic sequences of model organisms to profiling genomic information included in environmental and ecological samples, i.e., metagenomics . Figure 1 represents a hierarchical pyramid of various organisms including plants, animals, and microorganisms. Organisms interact not on the basis of "linear" but rather "nonlinear" relationships, as in a network or web, through a food chain and symbiotic relationships. Similarly, there is a hierarchy of molecules in a cell; it start with genomic DNA, followed by RNA, protein, metabolites, and, finally, biomass supramolecules, such as lignocellulose. Organisms are also related to each other through their metabolic systems.
Figure 1: Lignocellulose located between macro-scaled and micro-scaled time-space regions. The left scheme represents the ecological pyramid: its time scale is year order and size scale is meter order. The right scheme represents omic space: its time scale is milli-second order and size scale is nano-meter order. Lignocellulose is accumulated and degraded in the ecosystem along the year order time scale and is formed as construction of the meter order size scale. It is, however, self-organized by providing composition materials under the genetic regulatory system, similar to other biomacromolecules.
The era of "planar" or integrative approaches is dawning in omics analysis. For example, the RIKEN Omics Research Platform, including NMR and Life Science Accelerator, is applicable to assembling information on lives existing in the global environment such as "population-omics" to extensively collect polymorphism of sequences and phenotypes, and "eco-omics" to systematically collect information on interactions between environmentally-related organisms. Research based on techniques to utilize biomass require high-throughput analyses of variability of non-model plants and of interactions between microorganisms, and, thus, datasets obtained from such high-throughput analyses are accumulating at an increasing rate. A database to provide such integrated information will be a model for next-generation information resources.
To apply an eco-omics approach to environmental samples, we developed the ECOMICS web services as a source of information and tools useful for trans-omics approaches in ecosystem and biomass research. ECOMICS is composed of i) the E-class web tool for taxonomical and functional classification of nucleotide and peptide sequence data (e.g., metagenomic datasets obtained from an NGS); ii) the FT2DB tool to digitize NMR spectra for downstream chemical phenotyping in quantitative analysis of chemical signals; iii) the Bm-Char webtool to assign biomass-related components (especially components of lignocelluloses, such as hemicellulose and lignin); and iv) the HetMap tool to integrate and view associations between heterogeneous trans-omics datasets (such as metagenomic or enzymatic data obtained from E-class, chemical signal data from FT2DB, and lignocellulosic data from Bm-Char).
E-class (Figure 2) implements a BLAST search for a large number of sequences (e.g., a metagenomic dataset) in useful sequence databases, using the search for taxonomic and protein domain classification (i.e., small subunit ribosomal RNA or ssrRNA) of both prokaryotes and eukaryotes and enzymatic domains of carbohydrate-binding modules (CBM) and cytochrome P450 (CYP). The categorization of ssrRNA, CBM, and CYP are based on public websites, i.e., SILVA , Cazy , and Nelson , respectively. When a user submits a sequence dataset in multi-FASTA format, E-class provides a pie chart that represents the categorized distribution of query sequences, the BLAST output data, and the pie chart legend with the numbers of categorized sequences. We are planning to add databases for various protein functional analyses.
Figure 2: E-class for taxonomic and functional classification of an environmental sample. (A) The query form of E-class, involving the following 4 steps: 1) to input a sequence dataset or to upload a file in multi FASTA format; 2) to select a database of interest, e.g., the 16S rRNA database for prokaryote taxonomic classification; 3) to select an output style (as of now, depicting a pie chart or a bar char); and 4) to select a taxonomic level when selecting rRNA as a database of interest. (B) An example of classification by E-class.
FT2DB (Figure 3) provides a web-based service and a downloadable suite of programs to produce a digitized data format of NMR spectra. FT2DB accepts "nmrPipe"-formatted data of 1- and 2-D NMR spectra 81 in the ECOMICS MS. It results in binning data of an NMR spectrum, which allows a user to specify a region of chemical shift of interest. In the web service, it provides a pie chart representation of the overall distribution of chemical shift regions, which alleviates the need for a comparison of different NMR spectra. Output data from FT2DB may serve as a query data set for HetMap.
Figure 3: FT2DB for generating a tab-delimited binned text spectra. (A) A user can query multiple numbers of 1-D and 2-D NMR spectrum files with a specified region and number of divisions of binning. (B) A user query results in a binned text spectra and a quick overview of the queried spectra, with each as a piece in a pie-chart view. The binned text spectra data is easily transferred to editing software by simply copying and pasting the data.
Bm-Char (Figure 4) is able to assign query chemical shifts to 88 known chemical signals of lignin and hemicellulose, including signals of 42 aromatic and 17 aliphatic sites in lignin, 26 hemicellulosic sites, and 3 uncategorized sites, as previously reported . Results are represented as a pie chart of categorized chemical signals, according to the assignment of known signals, with the numbers of the categorized signals included in the legend. As our assignment of chemical shifts related to lignocellulose improves, we plan to adopt the newly assigned signals into Bm-Char.
Figure 4: Bm-Char for characterizing the lignocellulosic component, the main biomass product. (A) The portal includes the Bm-Char table that represents relationships between 88 chemical group 1H-13C NMR signals of lignin and hemicellulose from plant macromolecular fractions, a picture of the plant sample, and structures of lignocellulose components. (B) Pie charts showing the retrieval results for 11 queried plant samples. Different plant species show compositional differences in lignin and hemicellulose. The charts are arranged according to the 2 axes of the main principal components based on information of the lignin aromatic region of the 2-D NMR spectra; i.e., tree-to-grass (horizontal) and syringyl-to-guaiacyl (vertical). The 3 tree species are located on the “tree” side of the quadrant, while all grasses are located on the “grass” side.
HetMap is an integrated service that calculates a correlation matrix of heterogeneous datasets, including the common multiple variables (such as datasets that originate from a common environmental sample), that are obtained through the E-class, FT2DB, and Bm-Char analyses. It depicts a correlation matrix based on query datasets as a heat map, in which each cell represents a correlation of a pair of elements. The colors in the heat map correspond to the correlation coefficient (i.e., red and blue cells represent positive and negative correlation, respectively). HetMap enables the user to color cells according to statistically significant correlation coefficients.
A suite of the ECOMICS web services is applicable to datasets acquired from an environmental sample, in which various organisms, proteins, metabolites, and biomass coexist and interrelate in terms of particular processes, many of which are still unknown or only poorly described. We believe that these tools will provide a useful approach to understand environmental processes at a higher level. All software and web services are freely available at the ECOMICS web server (https://database.riken.jp/ecomics/).
This research was supported in part by Grants-in-Aid for Scientific Research for challenging exploratory research (J.K.) and Scientific Research (A) (J.K.) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. This work was also supported, in part, by grants from Research and Development Program for New Bio-industry Initiatives of the Bio-oriented Technology Research Advancement Institution (BRAIN) (J.K.), and the New Energy and Industrial Technology Development Organization (NEDO) (J.K.).
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals