Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Denmark
Received date: February 17, 2015; Accepted date: April 01, 2015; Published date: April 06, 2015
Citation: Simonsen HT (2015) Elucidation of Terpenoid Biosynthesis in Non-model Plants Utilizing Transcriptomic Data. Next Generat Sequenc & Applic 2:111. doi:10.4172/2469-9853.1000111
Copyright: © 2015 Simonsen HT, This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Next Generation Sequencing & Applications
The use of transcriptome sequencing has become very cheap, and it is increasingly being used for enzyme discovery in non-model plants. Especially in plants where no other genomic data is available (being 99.99% of all plants). This short commentary highlights the use of novel sequencing technologies in terpenoid biosynthesis characterization. It is also shown how tissue specific transcriptomic can be useful in this kind of research. Finally, a short overview of general procedures is given together with the perspectives of transcriptomics in plant biochemistry.
Terpenoids; Transcriptomic; Non-model plants; Biochemistry
Terpenoids comprise the largest group of specialized plant metabolites, with tens of thousands known structures . Terpenoids have key functions in cellular life (e.g. membrane fluidity, hormones and signalling compounds) in all kingdoms, but the vast majority of the characterized terpenoids are specialized metabolites in plants that influence the fitness of their biosynthesising plant . The specialized plant terpenoids serve as defence compound, deterrents, and pollinator attractants . The role of the terpenoids has a huge impact on where in the plant the terpenoid is biosynthesized, and tissue specific biosynthesis is an important feature to understand when utilizing transcriptomic data for biosynthesis discovery . Some are synthesized in specialized structures such as trichomes, oil bodies or resin ducts [5,6], whereas some are biosynthesized in general tissues like fruits or roots  and others are produced all over the plant . The largest subgroups of terpenoids are the sesqui- and diterpenoids. These subgroups contain molecules that are highly valuable and are used in a range of industrial and medicinal applications [9-12]. For the majority of the described terpenoids their biosynthesis has not been described and no attempts have been made yet. Specialized terpenoid biosynthesis in general consist of firstly a terpene synthase that utilize a diphosphate substrate, this is often followed by a cytochrome p450 that decorate the backbone with different oxidations (alcohols, aldehydes, ketones and carboxylic acids) [13,14]. Following the cytochromes P450, there can be one to 10’s of these, enzyme groups such as alcohol dehydrogenases, reductases, and acyl transferases further decorate the molecule to form the final bioactive terpenoid . The diversity of terpenoids is mainly due to the huge diversity within terpene synthases  and cytochromes P450  that are involved in the biosynthesis. Recently the branching point in the artemisinin biosynthesis (the reductase DBR2) have been shown to be the key regulator point that determine the final product and thus the specific terpenoid profile of a given Artemisia annua variety and therefor also important for the terpenoid diversity within Artemisia .
Recent observations within sesquiterpenoid biosynthesis studies found that less than ten cytochromes P450 had been described as part of a sesquiterpenoid biosynthesis and so far only within the CYP71 clan . This number is larger for diterpenoids, but excluding enzymes part of general metabolism such as gibberellins lowers this number significantly and limits the described cytochromes to enzymes in the CYP71 and CYP85 clans [18,19]. Among the specialized terpenoids the biosynthesis of the sesquiterpene lactone artemisinin is possibly the most well studied [14,20,21] closely followed by the sesquiterpene lactone costunolide [22-26], with several diterpenoids from gymnosperms picking up with an increasing number of publications especially from the group of Jörg Bohlmann [5,15,27-32]. The use of transcriptome data are greatly facilitating this and the number of papers utilizing this technology will dramatically increase in the coming years.
The enzyme family, terpene synthases has been known for decades, also before next generation sequencing took off. In 1995 Joe Chappel  in one of the first reviews of terpenoid biosynthesis have a chapter on “How little we know about terpenoid biosynthesis”. The same year two papers was published on terpene synthases from plants describing the biochemistry and thereby providing an enzyme and transcriptomic sequence that could be used in subsequent BLAST searches [34,35]. These and many successive studies provides the first baits for the initial searches into a newly sequenced plant transcriptome, and today several papers have described the biosynthesis of specific terpenoids along with several on the general parts of terpenoid metabolism [17,36-38].
Within sesquiterpenoids, the artemisinin biosynthesis is the best described, and this was mostly elucidated without transcriptomics available. Subsequently it has been shown that all five genes involved can be found in the transcriptome of the leaf trichomes . This provided evidence that the trichomes are indeed the cellular compartment that perform the biosynthesis of artemisinin, which has also been confirmed using Gus staining and promoter analysis of the genes involved . This has open for the use of transcriptomics to study expression levels of known and unknown genes involved in this biosynthesis in specific tissues. Other sesquiterpenoids biosynthesis has been fully or partly elucidated using transcriptomics. In 2009 and 2010, the transcriptome of the root and fruit of Thapsia garganica and the root of Thapsia laciniata (syn. T. villosa) was sequenced. This showed the presences of several terpene synthases and CYP71 genes that could be involved on the biosynthesis of terpenoids, especially sesquiterpene lactones . Using both 4-5-4 and Illumina for the sequencing, followed by thorough annotations of the sequenced genes lead to the discovery and characterization of the first step of the biosynthesis of thapsigargin from Thapsia garganica . Another example within sesquiterpenoids is the characterization of the biosynthesis of the main component in the sandalwood oil, santalol. Following the transcriptomic analysis of the plant the sesquiterpene synthase and the cytochrome p450 involved in this was discovered in 2008 and 2013 [41,42]. The characterization of the terpene synthases in 2008 did involve cDNA isolation and genome walking since the transcriptomic data at that time did not provide the same depth and coverage as the later analysis did. This is a general trend seen in non-model plants and not only for terpenoid biosynthesis .
Chemical analysis of pine trees have revealed a stunning chemical diversity of the resin. The transcriptomic analysis revealed that this is biosynthesized by only a few genes. The first genes was discovered using traditional degenerate primer design followed by genome walking, but later transcriptomic analysis was adopted. This has led to description of several genes from numerous gymnosperm including Sikta spruce, Abies balsamea and other pine trees [5,27-32,44-47]. The studies has described the biochemically characterization of a range of terpene synthases mainly found through blast searches that initially started with the general enzymes copalyl synthase and/or ent-kaurene synthase . Subsequently the BLAST searches included cytochromes p450 and this lead to the description of e.g. the multifunctional cytochrome P450 CYP720B4 that is involved in the biosynthesis of a range of conifer defence compounds .
Other examples of the use of transcriptomics for the discovery of biosynthesis of terpenoids include that of ginsenoside and saponins [48-50]. These examples are just some of those described in the literature and they include a variety of genes that have been discovered and biochemically characterized. As part of the development, the use of transcriptomic analysis will also lead to phylogenetic studies. This has been seen both within gene families and across plants species. In gene families, the phylogeny of terpene synthases have been studied in several papers. This has led to a useful classification annotated TPSa-h that include plants from bryophytes to angiosperms [16,39], within cytochromes P450 the use of phylogeny was established long before next generation sequencing and are continuously used in the annotation of these enzymes . The transcriptomic data also contain information that can be used in phylogenetic studies across species .
Lately, tissue and developmental specific transcriptomic analysis has proven to be a very effective tool in the discovery on new biosynthetic genes . In grapes, a developmental study revealed the presence of several new genes that subsequently could be characterized and shown to be involved in the biosynthesis of the peppery aroma of Shiraz wine. This study also revealed that the transcription of specific terpene synthases was depend on the maturation state of the grape, which could be correlated to terpene content during the grape maturation [54,55]. In another example in Coleus plants, the use of laser dissection and imaging revealed that special cells in the cork of the root contained the interesting diterpenoid compounds. The transcriptomic analysis of these special cells revealed parts of the biosynthetic machinery for forskolin biosynthesis [4,9]. These studies show that by combining transcriptomic analysis with modern dissection techniques the likelihood of discovering enzymes part of a biosynthetic route is significantly increased when the study is focused on specific tissues in specific developmental stages.
Discovery of any new biosynthesis of small molecules should include the following steps in order to be truly successful and utilize the full potential of the transcriptomic analysis. It is important first to establish profound chemical knowledge of the plant, especially where and when the chemical constituents are found. The use of advanced metabolomics tools such as GC-MS and LC-MS is crucial to establish at what developmental stage the constituents are produced. Utilizing advanced microscopically techniques to establish the cellular location of the constituents will show what cells to target. This will establish when and on what tissue to perform transcriptomic analysis in order to get the best coverage of biosynthetic genes. The first transcriptomic analysis should be based on tissues that clearly produce the compounds of interest. Subsequently, this can then be followed by studies that include stress inducement of producing tissues in order to enhance the biological understanding.
It is important early on to obtain very deep coverage of the RNA, since this eventually will prove very useful for sorting out chimeric and other miss assemblies. Here the use of the latest developments within transcriptomics should be utilized (these techniques constantly improves and please consult technical reviews to obtain the latest knowledge on this). A first deep transcriptomic dataset open an avenue for later transcriptomic analysis of large sample sets including time-course, developmental and stress studies. Collectively all these studies will provide a solid database that include knowledge on sequence data, expression level, and time of expression.
The obtained database can then be used for BLAST searches of gene families. Here the utilization of minimal datasets that are family specific can be very useful . Discovery of the genes of interest, have to be followed by the design of primers that can confirm that the gene sequence is also found in the living plants by PCR on the original plant material. Only then can one order a synthetic clone of the gene of interest for biochemical characterization. The PCR will confirm the in-planta sequence and even with the current technologies and assembly algorithms miss assemblies do occur.
With continuously falling prices on transcriptomic sequencing, increasing depth and coverage in the obtained datasets followed by continuously increasing amount of online available genomic data the perspectives in this field are daunting. The lack of transcriptomic sequence data will never again be the bottleneck in biology and biochemistry. However, the lack of fast biochemical and physiological screening methods will be the bottleneck for the decades to come. From just one transcriptomic analysis at € 2000, one will obtain enough RNA sequence information to keep tens of biochemistry post docs and PhD students busy for many years. Utilization of tissue and developmental specific transcriptomics will significantly shorten the discovery time as shown with both the grape and forskolin studies. Thus, chemical profiling and physiological studies of the plant prior to transcriptomic analysis is highly recommended to lower the amount of biochemical characterization needed in the later studies.
It is foreseen that numerous studies will utilize transcriptomics throughout plant biology and that the field will explode in publications within the next couple of years.