Vijeta Sharma, Manjul Tripathi and KJ Mukherjee*
School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
Received date: May 14, 2016; Accepted date: June 22, 2016; Published date: June 27, 2016
Citation: Sharma V, Tripathi M, Mukherjee KJ (2016) Application of System Biology Tools for the Design of Improved Chinese Hamster Ovary Cell Expression Platforms. J Bioprocess Biotech 6: 284. doi:10.4172/2155-9821.1000284
Copyright: © 2016 Sharma V, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Bioprocessing & Biotechniques
CHO cells have an impressive monopoly over other expression systems in the production of complex therapeutic proteins at large scale. There is thus a need to design superior cell lines with improved product quality and enhanced expression levels which requires an understanding of the cellular components and their interactions from a ‘systems’ point of view. With the emergence of critical ‘omics’ data sets for CHO cells which include transcriptomics, proteomics, metabolomics, fluxomics, and glycomics; some clarity has emerged in elucidating the global regulatory mechanisms that control protein over expression. Integrating this vast amount of information with bioprocess data can help point out significant targets for cellular modification that are required for hyper production. In mammalian systems, the information flow from genes to phenotype is mediated by complex regulatory networks and mathematical modeling which incorporates this framework would also assist in the identification of crucial targets for modification. This review updates recent advancements in OMICS technologies and the synergistic use of these platforms for designing improved cell lines.
CHO; Modeling; OMICS; Recombinant protein expression; System biology
CHO: Chinese Hamster Ovary; mAb: Monoclonal Antibody; GC-MS: Gas Chromatography Mass Spectroscopy; LC: Liquid Chromatography; GS: Glutamine Synthetase.
CHO cells are the favored hosts for scaling up production of complex recombinant proteins of therapeutic importance. Examples include the production of various therapeutic biologics like Factor VIII, tPA, erythropoietin and antibodies like Rituximab, Bevacizumab etc. As per latest estimates CHO cells are used for the production of more than 70% of all therapeutic biologics . As such they are amongst the most well characterized cell lines with well-established protocols for genetic manipulations. With the growing market for biosimilars, improved cell lines are required which would address issues of product quality like consistency of glycosylation patterns over different stages of production, higher product yields and specific productivity. An improved understanding of cellular dynamics would allow us to design hosts with improved product formation kinetics, promote growth to high cell densities and sustain expression for longer periods without any decline in product quality. To unravel the key nodes of the cellular machinery that can lead to hyper production, a better understanding of the complex genetic circuits of mammalian cells is required. Protein production is a complex process consisting of five major steps: transcription, RNA splicing, translation, folding and post translational modifications. Additionally a cellular stress response is triggered upon recombinant protein synthesis which regulates these steps and also plays a significant role in deciding the physiological status of the cell. Hence, for high level production of consistently good quality proteins, we must investigate all these steps from a systems perspective, their synchronized responses and the regulatory mechanisms involved during the process of growth and product formation (Figure 1).
Recent reviews have highlighted the importance of OMICS studies in the context of mammalian cells as production hosts [2-4]. In this review we provide an overview of the OMICS studies that have been carried out, given its usefulness in both metabolic modeling and in the design of the superior CHO cell platforms. We thus underline the need to focus on the interactome emphasizing their role in providing leads for rational cell design.
High throughput genome based technologies have improved our understanding of cellular behavior at the systems level . With the advent of various OMICS techniques, it is now possible to simultaneously monitor multiple levels of the cellular response and its dynamics . Integrating and correlating this data can help in analyzing and reconstructing the metabolic and regulatory network of the cell using tools of computational biology. This in turn can provide new targets for genetic modification and the generation of a superior host cell line to aid upstream development. System biology utilizes various OMICS tools as explained below.
Metabolic engineering of CHO cells has been lagging because of the lack of comprehensive genetic information. This genomic information is required for the efficient utilization of the resources generated by transcriptome, proteome and metabolome analysis . The tools that have been used for genomic analysis of CHO cells are mostly Illumina based sequencing, De novo sequencing and Next Generation Sequencing (NGS).
Due to the recent availability of genomic information, researches at the Manchester Centre for Integrative Systems Biology (MCISB) have reconstructed a metabolic model for CHO cells from the genome sequence and literature under the BioPreDyn project that is highly annotated and is updated regularly . This metabolic model can be used to understand the cellular behavior under different culture conditions. By comparing the human genome with CHO genome, Xu et al.  were able to find a 99% match of the human glycosylation genes, out of the predicted 24383 genes in CHO, although most of them were not expressed in the exponential phase of growth. This information could be useful in designing hosts with improved glycosylation patterns. Le et al.  used all the available CHO genome references including the CHO K1 genome sequence released in 2012 (NCBI Accession: GCF_000223135.1) and 2014 (NCBI Accession: GCF_000419365.1) to analyze RNAseq data using TopHat2 version 2.0.13 and got a 89.7% mapping rate, a significant improvement over the earlier value of 73.5%. Based on this information Goudar et al.  developed an RNAseq analysis tool in R platform to analyze large amounts of data generated by comparative transcriptomics and got meaningful biological information. An important discovery is that CHO cells are prone to genetic rearrangements during the various steps of cell line development [12-14]. Kaas et al.  sequenced CHO DXB11 cell lines and found a large number of haploid genes as well as specific areas of instability in each chromosome. Modifications in these regions have a critical impact on expression, viable cell density and most importantly on product quality. Thus quality assurance programs need to focus on sequencing every developed cell line using the known Chinese Hamster Genome sequence as a standard. Since chromosome number one and four were found to be more stable, they should be the preferred targets for knock-in of heterologous genes. To identify the reasons behind low productivity of coagulating factor VIII, Kaas et al.  used deep sequencing and found that no correlation exists between RNA quantity and protein productivity because of transcript truncation. This discovery would not have been possible only using transcriptomic data analysis.
The mRNA pool in the cell gives a dynamic snapshot of the differential levels of gene expression. Transcriptomic profiling allows us to monitor the changes in expression patterns over time and thus provides a qualitative as well as quantitative characterization of the cellular physiological response . Micrroarray and RNA sequencing techniques have made it possible to quantify this expression pool in a single assay [17-20]. This technique has been used extensively in the field of bioprocess optimization and analysis of industrial scale production. Unlike bacteria, mammalian cells do not show a rapid response to environmental factors which makes sense from an evolutionary perspective. In a typical mammalian cell, around 15,000 genes get transcribed at any given time, out of which only 10% are well expressed. There is a very little change in the expression levels of these well expressed genes in the microarray data, and most of these changes fall below the statistically significant region. Only rare genes, for example regulators of gene expression, tissue specific genes, death inducing genes, genes related to unfolded protein response etc. have been found to be differentially expressed under stressed conditions. However, these genes have a very high impact on cell physiology and the changes in their expression levels needs close monitoring. Thus biological information needs to be integrated into a statistical 2-step analysis, by first filtering out the abundant genes and then looking at the differential expression of low expressing rare genes. Improved normalization techniques are now available which do not make assumptions about “lack of variation” in between gene sets, thus allowing us to capture smaller changes in expression levels .
Clarke et al.  developed a statistical modeling technique utilizing partial least squares and regression methods to predict the specific productivity of CHO cell from the gene expression data obtained by microarray experiments. Using transcripome data of batch and fed batch cultures Wong et al.  found 4 early inducing novel targets for genetic modification against apoptosis. Schaub et al.  compared the expression profiles of low and high titre fed batch processes of IgG producing CHO cell lines and by a detailed analysis of their metabolism derived a rational procedure for media design. To understand how screening for enhanced Methotrexate resistance generates overproducers, Viswanathan et al.  looked at the transcriptome and found that the increase in productivity was not only due to gene amplification but also due to increase in the secretory activity of the cells. The results emphasize the need to design improved secretory systems in CHO cell lines which could improve expression levels. In a meta-analysis of transcriptomic data it was observed that the transcriptome pool was significantly different in various variants of CHO cell lines . With the available genome sequence, Becker et al.  have developed a global transcriptome analysis platform exclusively for CHO cells called the CHO14K microarray. This consists of 41,304 probes covering 94.85% of the CHO genome. Harreither et al.  used comparative microarray profiling of pre-selected host cell subclones based on their production profiles. They used transient expression to circumvent the problems associated with change in expression levels due to different sites of integration and found that the specific productivity was an inherent characteristic of the native cell line. These results underscore the importance of genome level modifications in creating hosts with higher expression capability.
Very little correlation has been found between the levels of transcriptome and proteome suggesting that the efficiency of translation is non-linear. There is therefore a need to investigate translational regulation by profiling the whole polysome loaded mRNA of the cells referred to as the translatome . By correlating transcriptome and traslatome data Courtes et al.  were able to prove that transcription and translation were uncoupled for 95% of the genes and simultaneously identified highly stable translated genes in DG44 cells. In future, translatome data can be used to identify the effect of codon usage on translational efficiency and to design translationally efficient vectors. This would also help in the study of translational regulation during stress and the identification of crucial differences between highly expressing and poorly expressing clones. Recently, using a slightly different approach, Patrucco et al.  used targeted protein translation enhancement using synthetic SINEUP noncoding RNAs. These consist of partially overlapping sense coding mRNAs which target the secretory leader peptide and enhance expression of secretory proteins.
Non-coding small RNAs, specifically miRNAs, are important regulators of cell phenotypes like growth, resistance against stress and protein production. A single miRNA has the ability to bind to more than 100 unique mRNA molecules and thus miRNAs have a major role in post transcriptional regulation. miRNA composition depends on environmental factors and is cell line specific. Thus miRNA play an important role in cell physiology and can affect parameters like growth, apoptosis, autophagy, stress responses, metabolic activity and protein secretion. By using them for cell engineering, we can improve production levels by alleviating the cellular stress response . About 400 conserved miRNA have been found till date in CHO cells .
Hackl et al.  identified and annotated the miRNAome of six cell lines using next Generation Sequencing to identify the most conserved miRNA pool. Kelly et al.  used a miRNA sponge decoy vector to deplete the growth inhibiting miR-34 family in growing cells, thereby increasing productivity ~2 fold. Strotbek et al.  identified 9 miRNAs which positively affect protein expression, out of which miR-557 and miR-1287 were found to increase both viable cell density as well as specific productivity, thus increasing the product titers 1.3 fold. Druz et al. [36,37] found that a novel microRNA mmu-miR-466h-5p gets activated by glucose depletion through oxidative stress and inhibition of histone deacetylation and affects apoptosis regulation in CHO cells. Stable inhibition of this miRNA improved apoptosis resistance and protein production in CHO cells . Jadav et al.  found that over expression of miRNA-17 enhanced viable cell count and increased the specific productivity by 2 fold. miR-23 depleted CHO cells were found to be more efficient in oxidative metabolism due to the enhancement of the electron transport system activity by 30% . The success and future challenges associated with this strategy have been reviewed recently .
The proteome pool of the cell provides the most direct estimate of the metabolic fluxes that governs the cellular phenotype. The dynamics of proteome reflect the physiological changes taking place during the process, giving an overview of the active metabolic pathways and regulatory processes. A comparison of the proteomes of different cell lines or processes can provide a window to the unique possibilities for genetic modifications. A proteome analysis thus offers innumerable targets for generating better hosts for bio-therapeutic production. Advancements in high throughput mass spectrometry and the availability of genomic sequences for CHO has made protein identification much easier. The proteome database construction for CHO is still under progress and RNA-Seq based transcriptome profiling holds the promise of enriching this database.
Some of the proteomic studies which have led to improved cell lines are summarized in Table 1. These studies have provided rational guidelines for the modifications required to sustain productivity, increase cell growth, improve media utilization, and give consistent glycosylation patterns. Strategies which help in channeling the metabolic flux towards product formation and delay apoptosis have also been implemented. The application of proteomics for designing better cell lines has been recently reviewed .
|Cell Line||Goal||Results and future directions||Reference|
|CHO||To find the changes due to exposure of CHO cells to butyrate & zinc sulphate in the production media.||Many proteins were induced including enolase and thioredoxin|||
|rCHO producing antibody||To find the proteomic changes due to over expression of Bclx(L) gene & to find biomarker(s) for better growth||Galectin-1, a growth inhibitor found to be downregulated due to BclxL introduction. proteins required for the protein synthesis pathway were upregulated and those related to cell cycle were differentially regulated||[43,44]|
|CHO||To understand the mechanism by which cMyc overexpression increases integrated viable cell count||Proteins related to cell proliferation, protein biosynthesis and energy metabolism were upregulated and related to cell adhesion were down regulated|||
|CHO-DG44||To prepare a proteome database for CHO-DG44 cells||Out of 1400 spots 179 spots were annotated successfully|||
|rCHO producing Erythropotin||To study intracellular responses due to serum free adaptation||2 proteins HSP60 and HSC70 found upregulated were used to engineer cells to get robust cells with enhanced cell concentration and reduced adaptation time for SFM adaptation|||
|rCHO producing antibody||To understand the cellular basis of increase in specific growth rate by addition of hydrolysates||Proteins for cell proliferation, metabolism & cytoskeleton formations upregulated|||
|rCHO K1 producing antibody||To understand the molecular basis of sustained productivity||12 differentially regulated proteins that were annotated can be used to engineer cells to get increased specific product formation rate.|||
|rCHO producing anti-Rhesus D factor||Studying apoptosis at various phases of cultivation||79 proteins were found to be differentially expressed out of which 30 proteins were characterized. These can help control apoptosis.|||
|CHO producing SEAP||To find the biological explanation of decrease in growth and increase in productivity due to dysregulation of miR-7||Stathmin and catalase were found to be possible targets of miR-7, study can be used to channelize the cellular energy towards more productivity|||
|Chinese Hamster Ovary||Complementing genomic sequences by cellular proteome, secretome and glycoproteome||Total 6164 proteins were identified among which for 2475 proteins the transcriptome was not detected|||
|Chinese Hamster Ovary (GS- cell line)||Proteome comparison of high and low antibody producing cell lines||2890 proteins were detected from which 277 were found to be modulated by culture conditions. Only 12 of them were found to have consistent modulation.|||
|Chinese Hamster Ovary (DG44)||To find biological basis of increase in specific productivity but decrease in IVCC at higher glucose concentrations||Increase in productivity was due to up regulation of NCK1 and down regulation of PRKRA thus generating ER stress resulting in cell death and decrease in IVCC|||
Table 1: Proteomics for mammalian bioprocess.
Secretome analysis is focused only on the set of secreted proteins present in the supernatant of growing cultures which include growth regulating factors, proteolytic enzymes and other small proteins and peptides that affects cell physiology as well as the quality and quantity of the recombinant protein product (Figure 2). Lim et al.  studied the secretome of CHO cells and found 260 secreted proteins. They identified 8 unique growth factors which when supplemented in various combinations into protein free basal media improved the growth of single cell transfectants by 30%.
A major problem with secretome analysis is the large variation in the results which depends on the cell line being used as well as media composition. More importantly the secretome profile changes significantly as the cells go through different stages of growth. This prevents us from getting a deeper insight into the critical factors which is needed to generate a cell line which can sustain product formation for longer periods and also help in reducing purification costs. The challenges associated with secretome studies have been reviewed by Chaudhuri et al. .
A major problem with maintaining product quality is the variation in the glycosylation pattern during different phases of cell growth, culture environment and the number of passages undergone by the cells [58-62]. Glycosylation is a template independent process, which lacks proof reading and has no direct relationship with the transcriptome and proteome. Independent studies are therefore required on the glycome changes associated with the change in extracellular environment and during the various phases of growth which in turn depend on the bioprocess strategy used for production. To get consistent and proper glycosylation a better understanding of the process is required. Amand et al.  used a mathematical model incorporating regulatory constraints and found that glycosylation can be controlled intrinsically by the cells. This information can be used to engineer cells for glycosylation control during biologics manufacturing. North et al.  compared the glycomes of normal CHO cells with their glycosylation mutants to study the effect of these mutations and discovered diverse patterns of glycosylation. This type of study can help in designing strategies for glycosylation related cell engineering. Baik et al.  were able to produce “Heparin” which is otherwise obtained from animal tissues by metabolic engineering of CHO cells. They modified the enzymology of glycosylation in the heparin sulfate (Lower sulphated glycosaminoglycan) synthesizing pathway normally present in CHO, using human N-deacetylase/N-sulfotransferase and mouse heparan sulfate 3-O-sulfotransferase enzymes. Wong et al.  over-expressed the CMP-sialic acid transporter which enhanced the sialylation of IFNγ glycoprotein by 4 to 16%. In a separate study they looked at the effect of nucleotide sugar precursors as feed on the intra cellular glycome and found that differential feeding can be used as a strategy for manipulating glycan quality .
With the advent of techniques like glycans microarray, mass spectroscopy and lectin microarray, it is now possible to quantitatively and qualitatively analyze the glycome profile and how it changes under different culture conditions. This in turn can be used to understand and hence modify the glycosylation pattern using different cellular engineering approaches and/or changing the culture conditions. This would help in identifying the best conditions for improved and consistent glycation of therapeutic proteins.
The recent shift towards utilizing metabolomic tools for analyzing cell cultures has helped in enhancing recombinant protein yields at large scale as well as obtaining a product of consistent quality . Metabolic foot printing and fingerprinting are the two essential components of systems biology not yet completely developed for cell culture. One reason is the presence of many compartments within the eukaryotic cell, which makes it harder to distinguish the metabolome of the cytoplasm vis. a vis. the cell organelles. The second problem is the use of complex sampling protocols with multiple processing steps that is required for metabolome quantification. Also, mammalian cells being fragile are prone to metabolite leakage. Mataszczyk et al.  used sub cellular fractionation using Digitonin to distinguish between the metabolite pools of the cytosol and sub cellular compartments and were able to unveil the interplay between them. Recently simplified quenching and extraction procedures have been reported which could improve the quality of data as reviewed by Leon et al. .
Metabolomics and in silico modeling of a CHO fed batch culture was used by Selvarasu et al.  to characterize the metabolic changes that take place when cells shift from growth to a non-growth phase. Ghorbaniaghdam et al.  investigated the metabolome of antibody producing inducible cell lines and developed a kinetic model that integrated metabolic regulation. This model could analyze and describe the metabolic differences associated with clonal variants. The information gained by this model could help in clone selection as well as bioprocess development. Cocom et al.  used metabolomic analysis to find out the difference in amino acid consumption between naïve and recombinant CHO cells producing monoclonal antibodies. The information gained from this consumption analysis can be used to design better feeding strategies. Zang et al.  used LC-MS based metabolomics methodology to find the changes in media composition due to prolonged storage and identified Riboflavin photosensitized degradation of Tryptophan as the cause behind slow growth of cells. Comparative metabolomic profiling of cells grown under different bioprocess conditions could provide a deeper insight into cellular physiology but this research is still at a nascent stage. Sellick et al.  used GC-MS based metabolite profiling of GS-CHO cell lines to identify pro survival and pro-productivity metabolites and found a clear correlation between the metabolite patterns and cellular state. Le et al.  used time course metabolome analysis using GC and LC MS to decipher the amount of noise introduced during metabolome studies of mammalian bioprocesses. They concluded that statistical corrections are required at each stage in order to derive accurate dynamics of intracellular metabolism. This can help both in the identification of distinct gene targets as well as bioprocess optimization.
Fluxomics or metabolic flux analysis
Since metabolic fluxes indicate the functional rate of a reaction, they are considered to be the final determinant of cell physiology. The integrated phenotype of genome, transcriptome and proteome finally gets reflected by the fluxes of metabolites in the cellular reaction pool [77-79]. The metabolism of mammalian cells changes drastically in a bioreactor; therefore it is crucial to understand the dynamic flux pattern in a production environment. Till now there is very little published data about the metabolic fluxes and regulations of intracellular metabolism during large scale cell culture possibly because most of the work has been done in industrial R&D settings.
There are two methods of flux estimation; isotope tracer studies [80-82] and metabolite balancing . Metabolite balancing is a steady state flux estimation method. Nolan and Lee  used simulations with kinetic rate expressions to determine pseudo-steady state flux distributions in CHO cells and generated a dynamic model of metabolism. Ahn and Antoniewicz  have reviewed in detail the application of metabolic flux analysis for CHO cell culture and the significance of dynamic flux estimation. Zamorano et al.  used dynamic MFA to compute the macroscopic reaction rates during each phase of batch culture of CHO-320 cells.
Using 13C-metabolic flux analysis Nargund et al.  were able to elucidate the role of copper in cellular metabolism and found that its shortage causes ATP deficiency by disrupting the electron transport chain. This destabilizes the redox balance leading to oxidative stress and causing lactate accumulation. Since therapeutic protein production depends critically on energy metabolism this study underscores the importance of investigating the role of micro nutrients in cellular processes [88-91].
Interactomics: Using mathematical modeling and computational biology for bioprocess development
Integrated OMICS analysis allows us to approach the cellular system from a holistic perspective and is fast becoming the method of choice for designing host platforms with improved productivities (Figure 3) [53,92-95]. The main advantage of this strategy is that it can combine data from diverse OMICS platforms to generate a comprehensive model of the whole cell, incorporating different levels of control [19,96]. Variability’s  introduced due to biological, experimental and technical limitations (Figure 4) prohibit us from fully utilizing and interpreting this available information and therefore building a comprehensive dynamic model of the cell remain a futuristic goal. Since, the steady state approximation is unable to capture the dynamic nature of the cell, time course analysis which provides us with snapshots of the changes taking place in the cell remains a useful alternative.
The complex cellular network in terms of interactions can be divided into three parts namely biological networks, gene regulatory networks and metabolic networks. Formalizing these networks has been done using different data platforms (Boolean networks , modeling using coupled differential equations  and petrinets ) making it harder to integrate these models. Integrated dynamic flux balance analysis has also been useful in integrating the regulatory, signaling and metabolic networks . These methods were originally designed for E. coli analysis  and are now being adapted for CHO cells. Thus Tebaldi et al.  have developed a R/Bioconductor package which contains almost all the statistical tools for pair-wise comparison of transcriptome, translatome and proteome. Karra et al.  developed a hybrid model combining a one-dimensional population model with an average cell model to explain the effect of culture media on the dynamics of protein production. Bayrak et al.  used an agent based model combined with a transport model to study the effects of inoculum cell density; glucose feed and dissolved oxygen concentration on viability and viable cell count. With the growing amount of OMICS data now available for CHO cells many of the models developed for other complex systems will now find applications in this area. The main problem however is the merging of data platforms especially since there is huge variability in the dimensionality of the data. However given the multiple levels of regulation occurring in CHO cells the interactome becomes the central core of any modeling exercise. In silco prediction using these models could then provide us with a rational basis for cell design.
The emergence of powerful analytical techniques in the past decade has generated a wealth of information on CHO cells. We no longer see the host expression system as a “black box” rather we can look inside the cell and study its dynamics in real time. This data along with the mathematical models allows us to predict host cell performance in a bioreactor. More importantly these studies provide us with leads on the cellular modifications that need to be carried out to design improved expression platforms. This process has accelerated in the past few years with the emergence of powerful genome engineering technologies which allow multiple targeted gene knock-ins and knock-outs like CRISPR/Cas-based RNA-guided DNA endonucleases, Zinc-Finger Nucleases (ZFNs) and Transcription Activator-Like Effector Nucleases (TALENs). These technologies will achieve their true potential only when we develop a clear understanding of cellular regulation and hence discover the rational basis for cell engineering.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals