alexa RNA Interference Off-target Screening using Principal Component Analysis | OMICS International
ISSN: 2153-0602
Journal of Data Mining in Genomics & Proteomics
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

RNA Interference Off-target Screening using Principal Component Analysis

Jakob Müller* and Michael W. Pfaffl

Physiology Weihenstephan, Technische Universität München, Research Center for Nutrition and Food Science, Weihenstephaner Berg 3, 85350 Freising, Germany

*Corresponding Author:
Jakob Müller
Physiology Weihenstephan
Technische Universität München
Research Center for Nutrition and Food Science
Weihenstephaner Berg 3, 85350 Freising, Germany
Tel: +498161713867
Fax: +498161714204
E-mail: [email protected]

Received date: March 07, 2012; Accepted date: June 13, 2012; Published date: June 17, 2012

Citation: Müller J, Pfaffl MW (2012) RNA Interference Off-target Screening using Principal Component Analysis. J Data Mining Genomics Proteomics 3:116. doi:10.4172/2153-0602.1000116

Copyright: © 2012 Müller J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Data Mining in Genomics & Proteomics


Off-target effects remain the major problem in any RNAi-knockdown application. Casting cell culture loss-of-function studies evaluated by heat map and principal component analysis (PCA) we realized that the PCA derived plots can clearly visualize off-target effects. Due to the inexistence of off-target effects in our cell culture model we created an in silico data model in order to demonstrate how PCA can be utilized therefore. With the presented in silico modulation it is possible to simulate the impact of various treatments on changing gene expression. Known effects caused by drug treatment or by inserted knockdowns could be clearly separated from unknown off-target effects. By creating various randomized gene expression data sets we demonstrate that PCA can assign more effective an off-target effect compared to a heat map gene regulation pattern.


RNAi Off-target Screening; Principal Component Analysis; Heatmap; in silico Data Modelling; Virtual RT-qPCR


Cq: Quantification Cycle; CV: Coefficient of Variation; EGCG: Epigallocatechin Gallate; HCA: Hierarchical Cluster Analysis; PCA: Principal Component Analysis; RNAi: RNA Interference; RT-qPCR: Reverse Transcription Quantitative Real-time Polymerase Chain Reaction; siRNA: Small Interfering RNA


RNA interference (RNAi) is nowadays a common used technology for gene silencing. It is applied for therapeutic aims or identification of drug targets as well as for basic research on gene function. The still unsolved major problem with this technique is the potential to cause unspecific off-target gene regulations in the treated cell culture or organism. These off-target effects are well described in many studies [1-3]. Since its discovery [4] and utilisation [5,6] RNAi has become an easy tool to use in cell culture assays. Compared to knockout based loss-of-function studies RNAi-knockdowns can be applied easily as a combined application beside a drug treatment. The reaction of the cell culture model to the treatments can subsequently be assessed on RNA level. Conventionally expression proofing studies are performed by quantitative real-time RT-qPCR or hybridization arrays. To visualize the differential gene expression changes hierarchical cluster analysis (HCA) or heat maps are very common [7]. Effects of a single treatment, like a siRNA-sequence, on a very broad range of genes can be visualized that way. An alternative method of visualizing complex datasets is the principal component analysis (PCA). This statistical and visualization tool reduces the dimensionality of a dataset consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the dataset [8].

Using an adenoviral based RNAi knockdown-model, we casted a loss-of-function study (data not shown) identifying the influence of a plant secondary metabolite (EGCG) under various gene knockdowns in an immunological signalling pathway. The outcome of the experiments was among others analyzed by PCA. Doing this we observed by chance a synergetic side effect exclusively in those of our treatment groups in which one particular knockdown was combined with the drug appliance. These findings led to the idea that if we can separate the roots of an effect applying PCA we can use it vice versa and isolate RNAi originating gene regulations from drug effects and this means off-target screening.

For an approval of our assumption we tried to find an off-target gene regulation in our cell culture model caused by one of the viral induced siRNA-knockdowns we had in stock. But without access to genome wide transcriptional array results no off-target effect was found. Hence we decided to use in silico data modelling to demonstrate how PCA can be utilized beneficially for RNAi off-target screening.


Data modelling

The underlying dataset was justified to the gene expression data output of our standard cell culture assays. Thereby the extracted RNA initially undergoes a RT (Reverse Transcription) with subsequent qPCR (quantitative real-time polymerase chain reaction) gene expression analysis. Accordingly the layout of the data model was designed. It consists of eight treatment groups (a-h) each containing four replicates, leading to 32 samples (Table 1). The eight groups were divided into two sections, the drug-treated groups (b,d,f,h) and the media control treated groups (a,c,e,g). The sections in turn were subclassified in four species each one characterized by a particular gene knockdown-combination. The four knockdown-combinations were composed from two target knockdowns (named “kd-I” or “kd-II”) and one knockdown-control (named “kd-c”). The two treatment groups and the four knockdown-groups were combined pair wise (Table 1).

Knockdown-combination kd-c kd-I kd-II kd-I&II
Control (cell culture media) a c e g
Treatment (drug appliance) b d f h

Table 1: Knockdown-assay layout.

For each of the 32 samples a dataset of 21 genes was rendered using our self designed spreadsheet template (Microsoft Office Excel 2007, Additional file 1). The 21 genes where group-wise attributed to a supposed class of gene regulation. The classes differ in how genes respond, either to drug treatment or knockdown-appliance. Four of these classes where created: (class I) stabile expressed reference genes, (class II) target-genes knocked down by RNAi, (class III) genes regulated (up or down) by the drug treatment, (class IV) genes influenced by the drug treatment and additionally bearing a RNAi-offtarget effect (Figures 1 and 2). The in silico expression data based on Cq- values of every single sample for every individual gene was created by our random algorithm (Microsoft Office Excel 2007, Additional file 1). The Cq-value represents the number of cycles that are required by the fluorescence signal of a single sample in a qPCR-experiment to cross a defined threshold [9,10]. All factors which influence the Cq-value of a sample were randomly generated: the expression level of each gene, its variation coefficient, the range of up- or down-regulation, and the range of on-target or off-target knockdown. The data model delivers a dataset of 21 x 32 values (genes in columns x samples in rows). This in silico modelled dataset represents the raw data outcome of a corresponding RT-qPCR gene expression analysis obtained from our standard cell culture assays. For all plots presented the same raw dataset was used (Additional file 2). All variables influencing the data were completely randomized, but expression ranges, gene regulations and the inherent variability were chosen to be realistic based on the knowledge about gene expression we gained from our cell culture assays. The range of the gene regulation here was set from 1-10 folds, the knockdown-efficiency between 45% and 95% and the applied CV-value was justified to 35% of the emulated gene regulation. In order to evaluate the quality of the raw data the distribution of the 32 Cq -values (32 samples) corresponding to every gene is plotted in a virtual RT-qPCR (Figure 1).


Figure 1: Cq-values plotted as virtual RT-qPCR. The one random output of our simulation which underlies all graphs in this work represents a raw dataset from a RT-qPCR gene expression analysis of 21 genes. The 32 Cq-values (originating from the eight different application combinations multiplied by the four replicates) are plotted gene wise on a line, like fluorescence-signals crossing a threshold in an in vitro RT-qPCR reaction. The random data distribution and within the emulated CV-value gets visible best in class I genes (unregulated reference-genes). Particular regulation effects become apparent by the separate clusters among the 32 data points of a gene.


Figure 2: Knockdown-evaluation from the in silico data set. Relative expression levels of simulated genes after knockdown and/or drug appliance calculated by the ΔΔCq-method plotted against a logarithmic ordinate. Net knockdown-effects are exposed by normalizing data to knockdown control samples. The 21 genes are arranged in four classes: (class I) reference genes (Ref-1, Ref-2, Ref-3); (class II) knockdown-targets (KD-1+, KD-1-, KD-1, KD-2+, KD-2-, KD-2); (class III) drug targets (G1-up, G2-down, G3-no, G4-up, G5-down, G6-no); (class IV) drug targets with an additional off-target effect (OT1-up, OT1-down, OT1-no, OT2-up, OT2- down, OT2-no). The additional prefix specification “+” or “-“ indicated at class II genes stands for an additional synergetic gene regulation effect which occurs when knockdown appliance is combined with drug appliance. The additional prefix specification “-up”, “-down” or “-no” indicated at class III and IV genes stands for the direction of the predestinated gene regulation. For every gene eight expression levels are given corresponding to the eight treatment groups: a - h (Table 1).

Data processing

Gene expression analysis: The raw data was analyzed under the terms of the ΔΔCq-method [11], and then relative expression values (shown in percentage) were plotted gene-wise on a logarithmic ordinate.

Hierarchical cluster analysis: Heat maps of the modelled data were plotted in GenEx software (version, MultiD, Sweden) using the ΔCq-values for each sample corresponding to the gene set [12].

Principal component analysis: The data processing using principal components analysis (PCA) was as well performed using GenEx software. Therefore the ΔCq-values for each sample corresponding to the gene set were calculated and then pasted into the PCA-algorithm input box. According to the aim of the PCA (results and discussion) the data was mean centered to columns or transposed and then mean centered to rows [7].

Results and Discussion

For this work we designed a data emulator based on the layout of our cell culture assays. Every time used it delivers a unique dataset. Within this emulator the range of the RT-qPCR data constituting parameters can be set up. Subsequently the values are rendered randomly (Additional file 1). In order to show that this data is authentic compared to a cell culture experiment we plotted the raw data (Cq-values) in a virtual RTqPCR (Figure 1). When this data is normalized to the control samples of the knockdown-treatments (group’s a and b) the net knockdowneffects get visible (Figure 2). As a result this plot shows explicitly the knockdown-effect which weighs on a particular gene. The six genes in class II are all predestinated to be knockdown-targets by the simulation. Thus they show down-regulation after the normalization used in this plot. All other genes are no knockdown-targets. But nevertheless some of them show a significant down-regulation. These are the genes from class IV which bear an off-target effect. Slight regulations on the remaining genes (classes I and II) emerge from the random variation coefficient integrated in the data simulation. We inflicted additional gene regulations on some of the class II genes (Figure 2). The additional prefix specification “+” or “-” at the gene name indicates a synergetic effect on the gene regulation when the RNAi-knockdown and the drug appliance are inserted simultaneously. We include this special scenario, because it is a well known phenomenon in in vitro or in vivo models as we experienced it from our own cell culture assays (data not shown). The same gene expression data is plotted in a heat map in Figure 3. In this visualization mode the ΔCq -values are directly used without any further normalization to control samples.


Figure 3: Heat map of relative expression data. ΔCq-values (normalized against reference gene means) for all 32 samples corresponding to the 21 genes are plotted in GenEx software. Green indicates a relative up-regulation, red a relative down-regulation. The same dataset as in Figure 2 was applied.

The randomly created dataset plotted in Figures 2 and 3 contains some special cases by chance. In the heatmap (Figure 3) we realize a separation between strong knockdown-effects evoked either by knockdown kd-I or knockdown kd-II. However only the genes influenced by knockdown kd-I can be globally separated clearly (KD-1, KD-1+, KD-1-, OT1, OT1-up, OT1-down). kd-II influenced genes fractionally merge undefined between red and green (KD-2, KD-2-, OT2-no). This happens due to the by chance relatively low knockdown-originated regulations on these genes. So, how can I adjust the undefined genes in order?

Therefore we additionally utilized PCA. A plot from a PCA will always have as much data points as the underlying dataset has rows. In every single data point you can include as much resources as you are interested in by integrating more or less columns of a dataset. In our case this means to consider more or less genes measured from one RNA-sample. Each gene will contribute its impact to the sample and influence its position in the PCA-plot. If a gene is strong regulated its impact is high. On the other hand a low regulated gene like a reference gene has poor impact on a sample. If such a gene (column) is included the position of a data point (sample) will not move in the PCA-plot. Figure 4a/b uses the same dataset as Figures 1, 2 and 3. In the plots from Figure 4a/b the data was sorted as rows = samples. With the data arranged like this the PCA algorithm regards the impact which lies on each of the 32 samples. On the algorithm level that means the dimension of the collective gene regulation corresponding to a single sample is expressed as a vector. The resulting PCA-plot then simplifies the multi dimensional dataset and adjusts it to the biggest vectors. That way the major gene regulations are exposed in a two dimensional plot. The way one single gene can influence the plot from a PCA gets clear in Figures 4a/b. While in Figure 4a we can recognize two clouds, there are four clouds in Figure 4b. This is due to the impact that is load on the data points (samples) by integrating more or less resources (genes). In Figure 4a the PCA is cast only with genes that explicitly bear no knockdown-effect (classes I & III). As a result the data points cluster only in accordance with media (circles) or drug (triangles) appliance. In Figure 4b there is only one more column (only one class IV gene: OT2up) included into the dataset. This single load has enough impact to split the two clouds into four. The knockdown lying on OT2up represents a second vector of regulation. For those of the samples in Figure 4b which are treated with the knockdown kd-II the PCA algorithm exposes another vector because the provided effect (an offtarget effect) is strong enough. For a screening that means if a gene set we want to analyze is not harassed by any off-target effects the PCA plot will remain in a treatment and a control cloud.


Figure 4: PCAs of the in silico data set from Figures 1,2 and 3. (a) PCA of all 32 samples (columns = genes, rows = samples) embracing exclusively genes of classes I and III. The samples cluster in two clouds according to media control (circles) or drug treatment (triangles). (b) PCA from the same data pool as in Figure 4a but including the class IV gene OT2-up. The incorporation of OT2-up divides the two clouds from Figure 4a by separating the samples which bear the off-target effect. (c) PCA with data transposed columns by rows including the complete dataset (32 samples x 21 genes). Genes of class II are indicated as squares (kd-I) or triangles (kd-II). Circles indicate genes from the three remaining classes. (d) Plot Identical with Figure 4c, but layout adjusted: gene names are given and class IV genes (with off-target effect) are now labeled in color according to the knockdown they originate from.

But we can go deeper into data analysis and retrieve more information about the gene regulation. In Figure 4c the PCA casted with the entire dataset is plotted. The major difference is that the data now is transposed columns by rows. In consequence every data point represents one gene including all 32 values from the corresponding samples. Three different kinds of symbols were attributed. Genes which are designated as knockdown-targets (class II) are labeled with squares (kd-I) or triangles (kd-II). All other genes (classes I, III & IV) are labeled as circles. The result is a pattern of the data branching into two directions (or vectors). In our model we have the advantage that we know which genes are additionally charged by an off-target effect (class IV genes). In an off-target screening this information would be the task. If we now label these genes with the color for the knockdown they bear we get a very clearly terminated evaluation (Figure 3d). Genes having an off-target effect from the knockdown kd-I cluster to the branch of kd-I target genes and accordingly behave genes with a kd-II originating off-target effect. This outcome is what we can use to screen for off-target effects using PCA. Genes clustering in a cloud containing a knockdown target are likely influenced by an off-target effect. They should be exclusively checked for off-target effects in an extra RTqPCR experiment.

But where is the advantage compared to a heat map analysis? The trouble in any case is when weak regulated genes shall be assigned. At that point the vectors in a PCA plot provide us extra information. A very low regulated gene in our dataset is OT2-no. In addition to that the two knockdown targets KD-2 and KD-2- are only weakly downregulated by the knockdown kd-II (Figures 2 and 3). If we look at these genes in the PCA-plots of Figure 4c/d we realize that this evaluation locates these genes in direction towards the cloud of genes affected by kd-II. Of course OT2-no merges slightly with the cloud of generally low regulated genes, but it settles at the border towards the kd-II containing cloud and explicitly not towards the kd-I influenced genes. In the heatmap these genes cannot be attributed to any treatment group and fade between other low regulated genes. At that point PCA can clearly provide more information about a gene regulation pattern than the HCA alone.

By using virtual RT-qPCR data modelling it is easy to create more similar datasets to demonstrate this evaluation method. For your concern we posted additional twelve PCA-plots online, even with extreme values for the regulative parameters or the CV-value (Additional file 2).


PCA in its innate manner takes that point of view on a dataset that visually maximizes the variation present in a dataset. As a result PCA cannot distinguish information which is not present in a dataset. But the task for the scientist is to pick the most vivid visualization mode for that what was measurable out of the processes which took place on biological level. Analyzing off-target effects using HCA and PCA gives us the advantage that a side effect can be easily attributed to its origin. In the PCA-plot the knockdown targets will cluster with genes regulated by the same cause. In a heat map only strong effects can be identified clearly. Weak regulations will fade between red and green. In addition the PCA assigns a direction to every source of gene regulation. Even when an effect is weak it will cluster towards that cloud of genes which are regulated by the same originator. Separated that way transcript expression patterns can efficiently be analyzed for siRNA specific gene regulations which means off-target screening in a very new mode.

Competing Interests

The authors declare that they have no competing interests.

Authors’ Contributions

The idea of utilizing principal component analysis for RNAi-off-target screening was devised by JM. The data modelling spread sheet was designed by JM. The manuscript was drafted by JM and supervised by MP.


This study was supported by a grant from the Vereinigung zur Förderung der Milchwissenschaftlichen Forschung an der Technischen Universität München e.V. I vigorously thank Katrin Danowski for basic and Dr. Ales Tichopad for advanced guidance with the GenEx software.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11981
  • [From(publication date):
    October-2012 - Jul 23, 2018]
  • Breakdown by view type
  • HTML page views : 8185
  • PDF downloads : 3796

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2018-19
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

+1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals


[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

© 2008- 2018 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
Leave Your Message 24x7