alexa Identification of Protein Biomarkers for Diabetic Retinopathy using Sequence Mining Techniques

ISSN: 0974-276X

Journal of Proteomics & Bioinformatics

Reach Us +1-217-403-9671

Identification of Protein Biomarkers for Diabetic Retinopathy using Sequence Mining Techniques

Ratnagiri Devarapu1,2, G Murali3 and Hanuman Thota4*
1S R K R Engineering College, Bhimavaram, India
2Department of Computer Sciences and Engineering, Acharya Nagarjuna University, Guntur-522510, India
3KKR & KSR Institute of Technology and Sciences, Guntur, India
4VR Siddhartha Engineering College, Vijayawada, India
*Corresponding Author: Hanuman Thota, VR Siddhartha Engineering College, Vijayawada, India, Tel: +91-9849158545, Email: [email protected]

Received Date: Feb 26, 2018 / Accepted Date: Apr 09, 2018 / Published Date: Apr 16, 2018


Bioinformatics and sequence mining are the application and development of data mining techniques to solve problems by comprehending biological data. Sequence analysis is the most primitive operation in sequence mining techniques. Modern sequence mining research is specialized in analyzing sequential patterns which are relevant and distinct from one another and utilizing retrieved sequences similarity and distance between different protein sequences can be analyzed. Diabetic retinopathy is the major cause of blindness in individuals mostly adults with diabetes and is it is the common problem of diabetes mellitus across the world. Various research analyses stated that there are many proteins which are found to take part in diabetic retinopathy. In this paper, we have evaluated certain proteins which are closely related with diabetic retinopathy with the help of multiple alignment tool viz. Clustal Omega and obtained a phylogenetic tree of 28 protein sequences gathered from National Center for Biotechnology Information (NCBI). In this work data mining technique called sequence mining plays a significant role in providing phylogram obtained with Neighbor-Joining algorithm. From the phylogenetic tree it was recognized that cortistatin, vitamin-D receptor and somatostatin proteins has close connection with diabetic retinopathy. Molecular docking studies have also been performed which is the most extensively used method for the calculation of protein-ligand interactions. In silico docking studies indicated that four inhibitory compounds i.e. Quercetin, Kaempferol, Naringenin and Melicitrin interact with aldose reductase which also found to have role in diabetic retinopathy. Outcomes infer that techniques intended to standardize cortistatin, vitamin-D receptor and somatostatin activities be of huge advantage and provide benefit in inhibiting diabetic retinopathy.

Keywords: Bioinformatics; Data mining; Sequence mining; Diabetic retinopathy; Docking; Neighbor-Joining Algorithm


Bioinformatics uses advancements in the area of computer science, information technology and communication technology to solve complex problems in life sciences and biotechnology. Data mining and data warehousing have become major issues for biotechnologists because of the growth of biological information in the form of protein sequences, Protein 3D structures, metabolic pathways databases, genomes of a number of organisms and biodiversity related information. The advancements in information technology particularly the use of internet play a significant role in gathering and accessing the ever increasing information in biology and biotechnology. Bioinformatics is the combination of biology computer science and information technology which has become an integral part of research and development in a variety of areas like functional genomics, proteomics, drug discovery and pharmacogenomics. It is evident that bioinformatics has major role in a number of issues like biodiversity and environmental change, for instance, climate change is because of the release of unwanted carbon dioxide gases and various other greenhouse gases due to industrial revolution which have negative impact on earth’s environment and in this case for reducing such toxic gases bioinformatics may help in sequencing microbial genome. Bioinformatics with its hold on various techniques and tools becomes common and in regular use in various research fields. Diabetes mellitus is a chronic condition which is characterized by lack of insulin as well as hyperglycemia, dyslipidemia, and neurovascular damage. It can affect any organ of the body of patients suffering with diabetes mellitus and harm their quality of life. Diabetic retinopathy is a macrovascular complication of hyperglycemia which causes blindness. It is a common complication in type-1 and type-2 diabetes. Various molecular, clinical and biochemical factors contribute to the risk of diabetes retinopathy. Different protein biomarkers, novel and traditional can help in improving primary and secondary prevention strategies for diabetic retinopathy [1]. Diabetic retinopathy can affect any individual irrespective of its racial and ethnic background. From different regions of the world, it has been reported that diabetic retinopathy in African/Afro-Caribbeans and South Asians compared to white Europeans is significantly more prevalent [2]. Sequence mining is a data mining topic used to identify patterns of ordered events within a database. Its applications in medicine eventually manifested in diseases susceptibility prediction, readmission and pharmacovigilance [3]. Sequence mining discovers meaningful sequential patterns among a large quantity of data. Data mining is an area in which computer ethics play a major role where abundant data were gathered from various sources used to study patterns. Personal privacy leaks have become serious issue in data mining while extracting valuable data and also at the same time preventing personal information leaks, thus various techniques need to be developed to stop it [4]. Molecular docking is highly used method to calculate protein-ligand interactions and AutoDock is computer software used globally for the same purpose [5]. Docking of lead compounds into the binding site of aldose reductase protein and estimating binding affinity plays a crucial role in structure based drug designing process. There are several interlinked biochemical mechanisms involved in diabetic retinopathy which include increase in actions of aldose reductase leads to abnormal level of glucose through pylol pathway, formation of glycated proteins after exposure to excessive sugars (AGEs), forming of Reactive Oxygen Species (ROS), reduction in formation of Endothelial Growth Factors (eNO), etc. [6-13]. These mechanisms also activate VEGF which results in damaging Blood Retinal Barrier (BRB), accumulation of fluid in the macula (DME) and formation of new blood vessels causing Proliferative Diabetic Retinopathy (PDR). There are several growth factors related to diabetic retinopathy as a combination of angiogenic stimuli [14,15]. Various techniques have been utilized to gauge blood flow in retina of individuals with diabetes. Despite some discrepancies between studies, in general individuals with a short duration of diabetes (less than 5 years) show a narrowing of the retinal arteries and retinal blood flow is reduced [16]. Animal cell culture studies disclose that impaired growth factor support, intense oxidative/nitrosative stress, and its downstream effectors playing roles in diabetic retinopathy pathogenesis. Evidence for important role of the downstream effector of free radical and oxidant-induced DNA injury, poly (ADP-ribose) polymerase activation, is emerging [17]. Diabetic retinopathy is a vascular disease characterized by changes in retinal capillary bed, mostly changes occur in the inner nuclear and outer plexiform layers. Selective loss of pericytes, the retinal capillary cells that contain abundant smooth-muscle actin and have a contractile function, thus, regulating retinal capillary blood flow, is a characteristic lesion that occurs early in the histopathology of diabetic retinopathy [18]. Aldose reductase is the rate-limiting enzyme of the polyol pathway that converts glucose to fructose which plays a major role in diabetic retinopathy by inducing retinal lesions including blood retinal barrier beak down, loss of pericytes, neuroretinal apoptosis and glial reactivation and neovascularization-events that are associated with diabetic retinopathy. In different animal studies it has been reported that rats administered with aldose reductase inhibitors prevented basement membrane thickening pericyte loss, and development of microaneurysms in the retinal capillaries [19]. However, results came from clinical trials showed that modification in concentrations and activities of some other protein and enzymes also has role in diabetic retinopathy pathogenesis. In some studies it appears that vitamin-D inhibit vascular smooth muscle cell growth in vivo because of its antiproliferative activity and its inadequacy leads to retinopathy in patients with type-1 and type-2 diabetes. Vitamin-D receptor is an active form of vitamin-D and it is highly expressed in human tissues which includes retina, thus it is regarded as the candidate gene associated with diabetic retinopathy [20]. Cortistatin (CORT) is a neuropeptide which is structurally similar to somatostain. It is reported that the concentration of somatostatin (SST) in vitreous fluid is higher than in plasma in non-diabetic patients and lower SST intravitreous concentration has been detected in PDR and diabetic macular edema patients. From this, it can be drawn that SST could be a natural angiogenic inhibitor in the vitreous fluid and shortage of intravitreal SST could be related with retinal neovascularization [21]. The present study is focused on evaluation of proteins/enzymes which take part in the development of diabetic retinopathy using sequence mining techniques and other tools like multiple sequence alignment using Clustal Omega.

Materials and Methods

We have fetched 28 genes that are known to participate in the diabetic retinopathy pathogenesis (Table 1) and protein sequences of these genes were then obtained using FASTA (protein and DNA sequence alignment software package) from NCBI (National Center for Biotechnology Information, The obtained sequences were stored in a separate word document file and were then transferred to Clustal Omega ( for multiple sequence alignment (it is the alignment of two or more sequences of DNA/RNA or proteins from which sequence homology can be deduced). After obtaining result from Clustal Omega, we have performed phylogenetic analysis which shows sequences shared evolutionary origins and distance between proteins were observed. In silico studies were also performed by docking of 4 lead compounds i.e. Quercetin, Kaempferol, Naringenin, Melicitrin (Figure 1), into the binding site of aldose reductase protein using AutoDock 4.2. The structural interactions between PDB ( with these four inhibitors were docked separately. X, Y, Z coordinates of PDB were selected by using SPDBV [22].

S. No. Gene Protein Length Accession
1 ACE Angiotensin-converting enzyme 739 aa AAH36375
2 ADIPOQ Adiponectin 244 aa AAH96310
3 AGER Advanced glycosylation end product-specific receptor 404 aa AAH20669
4 AKR1B1 Aldo-keto reductase family 1, member B1 (aldose reductase) 316 aa AAH00260
5 ALDRL2 aldehyde reductase 325 aa AAB92369.1
6 ANGPT2 Angiopoietin 2 496 aa AAI26203.1
7 AOC3 Amine oxidase, copper containing 3 (vascular adhesion protein 1) 763 aa AAH50549.1
8 CORT CORT protein, partial 122 aa AAH40034.1
9 CRP C-reactive protein isoform 1 precursor 224 aa NP_000558.2
10 CTGF Connective tissue growth factor 349 aa AAH87839.1
11 GCNT1 Glucosaminyl (N-acetyl) transferase 1, core 2 (beta-1,6-N-acetylglucosaminyltransferase) 428 aa AAI09103.1
12 HGF Hepatocyte growth factor (hepapoietin A; scatter factor) 728 aa AAI30285.1
13 IGF1 insulin-like growth factor I isoform 4 preproprotein 153 aa NP_000609.1
14 ITGA2 integrin, alpha 2 (CD49B, alpha 2 subunit of VLA-2 receptor) 1181 aa AAM34795.1
15 MTHFR MTHFR protein 73 aa AAH18766.1
16 NFKB1 NFKB1 protein, partial 550 aa AAH33210.1
17 NOS2A Nitric oxide synthase 2, inducible 1153 aa AAI30284.1
18 NOS3 Nitric oxide synthase 3 (endothelial cell) 1203 aa AAH63294.1
19 PGF Placental growth factor 170 aa AAH01422.1
20 PRKCB1 Protein kinase C, beta 673 aa AAH36472.1
21 RAGE RAGE protein 231 aa AAH53536.1
22 SST Somatostatin 116 aa AAH32625.1
23 TGFA transforming growth factor alpha 160 aa AAA61159.1
24 TIMP2 TIMP metallopeptidase inhibitor 2 220 aa AAH71586.1
25 TNC tenascin C 2201 aa CAI15110.1
26 TNF tumor necrosis factor (TNF superfamily, member 2) 233 aa BAE78639.1
27 VDR VDR protein 473 aa AAH33465.1
28 VEGF vascular endothelial growth factor 191 aa CAI19965.1

Table 1: 28 genes obtained from NCBI which are in close association with diabetic retinopathy with amino acid length and accession numbers [23].


Figure 1: Phylogram constructed using Neighbor-Joining Algorithm with Clustal Omega.


Various proteins which are involved in diabetic retinopathy disease pathogenesis were analyzed using multiple sequence alignment and we have constructed a score table (Table 2) of different proteins which are closely related to diabetic retinopathy. From Clustal Omega, we have obtained a pylogenetic tree using the gathered data (FASTA sequences of proteins) and it revealed that VDR, CORT and SST are the three proteins with minimum distance suggesting a dominant role of them in diabetic retinopathy when compared to other 25 proteins studied. This phylogenic tree indicated that cortistatin, somatostain and vitamin-D receptor proteins has close relation (Figure 2) and plays a significant role in pathogenesis of diabetic retinopathy. It is evident that proteomics study in combination with the sequence mining and multiple alignment tools are useful for accurate prediction of biomarkers as new therapeutic targets which are associated with diabetic retinopathy. The natural compounds selected for molecular docking have some collective structural features. All the lead compounds showed good binding energy and also exhibited interactions and better lower free energy values, indicating more thermodynamically favored interaction. The compounds melicitrin and quercetin exhibited binding energies of -10.52 Kcal/mol and -9.45 Kcal/mol respectively with melicitrin interacting Arg17, Arg268 and NAP1318 and for quercetin Leu212. Compound kaempferol interacts with Arg217, Glu229 and Leu212 with binding energy of –8.86 Kcal/mol and naringenin interacts with Gly213 with binding energy of -8.76 Kcal/mol. This study indicates all the four natural compounds interact with ALR2.

Test compounds Interacting amino acids Binding energy, ΔG (Kcal/mol) Dissociation constant (kI) (nM)
Quercetin Leu212(2) -9.45 117.72
Kaempferol Arg217, Glu229, Leu212 -8.86 317.8
Naringenin Gly213 -8.76 381.94
Melicitrin Arg217, Arg268(2), NAP1318 -10.88 10.52

Table 2: Experimental activities and predicted values by Lamarckian Genetic Algorithm dockings of the four compounds.


Figure 2: 3D structures of Quercetin, Kaempferol, Naringenin and Melicitrin.


From the literature study, it can be concluded that diabetic retinopathy is a complex process in which several cytokines, growth factors, and free radicals play a vital role. By the analysis of phylogram, it is observed that proteins with minimum distance are cortistatin, VDR and somatostain. Researcher have found that cortistatin mRNA has significant role in immune tissues like monocytes, macrophages and dendritic cells and also found that insulin secretion was inhibited by cortistatin does not affect glucose levels during physiological processes. According to Hernandez C, et al., patients suffering with PDR showed intravitreous cortistatin levels low when compared to non-diabetic patients and higher cortistatin levels in vitreous fluid than in the plasma. No plasma and vitreous cortistatin concentrations relationship suggests a possible role in retinal homeostasis [23,24]. Generally increase in hyperglycemia increases the activity of aldose reductase which in turns set off a series of events which causes enhanced iNOS, VEGF, PIGF, and free radicals expression. Several clinical trials were carried out by different scientists but the results were diverged whereas isolation of vascular endothelial growth factors (VEGF) and its angiogenic activity that its expression was increased in hypoxia made it a supreme candidate. But VEGF antagonists showed limited beneficial actions suggest changes in the concentrations and activities of other proteins and enzymes and various growth factors likewise assume a noteworthy part in diabetic retinopathy. Molecular docking is widely used method for calculating protein-ligand interactions. AutoDock 4.2 uses binding free energy assessment to assign the best binding conformation and docking studies are commonly performed for predicting binding modes to proteins and their binding energies of ligands. Molecular docking helps in structure based drug designing process and docking of other inhibitory compound related to diabetic retinopathy is fruitful for designing new drugs for its therapeutic purpose. The current multiple alignment study propose that a close association exists between cortistatin, somatostatin and vitamin-D receptor proteins in diabetic retinopathy, thus multi eclectic approach is needed to fight diabetic retinopathy. This study strongly suggests further research on these proteins related to diabetic retinopathy and various other genes closely involved in diabetes and molecular docking studies can be helpful in curing diabetic retinopathy by designing new drugs to counterattack it.


Diabetes mellitus is major health problem and has severe damaging effects on ocular health. Sequence mining is one of the exceptional techniques of data mining. Sequences usually occur either in partial or total order, for example, nucleotide sequences in amino acids and detecting of such recurrent subsequences is useful in biological studies. Lots of researches has been focused on sequence pattern mining for the growth of different algorithms and certain domains mainly biotechnology [25]. Sequence pattern mining assists in extracting sequence of most recurrent behavior in sequence database. Apart from life sciences domain sequence pattern mining is also used in business organizations for analyzing behavior of different customers [26]. Sequences contain most useful primary data with the information of proteins [27]. Two-step sorbitol pathway has been widely studied for diabetes complications while continuous pharmacological studies in animals showed inception and growth of ocular complications including keratopathy, retinopathy and cataract [28]. Diabetic retinopathy is a microcirculatory disease of retina and several evidences indicates that in diabetic retinopathy retinal neurodegeneration is an early encounter which engages in microcirculatory abnormalities that occurs in diabetic retinopathy [29]. Two different genes encoded the somatostain and cortistatin precursors evolved from gene of common ancestor by duplication mechanism. Somatostatin and cortistatin has their influence on glucose homeostasis, insulin secretion and resistance [30]. Hyperglycemia leads to the onset and progression of diabetic retinopathy and its role as a leading factor in the pathogenesis of DR has been confirmed in two largest prospective, multi-center, randomized clinical trials i.e., the Diabetes Control and Complications Trial (DCCT) and the U.K. Prospective Diabetes Study (UKPDS) [31]. Vascular endothelial growth factor when targeted gives remarkable results in treatment of diabetic retinopathy. Several trials confirmed that anti-VEGF agents when administered intraocularly showed better outcome when compared to laser therapies. Hyperglycemia leads to the production of higher levels of methylglyoxal because of increased glycolysis. These methylglyoxal in turns activates matrix metalloproteinases which facilitates an increase in vascular permeability. Therefore, fluid leakage chances increases in the surrounding retinal tissue causing macular edema and visual loss. Hyperglycemia induces abnormal metabolism resulting in excess production of free radicals. The accumulation of ROS can lead to oxidative stress, which damages the tissue in and around retinal vessels, ultimately resulting in DR [32]. Several studies have reported conflicting results regarding a potential association between AKR1B1 and DR [33]. eNOS is involved in regulating vascular tone by inhibiting smooth muscle contraction and platelet aggregation [34]. However some studies reported no significant relation between eNOS polymorphisms and DR in type-2 diabetes. In reference to the literature study and obtained results it is necessary to develop new traditional and advanced drugs which maintain levels of cortistatin, vitamin-D receptor, somatostain and several other proteins.


Ratnagiri Devarapu, Dr. G. Murali and Dr. Hanuman Thota participated in the design of the study, interpretation of the results and prepared manuscript using Sequence Mining Techniques in bioinformatics aspect. All authors read and approved the final manuscript.


Citation: Devarapu R, Murali G, Thota H (2018) Identification of Protein Biomarkers for Diabetic Retinopathy using Sequence Mining Techniques. J Proteomics Bioinform 11:094-098. DOI: 10.4172/jpb.1000472

Copyright: © 2018 Devarapu R, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Select your language of interest to view the total content in your interested language

Post Your Comment Citation
Share This Article
Relevant Topics
Article Usage
  • Total views: 1302
  • [From(publication date): 0-2018 - Jan 17, 2019]
  • Breakdown by view type
  • HTML page views: 1215
  • PDF downloads: 87

Post your comment

captcha   Reload  Can't read the image? click here to refresh
Leave Your Message 24x7