Reach Us +1-504-608-2390
A Review of Important Discontinuous B-Cell Epitope Prediction Tools | OMICS International
ISSN: 2155-9899
Journal of Clinical & Cellular Immunology

Like us on:

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

A Review of Important Discontinuous B-Cell Epitope Prediction Tools

Michelle Mukonyora1,2*
1Biotechnology Platform, Agricultural Research Council, Private Bag X05, Onderstepoort, 0110, South Africa
2Department of Life Sciences, College of Agriculture and Environmental Sciences, University of South Africa, Florida, 1710, South Africa
Corresponding Author : Michelle Mukonyora
Biotechnology Platform, Agricultural Research Council
Private Bag X05, Onderstepoort, 0110, South Africa
Tel: +27125299121
E-mail: [email protected]
Received: August 17, 2015 Accepted: October 02, 2015 Published: October 09, 2015
Citation: Mukonyora M (2015) A Review of Important Discontinuous B-Cell Epitope Prediction Tools. J Clin Cell Immunol 6:358. doi:10.4172/2155-9899.1000358
Copyright: © 2015 Mukonyora M. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed, Scholar Google

Visit for more related articles at Journal of Clinical & Cellular Immunology


The identification of B-cell epitopes is imperative for the rational design of vaccines, diagnostics and immunotherapeutics. Several bioinformatics resources are freely available for the prediction of B-cell epitopes, however despite advances in recent years, they still possess limited predictive capabilities. The aim of this review is to highlight and describe the algorithms of the most widely used free B-cell epitope prediction resources. The reasons behind the limited predictive powers of these algorithms are also discussed.

B-cell epitopes; Epitope prediction; BEPro; Discotope; Ellipro; Ellipro; Seppa; Algorithms
A B-cell epitope is a collection of distinct amino acid residues on an antigen that antibodies recognize and specifically bind to, thereby activating a protective immune response [1-3]. B-cell epitopes are classified according to their orientation in space as being either linear or discontinuous. Linear B-cell epitopes are composed of contiguous residues in the primary structure [1]. On the other hand, discontinuous B-cell epitopes comprise residues remotely located in the primary structure that are brought into close proximity due to the folding of the protein [1]. Only 10% of B-cell epitopes are linear and 90% are discontinuous [3]. Since linear B-cell epitopes do in fact adopt a conformation, the distinction between linear and discontinuous Bcell epitopes is a grey area [2,4].
The identification of B-cell epitopes is key to designing more effective vaccines [5,6]. Recombinant vaccines containing either a single or multiple B-cell epitopes from different serotypes can be rationally designed in a cost, time-effective and safe manner [7]. Also, protein subunits with the structural and immunogenic properties of their whole antigens may be designed and used as therapeutic and diagnostic tools [8-10].
Limitations of Experimental B-Cell Epitope Determination Methods
Experimental methods of elucidating B-cell epitopes include monoclonal antibody (MAb)-resistant variant studies, also known as virus neutralization tests [11], peptide scanning [8,12,13] and MAbantigen contact studies [11]. Despite the successes of experimental Bcell epitope determination methods, they are laborious and not feasible when searching for epitopes on a large scale [1]. MAb-antigen contact studies, which is deemed the most reliable of the B-cell epitope determination strategies, is curbed by the limited availability of X-ray crystal structures of MAb-antigen complexes [14]. Computational Bcell epitope prediction methods have therefore been proposed as a cost and time-effective alternative to the laborious and resource-intensive classical experimental methods [15].
Computational B-Cell Epitope Prediction Methods
High-performance computers are able to execute algorithms of increasing complexity at decreasing costs and timespans. Consequently, computational methods reduce epitope prediction time by as much as 95% [16] and also have the potential to predict B-cell epitopes on a genome-wide scale [1].
Computational B-cell epitope prediction methods exploit the inherent physicochemical properties of B-cell epitopes in their algorithms [17]. B-cell epitopes tend to be more exposed to solvent than their surrounding surface-exposed residues and it is this high surface-exposure of antigenic regions that makes them highly flexible [18-20]. A high-flexibility is necessary in order to accommodate the conformational changes that take place upon B-cell epitopes binding with Abs [19]. Furthermore, it would be reasonable to assume that flexibility is a prerequisite of antigenic sites when one takes into consideration the plasticity of the complementarity determination regions (CDRs) of Abs.
Computational B-cell epitope prediction methods are broadly divided into sequence and structure-based methods as well as into linear and discontinuous epitope prediction methods. What all these methods essentially have in common is that they provide a way of correlating the physico-chemical properties of the respective amino acids to their probable location in the protein structure [1,21].
Prediction Tools for Linear B-Cell Epitopes
Propensity scale methods are the most common way by which linear B-cell epitopes are predicted and they are entirely dependent on the primary structure of the proteins [22,23]. The original propensity scale methods make use of hydrophilicity [24], secondary structure [25,26] and side-chain solvent accessibility [27] in their algorithms [23]. Modern linear epitope algorithms make use of a combination of propensity scale methods, but have been shown to only be marginally better at predicting linear epitopes [1,28]. In a similar manner to the experimental peptide scanning methods, propensity scales are not highly successful for discontinuous B-cell epitope prediction unless a given reading frame contains the amino acids that are the major determinants of the conformation of the B-cell epitope [29]. Alternatively, structure-based methods are more ideally suited for the prediction of discontinuous B-cell epitopes [22].
Prediction Tools for Discontinuous B-Cell Epitopes
Discontinuous B-cell epitope prediction methods employ various algorithms that mostly exploit a combination of structural and propensity scale-based information [22]. Some examples of discontinuous epitope prediction programmes that are a combination of structure and propensity scale-based methods are Discotope2.0, BEPro and SEPPA [4,30,31]. It has been shown that the most successful integrated methods consider amino acid composition, secondary structure and surface exposure in their algorithms [30]. There are however, some purely structure or propensity scale-based discontinuous prediction programmes that perform as well as integrated ones, namely Ellipro and Ellipro respectively [9,32].
Discontinuous B-cell epitope prediction methods require the 3-D structure of the antigen as input [33]. In cases where no structure is available, some of the programmes build homology models of the antigens and then proceed to predict B-cell epitopes from the models [32].
Devising a Discontinuous Epitope Prediction Method
Training dataset construction
The first step to devising any B-cell epitope prediction algorithm is the definition of a training dataset. Discontinuous B-cell epitope prediction methods use X-ray crystallographic information of MAbantigen complexes to train their algorithms [22]. Redundancy is removed from the training datasets by generally allowing protein families to have equal representation [18]. To avoid over-fitting the algorithm, different parts of the dataset are used for training and evaluation [18].
B-cell epitope definition and benchmark dataset annotation
In order to train the prediction algorithms, a B-cell epitope needs to be defined [22] and the various prediction methods describe B-cell epitopes differently. In the Discotope2.0 [30] dataset, B-cell epitopes are defined as those antigen amino acids that are a distance of at most 4Å from any of the Ab atoms [18]. In the BEPro training dataset, a Bcell epitope is any antigen residue that is no further than 6Å from the CDRs of the Ab chains, thereby excluding incidental contacts [4,34].
Surface exposure is another measure incorporated in the B-cell epitope prediction algorithms in order to aid in the definition of epitopes. In Epitopia , a surface amino acid is defined as any residue on a 3-D structure with a relative accessible surface area (relative ASA) greater than 0.05 [17]. For SEPPA , a residue was defined as surface exposed if it had at least 1Å2 of ASA [31]. Furthermore, a surface exposed residue was a B-cell epitope if it lost at least 1Å2 of ASA upon binding with its Ab [31]. In Discotope2.0 the upper half-sphere neighbour count measure [35] was used as a measure of surface exposure (Figure 1) [30,35].
An additional B-cell epitope definition that is part of Ellipro ’s algorithm is that of the protrusion index [32]. The protrusion index provides a simplistic way of detecting those parts of the protein that protrude from the protein’s surface. Residues with high protrusion index values are often associated with antigenic sites [20].
Discontinuous Epitope Prediction Machine-Learning Algorithms
Five discontinuous B-cell epitope prediction algorithms are discussed in this review, namely Discotope (versions 1.0 and 2.) [18,30], BEPro [4], Ellipro [32], Epitopia [9] and Seppa [31] (Table 1). These are among the most widely used and freely available discontinuous B-cell epitope prediction algorithms to date, as well as the ones suitable for the analysis of multimeric structures such as virus capsid proteins [36].
Discotope (versions 1.0 and 2.0) integrates amino acids statistics expressed as log-odds ratios, spatial information and surface exposure in its algorithm [18,30]. It is notable in that it was the first B-cell epitope prediction method (as Discotope1.0 ) to make use of both propensity scale scores and structural information in its algorithm [18]. During execution of the Discotope algorithm, a 10Å radial sphere around each residue along the antigen chain is explored for intramolecular contact residues (Figure 1). The total number of residues within the sphere is subtracted from the sum of propensity scores of those ‘contact’ residues’ [30]. Discotope1.0 is available as a standalone version, while Discotope2.0 is available as an online server (Table 1).
BEPro , formerly known as PEPITO , was initially conceived as an alternative method to Discotope . BEPro , like Discotope combines propensity scales with surface exposure information, namely the upper half sphere neighbour count measure (Figure 1). BEPro utilizes the Discotope amino acid propensity scale [18] in its algorithm, side-chain orientation as well as solvent accessibility data [4]. BEPro is available online as a part of the SCRATCH suite of programmes [37] (Table 1).
Ellipro differs from the other B-cell epitope prediction methods in that it does not require training [32]. It is based on the notion that residues that protrude from the protein surface are more accessible for Ab binding [38] and that these protruding residues can be identified by treating the protein as an ellipsoid [39]. Ellipro uses Thornton’s method [20] in combination with a residue-clustering algorithm to predict B-cell epitopes [32]. Ellipro is available as a standalone version and as an online server, which is part of the Immune Epitope Database Analysis Resource (Table 1).
Ellipro applies two machine-learning based algorithms for the prediction of B-cell epitopes from either the tertiary structure of the antigen or directly from its sequence [9]. A total of 44 physicochemical and structural-geometrical properties for structure-based prediction and 41 properties for sequence-based prediction were used to train the Ellipro algorithm [17]. The immunogenic properties used to predict B-cell epitopes from sequences naturally do not include some of the structural-geometrical properties used for structure-based prediction [17]. These properties included previously used as well as novel amino acid propensity scales. Ellipro may be used via the online server or it may be downloaded as a standalone version (Table 1).
SEPPA employs the concept of the ‘unit patch of residue triangle’ to describe the local spatial context of a protein’s surface amino acids [31]. The novel concept of ‘unit patch of residue’ is used by SEPPA to give an improved description of the local spatial context on the antigen surface. The unit patch of residue triangle is made up of any three surface residues whose respective side-lengths is less than 4Å [31]. Those unit patches containing at least two B-cell epitopes were defined as epitope unit patches, and those containing less than two B-cell epitopes were defined as non-epitope patches [31]. Epitope propensity scores are summed up for all unit patches within a 15Å radius of each residue in the antigen [40].
Limitations of Computational B-Cell Epitope Prediction Methods
Despite significant advances made in devising computational B-cell epitope prediction methods, there are still limitations to the predictive powers of their algorithms. There are therefore continued efforts to improve their performances. One of the most widely used performance evaluators for machine-learning algorithms is the area under the receiver operating characteristic curve, also known as (AUC) or (ROC) curve [41-43]. The true positive rate (TPR) is plotted on the y-axis and the false positive rate (FPR) is plotted on the x-axis, thereby illustrating how the TPR depends on the FPR [43] The TPR is also called sensitivity or recall [43]. AUC values range between zero and one [41]. A method that scores 0.5 is deemed a random discriminator and one that scores a value of one has a perfect predictive capability [22,43]. Currently, the top performing B-cell epitope prediction methods have average AUC values ranging between 0.6 and 0.7, depending on the evaluation dataset used [4,9,30,31].
Improper benchmark annotation limits the predictive ability of b-cell epitope prediction algorithms and performance evaluations
One of the major limitations to the improved performance of computational B-cell epitope prediction methods is improper benchmark annotation. Most B-cell epitope prediction methods allow for the annotation of only one epitope per antigen in their training datasets [4,9,18]. This not only excludes a large portion of known epitopes but it does not take into consideration the fact that not all Bcell epitopes on any particular antigen have been experimentally identified [4,17,30].
Another form of improper benchmark annotation is that most of the X-ray crystal structures in the training datasets consist of Abs bound to single antigen chains, yet Abs in vivo are raised against whole biological units [30]. A negative consequence of this, is that several antigen contacts that are predicted as being available for binding to an Ab are in fact involved in long-range intra-molecular interactions [30].
Improper benchmark annotation therefore not only has a direct influence on the predictive abilities of the algorithms but also on the performance measures of the methods [4,22,30]. A limitation of the AUC for B-cell epitope prediction methods is that it underestimates the predictive power of the algorithms as long as the training datasets are under-annotated [9,30]. Otherwise good predictors consequently call a number of false negatives [9,30,44].
In spite of the limited predictive powers of the respective B-cell epitope prediction methods, using a consensus of the results of the top performing methods can ameliorate these limitations [36,45]. When predicting putative novel B-cell epitopes, consensus results reduce the likelihood of false positive results and increase confidence in positive results [36].
If B-cell epitope prediction methods are to improve, there needs to be constant efforts to update the training datasets of algorithms with current epitope experimental data. The Immune Epitope Database 3.0 [46] is a valuable resource in this regard, as it currently has curated experimental data of 120,000 B- and T-cell epitopes. This is representative of at least 95% of the published epitopes as of the end of 2012 and this data is free and available to the public [47].


Tables and Figures at a glance

Table icon
Table 1

Figures at a glance

Figure 1
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Article Usage

  • Total views: 13479
  • [From(publication date):
    October-2015 - Oct 24, 2019]
  • Breakdown by view type
  • HTML page views : 9500
  • PDF downloads : 3979