A. Cenci1, L. Tavoschi1, G. D’ Avenio2, P. Narino1, S. Becattini1,6, D. Bernasconi1,7, M. Chiappi1,8, L. La Torre1, H. Sukati3, E. Vardas4, A. Lo Presti5, E Cella5, M. Ciccozzi5, O. Picconi1, P. Monini1, B. Ensoli1 and S. Buttò1*
Received Date: April 16, 2012; Accepted Date: May 16, 2012; Published Date: May 20, 2012
Citation: Cenci A, Tavoschi L, D’ Avenio G, Narino P, Becattini S, et al. (2012) Characterization of Variable Regions of the Gp120 Protein from HIV-1 Subtype C Virus Variants Obtained from Individuals at Different Disease Stages in Sub-Saharan Africa. J AIDS Clinic Res S8:006. doi:10.4172/2155-6113.S8-006
Copyright: © 2012 Cenci A. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of AIDS & Clinical Research
Background: The development of a vaccine against HIV/AIDS capable of preventing virus infection has been hampered by the HIV envelope (Env) heterogeneity that makes it difficult to induce neutralizing antibodies against Env proteins from different HIV clades. Several studies have indicated that gp120 Env protein sequence tends to change considerably during the course of HIV disease which allows the virus to escape the immune responses. In order to define gp120 sequence changes, we have characterized the V1, V2, V4 and V5 variable regions of gp120 variants from 72 HIV-1-clade-C-infected subjects from South Africa and Swaziland, which were naïve to antiretroviral (ARV) therapy and at different disease stages. Sequence characteristics, such as aminoacid sequence length, presence of Putative N-Glycosylation Sites (PNGSs) and electric charge were investigated.
Methods: According to the Avidity Index value and CD4+ T cell count, patients were classified for disease stage in three groups: recent, chronic and late stage, each one comprised of 24 patients. The V1 to V5 Env variable regions were directly PCR amplified from plasma virus RNA and sequenced.
Results: A significant increase in the amino acid sequence length of V1 and V4 domains, and a corresponding increase of the “shifting” PNGSs were observed in the HIV variants obtained from individuals at chronic stage of disease, as compared to the recent infection group. Finally, a significant increase of the net electric positive charge of the V5 loop was found in the HIV variants from the group of subjects with late disease, as compared to the chronic disease group.
Conclusion: We conclude that changes in sequence length, glycosylation pattern and net electrical charge in the variable V1, V4 and V5 regions of gp120 occur in the course of HIV infection, possibly in response to the pressure of the host immune response.
HIV-1 Subtype C; Envelope
According to 2011 UNAIDS statistics, the world’s AIDS epidemic has hit a plateau, with 2.7 million people becoming newly infected each year for the last five years . Two-thirds of all people infected with HIV live in sub-Saharan Africa with severe demographic effects and country development impact. In this geographical area, South Africa and Swaziland are among the countries most affected by HIV/AIDS. UNAIDS has estimated that in 2009 the total number of persons living with HIV in South Africa was 5.7 million, a number that makes South Africa one of the countries with the largest HIV epidemic in the world . Recent data also indicate that in South Africa incidence among men and women aged 15-49 was estimated to be 1.3/100/pyar in the period 2005–2008 . For Swaziland, the 26% HIV prevalence in the adult population found in 2006 through a national populationbased survey, is the highest ever documented . Finally, South Africa and Swaziland are also among the countries in the world, where the number of recent infections is extremely high .
The development of a vaccine capable of preventing HIV infection and reducing virus transmission and spread would be essential to reduce the burden of AIDS and AIDS-related diseases, in particular in developing countries. However, such a preventative vaccine still remains a formidable challenge. One of the obstacles to its development is the design of an immunogen that elicits potent and broadly neutralizing antibodies, capable of preventing HIV infection. The primary targets of this neutralising response are epitopes in the HIV envelope (Env) that are involved in infectivity and transmission. However, attempts to induce these types of neutralising antibodies have failed so far , due the high variability of the Env proteins, from different clades and in the same subject, during the disease that allows the virus to escape the host immune response .
Genetic diversity of HIV-1 is represented by distinct subtypes or clades, A to K, as well as by recombinants . The HIV-1 subtype C is prevalent in Sub-Saharan Africa and is widespread worldwide, being responsible of more than 52% of total HIV-1 infections [9,10].
Five loop regions of the gp120, V1 to V5, are particularly variable and play a major role in infectivity, transmission and resistance to neutralization [11-13]. Early data on HIV-1 clade B, and more recent data on clade C, have shown that the V1 region is highly variable, protects the CD4 binding-site from neutralising antibodies and may play a role in infectivity and tropism [14,15]. The V2 region, closely interacting with V1, is less variable, can bind the Dendritic Cell-Specific Intercellular adhesion molecule-3-Grabbing Non-integrin (DC-SIGN) , and also plays a role in infectivity and tropism [16,17]. The V1 and V2 regions, together, have been shown to modulate neutralization sensitivity by hiding conserved epitopes, such as the coreceptor binding site [18-20]. The V3 region has been shown to bind both CCR5 and CXCR4 cellular co-receptors [21,22]. In HIV-1 subtype C viruses this region is more conserved than in other clades [23,24], and preferentially binds the CCR5 co-receptor, independently of the disease stage [25,26]. Finally, the function of V4 and V5 regions has not been clearly identified, although they are likely to influence Env conformation and glycan packing [27,28], thereby limiting accessibility to neutralization determinants by steric hindrance. In addition, some data indicate that epitopes in the C3-V4 region are major targets of the early autologous neutralizing response in HIV-1 subtype C infection .
HIV-1 Env gp120 is among the most heavily glycosylated proteins in nature [30-32]. Alterations of the glycosylation pattern can impact protein folding  and play a role in masking linear and conformational epitopes that are sensitive targets for neutralization . Changes in the glycosylation pattern may also affect and influence co-receptor binding, thereby playing a role in viral tropism . It has also been demonstrated that N-glycans in gp120 may play a role in viral transmission  and infectivity through the interaction between high mannose oligosaccharides and the DC-SIGN or other C-type lectins . Altered patterns of glycosylation can modulate the immunogenic properties of the gp120 and contribute to escape from humoral and T-cell responses [14,37], thus providing a “shield” against the immune response of the host .
Increasing evidence, based on restricted numbers of donor/ recipient couples and individuals with acute infection, suggests that a few or just one single viral variant, characterized by a more compact gp120 Env protein, can establish the initial infection [11,38-41]. More recently, it has been demonstrated that primary infection may be mediated by viruses with relatively short variable regions of gp120 and a low number of PNGSs and that these variants show a higher sensitivity to neutralization by sera [38,42,43]. Other data indicate that variants from different HIV-1 clades, isolated from recently infected patients, show shorter V1-V2 sequences than variants isolated from chronically infected individuals, and that these variants are also more sensitive to neutralisation by patient sera than those isolated during chronic infection [44-46]. Finally, it has been recently reported that variations in the net electric charge in both variable and constant regions of the virus may be responsible of virus resistance to neutralisation [23,34,47]. Therefore, it is plausible that during the course of the disease the sequence characteristics of Env variable domains may have an impact on the selection of virus variants and on their ability to be transmitted and/or escape immune control.
In order to identify key putative features of the Env sequence, we have performed a cross-sectional study on 72 HIV-1 clade C-infected individuals from Swaziland and South Africa at different disease stages, from whom we obtained virus variants that have been characterised for amino acid sequence length, glycosylation pattern and net electric charge of their gp120 variable regions.
A total of 72 plasma samples were collected in the period 2005-2007 from HIV-positive individuals, living in the Southern Africa region, all naïve for Antiretroviral Therapy (ART). Samples were obtained from 24 individuals enrolled during the 2006 Swaziland HIV National Serosurvey , 24 individuals attending the Chris Hani Baragwanath Hospital (CHBH) in Soweto, Johannesburg, South-Africa, in the framework of the activities included in the AVIP project (www.avipeu.org) and 24 individuals attending the HIV/AIDS National Referral Laboratory (NRL) Hospital in Mbabane, Swaziland, in the framework of activities of projects from the Italian AIDS Programme. Ethical clearance for these studies was previously obtained from local Ethical Committees. Disease stage of each enrolled individual was determined on the basis of the Avidity Index [48,49] and CD4+ T cell count, as described in the Results section.
CD4+ T cell count
CD4+ T cell count was performed by means of MultiTEST and TruCOUNT tubes (Beckton Dickinson) according to the manufacturer’s instructions.
Avidity Index (AI) assay
Viral RNA extraction and sequence amplification
Viral RNA was extracted from 0.5 ml of plasma using Qiamp viral RNA miniKit (Qiagen), after treatment with Heparinase (Sigma). The V1-V5 coding region in the Env gene was amplified by RT-PCR using SuperScriptTM One-Step RT-PCR System (Invitrogen) and specifically designed external primers for clade C HIV-1. Primers were: AC-Env Outer For 5′-CAGATGCATGAGGATATAATCA-3′; ED12m Outer Rev 5′-AGTGCTTCCTTGCTGCTCCCAA3′.
The RT-PCR mix was composed of 0.5 to 1 μg of RNA that were added to an RT/Taq buffer mixture containing 0.4 mM dATP, dCTP, dTTP, dGTP (Roche), 2.4 mM MgSO4 (Invitrogen), 1 μl RT/Taqplatinum (Invitrogen), 40 U/μl of RNase inhibitor (Invitrogen), and 10 μM primers (MWB-Biotech). Reaction was carried out in a thermal cycler (Eppendorf) according to the following program: 45°C for 30′, for retrotranscription and 94°C for 2′ for the RT denaturation; the resulting cDNA was amplified as follows: 94°C for 15″, 50°C for 30″ and 68°C for 1′30″ for a total of 40 cycles. A final extension step followed for 7′ at 68°C.
The resulting PCR product was again amplified in a nested PCR, using specifically designed primers identified in conserved external portions enclosing each target nucleotide sequence (i.e. V1-V2, V3C3 and V4-C4-V5 Env regions). Primers specific for each region are reported in Table 1.
|ENV Region||Primers I||Primers II|
|V1-V2||ForA 5’ACCCCACTCTGTGTCACTTT3’ RevA 5’TATTACACTTTAGAATCGCA3’||ForB 5’AAGTTGACCCCACTCTGTGT3’ RevB 5’CTTTAGAATCGCATAACCAG 3’|
|V3-C3||ForA 5’CTGTTAAATGGTAGCCTAGC3’ RevA 5’GCAATAGAAAAATTCTCC3’||ForB 5’CACAGTACAATGTACACATG3’ RevB 5’RCAATAGAAAAATTCTCCTC3’|
|V4-C4-V5||ForA 5’GTRGAGGAGAATTTTTCTATTG3’ RevA 5’TATAATTCACTTCTCCARTTGTC3’||ForB 5’ TTTAATTGTRGAGGAGAATTTTTCTATTG3’ RevB 5’ TATTTATATAATTCACTTCTCCAATTGTC3’|
In case amplification failed using the first couple of primers (Primers I), nested PCR was repeated using another couple of inner primers (Primers II).
Table 1: Primers used for nested PCR amplification of selected regions of gp120 HIV env.
Each nested reaction was conducted as follows: an aliquot of 1-5 μl of the amplified product was added to a reaction mixture containing 200 μM dNTPS (Roche), 2.5 mM MgCl2 (Invitrogen), the couple of 20 μM primers corresponding to the region that had to be amplified (MWB-Biotech), 2.5 U/μl of AmpliTaq Gold DNA polymerase and 1x AmpliTaq Buffer (Applied Biosystems). Amplifications were carried out specifically for each region as follows: V1-V2: 96°C for 7′, then 15″at 94°C, 30″ at 48°C, 30″ at 72°C (for 40 cycles); V3-C3: 96°C for 7′, then 15″ at 94°C, 30″ at 50°C, 30″ at 72°C (for 40 cycles); V4-C4-V5: 96°C for 7′, then 15″ at 94°C, 30″ at 44°C, 30″ at 72°C (for 40 cycles).
All these PCR reactions were followed by a final extension step of 7′at 68°C.
DNA purification and sequencing
Amplified DNA of each region was purified using Qiaquick PCR purification kit (Qiagen), according to the manufacturer’s protocol. The DNA samples were then quantified and checked for purity by measuring absorbance at 260 nm and 280 nm to estimate contaminants.
The V1-V2, V3-C3 and V4-C4-V5 amplified nucleotide sequences were then sequenced according to the Sanger method  using the same primers as nested PCR.
Sequencing was performed on uncloned PCR products to identify the prevalent viral quasispecies. The electropherogram was edited using Chromas Pro (www.techelsium.com.au/ChromasPro.html). All sequences were aligned using Clustal X  and corrected for multiple alignment by manual editing.
The nucleotide sequences were translated into aminoacid sequences using GeneRunner (www.generunner.net) and further codon-aligned using BioEdit . The V1, V2 and V1-V2 regions were defined as sequences comprised between the corresponding HXBc2 envelope aminoacids 131 to 157, 158 to 196 and 131 to 196, respectively (Accession number AAB50262-1, Geneprot). The V3 region was defined as the sequence comprised between the corresponding HXBc2 envelope aminoacids 296 to 331. The V4 and V5 regions were defined as sequences comprised between the corresponding HXBc2 envelope aminoacids 385 to 418 and 460 to 471, respectively.
The identification of PNGSs in V1, V2, V3, V4 and V5 regions was performed by using N-GLYCOSYTE (https://www.hiv.lanl.gov/content/sequence/ GLYCOSITE/glycosite.html). The analysis of predicted coreceptor usage on V3 region was performed by using Geno2pheno (https://coreceptor.bioinf.mpi-inf.mpg.de/index.php).
Phylogenetic analysis was carried out using the PAUP software (version 4.0)  with the K81 model of substitution and by the use of both Neighbor-Joining (NJ) and Maximum Likelihood (ML) treebuilding methods. The evolutionary model was chosen as the bestfitting nucleotide substitution model in accordance with the results of the Hierarchical Likelihood Ratio Test (HLRT) implemented in MODELTEST software (version 3.6) .
The parameters for the nucleotide substitution model were estimated by the ML method using a NJ tree (Jukes-Cantor distance) as the base tree .
The statistical robustness and reliability of the branching order within phylogenetic trees were confirmed by either a bootstrap analysis using 1000 replicates, for the NJ tree, or the zero branch length tests for the ML tree. All calculations were performed with PAUP software (version 4.0) .
In order to evaluate genetic distances among sequences from variants isolated during the three disease stages, the Mega 4 program (www.megasoftware.net) with Kimura 2 model was used.
HIV group M subtypes and CRFs sequences were available at the Los Alamos database (www.hiv.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html).
The Wilcoxon rank sum test was used to compare the differences between the groups of recently infected individuals, patients with chronic disease and patients at late disease stage, for each one of the parameters analysed in the study.
Furthermore, to measure existing correlation between sequence length and glycosylation status, a linear regression, using Spearman’s correlation test, was performed (Stata 8.2).
For the electric charge analysis of each region, the algebraic sum of all charged residues, both positive and negative, was considered as the total electric charge (Qtot), whereas the sums of all the positive charged residues (Arg, Lys, Hys) and of all negative charged residues (Glu, Asp) were considered as total positive (Qpos) and total negative (Qneg) charge, respectively. An original Matlab program was written to yield the distribution of Qtot, Qpos and Qneg directly from the * fas files with the aligned sequences. Such distributions were analyzed with Wilcoxon’s rank sum test, after a preliminary checking of the non- Gaussian nature of the distributions using the Kolmogorov-Smirnov test, as previously described .
The 72 HIV-1-infected patients from Swaziland and South Africa, all naïve for antiretroviral therapy, were classified for the disease stage in three groups: Recent Stage (RS), Chronic Stage (CS), and Late Stage (LS), on the basis of the AI assay and the CD4+T cell count. An AI ≤ 0.80 identified samples from recently infected individuals with stage (RS), [48,49], whereas patients with chronic and late disease (CS and LS, respectively) were identified on the basis of an AI > 0.80 and CD4+ T cell counts. Specifically, patients with CD4+ T cell count ≤ 200/ μl were considered to be at a late stage of disease, whereas the group of patients with chronic disease was identified by CD4+ T cell counts > 200/μl (Table 2).
|Number of patients||A I||CD4+ T-cell count Range; [median](cell/ml)|
*NA: Not Applicable.
RS= Recent disease Stage; CS= Chronic disease Stage; LS= Late disease Stage.
Table 2: Classification of patients for disease stage according to AI and CD4+ T cell counts.
A Maximum Likelihood (ML) phylogenetic tree, based on the V3V5 coding region, classified all variants obtained from the patients within the HIV-1 clade C subtype (Figure 1). The tree also showed a mixing of the Env sequences topology, independently of the patient region of origin. Moreover, nucleotide distances, implemented by Mega software, between viruses isolated from patients in Swaziland and viruses isolated from patients in South Africa, never exceeded 17%, confirming a close relationship among the isolates from the two countries.
Amino acid sequence alignment of the V1-V2, V3, and V4-V5 regions showed high sequence variability in V1 and V4 sequences, for both amino acid substitutions and insertion/deletions of short amino acid sequences, mainly clustering in the C-terminal portion of both regions. The V2 and V5 regions were less variable, whereas the V3 region appeared well conserved among all the isolates. The high sequence variability originated also new PNGSs that clustered in the C terminal portion of the V1, V2 and V4 loops, in particular in the group of patients with chronic disease.
The V3 region was extremely conserved for both amino acid sequence length and PNGSs number (median amino acid sequence length: 35aa; median PNGS number: 1) in all three disease stages, as previously described for HIV-1 subtype C virus  (data not shown). Therefore, due to its high degree of sequence conservation, the V3 region was not included in the subsequent analyses of the characteristics of the gp120 variable sequences.
A comparative box plot analysis of amino acid sequence length and PNGS number for each variable region is reported in figures 2 and 3, respectively. Amino acid sequence length of the V1 and V4 regions significantly increased in the CS patients when compared to that observed in patients with recent infection (RS) (p values of 0.0083 and 0.0158, respectively) (figure 2). In addition, there was a statistically significant increase of the length of the V4 region in the group of LS patients compared to the RS patients group (p=0.0208). A reverse trend to shorter sequences (not statistically significant) was observed in LS individuals for both V1 and V4 regions. The V2 domain showed a somewhat inverted behaviour for length variability, although not statistically significant, since the sequence length tended to be slightly reduced in patients in the chronic disease stage when compared to individuals in the early and late disease stages. Finally, no statistically significant changes of amino acid length of the V5 region were present in each of the three groups of patients.
The box plot analysis of PNGSs number is reported in figure 3. The number of PNGSs in the V1 region was significantly increased in CS patients, as compared to the RS patients group (p=0.0121). Furthermore, a slight not statistically significant, reduction of PNGSs number in V1 during the late stage of disease was observed. In addition, a slight not statistically significant increase of PNGS number was observed for the V4 region in CS patients, compared to the RS patients group, while the number remained quite constant in the late infection stage. Neither statistically significant changes in the PNGSs number, nor trends, were present in the V2 and V5 regions when the three groups of patients were compared.
A quantitative analysis of the PNGSs distribution revealed a stable presence of highly conserved PNGS, with an expression frequency ≥0.70, irrespective of the disease stage, in the positions N136, N141, N156, N160, N186, N301, N386, N392, N397, and N463 of the corresponding HXBc2 sequence, distributed in all the five variable regions.
The qualitative analysis of the distribution of PNGSs among the different groups of patients is reported in figure 4. Besides the presence of the conserved PNGSs, a number of PNGSs with a lower expression level could be identified (referred as shifting PNGSs) . These PNGS clustered in the C-terminal portion of V1 and V4 and, more moderately, in V2, in the CS group of patients, as compared to RS patients. Their mean expression frequency was <0.40 and only a few of them were still present in the group of patients in the late stage of disease. An exception was represented by the PNGS N406 in the V4 region that had an intermediate behaviour, being expressed with a frequency <0.5 in the group of recently infected patients and increasing up to >0.7 in the groups of patients in the chronic and late disease stages.
To investigate if the variability patterns of amino acid length and PNGSs distribution were related, a regression analysis was carried out for each variable V1, V2, V4 and V5 domain, individually. The results indicated that a strong statistically significant positive correlation exists between length and PNGSs number, associated with positive regression coefficients.
To complete the analysis of sequence characteristics, we have calculated the net electric charge of each region, considering positive, negative and total electric charge distributions. Total electric charge (Qtot) of V1, V2 and V4 regions was found to be stable, the transition between different stages of disease being characterized by no significant changes. In the V5 region, the Qtot was lower in the CS patients disease, as compared to the groups of patients at both recent and late disease stage (figure 5), although the change was not statistically significant. However, when positive (Qpos) and negative (Qneg) electric charges were considered separately, a statistically significant increase of Qpos of V5 region was observed in the LS patients group, compared to the CS patients (p=0.027), whereas Qneg was similar in all the three groups (figure 5).
We have conducted a cross-sectional study in patients at different disease stages in South Africa and Swaziland, where HIV-1 subtype C virus is predominant [57-59], with the aim to investigate changes of sequence characteristics of gp120 variable domains from virus variants obtained during the course of the disease. In order to avoid bias due to the pressure of antiretroviral drugs, we included in the study only patients naive for antiretroviral therapy. In addition, we excluded sampling bias due to collection of samples from two different countries, although from the same geographical region, by generating a Maximum Likelihood phylogenetic tree which indicated that the two subgroups were similar (nucleotide distance <10%).
Our data on sequence analysis of gp120 variable domains confirmed an increase in overall amino acid variability in the chronic stage of the disease, predominantly in the V1 and V4 regions and, to a lower extent, in the V2 and V5 regions with the exception of the V3 region that was, instead, highly conserved, in all the three disease stages, in line with previously published studies .
The V1 and V4 sequence lengths significantly increased in the group of patients in the chronic disease stage, compared to those in the early stage. These data, among the few ones obtained in cross-sectional studies, support what was previously observed in longitudinal studies on HIV1 C and non-C subtypes, namely that HIV-1 isolates from patients with chronic disease are prone to length increase of the variable regions of gp120 which promotes resistance to neutralization [11,38,43,45,61,63]. It is possible that these large V1/V2 domains contain epitopes recognized by neutralizing antibodies, in addition to those present in the V4 region of the HIV-1 C subtype . This is further confirmed by recent data obtained on clade C variants indicating that shorter V1 loops can expose the epitope recognized by anti-subtype C antibodies directed to the V1/V2 region, resulting in higher neutralization titres .
Our data also show a slight decrease of sequence length in the V1 and V4 loops, during the late stage of the disease when compared to the chronic stage. We could speculate that sequence length reduction is a consequence of the reduced pressure of the immune response in late disease stage. In fact, variants bearing shorter V1 and V4 sequences may be favoured in late stages of the disease, as is the case with recent infections, when the HIV-specific immune response has not completely developed. This hypothesis is further supported by published data on subtype B, showing that the reduction of immunological pressure in the late disease stage may allow the re-emergence of previous existing variants . In addition, a CD4+ T cell count cut-off of 150 cells/μl has been proposed to define the drop down of the selective pressure on Env variable domains , which is consistent with the CD4+ T cell count median value of 107.5 cells/μl in our patients in the late disease stage.
The PNGSs distribution in the V1, V2, V4 and V5 domains during the course of the disease followed a pattern comparable to the one described for the sequences length. The existence of a positive correlation between length and PNGS number suggests that for the sequence length there could be a viral mechanism of evolution towards variants with a more glycosylated Env protein variants.
The distribution of conserved and shifting PNGS over the variable loops sequences also revealed some other intriguing possibilities. The analysis confirmed the presence of conserved PNGSs characterized by a steady high expression level (>70% throughout the disease stages), as previously described . These PNGS are likely to be subjected to a negative selective pressure by the immune system, in agreement with the extensive literature data on their key role in envelope protein functionality [31,67]. Therefore, the correlation between sequence length variability and number of PNGSs is likely to affect the population of shifting PNGS. Zhang et al. have described these PNGS as being positioned in hot spot regions . In agreement with these findings, our data show an extensive presence of shifting PNGSs in the V1 and V4, and, to a less extent, in the V2 loops that accumulate in the chronic disease stage and cluster in the C-terminal portion of the domains. These results are also in line with previously published studies on subtype A, B and C viruses, which describe an increase of the PNGSs number after the early-infection stage in the V1-V2 domain [38,42,43,68], although our data demonstrate that the distribution of “hot-spot” areas of shifting PNGS is different. Nevertheless, since HIV1 subtype C is phylogenetically more related to clade A than clade B, our data support the possibility of evolution of a subtype-specific glycan shield by the shifting PNGSs within the constraints posed by glycandependent envelope functionality mediated by the more conserved PNGSs [27,31,46,69,70].
Taken altogether our data on sequence length and PNGS number variation suggest a major effect of the positive selective pressure driven by the host immune system on the V1 and V4 loops compared to the V2 and V5 loops, indicating that the V1 and V4 regions could be among the main drivers of clade C HIV-1 virus evolution. This hypothesis is also in line with recent studies that identify the V1 loop as the major regulator of virus sensitivity towards neutralizing antibodies in subtype B  and point to the V4 loop of subtype C as a being a major target of neutralization activity .
Furthermore, our results suggest that the disease stage-related envelope evolution trend, previously observed in intra-individual dynamics, could be a distinctive mechanism of HIV-1 subtype C virus escape at the population level in Southern Africa.
It has been reported that differences in aminoacid charges correlate with resistance to neutralising antibodies . In our study the Qtot of V1, V2 and V4 regions was found to be stable, throughout the course of the disease. Noteworthy, the V5 loop was characterized by a decrease of Qtot in the transition from early to chronic disease stage followed by an increase in the transition from chronic to late disease stage. This was associated with converse changes in Qpos, statistically significant in the transition from chronic to late disease stage. We can speculate that the V5 domain, while conserved in its sequence length and PNGSs number, undergoes significant changes in its net electrical charge, suggesting a crucial role of V5 charges in the 3D structure of Env, which could influence glycan packing and immune evasion, as it has been proposed previously [27-29].
In summary, our results have demonstrated the existence of a disease stage-related trend in the evolution of subtype C gp120 Env. Increased understanding of the exposure of different Env regions that have key function in virus infectivity and transmission may help current efforts to develop a preventative vaccine against HIV infection.
All the 72 sequences generated from this study were deposited in GenBank under accession numbers JN120974-JN121117
We thank Dr. Mario Falchi for his help in figure editing, Mrs. Guendalina Fornari Luswergh and Mrs Stefania Ceccarelli for her editorial assistance. We also thank Mrs Emanuela Salvi, Mrs. Claudia Rovetto and Mrs. Patrizia Di Zeo for their outstanding technical assistance. We are indebted to Dr. van Regenmortel for contributing to the discussion of the results of this work.