Received Date: December 29, 2013; Accepted Date: January 28, 2014; Published Date: February 03, 2014
Citation: Bai X, Zhang H, Shao W, Li L, Zeng Z, et al. (2014) Application of Ambiguous Analysis for Determining Recent Infections of HIV-1 in a China Men Who Had Sex with Men Population. J AIDS Clin Res 5:279. doi:10.4172/2155-6113.1000279
Copyright: © 2014 Bai X, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of AIDS & Clinical Research
Objective: To distinguish HIV-1 recent infection (≤ 300 days) from long-term HIV-1 infection (>300 days) in China men who had sex with men (MSM) population. We analyzed the change over time in the proportion of ambiguous nucleotides in patient plasma sequences. We hypothesized that this method could be used to determine recent infections.
Methods: HIV-1 sequences and clinical data of MSM were collected from June 2007 to September 2010. All the sequences were obtained by single genome amplification and sequencing. HIV-ambi-count was used to calculate the proportion of ambiguous nucleotides, and the optimal cut-off values to identify recent infection were established using receiver operator characteristic analysis (ROC).
Results: A total of 188 sequences from 150 patients were collected (38 patients were sampled twice at different time points), consisting of 118 sequences of subtype CRF01_AE and 70 sequences of subtype B. The optimal cut-off values for determination of early HIV-1 infection in subtypes CRF01_AE and B were 0.31% and 0.347%, respectively. The sensitivity, specificity, positive and negative predictive values were 76.5%, 55%, 89.3% and 32.4% for subtype CRF01-AE (P=0.005), 80.6%, 62.5%, 94.3% and 29.4% for subtype B (P=0.017), respectively. The area under the ROC curve (AUCROC) was 0.652 (95% CI, 0.520-0.783, P=0.03) and 0.75 (95% CI, 0.622-0.878, P=0.022) for subtype CRF01-AE and B, respectively.
Conclusions: The proportion of ambiguous nucleotides may be a good marker to distinguish recent infection from long-term HIV-1 infection with high positive predictive value in the China MSM population.
HIV-1; Recent infection; Long-term infection; Ambiguous nucleotide; MSM
HIV-1 infection is a global health problem, with millions of people newly infected every year. However, it is difficult to determine the difference between a recent infection and a chronic infection. HIV- 1 incidence is classically estimated by prospective cohort studies , which are expensive and time consuming. What’s more, once participants were recruited into cohort studies, interventions would be taken to avoid high-risk behaviors, thus leading to an under-estimated incidence . Based on enzyme immunoassay (EIA), Jassen et al.  developed an enzyme immunoassay (EIA) to distinguish recent (4- 12 months) from long-term (>12 months) HIV-1 infections [4,5] by modifying sample dilution and incubation time. Subsequently, several laboratory methods have been developed to estimate time of infection, such as BED-capture enzyme immunoassay (BED-CEIA) , antibody avidity assay , and detection of anti-p24 IgG3 antibodies . Currently, BED-CEIA is widely used to detect recent infection, which is based on the rationales that HIV-1 specific IgG level rises gradually in the early stage of infection, and finally maintains a plateau for many years. Comparing to prospective cohort studies, these methods are more convenient and useful. The sensitivities of 13 serologic assays vary from 42% to 100% with a median of 89%, and the specificities range from 49.5% to 100% with a median of 86.8% . However, there are still many limitations of these methods, including inconsistency between different laboratories results and misclassification problems . Thus, more studies are needed to develop a new method with high sensitivity and reproducibility.
HIV-1 infection is initiated in most cases by a single virus , and then viral diversity builds up gradually under the pressure of viral mutation and immune selection . Based on their genetic makeup, human immunodeficiency virus type 1 (HIV-1) is divided into four groups: M, N, O and P. Group M could be further divided into several subtypes, including subtypes A-D, F-H, J and K. The pandemic and high evolutionary rate of HIV-1 as well as the high mutation rate and recombination of viruses, has resulted in emergence of the circulating recombination form CRF01_AE [13,14]. Evolution and recombination of virus could also accelerate disease progression of AIDS, and reduce the effect of drugs . The most prevalent subtype in Europe and North America is subtype B (Los Alamos HIV database, https://www.hiv.1anl.gov). Early studies showed that subtype B was also the dominant subtype in Chinese MSM population, but CRF01_AE strains were found to surpass subtype B strains lately . In recent years, several methods based on viral sequences have been developed to identify recent HIV-1 infection (≤1 year) [16-18]. Importantly, Kouyos et al.  showed that the proportion of ambiguous nucleotides correlated with the time elapsed between HIV infection and sampling for genotyping. The threshold of ≥0.5% ambiguous nucleotides performed well in determining recent infections of HIV subtype B in a Switzerland population with high sensitivity (86.8%), high specificity (70%) and high negative predictive value (98.7%). However, because the subtypes of Chinese patients are mainly CRF01_AE and B, it has yet to be determined if this method of measuring ambiguous nucleotides within a patient between sampling times could be used to determine recent infections.. Here, our study aims to evaluate the performance of this method in Chinese patients.
All patients in this study were HIV-positive men who had sex with men (MSM). They were recruited from the local HIV/AIDS sentinel surveillance sites in Beijing Youan Hospital, Capital Medical University. Informed consent was given by participating patients during the period of 2007-2010. Eligible participants were adults with one or more sequences and complete clinical data such as age, time of infection, medical history, HIV viral load and CD4 + cell count. Sequences with uncertain infection days, incomplete clinical data, and from patients with coinfections of multiple HIV subtypes were removed from the analyses. All patients were not treated with anti-HIV medications.
To identify HIV infection, samples of patients were tested by enzyme linked immunosorbent assay (ELISA) (BIO-RAD xmar, HIV1/2 antibody ELISA kit, Shanghai, China) in Beijing Youan hospital. The seropositive samples were sent to Beijing Centers for Disease Control and prevention (CDC) and further confirmation tests were done there .
Time of infection was estimated as follows: (1) If the result of ELISA was negative but HIV RNA (VL-EasyQ NucliSens EasyQ Analyzer, NucliSens EasyQ HIV-1 v2.0, Biomerieux) was detected, then the infection time was determined as 14 days before sampling; (2) If the result of ELISA was positive but the result of western blot (WB) was suspicious, then the infection time was determined as 30 days before sampling; (3) If the results of ELISA and WB were all positive, then the infection time was determined as the midpoint of the interval between last negative and first positive HIV tests results [20,21]. Infection within 300 days was defined as recent infection, and if the time interval between HIV infection and sampling was longer than 300 days, the patient was determined to have a long-term infection.
All the sequences were obtained by single genome amplification and sequencing. The sequenced HIV-1 pol region covered 1293 base pairs from the transframe protein gene (p6) to the reverse transcriptase gene (RT) .
Analysis of sequences
JpHMM (jumping profile Hidden Markov Model) and BLAST search from Los Alamos HIV database (https://www.hiv.lanl.gov) were used to determine HIV subtypes.
Sequences were aligned using ClustalW in the MEGA 5.1 software package , and they were checked and edited using the BioEdit software package.
The proportion of ambiguous nucleotides within each sequence was calculated by HIV-ambi-count software. This analysis needs no reference sequences.
SPSS 13.0 software was used to analyze the data. Continuous data was expressed as mean ± SD and analyzed using Student’s t-test. Categorical data was analyzed by the chi-square test or Fisher’s exact test. All P values were 2-sided, and 0.05 was set as the statistical significance value.
Optimal cut-off values of the proportion of ambiguous sites were established to distinguish recent from long-term infections. The classification performance was evaluated with receiver operating characteristic (ROC) analyses. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for each subtype as follows: sensitivity = true positive / (true positive + false negative), specificity = true negative / (true negative + false positive), positive predictive value = true positive / (true positive + false positive), negative predictive value = true negative / (true negative + false negative).
One hundred and eighty-eight sequences from 150 enrolled patients were collected (38 patients were sampled twice at different time), including 160 sequences of recent infection and 28 sequences of long-term infection (Table 1). The mean age of all participants was 31.24 ± 8.63 years old. The mean age of recent infected and long-term infected were 31.01 ± 8.52 and 32.57 ± 9.28 years old, respectively. No significant statistical difference was observed between ages of these two groups (P=0.38).
|Recent infection (n=160) *||Long-term infection (n=28) *||P|
|Age (years)||31.01 ± 8.52†||32.57 ± 9.28||0.38|
|Time since infection (days)||140.28 ± 92.94||383.54 ± 112.25||0|
|Viral load (log10 copies/ml)||4.64 ± 0.92||4.34 ± 1.14||0.141|
|Viral load test undone||18||0|
|CD4 count (per mm3)||445.36 ± 193.04||488.18 ± 175.82||0.27|
*: Number of sequences. †mean ± SD
Table 1: Characteristics of participants.
Subtype analysis showed that 118 sequences were CRF01_AE (62.8%), and the remaining 70 sequences were B (37.2%). As shown in Figure 1, the proportion of ambiguous nucleotides spread from 0 to 4.27%, and differences between recent and long-term infection were observed in all subtypes. For subtype CRF01_AE, the median value of ambiguous nucleotides fraction for recent and long-term infections were 0.155% and 0.318%, respectively, P=0.031. For subtype B, the median value of ambiguous nucleotides fraction for recent and longtime infections were 0.080% and 0.425% respectively, P=0.02.
Figure 1: Comparison of proportions of ambiguous nucleotides between recent and long-term infections. The ends of the whiskers mark the 5th and 95th percentile. Outliers are marked by dots. For subtype CRF01_AE, the median value of ambiguous nucleotides fraction for recent and long-term infections were 0.155% and 0.318%, respectively, P=0.031. For subtype B, the median value of ambiguous nucleotides fraction for recent and long-time infections were 0.080% and 0.425% respectively, P=0.02.
Optimal cut-off value of ambiguous nucleotides proportion
To establish an optimal cut-off value of the proportion of ambiguous sites determining recent infection, we evaluated series of cut-off values. From Table 2, we could see that a cut-off value of 0.31% performs well for subtype CRF01_AE in distinguishing between recent and long-term infections. That is, if the fraction of ambiguous sites from a sequence was not larger than 0.31%, then the sequence was from a recent-infected patient, otherwise the sequence was from a long-term infected patient. The sensitivity, specificity, PPV and NPV for this cutoff was 76.5%, 55%, 89.3% and 32.4%, respectively (P=0.005). ROC analysis showed area under the ROC curve (AUCROC) was 0.652 (95%CI, 0.520-0.783, P=0.03) (Figure 2 A).
|Subtype||Cutoff, %||Sensitivity, %||Specificity, %||PPV, %||NPV, %||P value|
Table 2: Cut-off values for different subtypes.
When 0.347% was set as the cut-off value for subtype B, 50 of 62 recent infection sequences were classified correctly (sensitivity, 80.6%). Meanwhile, the specificity was 62.5%. As Table 2 shows, the PPV was 94.3%, and the NPV was 29.4% (P=0.017). The AUCROC was 0.75 (95%CI, 0.622-0.878, P=0.022) (Figure 2 B).
In our study, we found the fraction of ambiguous nucleotides within individual patient sequences was a good indicator to distinguish between recent and long-term infections in a Chinese MSM population. In total, 188 sequences were collected, most of which were subtype CRF01_AE (62.8%) (Table 1), which is consistent with other studies also reporting that CRF01_AE strains have surpassed subtype B strains in Beijing MSM . The proportion of ambiguous nucleotides ranged from 0 to 4.27%, and differences between recent and long-term infections were observed in all subtypes (Figure 1).
We further analyzed the proportion of ambiguous nucleotides of all sequences separately according to their subtypes, and found that the fraction of ambiguous nucleotides varied between recent and long-term infections with the fraction increasing with long term infections. Then we attempted to find the optimal cut-off value of the fraction of ambiguous nucleotides to determine the different between recent infections and long term infections. As Table 2 and Figure 2 show, the best cut-off value for subtype CRF01_AE was 0.31%. The sensitivity was 76.5% and the specificity was 55%. The positive predictive value was 89.3%. The AUCROC was 0.652. When 0.347% was set as the cut-off value for subtype B, the highest total classification accuracy (78.6%) was achieved. Similar to CRF01_AE, the sensitivity and specificity of subtype B were 80.6% and 62.5%, respectively. The PPV was 97.2%. The AUCROC was 0.75. These results were slightly lower than those described by some of the previous works [16,18]. This might be due to the shorter infection time of long-term infected patients (383.54 ± 112.25) as we only used data sets of precisely timed longitudinal samples. Other mature laboratory methods such as BEDCEIA  were reported to be quite reliable with high sensitivity and specificity. However, lots of experimental instruments were needed, and the results may differ in different laboratories . In contrast, sequencing based assays could identify recent infection by analyzing sequences collected from genotyping or drug resistance tests. Extra experimental instruments are not needed here, and the method is less time consuming.
However, there are still limitations of our study. Firstly, since immune selection plays an important role in the process of viral diversity, differences in immune pressure between patients such as HLA types should be taken into consideration. Secondly, all patients in this study were infected from a single founder strain. However, previous study showed that around 36% of MSM were found to be infected by more than one type of virus , and recent infections originating from multiple founder strains might be misclassified as long-term infections . Thirdly, antiretroviral therapy also strongly influences viral diversity , however, the classification value of sequencing based assays on exposure or treatment experienced people is limited.
In conclusion, the proportion of ambiguous nucleotides is a useful marker to distinguish recent from long-term HIV-1 infections, but this new genotypic tool can’t be used alone and must be interpreted with clinical and serological data. This method has performed well in the Chinese population. Considering the limited sample size in our study, more investigations are needed to confirm our findings.
We thank Ms. Valerie Boltz at National Cancer Institute (NCI) HIV Drug Resistance, USA for editing the manuscript.
The project has been supported by the Major Science and Technology Special Project of China Eleventh and Twelfth Five-year Plan (2008ZX10103, 2012ZX10001-003, 006, 008) and the grant from Beijing Key Laboratory (BZ0089). All authors declare that they have no competing interests.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals