Is the HIV Dementia Scale a Reliable Tool for Assessing HIV-related Neurocognitive Decline?

Grace M Lu1, Bruce J Brew1,2,5, Krista J Siefried1,2, Brian Draper3 and Lucette A Cysique2,4-6* 1University of New South Wales, St. Vincent’s Hospital Clinical School, Sydney Australia 2St. Vincent’s Hospital; Sydney, Australia 3School of Psychiatry, University of New South Wales, Sydney, Australia 4Neuroscience Research Australia, Sydney, Australia 5St. Vincent’s Centre for Applied Medical Research, Sydney, Australia 6Department of Neurology, level 4 Xavier building, St. Vincent's Hospital, Darlinghurst, NSW, 2010, Australia


Introduction
Human Immunodeficiency Virus type 1 (HIV-1)-Associated Neurocognitive Disorder (HAND) is a major neurological complication of HIV infection, affecting up to 50% of persons with advanced HIV disease [1][2][3][4]. With the introduction of combined antiretroviral therapy (cART) in the mid-1990s, HAND is persisting due principally to the increased life expectancy, and the chronic neurotoxic effect of HIV on the brain [5,6]. Therefore, as for HIV infection, HAND is now considered a chronic disease in countries with cART access.
However, milder neurocognitive impairment remains common, as found in the largest cART era cohort study in the United States (n=1500 patients from the years 2003 to 2007 in the CNS HIV Anti-Retroviral Therapy Effects Research project, CHARTER) [11]. Prevalence estimates of milder forms of HAND were found to be 33% asymptomatic neurocognitive impairment (ANI) and 12% mild neurocognitive disorder (MND) [11], remaining consistent with pre-cART reports [1,4]. Other earlier international cohort studies have also shown overall stable HAND prevalence rates between the pre-cART and cART era, as the number of dementia cases decreases but the number of mild HAND cases increases [2][3][4][12][13][14].
While mild forms of HAND do not have such a marked interference with everyday functioning as dementia, they still have a significant impact on employment and capacity to work efficiently [15], driving ability [16], mortality [17], adherence [18], and have greater risk of progression to more severe impairment [19,20].
Although standard neuropsychological (NP) assessment is the gold standard for the diagnosis of HAND, not all patients develop HAND so indiscriminate NP assessment is costly and has low public health efficiency [21]. Our group was the first to propose an improved approach [22] using a 'neuroalgorithm' based on clinical data, which streamlines patients based on priority and differential diagnosis. Patients assessed at risk on the neuroalgorithm complete a brief cognitive screen. Those who are impaired on the cognitive screen are then prioritized for a full neurological including a clinical NP examination.
The most popular pen and paper cognitive screen is the HIV dementia scale (HDS) [23]. The HDS was developed in 1995 to detect HIV-associated dementia (HAD) [23]. This initial study validated the scale in 29 HIV seronegative (HIV-) patients and 101 HIV-positive (HIV+) pre-cART patients; 39 with moderate dementia and seven with severe dementia. The HDS was found to have 80% sensitivity and 91% specificity using a raw cut-off score of ≤10. Since then it has been shown that in the cART era, raising the cut-off to ≤14 improves the sensitivity and specificity of the HDS to the mild forms of HAND, which are more common in the CART era [10].
However, one question that has not been thoroughly investigated is whether the HDS can be reliably used on repeated occasions to identify if HAND is present or if an HIV+ patient is declining. In the initial Power et al. study, the HDS was found to be highly reliable in a subset of 20 HIV+ individuals re-examined at one and a half months (test reliability r=0.87, p<0.0001) [23]. But, to the best of our knowledge, this is the only study that has reported the HDS re-test reliability. This means that there is no information for a longer and more clinically appropriate re-test interval as likely to occur in the cART era.
In addition, the magnitude of clinically significant neurocognitive decline that the HDS can actually detect has never been quantified. Yet clinicians are likely to use the HDS repeatedly to reassess cognitive status as part of optimal care in chronic disease. To the best of our knowledge, our study is the first to investigate this issue comprehensively. In the context of our neuroalgorithm [22], priority for neurological care should also be given to patients who significantly decline on repeated HDS testing.

Study Objectives
The aims of our study are to establish: 1. The HDS re-test reliability in a clinically stable HIV+ sample on cART over a period of three to six months.
2. The capacity to detect clinically relevant cognitive change using the HDS.
3. Which demographic factors, HIV biomarkers, and HDS subtests are associated with HDS-based decline.

Participants
The study sample was composed of 55 HIV+ individuals enrolled in two parent studies, the HIV and Ageing Observational Cohort Study (n=49) and the Neuro-HAART HIV Trials (n=6), taking place at St Vincent's Hospital, Sydney. The baseline sample size was 60 participants and follow-up sample size was 55, as five participants were lost to follow-up; two had changed their contact details and three were travelling.
Baseline assessment occurred between October 2011 and October 2012, and follow-up occurred between May 2012 and December 2012. Eligible participants had historically advanced HIV with a nadir CD4 cell count ≤ 350cp/mL, HIV duration ≥ 5 years, and stability on cART for at least 6 months. Participants of the Neuro-HAART HIV Trials had a HAND clinical diagnosis at entry that was based on standard neurological, NP, and laboratory examinations [7].
Participants were excluded if they reported any previous history of neurologic disease unrelated to HIV infection, current or past history of major psychiatric disorder such as schizophrenia or bipolar disorder, alcohol or drug dependence within the last 12 months, or any history of traumatic brain injury with loss of consciousness ≥ 30 minutes.
This study and the parent studies were approved by the St. Vincent's Hospital and the University of New South Wales Human Research Ethics Committees and all participants provided written informed consent prior to the study.

Procedure
To optimize the administration of the HDS for the non-specialist, we developed a standardized set of administering instructions based on the original HDS [23] and we provide those in Appendix 1 for the use of other researchers. The HDS was administered by a medical student (GL), research assistant, or board-registered neuropsychologist. All test administrators had been trained in the administration of the HDS by the senior neuropsychologist (LAC) and neurologist (BJB). All tests were conducted in a quiet and well-lit clinic room, and required approximately five minutes to complete. NP batteries were administered by registered neuropsychologists in training under the supervision of LAC, as well as the St. Vincent's Hospital senior clinical neuropsychologist. In the later instances, clinical and research NP data that overlapped were used to avoid unnecessarily repeating any tests. This was allowed by an ethics agreement for HIV+ patients at the Immunology and Neurology Department, which outlined that some of their clinical data, upon request and if they consent, may be used for clinical research. The clinical neuropsychologist requested permission for the use of any clinical NP data in research and all consented.
The Neuro-HAART HIV Trials battery required approximately 45 minutes to complete, and the HIV and Brain Aging Observational study battery required approximately two hours to complete. Participants were tested at baseline and follow-up, with a re-test interval of 6 months for the Neuro-HAART HIV Trials and 18 months for the Ageing Observational HIV cohort study. The timing of the HDS assessment was so that the parent study NP assessments of cognitive stability/ decline status could be used as the gold standard. More specifically, for the Neuro-HAART HIV Trials the 6-month re-test interval coincided with the baseline and follow-up HDS. For the Ageing Observational HIV cohort study, the 18-month follow-up NP testing coincided with the baseline HDS, and the follow-up HDS was performed three to four months after. Due to the timing of the parent studies, we only assessed sensitivity and specificity to cognitive change status but not the crosssectional validity HDS at either baseline or follow-up.
All the participants completed the baseline and follow-up NP testing. At follow-up, 87.3% of baseline participants completed the HDS. Participants were tested on average 3.9 (SD=1.1) months after their initial assessment.

NP gold standard evaluation:
Participants of the Neuro-HAART HIV Trials were assessed with a battery of standardized NP tests assessing five different cognitive domains, namely attention/ working memory, speed of information processing, verbal learning and memory, executive functions, and motor functions (Appendix 2). The NP battery for participants of the HIV and Ageing Observational Cohort Study included the same domains, but with additional measures for the domain of speed of information processing, verbal memory, and the additional domain of verbal generativity. Both batteries had been selected to assess cognitive domains that are sensitive to HIV-related brain injury [7,24]. The other difference between the two batteries resided in the non-cognitive data that was collected. Moreover, the participants completed the Independence in Activities of Daily Living (IADL) scale which measures the functional impact of emotional, cognitive, and physical impairments [25], and the Patients Assessment of Own Functioning Inventory (PAOFI) which evaluates a patient's experience of everyday cognitive and functional problems [26].
Self-reported depressive complaints were assessed with the Depression, Anxiety and Stress Scale (DASS) [27] if taking part in the Neuro-HAART HIV Trials, or the Beck Depression Inventory-II (BDI-II) in the HIV and Aging Observational study [28]. We used the standard clinical cut-offs (BDI-II>17; DASS>14) to distinguish between no complaints, mild depressive complaints and clinically relevant depressive complaints.

Gold standard NP-impairment definition:
To determine the gold standard NP-impairment, we transformed raw NP data into z-scores. The reference z-scores were developed in a sample of 52 local HIV, controls with comparable demographic characteristics to the current sample. Each z-score was transformed into a deficit score as follows: A deficit score of 0 indicates no impairment (z-score ≥ -1), a deficit score of 1 indicates mild impairment (z-score<-1 to -1.5), a deficit score of 2 indicates mild to moderate impairment (z-score<-1.5 to -2), while 3 indicates moderate impairment (z-score<-2 to -2.5), 4 indicates moderate to severe impairment (z-score<-2.5 to -3), and 5 indicates severe impairment (z-score<-3). Then, the individual deficit scores were averaged to create a summary z-score-based GDS. As per convention a GDS ≥ 0.5 was used to define clinically relevant level of impairment [29][30][31]. A higher GDS indicates greater cognitive deterioration.
Next we determined in each of the HIV+ cases, the HAND classification (ANI, MND or HAD) according to the international diagnostic nomenclature [7], which was implemented as follows: GDS ≥ 0.5 and no IADL decline=ANI; GDS ≥ 0.5 and IADL decline=MND; GDS ≥ 1.5 and no IADL decline=MND; GDS ≥ 1.5 and severe IADL decline=HAD. To differentiate between ANI and MND, we used the IADL and PAOFI self-report as well as any clinical evidence of IADL decline (e.g. medical records and information from associated personnel).
Laboratory data: HIV biomarkers were collected at baseline and included nadir CD4, current CD4, plasma HIV RNA, (and CSF HIV RNA was available in 23 participants).

Data analysis HDS impairment definition:
To provide a rate of overall cognitive impairment on the HDS we used the most recently published cut-off: Raw HDS score ≤ 14 cut-off [10].

Re-test reliability:
Re-test Pearson correlation coefficient was computed between baseline arcsine-root HDS and follow-up arcsineroot HDS to determine test retest reliability. This was computed in the entire group and in those who had been found to be NP-stable on the gold standard testing (see section below on how cognitive stability was defined).
Rating of gold standard NP decline/stability: Standard NP scores were transformed into standard regression-based change scores based on published normative standards in HIV-and HIV+ stable individuals. These normative longitudinal NP standards are inclusive of the demographics of the current sample [31]. Using these normative standards for NP change, we then determined clinically significant neurocognitive change as a z-score outside of the 1-tailed 80% confidence interval. This definition of neurocognitive change was slightly less strict than in the normative standards' publication (90% 1-tailed) in order to encompass milder levels of NP change. Using this definition, we found that 12.7% declined, 9.1% improved, and 78.2% were stable. Finally, to keep in mind optimal statistical power and focus on decline detection (rather than improvement), we selected a 1-tailed 80% prediction interval; hence cases who improved on the standard NP testing were grouped with the stable cases to form a non-decliner group and the others were labelled decliners.
Rating of HDS-based decline/stability: HDS-based simple regression-based change scores were developed based on the HIV+ individuals who had stable NP performance (i.e., the stable 78.2%). These regression-based change score formulae were then applied to the rest of the group to yield an HDS-based change z-score in all cases. Similar to the gold standard NP decline definition, we selected an 80% interval of confidence; 1-tailed.
Change score methodology rationale: Use of the standard regression-based changed score methodology was based on the following rationale: (1) This procedure corrected for practice effect and regression towards the mean in the prediction of both gold standard NP and HDS-based change; and it standardized the NP data into z-scores with a mean of zero and a SD of 1 [32]; (2) This procedure yielded predictions of neurocognitive change in individuals rather than at the group level [31]; hence it can provide guidance for individually based neurological care.
The HDS data was transformed to approximate the normal distribution as the development of standard regression-based change scores assumes normally distributed data. We used log 10 transformation for continuous data and arcsine-root transformation for dichotomous data.

Demographic, HIV disease and HAND characteristics
The study participants' characteristics are presented in Table 1. The sample was composed of mostly chronically HIV-infected men (two women), virally suppressed on stable cART, who in the past had historic AIDS as per the 1993 Centres for Disease Control (CDC) definition [35], and 36.4% have had an AIDS defining illness. The prevalence of HAND in the sample was 49.1%.

Baseline HDS-based impairment rate
The rates of impairment at baseline by HDS testing was 36.4% (cutoff ≤ 14).

HDS re-test reliability
HDS re-test reliability was high in both the sub-sample with stable NP performance and the entire sample (r=0.76; p<0.0001).

HDS-based decline prevalence
The raw data of the HDS are presented in Table 2. At baseline the mean total raw score was 14.0 (SD=2.66) while at follow-up the mean total raw score was 14.2 (SD=2.60) and this was not statistically different. The prevalence of HDS-based decline was 21.8% ( Figure 1).

NP standard vs. HDS change status
The HDS had 57% sensitivity and 82% specificity in detecting decline when compared to NP gold standard decline. The correct classification ratio was 79%, positive predictive value was 33%, and negative predictive value was 93%. Four cases were congruently identified as decliners between the HDS and the gold standard NP testing (Figure 2).

Predictors of HDS-based decline
When considering the subtest scores of the HDS at baseline, we found that the memory recall subtest was significantly lower in the decliners versus non-decliners (p<0.04). However when considering the baseline HDS cut-off (impaired vs. unimpaired) or the HDS total score, there was no significant difference between the decliners and non-decliners (see Table 3).
Finally, when comparing the HDS-based decliners and nondecliners on demographic, HIV and clinical markers, we found that having a HAND diagnosis at baseline and having a more severe HAND diagnosis was associated with a greater chance of decline on the HDS (p<0.03) ( Table 3). There were no other differences.

Discussion
There are four main study findings. First, the HDS shows excellent re-test reliability over a 3.9 month test re-test interval. Second, while the HDS regression-based change scores method detected decline in individuals (21.8%), the sensitivity of this method (57%) was restricted to the participants who declined at least moderately (-2 SD below the HDS-mean change score of zero) while its specificity was adequate (82%). This means that the HDS can be used to detect moderate to severe neurocognitive decline, but another strategy such as a standard NP assessment is needed when milder levels of neurocognitive decline are likely. Third, having baseline HAND and HAND of greater severity were associated with HDS-based decline, but not overall impairment on the HDS. Only lower performance on the baseline HDS memory subtest was associated with decline. Fourth, no baseline HIV biomarkers were associated with HDS-based decline.
Our study yielded a similar test re-test reliability compared to the HDS initial study, (r=0.76 versus r=0.87) [23] confirming a very good test-retest reliability and extended it over a mean 3.9 month period.
Our study is the first to quantify neurocognitive decline based on the HDS, and this may provide an indication as to which patients require further prompt investigation as in the framework of our neuroalgorithm [22]. It is likely that this decline was associated with progressing HIVrelated brain injury as other clinical and psychiatric confounds had been excluded. Supporting this interpretation is that having HAND at baseline and a more severe form of HAND were associated with declining HDS performance (6.3% of those with ANI declined, 28.6% of those with MND declined, and 50% of those with HAD declined). These data tentatively corroborate that a HAND diagnosis is associated with greater likelihood of neurocognitive deterioration even while on cART [19,20].
The fact that the baseline HDS memory subtest (the capacity to recall four words after a short interval) was the only sensitive test to decline shows that assessment of verbal learning and memory need to be included in follow-up assessment of HAND including at the screening level [20]. NP studies assessing which cognitive domains are most sensitive to decline in HIV+ individuals have however robustly identified decrease in speed of information processing as the primary deteriorating function [13,36]. It could explain why the HDS was not more sensitive to decline because the HDS does not have a subtest that assesses this core function in HAND.
We could not detect any association with baseline HIV traditional biomarkers suggesting that the progression of HAND, in chronic HIV infection, is somewhat dissociated from baseline systemic disease markers [37,38]. Reliable biomarkers for HAND remain elusive. In virologically suppressed patients, current CD4 levels and viral suppression have been found to be unreliable markers for HAND, which may occur in up to 21% despite suppression of plasma and cerebrospinal fluid (CSF) HIV RNA [37,38].
However, the follow-up HIV biomarkers were not available at the time of the follow-up HDS and it is still possible that changes in those could have correlated with HDS decline. This warrants larger studies to assess the effects of co-varying HIV biomarkers on screening instruments used longitudinally.
While we were able to quantify HDS-based decline, it was overestimated by 9.1% compared to the gold standard NP decline. Because some participant's baseline HDS was at the same time as the follow-up gold standard assessment, it is possible that some cases declined after their last NP standard assessment. This is a possibility, but it remains unlikely because the gold standard assessment uses state of the art methods to detect cognitive change [22]. Stability using such procedures is robust and has been observed to span several years especially in those who are optimally treated and virally undetectable as were the participants included in this study [39]   those who declined on the HDS but not on NP testing, deteriorated neurocognitively between the time of their last NP assessment and the HDS follow-up visit.
HDS-based neurocognitive decline was predicted in a total of 12 participants, among whom eight had stable performance on the standard NP testings. When closely inspecting the HDS performance we found two possible explanations for this discrepancy: (1) The HDS and NP standard decline were congruent only in those with moderate decline (-2 SD below the HDS-mean change score of zero). This represents a large decline in the HDS raw score of at least 3-4 points. (2) In contrast, the other eight cases raw score HDS change ranged only from 0-2 meaning that change was of a small magnitude at best. In this case improvement based on the HDS change-score was probably predicted (as a correction for expected practice effect) but did not happen; hence cases were wrongly classified as decliners. In other words, the regression-based change score method was not operating optimally on the HDS because the range of values is too restricted, and this is despite having applied transformations to approximate the Normal distribution.
Overall the HDS has the most utility as a monitoring tool in those who have already been diagnosed with HAND (in particular MND and HAD) and are at risk of neurocognitive deterioration of a moderate degree. Our findings extend cross-sectional findings showing that the utility of the HDS is greater for MND and HAD [40,41]. In future studies, it would be important to assess if these results are confirmed in a sample that includes HIV+ individuals with a wider range of HIV disease stages, more women, and persons with a greater range of education levels and ethnic background to be more representative of global HIV epidemic characteristics. It would be also important to assess if these results can be reproduced using the International-HDS [42]. Future studies will be needed to assess if recently developed non-parametric longitudinal statistical models could improve the predictions of individual neurocognitive change on screening scales which have a limited range of values such as the HDS.