Lesley S Park1,2*, Janet P Tate3,4, Maria C Rodriguez-Barradas5,6, David Rimland7,8, Matthew Bidwell Goetz9,10, Cynthia Gibert11,12, Sheldon T Brown13,14, Michael J Kelley15,16,17, Amy C Justice3,4 and Robert Dubrow1,2
Received Date: April 04, 2014; Accepted Date: June 23, 2014; Published Date: June 30, 2014
Citation: Park LS, Tate JP, Rodriguez-Barradas MC, Rimland D, Goetz MB, et al. (2014) Cancer Incidence in HIV-Infected Versus Uninfected Veterans: Comparison of Cancer Registry and ICD-9 Code Diagnoses. J AIDS Clin Res 5:318. doi:10.4172/2155-6113.1000318
Copyright: © 2014 Park LS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of AIDS & Clinical Research
Background: Given the growing interest in the cancer burden in persons living with HIV/AIDS, we examined the validity of data sources for cancer diagnoses (cancer registry versus International Classification of Diseases, Ninth Revision [ICD-9 codes]) and compared the association between HIV status and cancer risk using each data source in the Veterans Aging Cohort Study (VACS), a prospective cohort of HIV-infected and uninfected veterans from 1996 to 2008.
Methods: We reviewed charts to confirm potential incident cancers at four VACS sites. In the entire cohort, we calculated cancer-type-specific age-, sex-, race/ethnicity-, and calendar-period-standardized incidence rates and incidence rate ratios (IRR) (HIV-infected versus uninfected). We calculated standardized incidence ratios (SIR) to compare VACS and Surveillance, Epidemiology, and End Results rates.
Results: Compared to chart review, both Veterans Affairs Central Cancer Registry (VACCR) and ICD-9 diagnoses had approximately 90% sensitivity; however, VACCR had higher positive predictive value (96% versus 63%). There were 6,010 VACCR and 13,386 ICD-9 incident cancers among 116,072 veterans. Although ICD-9 rates tended to be double VACCR rates, most IRRs were in the same direction and of similar magnitude, regardless of data source. Using source, all cancers combined, most viral-infection-related cancers, lung cancer, melanoma, and leukemia had significantly elevated IRRs. Using ICD-9, eight additional IRRs were significantly elevated, most likely due to false positive diagnoses. Most ICD-9 SIRs were significantly elevated and all were higher than the corresponding VACCR SIR.
Conclusions: ICD-9 may be used with caution for estimating IRRs, but should be avoided when estimating incidence or SIRs. Elevated cancer risk based on VACCR diagnoses among HIV-infected veterans was consistent with other studies.
Neoplasms; Registries; International Classification of Diseases; HIV Infections
ICD-9: International Classification of Diseases Ninth Revision; IRR: Incidence Rate Ratio; SIR: Standardized Incidence Ratio; VACCR: Veterans Affairs Central Cancer Registry; VACS: Veterans Aging Cohort Study
While the incidence of AIDS-defining cancers (ADC) has decreased among persons living with HIV/AIDS (PLWHA) in the antiretroviral therapy era, the number of non-AIDS-defining cancer (NADC) diagnoses is increasing, mainly due to the growth and aging of the PLWHA population . Furthermore, NADC risk among PLWHA compared to uninfected persons is elevated [2-4]. Given the growing interest in the cancer burden in PLWHA, we sought to examine how different sources of cancer diagnosis data influence epidemiologic analyses.
Studies of cancer incidence among PLWHA have used International Classification of Diseases, Ninth Revision (ICD-9) codes in claims databases to identify cancer diagnoses [6-8]. Claims data are primarily used for administrative purposes, such that their accuracy for epidemiologic investigations is variable. Investigators have assessed the validity of various algorithms usingICD-9 diagnosis and/or procedure codes for ascertaining incident cancer cases, and have observed varying sensitivity and positive predictive value (PPV) [9-14], with the expected tradeoff between the two [9,11,14]. Reasons for false positive incident cancer diagnoses include ascertainment of prevalent cancers, incorrect coding of cancer site, benign or in situ cancers incorrectly classified as malignant, and rule out or provisional diagnoses [9,11,14]. Although the false positive rate may be reduced using procedure codes as evidence of cancer treatment [9,11,14], systematic differences between patients who do and do not receive treatment may introduce bias. Such bias would be of particular concern for comparisons of cancer incidence in HIV-infected and uninfected persons, since HIV-infected cancer patients are less likely than uninfected patients to be treated [15,16].
Other studies of cancer incidence among PLWHA have used cancer registries to identify cancer diagnoses [1,17,18]. Registries aim to collect standardized information on all cancer diagnoses that occur within a defined population, such as residents of a defined geographic area [19,20]. They require substantial resources, but meet stringent quality standards .
To our knowledge, the impact on the results of epidemiologic analyses of using registry versus claims data for cancer diagnoses has not been studied. To assess this impact on studies of cancer incidence in HIV-infected versus uninfected persons, we compared findings using registry versus claims data to identify incident cancer diagnoses in the Veterans Aging Cohort Study (VACS), which includes HIV-infected veterans and demographically-matched uninfected veterans in the US. Previously, ICD-9 code diagnoses were used to study cancer incidence in this cohort [7,22]; subsequently VACS was linked to the Veterans Affairs Central Cancer Registry (VACCR), which seeks to identify all cancer cases diagnosed or treated at Veterans Health Administration (VHA) medical centers . We hypothesized that VACCR diagnoses would have greater validity than ICD-9 code diagnoses and that the two data sources would lead to different study conclusions.
VACS consists of two cohorts. The larger VACS Virtual Cohort (VACS-VC) is based on national VHA databases (e.g., demographic, vital status, inpatient and outpatient encounters with associated diagnosis and procedure codes, pharmacy, laboratory results) with no patient-contact . VACS-VC enrolls HIV-infected veterans when they begin HIV care in the VHA system and two matched uninfected patients also in VHA care. The VACS Eight Site Study (VACS-8)  recruits HIV-infected patients at Infectious Disease clinics(a subset of VACS-VC) and frequency-matched uninfected patients from the General Internal Medicine clinics at eight VHA facilities. VACS- 8 subjects complete surveys and provide consent to access medical charts. In both cohorts, subjects are matched on age, race/ethnicity, gender, and clinical site.VHA Connecticut Healthcare System and Yale University Institutional Review Boards have approved both cohorts and have granted VACS-VC a waiver of informed consent.
VACCR, which adheres to North American Association of Central Cancer Registries standards, maintains a database of cancer cases diagnosed or treated at the VHA, aggregated from local VHA medical center registries . We linked VACS and VACCR to ascertain incident cancer cases. We then mapped ICD for Oncology, third edition (ICD-O-3)  topography and morphology codes from VACCR records and ICD-9 cancer diagnosis codes from VHA inpatient and outpatient encounter files to specific cancer types, consistent with Surveillance, Epidemiology, and End Results (SEER) recoding algorithms [27,28]. We required ICD-9 code diagnoses to be associated with at least one inpatient or two outpatient encounters on separate dates for the same cancer type, similar to methods used in prior VACS analyses [7,22,29]. The diagnosis date was the date provided by VACCR for VACCR diagnoses or the earliest ICD-9 code date for ICD-9 code diagnoses.
We ran parallel analyses to compare the findings using VACCR versus ICD-9 codes as the cancer diagnosis source. First, we assessed the validity of each data source for cancer-type-specific diagnoses using medical chart review as the gold standard; next, we compared HIV-status-specific, type-specific incidence rates and the association of HIV infection with cancer risk; and finally, we calculated HIVstatus- specific, type-specific standardized incidence ratios (SIRs) by comparing observed and expected numbers of cases based on SEER rates. Tables for IRR and SIR analysis results include ADC, cancer types established to be associated with HIV, and select common NADC. The appendix includes complete results for all specific cancer types. Statistical analyses were performed using SAS version 9.2 .We defined statistical significance as p<0.05 (two-sided).
Validity of VACCR and ICD-9 code diagnoses by chart review
We identified all potential incident cancer cases (excluding nonmelanoma skin cancers) from 1999-2008 for VACS-8 subjects at four VACS-8 sites with available electronic medical charts (Los Angeles, Pittsburgh, Bronx, Houston). Potential cases included all diagnoses identified from four sources: ICD-9 codes, VACCR, local VHA hospital cancer registries, or electronic medical record text searches for malignancy-related terms (“malignant”, “cancer”, “lymphoma”, “carcinoma”, etc.). We then adjudicated each potential case from the four sources by reviewing pathology, cytology, operative, radiology, and progress reports, as well as other records from the electronic medical chart. We compared sensitivity, specificity, PPV, and negative predictive value between VACCR and ICD-9 code diagnoses for all cancers combined and for specific cancer types, overall and by HIV status.
Impact of VACCR and ICD-9 code diagnoses on cancer incidence rates and rate ratios
Using VACS-VC data from 1996-2008, we compared the impact of the two data sources on HIV-status-specific cancer incidence rates and on rate ratios (HIV-infected vs. uninfected). We conducted each analysis for all cancers combined and for specific cancer types. We calculated observation time from 180 days post-baseline (to remove prevalent cases) to the date of cancer diagnosis, death, last healthcare utilization date, or December 31, 2008. In the all-cancer analysis, we used the first incident primary cancer diagnosis per subject as the endpoint. For specific cancer types, the endpoint was the first typespecific diagnosis.
We calculated age-, sex-, race/ethnicity-, and calendar-periodstandardized incidence rates (IR) using the person-year distribution of the entire cohort as the standard weights . We used exact persontime data calculation methods where time-dependent variables (i.e., age, calendar period) are classified at each day of observation . The categories used for standardization were the same as those presented in Table 1.To quantify the association of HIV with cancer risk, we calculated incidence rate ratios (IRR) (HIV-infected versus uninfected) and 95% confidence intervals (CI) . For the remainder of the text, IR and IRR refer to calculations using standardized rates. To examine the effects of missed cases due to healthcare outside the VHA system, we used the one inpatient/two outpatient encounter algorithm to count additional ICD-9 code cancer diagnoses from Medicare and Medicaid encounters to recalculate IRs and IRRs.
|Covariate||HIV+ (N=38,123)||HIV- (N=77,949)|
Table 1: Baseline characteristics of VACS Virtual Cohort subjects who contributed observation time.
Impact of VACCR and ICD-9 code diagnoses on SIRs
We calculated SIRs with Wald confidence limits  to compare cancer incidence in VACS with the SEER program cancer registries, which are representative of the US population . For each cancer type, we multiplied age-, sex-, race/ethnicity-, and calendar-periodspecific SEER incidence rates  obtained using SEER*Stat version 8.0.4  by the stratified person-time in VACS-VC to calculate the expected number of cases. We selected the first type-specific diagnosis for type-specific SEER incidence rates and the first cancer diagnosis of any type for the all-cancer SEER incidence rates.
There were 124,936 VACS-VC subjects enrolled during 1996-2008, of whom 116,072 contributed observation time to this investigation (i.e., had observation time after 180 days post-baseline). Subjects who contributed observation time were predominantly male (98%); about half were non-Hispanic black and about two-fifths were non-Hispanic white; approximately 70% were 35-54 years of age at baseline; and more than half entered the cohort in 1996-1999 (Table 1). The median observation time was 6.0 years for HIV-infected subjects and 7.7 years for uninfected subjects. The medical chart review from the four VACS- 8 sites included 3,222 subjects, whose baseline characteristics were similar to VACS-VC subjects, except that about 90%of VACS-8 subjects were enrolled during 2000-2005.
Validity of VACCR and ICD-9 code diagnoses by chart review
Medical chart review confirmed a total of 252 incident cancers in the four VACS-8 sites (Table 2). VACCR identified 229 potential cancers, and ICD-9 codes identified 369 potential cancers. Using chart review as the gold standard, overall VACCR type-specific sensitivity was slightly lower than ICD-9 code sensitivity(87% versus92%), but overall VACCR type-specific PPV was much higher(96% versus 63%), with 136 false positive ICD-9 code diagnoses, but only 10 false positive VACCR diagnoses. When we excluded 28 “cancers of unspecified site,” theICD-9 code PPV remained low (68%).
|Source of diagnosis||Chart review|
|Incident cancer||No cancer||Total|
|VACCR||Incident cancer||219||10||229||Sensitivity=219/252=87% Specificity=2982/2992=99.67% PPV =219/229=96% NPV =2982/3015=99%|
|ICD-9||Incident cancer||233||136||369||Sensitivity=233/252=92% Specificity =2926/3062=96% PPV =233/369=63% NPV =2926/2945=99%|
PPV=Positive Predictive Value, NPV=Negative Predictive Value
*For subjects with >1 potential cancer, each cancer was counted separately. Subjects with no potential cancers were counted once. ICD-9 codes identified more subjects
with >1 potential cancer, resulting in a higher total.
†False negative diagnoses include those with no cancer diagnosis according to VACCR or ICD-9 code, respectively, as well as those with incorrect coding of the cancer
type (two VACCR diagnoses and nine ICD-9 code diagnoses).
Table 2:Validity of Veterans Affairs Central Cancer Registry (VACCR) versus ICD-9 code cancer-type-specific diagnoses, with medical chart review as the gold standard*.
VACCR diagnosis sensitivity and PPV did not meaningfully differ by HIV status (Table 3). ICD-9 code diagnosis sensitivity also did not differ by HIV status, but PPV was lower for HIV-infected subjects (59% versus 69%). VACCR sensitivity was below 90% for four of the nine common cancer types shown in Table 3 (anal, liver, lung, Kaposi sarcoma), whereas ICD-9 code sensitivity was below 90% for only two of these cancer types (anal, leukemia). However, VACCR PPV was at least 90% for all of the common cancer types except oral cavity and pharynx, whereas ICD-9 code PPV was below 90% for all but non- Hodgkin lymphoma. In all cases, PPV was lower using ICD-9 code than VACCR; the lowest ICD-9 code PPVs were for oral cavity and pharynx (23%) and colorectal (34%) cancers.
|Oral cavity and pharynx||3/3=100%||3/6=50%||3/3=100%||3/13=23%|
|Other specified cancers†||45/51=88%||45/46=98%||47/51=92%||47/89=53%|
|Cancers of unspecified site||2/3=67%||2/4=50%||2/3=67%||2/28=7%|
*Cancer types with at least 10 VACCR or ICD-9 code diagnoses
†Cancer-type specific sensitivity and PPV for the following cancer types combined: esophagus, stomach, small intestine, biliary tract, pancreas, peritoneum and
retroperitoneum, larynx, pleura, melanoma, breast, uterus or corpus, penis, testicular, bladder, kidney, brain and nervous system, thyroid, Hodgkin lymphoma, multiple myeloma
Table 3: Sensitivity and positive predictive value (PPV) of Veterans Affairs Central Cancer Registry (VACCR) and ICD-9 code diagnoses by HIV status and for common cancer types* (chart review as gold standard).
We compared the cancer diagnosis dates determined by the VACCR and ICD-9 codes (date of first code for the cancer type of interest) with the diagnosis date determined by the gold standard chart review. The VACCR diagnosis date was identical to the gold standard diagnosis date for 192 (88%) of the 219 cancer diagnoses identified by VACCR and confirmed by chart review. Of the 27 VACCR diagnosis dates that were incorrect, 24 (89%) were within one year of the gold standard date (median: 15 days, interquartile range: 4-54 days). The ICD-9 code diagnosis date was identical for 166 (71%) of the 233 cancer diagnoses identified by ICD-9 code and confirmed by chart review. Similar to VACCR, of the 67 ICD-9 diagnosis dates that were incorrect, 60 (90%) were within one year of the gold standard date (median: 28 days, interquartile range: 6-116 days).
Impact of VACCR and ICD-9 code diagnoses on cancer incidence rates and rate ratios
In the VACS-VC, there were 6,010 VACCR incident cancer diagnoses (5,615 first primary cancers of any type) and 13,386 ICD- 9 code incident cancer diagnoses (9,807 first primary cancers of any type). All-cancer IRs per 100,000 person-years were about twice as high using ICD-9 code (IRHIV+=2,120, IRHIV-=1,087) compared to VACCR (IRHIV+=1,103, IRHIV-=571) diagnoses, regardless of HIV status (Table 4). For a few cancer types (i.e., trachea and mediastinum, bone and joint, soft tissue, eye, and brain and nervous system cancers), number of diagnoses and consequent IRs were on the order of five- to fiftyfold higher using ICD-9 code compared to VACCR (Appendix Table A1). The most extreme instance was cervical cancer, with no VACCR cases, but 14 ICD-9 code cases (Table 4).VACCR indicated that four of these cases were intraepithelial (i.e., in situ) cancers. Furthermore, a search for procedure codes (e.g., hysterectomy) and pharmacy records (e.g., cisplatin) that would suggest treatment of invasive cervical cancer provided insufficient evidence to definitely support an incident invasive cervical cancer diagnosis among any of the 14 ICD-9 code cases.
|# cases||IR||IRR||95% CI||# cases||IR||IRR||95% CI|
|All cancers*||+||2,421||1,103.0||1.93||(1.82, 2.05)||4,282||2,119.6||1.95||(1.86, 2.05)|
|Oral cavity & pharynx||+||120||51.3||1.30||(1.03, 1.64)||286||123.6||1.65||(1.40, 1.94)|
|Colorectal||+||124||54.8||1.22||(0.97, 1.52)||376||162.5||1.99||(1.70, 2.32)|
|Anal||+||149||62.0||28.14||(16.06, 49.32)||281||118.1||20.78||(14.24, 30.31)|
|Liver||+||173||75.9||3.27||(2.50, 4.29)||272||118.9||2.70||(2.21, 3.31)|
|Lung||+||445||195.8||1.87||(1.63, 2.14)||647||284.9||1.84||(1.64, 2.06)|
|Melanoma||+||46||19.5||2.21||(1.40, 3.49)||110||47.3||1.66||(1.27, 2.17)|
|Prostate||+||396||181.1||0.96||(0.85, 1.07)||588||276.4||0.98||(0.89, 1.08)|
|Penis||+||11||4.8||4.59||(1.38, 15.27)||24||10.0||5.11||(2.19, 11.94)|
|+||75||32.3||9.08||(5.09, 16.19)||136||58.4||6.63||(4.50, 9.78)|
|+||315||133.8||6.93||(5.35, 8.96)||627||271.1||6.68||(5.57, 8.00)|
|Leukemia||+||51||22.2||2.02||(1.33, 3.08)||90||39.9||1.45||(1.10, 1.92)|
|Kaposi sarcoma||+||227||94.4||519.61||(229.02, 1178.9)||528||223.3||180.31||(115.18, 282.26)|
*Includes cancers of unspecified type
Table 4: Age-, sex-, race/ethnicity-, and calendar-period-standardized incidence rates (IR, per 100,000 person-years), incidence rate ratios (IRR) and 95% confidence intervals (95% CI) for Veterans Affairs Central Cancer Registry (VACCR) diagnoses and Veterans Health Administration (VHA) ICD-9 code diagnoses for selected cancer types.
|Cancer type||HIV status||Observed # cases||Expected # cases||SIR||(95% CI)||Observed # cases||Expected # cases||SIR||(95% CI)|
|All cancers†||+||2421||1407||1.72||(1.65, 1.79)||4282||1277||3.35||(3.25, 3.45)|
|-||3194||3669||0.87||(0.84, 0.90)||5525||3401||1.62||(1.58, 1.67)|
|Oral cavity & pharynx||+||120||61||1.96||(1.64, 2.35)||286||61||4.71||(4.19, 5.29)|
|-||227||154||1.48||(1.30, 1.68)||427||153||2.79||(2.53, 3.06)|
|Colorectal||+||124||164||0.76||(0.64, 0.90)||376||162||2.31||(2.09, 2.56)|
|-||264||423||0.62||(0.55, 0.70)||471||420||1.12||(1.02, 1.23)|
|Anal||+||149||6||23.05||(19.63, 27.07)||281||6||43.56||(38.75, 48.96)|
|-||13||16||0.82||(0.48, 1.42)||33||16||2.09||(1.49, 2.94)|
|Liver||+||173||46||3.80||(3.28, 4.41)||272||45||5.98||(5.31, 6.74)|
|-||139||118||1.18||(1.00, 1.39)||259||118||2.20||(1.95, 2.48)|
|Lung||+||445||216||2.06||(1.88, 2.26)||647||214||3.02||(2.79, 3.26)|
|-||614||564||1.09||(1.01, 1.18)||894||561||1.59||(1.49, 1.70)|
|Melanoma||+||46||44||1.04||(0.78, 1.39)||110||44||2.50||(2.07, 3.01)|
|-||51||110||0.46||(0.35, 0.61)||162||109||1.48||(1.27, 1.73)|
|Prostate||+||396||571||0.69||(0.63, 0.77)||588||558||1.05||(0.97, 1.14)|
|-||1085||1496||0.73||(0.68, 0.77)||1555||1456||1.07||(1.02, 1.12)|
|Penis||+||11||2||5.91||(3.27, 10.67)||24||2||12.90||(8.64, 19.24)|
|-||6||5||1.24||(0.56, 2.75)||11||5||2.27||(1.26, 4.10)|
|Hodgkin lymphoma||+||75||9||8.24||(6.57, 10.33)||136||9||14.97||(12.65, 17.71)|
|-||20||21||0.93||(0.60, 1.44)||49||21||2.29||(1.73, 3.02)|
|Non-Hodgkin lymphoma||+||315||67||4.69||(4.20, 5.23)||627||67||9.42||(8.71, 10.19)|
|-||113||169||0.67||(0.55, 0.80)||229||169||1.36||(1.19, 1.54)|
|Leukemia||+||51||35||1.44||(1.10, 1.90)||90||35||2.55||(2.07, 3.14)|
|-||63||90||0.70||(0.54, 0.89)||156||90||1.73||(1.48, 2.02)|
|Kaposi sarcoma||+||227||8||28.07||(24.65, 31.97)||528||8||65.92||(60.53, 71.79)|
|-||1||18||0.05||(0.01, 0.39)||7||18||0.38||(0.18, 0.81)|
*Expected numbers of cases are rounded to the nearest whole number. SIR calculations used the exact expected number of cases (unrounded). †Includes cancers of unspecified type
Table 5: Observed and expected numbers of cancer cases, standardized incidence ratios (SIR), and 95% confidence intervals (95% CI) for Veterans Affairs Central Cancer Registry (VACCR) diagnoses and Veterans Health Administration (VHA) ICD-9 code diagnoses for selected cancer types*.
The all-cancer IRRs (HIV-infected vs. uninfected) were 1.93 (95% CI: 1.82, 2.05) for VACCR and 1.95 (95% CI: 1.86, 2.05) for ICD-9 code (Table 4). Most type-specific IRRs were in the same direction and of similar magnitude, regardless of data source. Using either VACCR or ICD-9 code diagnoses, most viral-infection-related cancers (oral cavity and pharynx, anal, liver, and penis cancers; Hodgkin and non- Hodgkin lymphoma; and Kaposi sarcoma), lung cancer, melanoma, and leukemia had significantly elevated IRRs. Using ICD-9 code diagnoses, colorectal, biliary tract, nasal cavity and middle ear, bone and joint, soft tissue, cervical, brain and nervous system cancers, and multiple myeloma were also significantly elevated (Table 4; Appendix Table A1).Among ADC, the magnitude of significantly elevated IRRs ranged from about 7 for non-Hodgkin lymphoma to 180 (ICD-9) to 520 (VACCR) for Kaposi sarcoma. Significantly elevated NADC IRRs ranged from 1.30 (oral cavity and pharynx cancer) to almost 30 (anal cancer) using VACCR diagnoses and from 1.45 (leukemia) to about 21 (anal cancer) using ICD-9 code diagnoses.
In our secondary analysis of outside utilization with Medicare/ Medicaid, we found that 48% of HIV-infected subjects and 36% of uninfected subjects accessed Medicare/Medicaid services between 1997 and 2008. In the all-cancer analysis, the addition of Medicare/Medicaid diagnoses increased ICD-9 code first primary cancer diagnoses of any type by 19% for HIV-infected subjects, but only by 11% for uninfected subjects (Appendix Table A1). Therefore, adding Medicare/Medicaid ICD-9 code diagnoses to VHA ICD-9 code diagnoses resulted in higher IRRs (for all cancers combined and for specific cancer types). However, most results were qualitatively similar to results using VHA ICD-9 code diagnoses alone.
Impact of VACCR and ICD-9 code diagnoses on SIRs
The greater number of observed ICD-9 code diagnoses compared to VACCR diagnoses resulted in more censoring based on cancer diagnosis date for ICD-9 code diagnoses, yielding less observation time and consequently a lower number of expected cancer cases based on SEER rates (HIV+: 1,277 versus 1,407 cases; HIV-: 3,401 versus 3,669 cases) (Table 5). The all-cancer SIR for HIV-infected subjects was almost twice as high for ICD-9 code diagnoses (SIR=3.35; 95% CI: 3.25, 3.45) than for VACCR diagnoses (SIR=1.72; 95% CI: 1.65, 1.79). For uninfected subjects, VACCR identified fewer cancers than expected (SIR=0.87, 95% CI: 0.84, 0.90), but ICD-9 code diagnoses identified more cancers than expected (SIR=1.62, 95% CI: 1.58, 1.67).
Using VACCR diagnoses, with the exception of melanoma, cancer types with significantly elevated IRRs (most viral-infection-related cancers, lung cancer, and leukemia) in HIV-infected subjects also had significantly elevated SIRs (Table 5). Biliary tract, nasal cavity and middle ear, larynx, and pleura cancers were also significantly elevated (Appendix Table A2). For uninfected subjects, SIRs were significantly elevated for oral cavity and pharynx, esophagus, larynx, lung, and pleura cancers, but the majority of SIRs were less than or around 1.0.
Using ICD-9 code diagnoses, most SIRs for both HIV-infected and uninfected subjects were significantly elevated (Table 5; Appendix Table A2). Furthermore, all of the ICD-9 SIRs were greater than the corresponding VACCR SIRs. The strikingly higher number of ICD-9 code than VACCR observed cases for trachea and mediastinum, bone and joint, soft tissue, brain and nervous system, and cervical cancers resulted in markedly higher SIRs by ICD-9 code than by VACCR.
This investigation not only reviewed the validity of cancer registry and ICD-9 code diagnoses compared to medical chart review, but also examined the impact of these data sources in a study of the association between HIV status and cancer risk. As expected, we found VACCR diagnoses to be more valid than ICD-9 code diagnoses. With chart review as gold standard, we found that both VACCR and ICD-9 code diagnoses exhibited high sensitivity, missing only about 10% of cancer cases. However, the validity of the two data sources diverged when we evaluated PPV, which was appreciably lower using ICD-9 codes. Furthermore, the VACCR cancer diagnosis date was more accurate than the ICD-9 code diagnosis date, with 88% of VACCR diagnosis dates, but only 71% of ICD-9 diagnosis dates, in exact agreement with the gold standard. Based on these results, we can adopt VACCR as our preferred data source for cancer epidemiologic analyses, and can predict that the low PPV (and resultant high proportion of false positive cases) for ICD-9 code diagnoses will lead to artifactually inflated estimates of IRs and SIRs.
Our analyses using VACCR diagnoses were broadly consistent with previous findings [2,3], providing face validity for our adoption of VACCR diagnoses in future analyses. First, the incidence of the two main ADC, Kaposi sarcoma and non-Hodgkin lymphoma, was appreciably elevated among PLWHA as shown by both IRRs and SIRs. Second, our all-cancer IRR, as well as our SIR for PLWHA, were both almost two, similar to the result for all-NADC combined in a metaanalysis of the incidence of NADC in PLWHA compared to the general population . Third, the incidence of oral cavity and pharynx, anal, liver, lung, and penis cancers, Hodgkin lymphoma, and leukemia, NADCs well-established to be elevated among PLWHA [2,3], was significantly elevated among PLWHA according to both IRRs and SIRs.
Consistent with our prediction, the all-cancer IR was about twofold higher using ICD-9 code than VACCR diagnoses. However, for some cancer types (e.g., bone and joint and brain and nervous system), the number of diagnoses was many-fold higher using ICD-9 codes, resulting in IRs (and SIRs) that were biased upward many-fold. Interestingly, in spite of the inflated IRs using ICD-9 code diagnoses, many type-specific IRRs were similar across the two data sources, apparently due to similar type-specific multiplicative inflation factors (ICD-9 IRs versus VACCR IRs) across HIV status. Consistent with this observation, the 2009 VACS study that used ICD-9 code cancer diagnoses reported IRs that we now know to be inflated. However, IRRs for anal, liver, lung, and prostate cancers, Hodgkin lymphoma, non- Hodgkin lymphoma, and Kaposi sarcoma were of similar magnitude and significance to the VACCR IRRs reported here ; only the cervical cancer IRR was notably erroneous. Furthermore, in the current study, IRRs for other cancer types differed across the two data sources. For example, for colorectal cancer, which is generally found not to have elevated incidence in PLWHA, the VACCR IRR was 1.22 (95% CI: 0.97, 1.52), whereas the ICD-9 code IRR was 1.99 (95% CI: 1.70, 2.32) (Table 4). Finally, for other cancer types, such as biliary tract and nasal cavity and middle ear, the magnitude of the IRR was similar across the two data sources, but the IRR based on ICD-9 code diagnoses was statistically significant, whereas the IRR based on VACCR diagnoses was not.
The effect of low PPV and large number of false positive ICD-9 code diagnoses was more evident in our SIR analysis, with ICD-9 code diagnoses resulting in inflated SIRs, significantly greater than 1.0, for most cancer types in both HIV-infected and uninfected subjects. The all-cancer SIR of 3.35 for HIV-infected subjects was unrealistically high . Although we would expect cancer incidence in uninfected veterans to be higher than in the general US population due to higher prevalence of certain cancer risk factors, such as smoking  and hepatitis C virus infection , the all-cancer SIR of 1.62 was also unrealistically high. Furthermore, cancer types that appear to be unrelated to HIVinfection, including colorectal, pancreas and bladder cancers , exhibited significantly elevated SIRs among PLWHA.
The inflated ICD-9 code SIRs, which resulted from comparing cancer incidence rates measured by ICD-9 code in the VACS to cancer incidence rates measured by cancer registry in the general US population, highlight the danger of comparing disease risk in two populations for which the method of case ascertainment differed. On the other hand, the VACCR SIRs, which we established to be more valid, compared cancer incidence rates measured by cancer registry in both populations.
This study had appreciable strengths. Few studies have compared both registry and claims data to a medical chart review; and to our knowledge, no study has examined data source impact on epidemiologic analyses. VACS has the strengths of a very large sample size in the VACS-VC, providing generally high statistical power for epidemiologic analyses, combined with the smaller VACS-8 sample, providing indepth medical record data for validation analyses. Thus, this study was able to review the fine detail of cancer data accuracy with chart review, while also calculating IRs, IRRs, and SIRs for many specific cancer types in the larger sample.
Our study also had limitations. Since our cohort was 98% male, we had scant statistical power to assess female cancer types and were unable to generalize our results to women. In addition, we did not review the medical chart for each of the 3,222 subjects from the four VACS-8 sites. However, we believe that the text search for malignancyrelated terms should have rendered all of the subjects with potential incident cancer diagnoses.
As demonstrated in this study, analyses that use VACCR data will underestimate cancer incidence. First, although the VACCR diagnoses exhibited a relatively high sensitivity, VACCR did miss 13% of cases. Second, the all-cancer SIR for uninfected subjects was significantly below 1.0, suggesting that VACCR is missing cancer cases since we would expect cancer incidence for uninfected veterans to be greater than in the general population. Finally, we found that adding Medicare/ Medicaid ICD-9 code diagnoses to VHA ICD-9 code diagnoses resulted in 19% more diagnoses among HIV-infected veterans and 11% more diagnoses among uninfected veterans. Although some of these diagnoses were certainly false positives, many would be true positives not captured by VACCR.
The underestimation of cancer incidence in the VACS-VC using VACCR diagnoses would result in SIRs that are biased downward. Given the differential outside utilization with Medicare/Medicaid apparently resulting in a greater proportion of outside cancer diagnoses among HIV-infected veterans, we can project that IRRs would be biased downward as well. Unfortunately, we were unable to directly show how outside utilization may affect IRRs based on VACCR diagnoses.
In conclusion, ICD-9 codes are a convenient source of data because of low cost and availability, especially since linking cancer registry data to HIV cohorts is not always feasible. However, the low PPV of ICD- 9 code diagnoses resulted in overestimation of incident cancers cases, IRs, and SIRs, such that ICD-9 code diagnoses should not be used to estimate these measures. Although VACCR and ICD-9 code IRRs tended to be similar, there were meaningful exceptions, indicating that ICD-9 code IRR estimates should be interpreted with caution. Elevated cancer risk based on VACCR diagnoses among HIV-infected veterans was consistent with other studies. Cancer registry diagnoses should be used for epidemiologic analyses whenever feasible.
We are grateful to the Veterans Affairs Central Cancer Registry (VACCR) for linking the VACS with the VACCR database and providing us with a dataset containing the linked records. This study would not have been possible without the VACCR’s generous assistance.
Sources of support: Research reported in this publication was supported by the US Department of Veterans Affairs and by grants from the National Institute of Mental Health (5T32-MH020031, P30-MH062294), the National Institute on Alcohol Abuse and Alcoholism (1U01-AA020790, U24-AA020794, U10-AA013566), National Institute of Allergy and Infectious Diseases (5U01-A1069918), and the National Cancer Institute (F31-CA180775) of the National Institutes of Health.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the Department of Veterans Affairs.