Lessons Learned in Dealing with Missing Race Data: An Empirical Investigation
- *Corresponding Author:
- Dr. Mulugeta Gebregziabher
Division of Biostatistics and Epidemiology
Medical University of South Carolina, Charleston, USA
E-mail: [email protected]
Received Date: February 13, 2012; Accepted Date: April 13, 2012; Published Date: April 15, 2012
Citation: Gebregziabher M, Zhao Y, Axon N, Gilbert GE, Echols C, et al. (2012) Lessons Learned in Dealing with Missing Race Data: An Empirical Investigation. J Biom Biostat 3:138. doi:10.4172/2155-6180.1000138
Copyright: © 2012 Gebregziabher M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract Background: Missing race data is a ubiquitous problem in studies using data from large administrative datasets such as the Veteran Health Administration and other sources. The most common approach to deal with this problem has been analyzing only those records with complete data, Complete Case Analysis (CCA) which requires the assumption of Missing Completely At Random (MCAR) but CCA could lead to biased estimates with inflated standard errors. Objective: To examine the performance of a new imputation approach, Latent Class Multiple Imputation (LCMI), for imputing missing race data and make comparisons with CCA, Multiple Imputation (MI) and Log-Linear Multiple Imputation (LLMI). Design/Participants: To empirically compare LCMI to CCA, MI and LLMI using simulated data and demonstrate their applications using data from a sample of 13,705 veterans with type 2 diabetes among whom 23% had unknown/ missing race information. Results: Our simulation study shows that under MAR, LCMI leads to lower bias and lower standard error estimates compared to CCA, MI and LLMI. Similarly, in our data example which does not conform to MCAR since subjects with missing race information had lower rates of medical comorbidities than those with race information, LCMI outperformed MI and LLMI providing lower standard errors especially when relatively larger number of latent classes is assumed for the latent class imputation model. Conclusions: Our results show that LCMI is a valid statistical technique for imputing missing categorical covariate data and particularly missing race data that offers advantages with respect to precision of estimates.