2011 HSR&D National Meeting Abstract
3030 — Dealing with Missing Race Data: An Empirical Investigation of Imputation Methods
Gebregziabher M (Charleston REAP/MUSC), Zhao Y
(Charleston REAP/MUSC), Echols C
(Charleston REAP), Gilbert G
(Charleston REAP), Egede LE
Missing race data is ubiquitous in many studies that use data from the Veteran Health Administration (VHA). While several methods have been suggested in the literature on how to deal with missing categorical covariate data, the most commonly used approach has been analyzing the complete data which could lead to biased estimates with inflated standard errors.
In this study, we examined the performance of a new imputation approach, latent class multiple imputation (LCMI), for imputing missing race data assuming missing at random mechanism. We empirically investigated its performance and compared it with other imputation techniques such as multiple imputation (MI) and log-linear imputation (LLMI) that are appropriate for missing categorical data. We used data from a retrospective cohort of 13,416 veterans with type 2 diabetes among whom 22% were with unknown/missing race data. In this cohort, the distribution of missing race was different by level of comorbidities such that those with missing race data showed lower rates of comorbidities. There were also differences in terms of HbA1c, blood pressure, and lipid control outcomes, as well as other demographic variables between those with and without race data. We used statistical information criterion and standard error of estimates to assess the performance of the methods under a logistic regression model. Furthermore, simulation studies were used to investigate the statistical properties of LCMI in comparison with the other methods under all possible missing data mechanisms (including missing completely at random, missing at random, and not missing at random). The procedures were compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates.
Our simulation results show that, under many missingness scenarios, LCMI performs favorably and can be used to handle missing race data in VHA datasets. The simulation results were also supported by the results from the actual data example.
See results section.
Accuracy of health disparity studies as well as other studies that adjust for race depends on complete race data. However, race data is substantially missing in some VHA data sets. When race data cannot be filled in using other patient files, imputation techniques that are specifically developed for missing categorical data could reduce the impact of missingness.