Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Veterans Crisis Line Badge
Go to the ORD website
Go to the QUERI website

2011 HSR&D National Meeting Abstract

Printable View

2011 National Meeting

3030 — Dealing with Missing Race Data: An Empirical Investigation of Imputation Methods

Gebregziabher M (Charleston REAP/MUSC), Zhao Y (Charleston REAP/MUSC), Echols C (Charleston REAP), Gilbert G (Charleston REAP), Egede LE (Charleston REAP/MUSC)

Objectives:
Missing race data is ubiquitous in many studies that use data from the Veteran Health Administration (VHA). While several methods have been suggested in the literature on how to deal with missing categorical covariate data, the most commonly used approach has been analyzing the complete data which could lead to biased estimates with inflated standard errors.

Methods:
In this study, we examined the performance of a new imputation approach, latent class multiple imputation (LCMI), for imputing missing race data assuming missing at random mechanism. We empirically investigated its performance and compared it with other imputation techniques such as multiple imputation (MI) and log-linear imputation (LLMI) that are appropriate for missing categorical data. We used data from a retrospective cohort of 13,416 veterans with type 2 diabetes among whom 22% were with unknown/missing race data. In this cohort, the distribution of missing race was different by level of comorbidities such that those with missing race data showed lower rates of comorbidities. There were also differences in terms of HbA1c, blood pressure, and lipid control outcomes, as well as other demographic variables between those with and without race data. We used statistical information criterion and standard error of estimates to assess the performance of the methods under a logistic regression model. Furthermore, simulation studies were used to investigate the statistical properties of LCMI in comparison with the other methods under all possible missing data mechanisms (including missing completely at random, missing at random, and not missing at random). The procedures were compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates.

Results:
Our simulation results show that, under many missingness scenarios, LCMI performs favorably and can be used to handle missing race data in VHA datasets. The simulation results were also supported by the results from the actual data example.

Implications:
See results section.

Impacts:
Accuracy of health disparity studies as well as other studies that adjust for race depends on complete race data. However, race data is substantially missing in some VHA data sets. When race data cannot be filled in using other patient files, imputation techniques that are specifically developed for missing categorical data could reduce the impact of missingness.


Questions about the HSR&D website? Email the Web Team.

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.