Lead/Presenter: Bocheng Jing, San Francisco VAMC
All Authors: Jing B (San Francisco VAMC, Northern California Institute for Research and Education, University of California, San Francisco), Jeon, SY (San Francisco VAMC, University of California, San Francisco), Boscardin, WJ (San Francisco VAMC, University of California, San Francisco) Lee, AK (San Francisco VAMC, University of California, San Francisco) Lee, SJ (San Francisco VAMC, University of California, San Francisco)
There is tremendous interest in using electronic health record (EHR) data to identify high-risk patients. However, it is unclear which prediction model development methods yield the most accurate models. Our goal was to compare discrimination and diagnostic test characteristics of models developed using backward selection methods to random forests (RF).
We used VA EHR data to identify patients > 50 who had clinic visits in 2005. We used a random 5% sample for training and a separate sample for validation (n = 50,544 respectively). For survival models, the outcome was time to death, with follow-up through 12/31/2017. For logistic and RF models, the outcome was death within 10 years of the index-visit. Potential predictors included ICD9 diagnoses (293 Clinical Classifications Software categories), VA drug classes (399), labs (71 different lab tests, categorized as abnormal low, normal, abnormal high, nonsensical values and missing), healthcare utilization (149 different clinics and hospitalizations, categorized as 0, 1 and 1+ visits), vital signs (9), and demographics (age, sex). We applied backwards selection in the training sample (stay-in p-value < 0.0001) for survival and logistic models. We set the RF to find an optimal variable and its value at each split that minimized classification error. We calculated AUC, sensitivity, specificity, positive/negative predictive value (PPV/NPV) for each model.
Selection yielded approximately 120 predictors for the survival models and 98 predictors for the logistic model. RF utilized 658 variables per tree on average. The training and testing AUCs were almost identical for each model. All models had similar AUCs (0.828 vs. 0.828 vs. 0.831 vs. 0.823). The Gompertz model had the highest sensitivity (81.5%) while RF had the highest specificity (86.6%). The Weibull model had the highest PPV (81.3%) while the logistic model had the highest NPV (78.3%).
Using VA EHR data, RF, survival (Weibull and Gompertz) and logistic regression led to mortality prediction models with similar discrimination and diagnostic test characteristics.
Since different model development methods lead to models with similar accuracy, future prediction models using VA EHR data should choose model development techniques based on other factors beyond model accuracy, such as computational efficiency and ability to accommodate missing variables.