Despite increasing attention on VA heart failure (HF) readmission rates and quality improvement efforts (QI) directed at their reduction, rates remain unchanged over at least the past 7 years. While clinical trials have shown that focused multi-component interventions initiated before discharge and continued into the discharge period are most likely to decrease readmission risk, applying such interventions to all HF patients may not be financially sustainable. Thus, the ability to identify HF patients at highest readmission risk is key to selective targeting of appropriate interventions. Only one of several existing models predicting individual HF patients' readmission risk (Amarasingham et al., 2010) has been developed into a decision support tool to target HF inpatients at hospital admission for intensive case management and successfully used to decrease readmissions. So far this model has only been applied outside the VA, but holds promise for VA use. This model demonstrated the highest known c-statistic (a measure of the ability to discriminate between low- and high-risk patients) among such models at 0.72, versus approximately 0.60 for all others. This higher predictive ability was purportedly achieved by including "social instability" variables such as number of address changes in the prior year. However, this prediction level is still only modest. Machine learning (ML), which develops algorithms for identifying complex relationships among variables, may offer a predictive advantage over traditional logistic models. While no ML methods have been applied to readmission prediction, various ML methods (e.g., Support Vector Machines [SVMs], and Decision Trees) have shown promise when applied to other healthcare outcomes.
Specific aims were to:
1.-Adapt an existing non-VA model that uses readily available automated data to predict HF inpatients' risk of readmission to the VA setting.
2.-Evaluate ML, non-linear modeling techniques, specifically, Decision Trees, na ve Bayes classifiers and SVMs, in predicting readmissions. Compare results obtained using each method to each other and to model(s) from Aim 1.
This was a retrospective observational pilot study using FY08-FY13 national data. VA data sources included CDW laboratory, vital signs, appointment files, Medical SAS datasets (demographics, inpatient and outpatient utilization and diagnostic and procedure codes, means test, insurance status, homeless status), Decision Support System files (medications and priority status), and Vital Status file (death dates). We created a zip code-based variable, "residence census tract in lowest socioeconomic quintile" using the 2013 US Census Bureau's American Community Survey.
Our study sample consisted of all index discharges with a principal diagnosis of HF during FY09 through FY13; (we used FY08 for baseline utilization and diagnostic information). We defined an index discharge as either the first hospitalization during the study period or one occurring more than 30 days after the index discharge.3 We excluded discharges where the patient died in-hospital, leaving us a final sample of 96,555 HF index discharges among 64,782 patients. Our outcome measure was 30-day all-cause VA readmission defined per CMS methods.
For Aim 1, we started with variables used in Amarasingham et al.'s model (if available). We added VA-specific factors (e.g., VA priority status and dual insurance status). Additionally, we included 2 variables used as screens for transition problems in other studies, "prescription of 5 routine medications" and presence of high risk medications (insulin, oral hypoglycemic agents, anticoagulants, aspirin and clopidogrel dual therapy, digoxin.) For laboratory and vital signs, we selected the most abnormal value occurring within 24 hours of the admission date. All variables included in the final model from Aim 1 were used for Aim 2, ML models.
We built models predicting readmission risk, first using logistic regression and then hierarchical generalized linear mixed effects models (GLMMs) using facilities as a random effect. We assessed model performance based on c-statistics, and/or areas under the curve (AUCs) as applicable, and compared models with varying inclusion and specifications of variables (e.g., continuous vs. categorical).
For Aim 2, we first did a 10-fold cross validation on the final baseline logistic regression model from Aim 1 and averaged c-statistics across iterations. We then examined Decision Trees using Classification and Regression Tree (CART) models. We also did preliminary analyses using a na ve Bayes classifier and SVMs.
Of VA HF discharges, 21% had a 30-day readmission; 27% had a HF hospitalization in the preceding year.
Aim 1- Our final base multivariate logistic regression model had a c-statistic of 0.641. Re-running models using different variable specifications had no meaningful impact on results (c-statistics varied from a low of 0.632 if age, laboratory and vital signs variables were combined into a mortality score as was done by Amarasingham et al., to a high of 0.656 if this score minus age was added to a model containing all individual variables that went into the score. GLMMs yielded similar results to our logistic regression model. The AUC was 0.643 for our baseline GLMM. Various variable specifications similarly had minimal impact on GLMM results. Of note, using only the variable "number of hospitalizations in the prior year" in the model yielded a c-statistic of 0.617.
Aim 2- Ten-fold cross-validation revealed our baseline logistic regression model was valid (average c-statistic=0.64). We ran several CART models, none of which yielded clinically coherent results, but did highlight a fundamental characteristic of the dataset that was consistent with the logistic model findings - the main driver of group definition was "number of hospitalizations in the prior year." Dividing the sample based solely on this variable produced results comparable to the more detailed CART model. Preliminary analyses with na ve Bayes classifiers and SVMs did not provide predictive benefit over logistic regression models.
We could not replicate the relatively high c-statistic found by Amarasingham et al., despite incorporating similar socioeconomic variables (the presumed reason Amarasingham's results are better than other models), as well as VA-specific variables. Instead, we found only modest c-statistics or AUCs. Results were similar across numerous sensitivity analyses and when using simple logistic regression models or hierarchical models. More sophisticated ML approaches offered little advantage over logistic models. Our results are similar in magnitude to most existing studies on readmission. Nevertheless, one of the very useful and practical findings from our study is that number of hospitalizations in the prior year predicted readmission risk almost as well as the full model. Future research will examine the impact on reducing readmissions of targeting HF patients at high readmission risk because of multiple admissions in the prior year for intensive inpatient and post-discharge case management.
External Links for this Project
Grant Number: I21HX001360-01
None at this time.