Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Improving diagnosis-based quality measures: an application of machine learning to the prediction of substance use disorder among outpatients.

Hoggatt KJ, Harris AHS, Hayes CJ, Washington D, Williams EC. Improving diagnosis-based quality measures: an application of machine learning to the prediction of substance use disorder among outpatients. BMJ open quality. 2025 Mar 22; 14(1):DOI: 10.1136/bmjoq-2024-003017.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions



Abstract:

OBJECTIVE: Substance use disorder (SUD) is clinically under-detected and under-documented. We built and validated machine learning (ML) models to estimate SUD prevalence from electronic health record (EHR) data and to assess variation in facility-level SUD identification using clinically documented diagnoses vs model-based estimated prevalence. METHODS: Predictors included demographics, SUD-related diagnoses and healthcare utilisation. The criterion outcome for model development was prevalent SUD assessed via a patient survey across 30 geographically representative Veterans Health Administration (VA) sites (n = 5989 patients). We split the data into training and testing datasets and built a series of ML models using cross-validation to minimise over-fitting. We selected the final model based on its performance in predicting SUD in the testing dataset. Using the final model, we estimated SUD prevalence at all 30 sites. We then compared facilities based on SUD identification using two alternative SUD identification measures: the facility-level SUD diagnosis rate and model-based estimated SUD prevalence. RESULTS: The best-performing LASSO model with n = 61 predictors doubled the sensitivity for classifying SUD relative to a model with only documented SUD diagnoses (0.682 vs 0.331). Across the 30 sites, SUD diagnosis rates ranged from 6.4%-13.9% and predicted SUD prevalence ranged from 9.7-16.0%. The difference in facility-level SUD identification (observed diagnosis rate minus predicted prevalence) ranged from -7.2 to +1.3 percentage points. Comparing facilities'' rank ordering on documented SUD diagnosis rates vs estimated SUD prevalence, 16 out of 30 sites had a ranking that changed by at least a quintile (ie, 6 places or more). CONCLUSIONS: This analysis shows that use of model-based performance measures may help address measurement blind spots that arise due to differences in diagnostic accuracy across sites. Although model-based estimates better estimate SUD prevalence relative to diagnoses alone for facility quality assessment, further improvements and individual SUD detection both require enhanced direct screening for non-alcohol drug use.





Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.