Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes.

Morrow D, Zamora-Resendiz R, Beckham JC, Kimbrel NA, Oslin DW, Tamang S, Million Veteran Program Suicide Exemplar Work Group , Crivelli S. A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes. Journal of psychiatric research. 2022 Jul 1; 151:328-338.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions


The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.