3089 — Classifying Features from VA Clinical Documents to Identify Homeless or At-Risk Veterans
Shen S (SLC IDEAS Center, VA Salt Lake City Health Care System) , South BR
(SLC IDEAS Center, VA Salt Lake City Health Care System), Palmer M
(SLC IDEAS Center, VA Salt Lake City Health Care System), DuVall SL
(SLC IDEAS Center, VA Salt Lake City Health Care System), Nelson R
(SLC IDEAS Center, VA Salt Lake City Health Care System), Samore MH
(SLC IDEAS Center, VA Salt Lake City Health Care System), Gundlapalli AV
(SLC IDEAS Center, VA Salt Lake City Health Care System)
Developing new methods that can be used to identify veterans experiencing homelessness or documented risk factors is a priority area for health services research. We demonstrate a semi-automated approach to induce lexical domain knowledge and identify risk factors for homelessness mentioned in VA clinical documents. Once an initial lexicon is created it can be used to support training and evaluation of automated methods such as Natural Language Processing (NLP) systems for detection and prediction of homelessness.
Using expert inputs and literature resources we used a “think out loud approach” to develop an initial lexicon of features related to homelessness. Domain experts identified five lexical categories amenable to common mentions: 1) social stressors (i.e., recent divorce, unemployment); 2) behavioral risk factors (i.e., drug/alcohol abuse); 3) supporting evidence (i.e., lives in shelter, no housing); 4) direct mention of homelessness (i.e., homeless patient); and 5) other risk factors (i.e., exposure to war-related trauma). Using our initial lexicon we pre-annotated 600 VA clinical documents extracted from the Veterans Informatics Computing Infrastructure (VINCI) for the time period 1/1/2000-12/31/2009. We used a prototype system that supports interactive annotation and semi-automated curation of user defined information classes. Domain experts reviewed pre-annotated documents and determined if information was correctly identified, made modifications, added missed features, or rejected annotations found to be incorrect or irrelevant.
Our initial lexicon had 83 entries. After two rounds of review applied to 75 documents, 38 concepts were added. Pre-annotated information helped reviewers focus attention on important contextual cues that can be used further refine our lexicon. These efforts will be used to develop a reference standard for NLP system training and evaluation.
Our methods can quickly and efficiently generate lexical domain knowledge combining information from literature and expert feedback via iterative refinement. These methods can easily be adapted to other VA priority domains adding an additional source of information for case identification or care management.
Homelessness is a high priority area for VHA. Identifying homeless or at-risk veterans has the potential to improve access to community care and preventive services for veterans.