Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website
2015 Conference Logo

2015 HSR&D/QUERI National Conference Abstract

3067 — Extracting Left Ventricular Ejection Fraction, a Marker of Heart Function, from Semi-structured and Unstructured Data

Kim Y, University of Utah; Garvin JH, VA Salt Lake City Health Care System and University of Utah; Nixon G, VHA Health Informatics; Felicio LS, VA San Diego Health Care System and University of California, San Diego; Rubin MA, VA Salt Lake City Health Care System and University of Utah; Redd A, VA Salt Lake City Health Care System and University of Utah; Meystre SM, University of Utah;

The objective of this study was to extract mentions of left ventricular ejection fraction (EF) and associated values from semi-structured and unstructured clinical notes. We analyzed the quantitative values extracted from two different sources to compare the prevalence of a marker of poor cardiac function for each patient. This study was undertaken in the context of the HMP (Health Management Platform) project to automatically extract quality measures and combine them at the patient level to aid clinicians in having rapid access to critical information about heart failure (HF) to improve clinical decision-making.

We developed a software tool that analyses semi-structured echocardiogram reports written in XML (Extensible Markup Language) format. This tool provides two functionalities: 1) detecting and processing specific XML tags to extract EF values from XML files, 2) converting XML files to human-readable text files. To extract EF mentions and values from unstructured text, we created a sequential tagger trained with a collection of 790 manually-annotated clinical notes. Features for the machine-learning algorithm include words, syntactic information, orthographic information, and combinations of these features.

The XML conversion tool successfully analyzed the XML-formatted echocardiogram reports and subsequent manual review revealed that all EF values were correctly extracted. For unstructured text, we processed 865 clinical notes. Our sequential tagger reached a 90.1% overall F1-measure. At the patient level, both systems detected EF values for 167 patients with 93.4% accuracy finding EF<40% values.

This study shows that the automatic extraction of EF mentions and values can exploit a conversion tool that analyzes semi-structured data and a machine learning-based tagger trained on unstructured clinical notes to achieve satisfactory performance. Although there is ample room for improvement in processing unstructured text, this information extraction approach can be utilized to extract EF information when such information is absent in semi-structured data.

Providing guideline-concordant, timely care for patients with poor heart function reduces morbidity and mortality. Yet, in today's busy clinical practice, data buried in large records can be overlooked. The tools presented here improve efficiency of key clinical data acquisition for quality metrics by extracting left ventricular EF and providing it in structured, computable form.