2012 HSR&D/QUERI National Conference Abstract
3011 — Detecting Mentions and Values of Left Ventricular Ejection Fraction in Echocardiogram Reports
Kim Y, University of Utah; Garvin JH, VA Salt Lake City Health Care System and University of Utah; Meystre SM, University of Utah;
The aim of this study is to automatically extract mentions of left ventricular ejection fraction (EF) and its associated qualitative and quantitative values from echocardiogram reports. Knowledge of the left ventricular EF is necessary to monitor the progression and treatment of heart failure. This study is undertaken in the context of the ADAHF (Automated Data Acquisition for Heart Failure) project, aiming at the automated extraction and analysis of congestive heart failure treatment performance indicators.
We approached the detection of EF mentions and values as a sequence-tagging problem. A training corpus of 280 manually annotated echocardiogram reports was used to train a classifier. After segmenting the text in sentences and tokens, and tagging the latter with their part of speech (POS), we added BIO (B: beginning of a term, I: inside the term, O: outside) tags to each EF mention or value token, and trained a sequential classifier using Conditional Random Fields. As classification features, we used words, POS tags, affix, orthographic features, and combinations of these.
Our system reached an overall 96.87% precision and 93.21% recall (95.01% F1-measure) with a testing corpus of 491 echocardiogram reports. Mentions of EF were extracted with 97.00% precision and 93.20% recall (95.06% F1-measure), qualitative values with 97.10% precision and 92.49% recall (94.74% F1-measure), and quantitative values with 96.43% precision and 93.98% recall (95.19% F1-measure). Most EF mention false positives were witnessed when an EF entity existed without associated value in a sentence. Our classifier missed some associated values cited too far away from the EF mention.
This study shows that an information extraction method based on Natural Language Processing can be applied to successfully detect medical concepts and values. For automated machine learning-based detection, we observed that lexical and syntactic features were the most important for high accuracy.
Heart failure is a prevalent condition in the Veteran population, and quality improvement efforts in this domain rely on treatment performance metrics such as the left ventricular EF. The automated detection of EF mentions and values in clinical notes, as presented here, could allow for faster and more comprehensive acquisition of these metrics.