3077 — Extracting Time Concepts from Electronic Health Records: A Prerequisite for Surveillance of Medical Events across Time
Reeves RM (Tennesse Valley Healthcare System), Ong FR
(Vanderbilt University Medical Center), Gobbel GT
(Tennesse Valley Healthcare System), Montella D
(Tennesse Valley Healthcare System), Matheny ME
(Tennesse Valley Healthcare System), Brown SH
(VHA Office of Health Information), Speroff T
(Tennesse Valley Healthcare System)
Many research and quality improvement initiatives depend on expensive, time-consuming chart abstraction for establishing when, how long, and in what sequence medically relevant events occur. Our goal was to develop an automated natural language processing (NLP) tool for capturing time information in medical narratives, and to subsequently evaluate the accuracy of automated time classification within clinical documents.
The data were 100 clinical records of hospitalized patients. We developed Med-TTK for extracting time information in medical records as an adaptation of TARSQI Tool Kit (TTK), an open source NLP application. TTK’s rule-based approach structures temporal information in expressions it identifies via four classes:
time (10:23 p.m.; in 4 hours);
date (December 21, 2004; two months ago);
duration (for 12 minutes; over a period of 8 years); and
set (every four hours; B.I.D.; Monday, Wednesday and Friday).
Temporal terminology and formatting modifications were implemented within Med-TTK to optimize performance among medical documents. Each document underwent independent review by two annotators with disagreements resolved by a referee. Performance was measured by comparing the adjudicated gold standard to the TTK and Med-TTK classifications to yield precision (positive predictive value) and recall (sensitivity).
Recognized time expressions in the documents numbered 1581, 3275, and 3213 for TTK, Med-TTK, and the gold standard, respectively. TTK had a precision of 0.26 (95%CI 0.24, 0.29) and recall of 0.13 (CI 0.12, 0.14). Med-TTK had a precision of 0.82 (CI 0.81, 0.83) and recall and 0.84 (CI 0.82, 0.85). Med-TTK's performances by class were: time: precision 0.91, recall 0.91; date: precision 0.77, recall 0.91; duration: precision 0.77, recall 0.67; and set: precision 0.87, recall 0.83.
We achieved significant improvement from TTK, which was developed to capture time in news articles, to the medically customized Med-TTK. The high accuracy in recognition and classification of time expressions by the enhanced Med-TTK positions it to provide accurate and meaningful time-stamps to the medical events in the electronic health record.
Natural language processing applications provide an avenue for meaningful longitudinal mapping of patient history elements among electronic health records. This informatics research highlights a tool that could be used for initiating electronic surveillance of medical events across time.