3014 — Extracting Free Text from Electronic Health Record: Natural Language Processing to Identify Lines and Devices in Portable Chest X-Ray Reports
Rubin D (Stanford University and VA Palo Alto Health Care System), Wang D
(GRECC and CHCE, VA Palo Alto Health Care System), Chambers DA
(GRECC and CHCE, VA Palo Alto Health Care System), Chambers J
(GRECC and CHCE, VA Palo Alto Health Care System), South B
(VA Salt Lake City Health Care System), Goldstein MK
(GRECC and CHCE, VA Palo Alto Health Care System)
Patients in the intensive care unit (ICU) frequently have complications due to infections likely related to medical device presence and length of time inserted, so these are important data for infection surveillance. These patients often receive portable chest X-ray (CXR) imaging throughout their stay in the ICU. The resultant reports include rich information about medical devices that could lead to determining dwell time; however, they are reported in free-text rather than structured data elements. Our aim was to develop a natural language processing (NLP) system to extract structured data from CXR reports to enable epidemiological research correlating line/device types or dwell time with clinical parameters.
We developed an NLP system, Chest X-Ray Device Extractor (CXDE), that analyzes reports in two sequential processing steps utilizing GATE framework. The first step segments text into individual sentences and identifies spans of text related to medical lines/devices. Terms include line/device names and synonyms, and words/phrases that indicate the insertion, removal, presence, or absence status of device. The second step analyzes results of the first step using prioritized parameters to correctly infer status of line/device based on status annotations located in proximity of the device.
We prepared an annotation guideline/schema to create a reference standard CXR report set to test the performance of CXDE. The reference standard consisted of 90 randomly selected CXR reports annotated by three team members rotating the roles of annotator/adjudicator with each batch. CXDE was evaluated against the reference standard and precision and recall metrics were calculated as: recall = True Positive(TP)/(TP+False Negative(FN)) and precision = TP/(TP+False Positive(FP)).
The reference set had 148 line/device terms. 23/148 identified as status inserted, 16 removed, 107 present, and none absent. CXDE identified line/device mentions with recall and precision of 97% and 99%, respectively, and identified presence status type with recall and precision of 95% and 96%, respectively. For inserted and removed status type, recall and precision are 91-94% and 94-95%, respectively.
We created an NLP system that is capable of detecting device mentions and inferring their insertion status in CXR reports with good performance in this preliminary testing.
The CXDE has potential to speed record review for epidemiologic infection surveillance by automating detection of lines/devices.