4101 — A Flexible, Rapid, and Scalable System for Identifying AKI Risk Factors within Medical Free Text: Development and Evaluation
Lead/Presenter: Glenn Gobbel,
All Authors: Gobbel GT (VA Tennessee Valley Healthcare System) Sanjib S (Tennessee Valley Healthcare System) Reeves RM (Tennessee Valley Healthcare System) FitzHenry F (Tennessee Valley Healthcare System) Gentry NH (Tennessee Valley Healthcare System) Hanchrow EE (Tennessee Valley Healthcare System) Matheny ME (Tennessee Valley Healthcare System)
Acute kidney injury (AKI) is a significant risk following cardiac catheterization. Despite knowledge of multiple factors for predicting AKI risk, no models currently exist in the VA to support risk stratification and surveillance. Risk predictive factors are commonly buried in free text. We sought to develop an NLP system capable of extracting disparate risk factors from unstructured text to create an AKI risk model and support patient level risk analysis in near real time.
We augmented our existing probabilistic RapTAT NLP tool for concept extraction by developing a rule-based module to improve adaptability and performance, and we trained a conditional random field-based model to assign concept assertion value. We embedded the NLP system in the VA-generated LEO framework for multi-threaded processing. The document corpus contained 1.2+ million documents from 158,432 patients across 74 VA medical centers and included 7 document types ranging from progress notes to discharge summaries. For system training and testing, two nurse reviewers and a third adjudicator annotated 1344 documents for 14 different AKI risk-related concepts across 4 broad categories including medications, fluid status, renal functionality, and radiographic media exposure. System performance measures included precision, recall, and F1.
Micro-average inter-annotator agreement (based on F1) was 0.93, and the macro-average was 0.88 ± 0.09 (mean ± SD). Trained NLP tool performance was similar with micro-averaged precision, recall, and F1 of 0.92, 0.91, and 0.91, respectively, and macro-averages of 0.89, 0.87, and 0.88. For assertion status assignment, precision, recall, and F1 for positive assertions was 0.93, 0.98, and 0.956, respectively, and 0.91, 0.74, and 0.82 for negative assertions. Using 10 parallel processes, annotation of all 1.26 million documents by the NLP system required 806 minutes for a rate of 138.2 kilobytes of text (~ 26.0 documents) per second.
These results demonstrate the feasibility of generating a rapid NLP system for accurately extracting concepts to support risk analysis for acute kidney injury after cardiac catheterization in near real time.
The RapTAT NLP tool should enable the creation of systems that rely on near-real-time extraction of multiple factors from unstructured notes for analyzing risk and assisting in population health management.