1025 — Automating Performance Measurement of Congestive Heart Failure: Development of a Gold Standard to Train and Test a Natural Language Processing Tool
Garvin JH (Center for Health Equity Research and Promotion, NewCourtland Center for Transitions and Health), Leecaster M
(IDEAS Center, VA Salt Lake City Health Care System and University of Utah), Field S
(Center for Health Equity Research and Promotion), Elkin P
(Center for Biomedical Informatics, Mount Sinai School of Medicine), Brown S
(Nashville TREP, TVHS), Hoke L
(Center for Health Equity Research and Promotion), Quiaoit Y
(Center for Health Equity Research and Promotion), LaJoia J
(Center for Health Equity Research and Promotion), Speroff T
(Nashville TREP, TVHS)
The most widely available method to gather data from free text documents is through manual review by human abstractors. Natural Language Processing (NLP) techniques have the potential to accurately extract and process rules against the information contained in the discharge instructions. In order to use these techniques, however, gold standard assessments by human experts are required to develop and refine NLP techniques to process complex rules on free text documents. The purpose of this presentation is to discuss an example of gold standard development for congestive heart failure (CHF) quality measurement which will be used to train and test an NLP tool.
Nurse practitioners reviewed the discharge instructions of 160 patients in the research cohort for presence of quality criteria. Based on External Peer Review Process (EPRP) requirements, complete discharge instructions include: activity level, diet, discharge medications, follow-up appointment with MD/NP/PA, weight monitoring after discharge, when to contact their health care provider if significant weight change occurs, and what to do if symptoms worsen. For each reviewer, the discharge instructions were summarized as complete (1) if all instructions were present, and incomplete (0) if any instructions were absent. The inter-rater reliability was assessed using Kappa statistics on the summary as well as the six instructions separately. The bootstrap confidence intervals were calculated. Following independent chart review, adjudication, and recording of data, the reviewers discussed all cases where there was disagreement related to presence of required elements. The consensus exercise was summarized and qualitative findings were summarized.
The Kappa for the summary was 0.88 with 95% confidence interval 0.78 to 0.93. The individual criteria in discharge instruction Kappas ranged from 0.38 to 1.0. Based on independent review, there were 8 patient records with disagreement about overall completion of discharge instructions.
While agreement on the overall completion of discharge instructions was high, there was disagreement on the presence of instructions for activity level, weight monitoring, and what to do if symptoms worsen.
Performance measurement is a key endeavor in the VA. This research provides information about an important aspect of automating performance measures with NLP.