Adherence to hospital quality measures should be aligned with improved patient outcomes so that measure-adherent care is synonymous with optimal care. Although patients are diverse and heterogeneous, clinical guidelines and quality measures are generally conceived as 'one size fits all'. There is growing discomfort with this approach because of the increasing recognition that guidelines for optimal care are not simple, and may vary with patient characteristics. To develop patient centered quality measures two daunting challenges need to be overcome: first, to identify the best evidence-based care for a particular patient and second, to efficiently determine whether such care was received. The currently available structured data sources are not adequate for this task and most clinical trials are either narrowly focused, or lack sufficient power to evaluate evidence-based treatments in subgroups of patients. Certain quality measures do collect the necessary data, but this often requires manual abstraction from clinical notes. This makes these measures resource intensive and limits them to a relatively small subset of eligible patients. If collection of this data could be automated then it could be extended to all Veterans, providing the data needed to develop patient-centered quality measures as well as reducing the resources required for ongoing quality measurement. Natural language processing (NLP) methods provide a method to extract information from clinical notes making this possible. This project focuses on applying NLP methods to a complex quality measure, the Surgical Care Improvement Project (SCIP) measure of perioperative beta blocker continuation, SCIP-Card-2. The measure, as defined, requires evidence found in clinical notes in addition to the information available in structured data.
The purpose of this pilot project is to assess whether natural language processing (NLP) methods in conjunction with the VA's electronic medical record (EMR) can be used to successfully reproduce the SCIP-Card-2. To do this, we design and test an automated re-creation of the SCIP-Card-2 data that does not require manual chart abstraction. Patients' care can meet the measure with varying degrees of continuation.
This is a retrospective cohort study using SCIP-Card-2 adherence data for fiscal years 2014 and 2015 from the VA External Peer Review Program SCIP module, merged with VA records including pharmacy and vital sign data, clinic visits, ICD9, CPT codes, and free-text clinic notes. The development of the recreated measure will proceed in stages using as a gold standard the SCIP-Card-2 data already collected by VA's External Peer Review Program (EPRP). We will first identify the indeterminate cases whose adherence to the measure cannot be determined using inpatient pharmacy data alone. These indeterminate cases will then be used to train NLP methods to extract the necessary information from clinical documents. The recreated algorithm will then be constructed from a combination of the NLP data and the structured EMR data, and tested for accuracy. Finally, we will use the SCIP-Card-2 and reconstructed data to classify patients who adhered to the measure by their pattern of beta blocker continuation.
We have received SCIP-Card-2 data from EPRP for 22,308 surgeries in FY 2013 and 2014, 98.3% met the measure while only 1.7% failed. Of those, 13,452 (60.3%) cases could be determined to have met the measure based on inpatient pharmacy data alone. The remaining 8856 indeterminate cases were randomly divided into 75% (N=6647)for training NLP, and 25% (N=2209) for testing results. The reconstructed measure using only inpatient pharmacy data and vital signs from the EMR agreed with the official measure on 1929 (87.3%) of indeterminate cases for 94.9% estimated overall agreement. The reconstructed measure using both NLP and vital sign data agreed with SCIP-Card-2 on 2046 (92.6%) indeterminate cases for an overall estimated agreement rate of 97.0%. Of the remaining 162 disagreements, 74 ( 46%) were false-positives, 72 (44%) were false-negatives, and 17 (10%) were classified by the reconstructed measure as 'undefined', suggesting an overall lack of bias in the direction of residual disagreement. Overall, the estimated positive predicted value is 98.7% while the negative predicted value was low at 1.4% (1 out of 73).
This project supports the feasibility of using EMR data and NLP to reconstruct a complex quality measure. However, practical implications in this case may be limited by ceiling effects-- the true rate of measure failure is only 1.7% which is not much greater than the rate of errors that can reasonably be expected from the manual record-abstraction used for the official measure or what could be expected from an automated measure. The best performance may ultimately come from using automated algorithms to do a first-pass, and efficiently allocating manual record abstraction to a limited number of ambiguous cases. These findings do suggest that nuanced quality measures may be monitored for all eligible patients without manual record abstraction which will provide a critical tool to develop system-wide patient-centered tailored care guidelines and comparative effectiveness studies.
None at this time.
Treatment - Observational, Prevention, Prognosis