Atherosclerotic plaquing in the internal carotid artery results in carotid stenosis, a wellrecognized risk factor for ischemic stroke. Treatment of symptomatic carotid stenosis is one of the most cost-effective interventions in stroke prevention. Carotid procedures remove the atherosclerotic plaque and thereby reduce the risk of future stroke, but at a cost of an elevated periprocedural risk of stroke and death. Therefore, management of carotid stenosis requires a careful weighing of benefit and risk, and effective communication of that information so that patients can make an informed decision.
The typical study design of carotid stenosis identifies patients who have undergone a carotid procedure, followed by a retrospective analysis to determine whether the procedure should have been undertaken. However, this approach omits patients who do not undergo a carotid procedure. The ideal study design of carotid stenosis would start by identifying patients with carotid stenosis. However, this approach is not undertaken because there is no simple method for identifying the population with this condition without conducting an expense and time intensive chart review. Natural language processing (NLP) of radiology reports promises to reduce the number of charts or the time needed to review each chart through a combination of filtering reports for relevancy and highlighting or structuring information in a report for quicker review. In this proposal, we developed an NLP application for identifying ultrasound reports describing significant internal carotid stenosis.
The specific aim of this project is to determine how accurately an NLP application can identify patients with significant internal carotid stenosis when compared against a reference standard of abstractions created by chart review.
The VA Office of Quality and Performance (OQP), Patient Care Services (PCS), and the stroke QUERI collaborated to conduct the Office of Quality and Performance Stroke Special Study. It was a retrospective cohort of 5000 veterans admitted to all Veterans Affairs Medical Centers in FY 2007 with a primary discharge diagnosis of ischemic stroke. Several diagnostic tests had been performed for detecting carotid artery stenosis, including carotid ultrasound, MRA of the neck, and CTA of the neck. The most commonly performed test was carotid ultrasound, so we focused on ultrasound reports in this project. A carotid test typically has two results, one for the right and left carotid artery. In the OQP cohort, there were 3750 carotid test results abstracted during the time period of 12 months prior to admission to 6 months after admission. Originally, we intended for these results collected by the chart abstractors to serve as the reference standard (RS) in this project. However, we were unable to successfully link these abstractions to the provided reports due to a lack of report-to-abstraction linkage information e.g., date of exam, report type field, etc. Therefore, we randomly selected 19 patients (n=222 reports) for training, 41 patients (n=86 reports) for tuning, and 34 patients (n=110 reports) for test from the cohort for algorithm development and evaluation. For this subset of OQP reports, we generated a reference standard of abstraction annotations. Two annotators marked each finding mention of carotid stenosis and its contextual descriptors on a development set. We developed the ConText algorithm by adding missing phrases and regular expressions from the training set, validated ConText's performance on a tuning set, and finally evaluated its performance against a blind test set. We report results of an annotation study, a formative evaluation, and a summative evaluation.
Annotation Study: We report the inter-annotator agreement using F1-score, the harmonic mean between sensitivity (recall) and positive predictive value (precision) by concept between annotators on the development set.
Formative evaluation: For each carotid stenosis finding identified by ConText, using eHOST (extensible Human Oracle Suite of Tools), we measured F1-score for finding mentions and contextual descriptors to compare ConText annotations against manual annotations on the tuning set.
Summative evaluation: For each abstraction from our test set, we calculated the accuracy of the NLPbased application by dividing the number of correct abstractions by the total number of abstractions made by the human chart reviewers. We also evaluated how well the algorithm correctly asserted whether a report had no stenosis, insignificant stenosis, or significant stenosis applying Cohen's, sensitivity (recall), positive predictive value (PPV), specificity (true negative rate), and negative predictive value (NPV).
Annotation Study: From the development set, we observed high inter-annotator agreement between annotators for carotid stenosis finding mentions (88%) and directly related contextual descriptors e.g., neurovascular anatomy (94%), sidedness (89%), and severity (89%). Lower agreement was found on the existence indicator (74%), probably because this entails identifying expressions of uncertainty - a task that notoriously achieves moderate agreement. These results demonstrate that our schema is reliable.
Formative evaluation: From the tuning set, we observed moderate performance of ConText across all concepts - neurovascular anatomy (66%), finding (55%), sidedness (49%), severity (48%), existence indicator (33%). An error analysis revealed the need to develop more sophisticated pre-processing techniques to help localize ConText's information extraction efforts e.g., a section tagger for identifying impression sections and a template parser for extracting concepts from embedded tabular structures e.g., headings/subheadings.
Summative evaluation: From the blind test data of 103 patient reports containing 206 abstractions (1 for each side), we observed a distribution of 158 (77%) no stenosis, 36 (17%) insignificant, and 12 (6%) significant carotid stenosis findings. ConText predicted a distribution of 170 (83%) no stenosis, 22 (11%) insignificant, and 14 (6%) significant carotid stenosis findings. We report an overall accuracy of 89% and Cohen's kappa of 69%. For no stenosis, we observed high sensitivity (99%), high positive predictive value (92%), moderate specificity (73%), and high negative predictive value (97%). For insignificant stenosis, we observed moderate sensitivity (61%), high positive predictive value (100%), high specificity (100%), and high negative predictive value (92%). For significant stenosis, we observed moderate sensitivity (42%), low positive predictive value (36%), high specificity (95%), and high negative predictive value (96%).
ConText had 22 total errors caused by 14 due to tabular structures, 5 due to insufficient regular expressions, and 3 errors due to insufficient scope. Common misclassifications according to assertion by ConText/RS include: 8 (4%) significant/insignificant, 7(~3%) no stenosis/significant, 6 (~3%) no stenosis/insufficient, and 1 (<1%) significant/no stenosis. These results demonstrate that ConText could be useful for filtering out negative (no stenosis) carotid stenosis reports. However, template/tabular structures must be addressed to more accurately correctly parse and aggregate finding concepts for predicting insignificant and significant assertions.
This research supports a Stroke QUERI goal: Develop, evaluate, and integrate interventions to improve risk factor control among veterans at high risk of stroke. Carotid stenosis, a high risk factor for stroke, remains relatively unstudied because there is no feasible method of identifying a population with this risk factor. A successful NLP algorithm to identify a population with carotid stenosis will allow researchers to finally design and conduct feasible studies about the management of stroke and to assess comparative effectiveness for veterans, in particular. Our initial results suggest ConText could aid health service researchers in the identification of patients with significant carotid stenosis.
- Mowery DL, Franc D, Ashfaq S, Zamora T, Cheng E, Chapman WW, Chapman BE. Developing a Knowledge Base for Detecting Carotid Stenosis with pyConText. Presented at: American Medical Informatics Association Annual Symposium; 2014 Nov 1; Washington, DC.