Ischemic stroke is an important cause of morbidity, mortality, and cost within the VHA. An accurate assessment of who has had a stroke is a vital part of many ongoing VHA research and quality improvement efforts. However, current approaches to ascertaining this outcome have important limitations. Chart review, while accurate, is extremely resource intensive, thus limiting the size of projects relying upon it. Approaches using administrative data (i.e. ICD-9 codes) are less resource intensive, but less accurate. We propose a novel approach to ascertaining ischemic strokes from VHA automated data using natural language processing and machine learning. Natural language processing and machine learning use artificial intelligence principles to "teach" a computer how to extract meaningful data from free-text notes with acceptable sensitivity and specificity.
Aim 1: Develop and evaluate an automated approach to the extraction of ischemic stroke from the free text of VHA medical records.
Aim 2: Deploy this empirically evaluated approach upon an existing database of over 200,000 patients who have received warfarin (an oral anticoagulant) from the VHA, and are thus at elevated risk for ischemic stroke.
We began by creating a training set of documents to use for training the NLP algorithm. All of our documents were drawn from patients who received warfarin from VHA during FY08; we specifically selected document types likely to contain a mention of stroke. Our final training set consisted of 300 true positive documents which contained one or more mentions of a stroke, and 1408 true negative documents which did not contain any mention of a stroke. The true positive documents were heavily annotated by two physician chart reviewers; through extensive discussions, the two reviewers reached consensus and produced a final, consensus set of 300 harmonized, annotated documents. These concept-level annotations included identification of each text string that mentioned stroke. In addition, the annotations captured several parameters for each utterance, including multiplicity (one vs. more than one stroke event), whether the stroke was ischemic or hemorrhagic, the speaker's degree of certainty regarding stroke (certainly yes, probably yes, possibly, probably no, certainly no), negation ("did not have a stroke"), and hypothetical formulations ("to prevent a stroke in the future"). The 1408 true negative documents were also dually reviewed by both physicians; both agreed that each document did not suggest a history of stroke. In theory, using a set of fully-annotated true positive notes and a larger set of confirmed true-negative notes should be sufficient for training an NLP algorithm. We fed these documents into our NLP system, which consisted of cTAKES 1.1 and ARC 2.0. We first tested the concept-level performance of the algorithm internally on the same set of 300 annotated positive notes and 1408 negative notes. We then tested their concept-level performance on an unselected set of 2000 patients who received anticoagulation from VHA during FY08. Statistics used to evaluate performance included recall (analogous to sensitivity), precision (analogous to positive predictive value), and F-Score (a harmonic mean reflecting both recall and precision). Our prespecified goal was to achieve a precision of over 90% using the unselected (external) validation set.
Our results for internal validation (within the training set), at the paragraph level, were as follows. Identification of any instance of stroke had a recall of 0.78, precision of 0.92, and F-Score of 0.84. Identification of ischemic stroke (as opposed to hemorrhagic) had a recall of 0.70, a precision of 0.83, and an F-Score of 0.77. Identification of negated stroke had a recall of 0.70, a precision of 0.86, and an F-Score of 0.77. The results of external validation were as follows. Among 2000 unselected patients who received warfarin from the VHA in 2008, the expected prevalence of stroke was approximately 10%. However, 50% of patients were identified by our algorithm as stroke-positive, for a projected precision of approximately 20%. This falls far short of our goal of 90% precision. Upon review of these charts, many patients were falsely identified as having stroke due to system difficulties in processing such items as family history, hypothetical utterances, and negated strokes. In addition, highly structured items such as checklists, contained within the note, presented great difficulty for our NLP algorithm.
The main impact of our project at present is to present a cautionary tale to other researchers regarding the many complex issues that must be addressed to identify stroke with high fidelity using NLP. We are preparing a manuscript describing our results to date and lessons that may be learned by others in the field.
- D'Avolio LW. Code-free natural language processing / case finding using the Automated Retrieval Console (ARC) - VA Informatics and Computing Infrastructure. [Cyberseminar]. 2012 Jul 31.