National Meeting 2007

3089 — Use of Natural Language Processing Techniques to Extract Surveillance Information from Diverse Types of CPRS Notes

South BR (Salt Lake City TREP, University of Utah) , Phansalkar S (Salt Lake City TREP, University of Utah), Swaminathan AD (University of Utah), Delisle S (VA Maryland Health Care System, University of Maryland), Perl T (Johns Hopkins University), Samore MH (Salt Lake City TREP, University of Utah)

We applied natural language processing (NLP) methods to diverse types of CPRS free-text notes to enhance case detection for influenza-like-illness (ILI). Two types of NLP techniques were compared with respect to accuracy, information extracted, and ease of use.

A randomized sample of 15,377 outpatient encounters from the VA Maryland Health Care system and VA Salt Lake City Health Care system were randomly selected for chart review during the study period 10/01/03 to 3/31/04. Cases were identified using an explicit definition of ILI based on CDC criteria. Following chart review all positive cases and a 14% random sample of all negative cases were selected for text processing. ILI concepts from the case definition were mapped to a standard vocabulary using the UMLS Metathesaurus. Two text processing methods were applied to CPRS notes; string matching using UMLS concepts in conjunction with a negation algorithm called NegEx and an NLP system called MedLEE. For both methods, presence of two unique non-negated concepts in the same note denoted ILI. False positive cases were further reviewed to identify negation problems encountered by each case detection method.

False positive cases were primarily caused by non-negated concepts found in templated note sections. Modifications were made in our adapted version of the NegEx algorithm and in MedLEE pre-processing steps to improve text processor performance. Additionally, we were able to tailor the NegEx algorithm to identify negations in specific note templates unique to each VA facility. Predictive values in terms of sensitivity, specificity, and positive predictive value for final text based case detection models were: NegEx (88%,94%,21%) and MedLEE (91%,91%,16%). NLP processing provides additional information including demographic factors, duration of illness, and prior exposure to infection.

Application of text processing methods on CPRS notes provides high specificity and positive predictive value for detection of ILI cases. NLP processing provides additional information not available in structured data.

Note templating is pervasive in VA free-text clinical documents. Adapting text processing techniques for use on a diverse array of note types and templates expands the capabilities of CPRS for decision support applications.