Intermittent exposures to chemical or biological agents has been a concern for decades among our nation's military forces because they pose challenges to detection, which can delay treatment for months or years, frequently after the solider has been discharged. Medical data for surveillance has typically been limited to chief complaint and microbiology test result data. However, the majority of reported symptoms are placed into free text within the clinical record, and accessing this type of data is a challenge that may be successfully addressed by natural language processing (NLP) tools. Developing and validating an NLP system for symptom and disease detection algorithms could support automated syndromic surveillance for veterans.
The objectives of this study were to (a) develop a set of rules based upon keywords and SNOMED-CT concepts that capture signs and symptoms suggestive of infectious syndromes and diseases using an NLP tool, and to (b) evaluate the accuracy of these clinical rules to detect examples of infectious diseases relevant to biosurveillance.
A symptom training set (60), a symptom testing set (444), and a disease detection set (216) were randomly selected from VHA VISN-9 emergency department, urgent care, and primary care physician records. All of the documents were parsed and evaluated with a natural language processing system. The symptom data sets were manually reviewed for the occurrence of 18 symptoms associated with three clinical diseases (tuberculosis, influenza, and acute hepatitis). The disease data set was manually reviewed to determine the clinical diagnoses associated with the episode of care. Rules were developed to detect the symptoms in the training set using SNOMED-CT concepts and keywords, and subsequently evaluated using the testing set. The classic symptoms for each disease were evaluated using logistic regression in the disease set to determine if any expected associations could be detected.
The overall performance of the automated symptom detection algorithm was measured with a sensitivity of 0.903 and a positive predictive value of 0.906 (TP=2,399, FP=248, FN=259). The automatic detection algorithm was able to correctly determine negation in 77.3% (1542/1995) of symptoms that were found by both manual and automated means, and expected associations between symptoms and 3 diseases were able to be detected by the system.
There is a high likelihood of biological, chemical, or environmental exposures taking place among OEF/OIF veterans, many of which will present with indolent, chronic, non-specific symptoms that gradually cause morbidity and disability. Developing an informatics and statistical framework within the VA can detect clusters of symptoms among these veterans and instigate further focused evaluation for those symptoms and illnesses. This project demonstrates how codified narrative text from the medical record would enable detection of higher than expected rates of non-specific symptoms and clusters of symptoms well in advance of manual public health official review.
External Links for this Project
None at this time.