Pro-WATCH: Epidemiology of Medically Unexplained Syndromes
Matthew H. Samore MD
VA Salt Lake City Health Care System, Salt Lake City, UT
Salt Lake City, UT
Funding Period: September 2010 - August 2014
More than 2 million members of the armed services have been deployed in Afghanistan or Iraq since 2001. Operation Enduring Freedom (OEF) and Operation Iraqi Freedom (OIF) veterans are experiencing a wide variety of health problems related to deployment. Although veterans of previous wars have experienced a variety of chronic, unexplained symptoms, relatively little is known about the prevalence of medically unexplained symptoms and syndromes (MUS) in OEF/OIF veterans.
The objectives of this study are to: (a) Use natural language processing (NLP) techniques to extract information about symptoms from Veterans Affairs (VA) ambulatory progress notes; (b) Validate an algorithm to detect the presence of a MUS, using responses to symptom questionnaires as the reference standard; and (c) Apply automated algorithms to national VA data to assess variation in prevalence of MUS by year, region, deployment exposure, blast injury, age, and co-morbid illness, including post-traumatic stress disorder (PTSD).
Tools were developed using NLP to support semi-automated ontology creation. Design and development included input from multiple ontology experts. Informatics ontologies for IBS, fibromyalgia, and chronic fatigue were created using the new tools.
A symptom tool was trained using a three-step process. Step one was annotation of randomly selected Veteran documents to establish a reference standard. A team of four reviewers was trained for the annotation task. Step two was using the reference standard to train a NLP pipeline to identify positive assertions of symptoms. To maximize recall and precision, different negation algorithms and machine learning methods were incorporated into the NLP pipeline and compared. Step three was to apply the NLP pipeline to the entire corpus.
Currently the instances of annotations are being human reviewed for validity. An additional 250 documents are being annotated to increase the instances of symptoms in the reference standard. Additional methods are being applied to the NLP pipeline to improve the symptom recognition.
An algorithm to induce the presence of MUS will be trained and validated using a reference standard consisting of Veteran responses to a symptom questionnaire and clinician reviewed records. A patient level record review tool is being built to assist in the clinician review process.
The validated algorithm will be used to analyze the epidemiology of MUS using all available electronic records in VINCI on OEF/OIF veterans. The prevalence of MUS will be reported including specific prevalence for chronic fatigue syndrome, irritable bowel syndrome, and fibromyalgia. Multivariable mixed effect Poisson regression models will be used to determine independent contributions of year, region, and duration of deployment.
Health data on 856,815 Veterans who had been deployed in Iraq or Afghanistan between October 2001 and October 2011 were analyzed. The corpus of text data for the OEF/OIF Veterans included 46 million clinical documents that belonged to the Text Integration Utility (TIU) note type.
A reference standard of positively asserted symptoms has been completed. 750 documents were reviewed using human annotation, 5,572 symptoms were identified. A lexicon of symptoms has been compiled using the symptoms identified by annotation. Within each document, subjective symptom expressions were compared to assertions of symptoms in clinical terms and to the assigned ICD-9-CM codes for the encounter. A total of 543 subjective symptom expressions were identified, of which 66.5% were categorized as mental/behavioral experiences and 33.5% somatic experiences. Machine learning for symptoms is complete. An ontology development tool has been created and pilot tested. Ontologies have been developed for IBS, Fibromyalgia, and Chronic Fatigue.
It has been determined that it is necessary to annotate an additional 250 documents to be used in the reference standard. The documents have just recently begun to be annotated by human annotators. The original 750 documents are also being re-reviewed for accuracy of the symptom findings. Once all annotation is completed, the algorithm will be tested again on the test set of documents.
By examining symptoms and symptom clusters of post-deployed OEF/OIF Veterans, VA will have the ability to continually assess the health status and health care utilization of OEF/OIF Veterans. Specifically, VA will be able to measure the prevalence of MUS, identify symptoms that are related to combat-related exposures, and identify co-morbid conditions. By knowing this information, VA can provide more comprehensive care to the patient vs. reacting and treating individual symptoms.
DRA: Autoimmunity and Allergy, Cardiovascular Disease, Military and Environmental Exposures
DRE: Research Infrastructure, Epidemiology, Diagnosis
Keywords: Clinical Diagnosis and Screening, Healthcare Algorithms, Information Management, Knowledge Integration, Natural Language Processing, Reintegration Post-Deployment, Risk Factors, Surveillance
MeSH Terms: none