The service-directed research proposal entitled "Protecting Warfighters using Algorithms for Test Processing to Capture Health Events (Pro - WATCH)" focused on the post-deployment health status of Iraq and Afghan veterans. We proposed to develop and implement informatics tools to monitor, detect, and prevent health problems in the deployed veteran populations. Our work was intended to directly inform the efforts of clinical programs for deployed veterans and to complement other studies of post-deployment health, including the Millennium Cohort study.
The specific aims this study were to: (1) Use natural language processing (NLP) techniques to extract information about symptoms and related concepts from Veterans Affairs (VA) ambulatory progress notes; (2) develop and validate algorithms to examine phenotypes of deployed Veterans with syndromic diagnoses (3) characterize the epidemiology of medically-unexplained syndromes in deployed Veterans.
Annotations were performed by human reviewers to establish reference standards for training NLP systems to extract concepts. A total of 950 documents were annotated for presence of symptom concepts. In addition, 1200 documents were reviewed for the presence of adverse childhood experiences and suicide gestures and 3500 documents were reviewed to support extraction of psychosocial concepts.
Concept extraction was performed using a mix of key word matching, rule-based procedures, and statistical techniques. Machine learning was used to distinguish concepts that were positively asserted and experienced by the patient from concepts that were negated or not experienced by the patient. As described in more detail in "findings", a variety of technical advances were made to address text processing challenges such as the extensive use of templates in VA notes. Additional improvements in the NLP system were made to enhance its scalability for processing extremely large text corpora.
Epidemiological analyses were conducted of Veterans deployed during Operation Enduring Freedom/Operation Iraqi Freedom/Operation New Dawn (OEF/OIF/OND) and the Persian Gulf War (Operation Desert Storm). Poisson regression was used to examine healthcare utilization among deployed Veterans who carried a diagnosis of fibromyalgia. Logistic regression was used to characterize the relationship between adverse childhood experiences and a variety of types of diagnoses, including mental health disorders and medically unexplained syndromes.
Our report on the project findings is divided into 3 sections:
1) Development of the ontological framework
2) Technical advances in informatics methods for concept extraction
3) Epidemiological analyses of chronic multi-symptom illness and mental health syndromes
1) Ontological framework (related to the specific aims 1 and 2)
The defining characteristic of medically unexplained syndromes is that the underlying pathological processes are not well understood. Medically unexplained syndromes are similar to mental health diseases in that tests applied to laboratory or tissue specimens are only useful insofar as they exclude other conditions. We demonstrated that even with these limitations, existing realist ontologies provided a sufficient framework to represent medically unexplained syndromes.
Our initial analysis of the ways that symptoms were described in clinical notes revealed significant use of lay expressions that were sometimes written in the patient voice; these vivid statements often involved use of metaphors or similes (e.g., "my head felt like it was hit with a hammer"). We showed that these lay expressions were rarely represented in ICD9 codes. Our analysis of symptom descriptions in clinical notes also established that symptom terms were distributed across different semantic categories within the UMLS Metathesaurus. We improved the accuracy of dictionary look-up of symptoms by systematically identifying activity concepts within UMLS that represented symptoms when coupled with appropriate modifiers. For example, an activity such as "sleep" is a symptom when coupled with the modifier "poor" but not a symptom when included in the phrase "sleeps well". This information was incorporated into the machine learning algorithms that were used to improve the accuracy of dictionary look-up.
We also determined that existing methods to group symptoms into organ system categories were insufficient. We implemented a novel method to map symptoms into anatomically related categories. Starting from root concepts within the UMLS Metathesaurus, ancestor-descendent relationships were traversed to find symptoms and signs within a given organ system. Superfluous concepts were removed by blocking paths that deviated from the organ system. Our method to map symptoms to organ systems was used to categorize a total of 115,000 concepts. Overall, 90% of 668 unique symptom and sign concepts in a 750 document corpus were correctly mapped to their organ systems.
2) Technical advances in informatics methods for concept extraction (related to the specific aims 1 and 2)
Symptom extraction: The NLP pipeline for symptom extraction consisted of a series of component parts each assigned to a specific task. New modules were developed to address the distinct challenges associated with symptom extraction. We implemented a novel method to identify hierarchically nested sections within clinical documents. A "clinical document sections ontology" was created, consisting of more than 1000 concepts, to represent section headers, sections, and properties, as well as parent-child relationships. Sections in clinical documents were annotated by applying an algorithm to generate an initial parsed structure, correct errors in the structure, and then produce a final output. The algorithm was implemented as a module within the symptom extraction pipeline.
A novel approach to extract templates from VA notes was developed based on a copy-plagiarism algorithm. The algorithm was designed to identify novel templates even when portions of the template varied from document to document. A canonical signature was determined for each template in order to separate the boilerplate part of the template from the text added by providers in the form of answers to questions. Six commonly used templates were successfully extracted from a 750 document corpus; the prevalence of use of these templates in the corpus ranged from 1.3% to 11.2%.
The performance of the NLP pipeline for symptom extraction was evaluated against as a human-annotated reference corpus of 950 VA electronic clinical notes that contained 7676 positively asserted symptoms. The best performing Na ve Bayes machine learning model with 10-fold cross validation yielded a precision of 0.94, recall of 0.89 and an F-measure of 0.91 for identifying non asserted symptoms; the metrics were precision 0.66, recall 0.79 and an F-measure of 0.71 for positively asserted symptoms.
Scalability: We developed a novel dictionary look-up algorithm to increase the speed of the NLP pipeline. The performance of this new algorithm was compared to two other NLP pipelines, MetaMap and cTAKES; it was demonstrated to be considerably faster in processing documents and similar or better in overall accuracy. Leo, a set of functionalities simplifying the UIMA-Asynchronous Scale-out tools for NLP pipelines was also developed by SLC. The simplification functionalities within Leo were employed in the Symptom pipeline.
We demonstrated the ability to scale the NLP pipeline to large text corpora for several use cases related to symptom extraction. For example, we extracted psychosocial concepts from a corpus of 316,355 high yield clinical documents. The module demonstrated excellent performance, with a precision of 80%. More recently, the psychosocial concepts were extracted from a corpus of 9,000,000 documents. Other large corpora have been processed using this NLP pipeline, including _ notes from OEF/OIF/OND Veterans.
An alternative method for large scale text mining was employed to extract information about adverse childhood experiences and suicide gestures. This method involved the use of the machine learning system called the Automated Retrieval Console to classify text snippets that contained key words related to target concepts such as "childhood physical abuse". Recall and precision ranged from 0.64 to 0.90 for different categories of adverse childhood experiences. A total of 44 million documents were processed using this method.
3) Epidemiological analyses (related to the specific aim 3)
Diagnoses that are considered to examples of medically unexplained syndromes include fibromyalgia, chronic fatigue syndrome, and irritable bowel syndrome. Because symptom constellations experienced by deployed Veterans commonly overlap, the term "chronic multi-symptom illness" (CMI) came into use in studies that tracked the prevalence and incidence of unexplained symptoms in Gulf War Veterans. We conducted an analysis of CMI in female Veterans deployed to Iraq and Afghanistan during Operation Enduring Freedom/Operation Iraqi Freedom/Operation New Dawn (OEF/OIF/OND)
The prevalence of at least one of the three CMI diagnoses described above (fibromyalgia, chronic fatigue syndrome, and irritable bowel syndrome) was 2.1 fold higher among female Veterans deployed during OEF/OIF/OND compared to all female Veterans accessing VHA care (8.2% versus 3.9%). The fibromyalgia diagnosis had the highest prevalence (5.1%) followed by irritable bowel syndrome (3.5%) and chronic fatigue syndrome (0.4%). A diagnosis of post-traumatic stress disorder (PTSD) was 43% among women Veterans with a CMI diagnosis and 19.5% among women Veterans without a CMI diagnosis.
Utilization of primary care, mental health, and rheumatology sub-specialty care was examined in more detail in female Veterans who carried a fibromyalgia diagnosis. The index diagnosis was most frequently made in the primary care setting. However, most female Veterans diagnosed with fibromyalgia received combined care, that is, they were also followed in mental health or rheumatology clinics or both. Receipt of combined care was associated with increased prescribing of both opioid and non-opioid pain medications.
We performed an epidemiological analysis of the relationship between adverse childhood experiences and mental health syndromes, including suicide gestures, in Gulf War Veterans. The number of documented adverse childhood experiences was found to correlate strongly with suicide gestures, depression, anxiety, PTSD, psychosis, personality disorder, and bipolar disorder. Weaker correlations were also observed between adverse childhood experiences and the CMI diagnoses of myalgia and irritable bowel syndrome. These associations with mental disorder and CMI diagnoses remained statistically significant after accounting for potential sources of information bias.
By examining symptoms and symptom clusters of post-deployed OEF/OIF Veterans, VA will have the ability to continually assess the health status and health care utilization of OEF/OIF Veterans. Specifically, VA will be able to measure the prevalence of MUS, identify symptoms that are related to combat-related exposures, and identify co-morbid conditions. By knowing this information, VA can provide more comprehensive care to the patient vs. reacting and treating individual symptoms.
External Links for this Project
- Jones B, Gundlapalli AV, Jones JP, Brown SM, Dean NC. Admission decisions and outcomes of community-acquired pneumonia in the homeless population: a review of 172 patients in an urban setting. American journal of public health. 2013 Dec 1; 103 Suppl 2:S289-93. [view]
- Mohanty AF, Muthukutty A, Carter ME, Palmer MN, Judd J, Helmer D, McAndrew LM, Garvin JH, Samore MH, Gundlapalli AV. Chronic multisymptom illness among female Veterans deployed to Iraq and Afghanistan. Medical care. 2015 Apr 1; 53(4 Suppl 1):S143-8. [view]
- Mohanty AF, Helmer DA, Muthukutty A, McAndrew LM, Carter ME, Judd J, Garvin JH, Samore MH, Gundlapalli AV. Fibromyalgia syndrome care of Iraq- and Afghanistan-deployed Veterans in Veterans Health Administration. Journal of rehabilitation research and development. 2016 Feb 1; 53(1):45-58. [view]
- Fortenberry KT, Berg CA, King PS, Stump T, Butler JM, Pham PK, Wiebe DJ. Longitudinal trajectories of illness perceptions among adolescents with type 1 diabetes. Journal of Pediatric Psychology. 2014 Aug 1; 39(7):687-96. [view]
- Toth DJ, Gundlapalli AV, Schell WA, Bulmahn K, Walton TE, Woods CW, Coghill C, Gallegos F, Samore MH, Adler FR. Quantitative models of the dose-response and time course of inhalational anthrax in humans. PLoS pathogens. 2013 Aug 1; 9(8):e1003555. [view]
- DeLisle S, Kim B, Deepak J, Siddiqui T, Gundlapalli A, Samore M, D'Avolio L. Using the electronic medical record to identify community-acquired pneumonia: toward a replicable automated strategy. PLoS ONE. 2013 Aug 13; 8(8):e70944. [view]
- Divita G, Carter ME, Tran LT, Redd D, Zeng QT, Duvall S, Samore MH, Gundlapalli AV. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text. EGEMS (Washington, DC). 2016 Aug 11; 4(3):1228. [view]
- Gundlapalli AV, Samore MH, Palmer M, Tuteja AK, Carter M, Shen S, South B, Forbush T, Divita G. Annotation of Symptoms in VA Clinical Documents. Poster session presented at: Integrating Data for Analysis, Anonymization, and Sharing Annual Conference; 2012 Sep 29; La Jolla, California. [view]
- Meystre S, Samore MH. Domain and Application Ontologies for Medically Unexplained Syndromes. Paper presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL. [view]
- Zeng Q, Samore MH, Divita G. Finding Medically Unexplained Symptoms within VA Clinical Documents using v3NLP. Poster session presented at: International Society for Disease Surveillance Annual Conference; 2011 Dec 7; Park City , UT. [view]
- Palmer M, South B, Shen S, Tuteja AK, Divita G, Samore MH, Gundlapalli AV. Identification and Classification of Medically Unexplained Symptoms in VA Clinical Documents. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 16; National Harbor, MD. [view]
- Samore MH, Nelson R. Screening for Homelessness in the Free Text of VA Clinical Documents using Natural Language Processing. Poster session presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; National Harbor, MD. [view]
- Forbush T, Gundlapalli AV, Palmer M, Shen S, South B, Divita G, Carter M, Redd AM, Butler J, Samore MH. Sitting on Pins and Needles. Paper presented at: American Medical Informatics Association Spring Congress; 2012 Mar 20; San Francisco, CA. [view]
- South B, Palmer M, Shen S, Divita G, DuVall SL, Samore MH, Gundlapalli AV. Using Clinician Mental Models to Guide Annotation of Medically Unexplained Symptoms and Syndromes found in VA Clinical Documents. Paper presented at: International Society for Disease Surveillance Annual Conference; 2011 Dec 7; Park City, UT. [view]
- South B, Palmer M, Shen S, Divita G, DuVall SL, Samore MH. Using Clinician Mental Models to Guide Annotation of Medically Unexplained Symptoms and Syndromes found in VA Clinical Documents. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 16; National Harbor, MD. [view]