The overall goal of this project is to advance methods for automated concept extraction and interpretation from electronic clinical records text fields within a collaborative research environment. Achieving this goal will require formal domain ontologies and existing methodologies integrated with novel approaches. New or hybrid natural language processing (NLP) tools, modules, and algorithms will expand and improve the information extraction methods suitable for implementation within a pipeline-based architecture.
1.Expand the available natural language processing (NLP) methods and modules in the areas of contextual understanding, machine learning and evaluation by adapting existing open-source NLP modules, and developing new algorithms, modules and platforms to improve accuracy and ease of use.
2.Collaborate with CHIR investigators, other VA investigators, and the wider NLP community to develop a set of standards to support interoperability among NLP modules and systems.
We developed an ontology and a common data model to support the interoperability among NLP modules and systems. To improve the NLP productivity, the Information extraction method team has developed individual tools and integrated the tools in various way. We used the ontology and common data model to facilitate the integration of individual modules.
Aim 1: Identify and prioritize challenges in information extraction methods according to VA clinical relevance and scientific significance.
We examined the information extraction (IE) needs in the context of MRSA and PTSD use cases. We also identified challenges in several other clinical use cases including homelessness identification and symptom extraction.
Aim 2: Develop new NLP methods and modules to address the challenges identified in Aim 1.
v3NLP (Salt Lake) and ARC (Boston) are novel, user-friendly NLP clinical information extraction tools. Voogo, the information retrieval developed by the Salt Lake team, allows easy access and review of text data in conjunction with coded data. San Diego built Onyx, a data extraction tool with speech recognition that builds medical charts in real time as a clinician dictates into a microphone. ORBIT informatics software registry (Boston) is hosted by the NLM.
Aim 3: Develop a set of standards to support interoperability among NLP modules and systems.
We created a "CDA+" drafted NLP software standards (Salt Lake) to allow "plug and play" NLP pipeline. We also created a common data model.
Lack of standards has kept data extraction tools in the laboratory and out of the clinics. NLP software standards allow modules to be moved from one platform to another without reprogramming. Easy-to-use tool sets will allow researchers and clinicians to configure data extraction tools without a programmer. IEM has developed an inventory of tools that will soon be customized to meet the draft NLP standards. ARC alone has been downloaded by over 50 institutions worldwide. ARC provides data selection tools and a "sandbox" for running and configuring the v3NLP framework with "plug and play" modules. Onyx, a novel data extraction tool, uses speech recognition with machine-learning and data-mining. The ORBIT central registry for medical informatics tools will soon be hosted by the National Library of Medicine (NLM).
- Redd D, Frech TM, Murtaugh MA, Rhiannon J, Zeng QT. Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis. Computers in biology and medicine. 2014 Oct 1; 53:203-5.
- Bui DD, Zeng-Treitler Q. Learning regular expressions for clinical text classification. Journal of the American Medical Informatics Association : JAMIA. 2014 Sep 1; 21(5):850-7.
- Hoogenboom WS, Perlis RH, Smoller JW, Zeng-Treitler Q, Gainer VS, Murphy SN, Churchill SE, Kohane IS, Shenton ME, Iosifescu DV. Limbic system white matter microstructure and long-term treatment outcome in major depressive disorder: a diffusion tensor imaging study using legacy data. The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry. 2014 Feb 1; 15(2):122-34.
- Figueroa RL, Zeng-Treitler Q. Text classification performance: is the sample size the only factor to be considered? Studies in health technology and informatics. 2013 Jan 1; 192:1193.
- Scarton LA, Del Fiol G, Treitler-Zeng Q. Completeness, accuracy, and presentation of information on interactions between prescription drugs and alternative medicines: an internet review. Studies in health technology and informatics. 2013 Jan 1; 192:841-5.
- Figueroa RL, Zeng-Treitler Q, Ngo LH, Goryachev S, Wiechmann EP. Active learning for clinical text classification: is it better than random sampling? Journal of the American Medical Informatics Association : JAMIA. 2012 Sep 1; 19(5):809-16.
- Zeng Q, Nebeker JR. Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes. Journal of Health and Medical Informatics. 2011 Dec 26; 10(12):1-9.
- Butler J, Hayden C, Samore MH, Samore MH, DuVall SL, Zeng Q, Gundlapalli AV, Nebeker JR. Qualitative Methods and Text Processing: Complimentarily Connected: oral presentation for workshop. Tools for Exploring and Analyzing Text Data in Health Services Research and Epidemiology. Paper presented at: VA HSR&D / QUERI National Meeting; 2012 Jul 16; Washington, DC.
- Zeng Q, Samore MH, Divita G. Finding Medically Unexplained Symptoms within VA Clinical Documents using v3NLP. Poster session presented at: International Society for Disease Surveillance Annual Conference; 2011 Dec 7; Park City , UT.
- DuVall SL, South B, Shen S, Nebeker JR, Samore MH, Gundlapalli AV. Creating reusable annotated corpora using the clinical document architecture. Paper presented at: Hawaii Annual International Conference on System Sciences; 2011 Jan 5; Koloa, HI.
- Divita G, Zeng Q. A standardization effort to aid interoperability between processing systems. Poster session presented at: International Society for Disease Surveillance Annual Conference; 2010 Dec 1; Park City, UT.
- Divita G, Zeng Q, Meystre S, South B, Shen S, Cornia R, Garvin JH, Nebeker JR, Samore MH. Standardization to aid interoperability between NLP systems. Paper presented at: International Society for Disease Surveillance Annual Conference; 2010 Dec 1; Park City, UT.
- DuVall SL, Ferraro JP. The role of co-reference resolution in outbreak reporting and detection. Paper presented at: Hawaii Annual International Conference on System Sciences; 2010 Jan 5; Koloa, HI.
- DuVall SL, Ferraro JP, Riloff E. The ProMED-mail Coreference Corpus: Outbreak Detection Reports Annotated for Conference Resolution. Paper presented at: International Society for Disease Surveillance Annual Conference; 2009 Dec 3; Miami, FL.
Research Infrastructure, Technology Development and Assessment