HIR 09-005
Consortium for Health Care Informatics Research: Information Extraction
Qing Zeng, PhD VA Salt Lake City Health Care System, Salt Lake City, UT Salt Lake City, UT Funding Period: September 2010 - March 2015 |
BACKGROUND/RATIONALE:
The overall goal of this project is to advance methods for automated concept extraction and interpretation from electronic clinical records text fields within a collaborative research environment. Achieving this goal will require formal domain ontologies and existing methodologies integrated with novel approaches. New or hybrid natural language processing (NLP) tools, modules, and algorithms will expand and improve the information extraction methods suitable for implementation within a pipeline-based architecture. OBJECTIVE(S): 1.Expand the available natural language processing (NLP) methods and modules in the areas of contextual understanding, machine learning and evaluation by adapting existing open-source NLP modules, and developing new algorithms, modules and platforms to improve accuracy and ease of use. 2.Collaborate with CHIR investigators, other VA investigators, and the wider NLP community to develop a set of standards to support interoperability among NLP modules and systems. METHODS: We developed an ontology and a common data model to support the interoperability among NLP modules and systems. To improve the NLP productivity, the Information extraction method team has developed individual tools and integrated the tools in various way. We used the ontology and common data model to facilitate the integration of individual modules. FINDINGS/RESULTS: Aim 1: Identify and prioritize challenges in information extraction methods according to VA clinical relevance and scientific significance. We examined the information extraction (IE) needs in the context of MRSA and PTSD use cases. We also identified challenges in several other clinical use cases including homelessness identification and symptom extraction. Aim 2: Develop new NLP methods and modules to address the challenges identified in Aim 1. v3NLP (Salt Lake) and ARC (Boston) are novel, user-friendly NLP clinical information extraction tools. Voogo, the information retrieval developed by the Salt Lake team, allows easy access and review of text data in conjunction with coded data. San Diego built Onyx, a data extraction tool with speech recognition that builds medical charts in real time as a clinician dictates into a microphone. ORBIT informatics software registry (Boston) is hosted by the NLM. Aim 3: Develop a set of standards to support interoperability among NLP modules and systems. We created a "CDA+" drafted NLP software standards (Salt Lake) to allow "plug and play" NLP pipeline. We also created a common data model. IMPACT: Lack of standards has kept data extraction tools in the laboratory and out of the clinics. NLP software standards allow modules to be moved from one platform to another without reprogramming. Easy-to-use tool sets will allow researchers and clinicians to configure data extraction tools without a programmer. IEM has developed an inventory of tools that will soon be customized to meet the draft NLP standards. ARC alone has been downloaded by over 50 institutions worldwide. ARC provides data selection tools and a "sandbox" for running and configuring the v3NLP framework with "plug and play" modules. Onyx, a novel data extraction tool, uses speech recognition with machine-learning and data-mining. The ORBIT central registry for medical informatics tools will soon be hosted by the National Library of Medicine (NLM). External Links for this ProjectDimensions for VADimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.Learn more about Dimensions for VA. VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address. Search Dimensions for this project PUBLICATIONS:Journal Articles
DRA:
Health Systems
DRE: Technology Development and Assessment, Research Infrastructure Keywords: none MeSH Terms: none |