Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

CRE 12-315 – HSR Study

CRE 12-315
A VHA NLP Software Ecosystem for Collaborative Development and Integration
Qing Zeng, PhD
Washington DC VA Medical Center, Washington, DC
Washington, DC
Funding Period: December 2013 - November 2019
The VA has invested hugely in electronic medical records and has achieved a nationwide system that collects medical information from all patients. Currently, the textual information in the medical records is inaccessible to all but a small number of researchers. In order to obtain the highest value from this existing system, researchers need to be able to access the textual information they need. Clinical natural language processing (NLP) is an important part the solution.

The value of NLP has been recognized in the biomedical domain. However, the general consensus in the informatics community is that processing and utilizing textual data remains challenging due to lack of interoperability and collaboration. Although synergistic development has the promise of advancing the science of NLP and accelerating the pace of NLP tool production, there lacks a vibrant collaborative environment attracting participation of a significant number of clinical NLP developers and researchers. We have created a prototype NLP ecosystem called V3NLP that supports the interoperability and integration of heterogeneous tools into VA research and operational initiatives. However, the environment needed to foster collaboration and a critical mass of users is limited.

In the proposed project, we will study the needs of existing and potential users of the V3NLP ecosystem to increase its utility and ease of adoption and to facilitate collaboration.

1. Collect and analyze the needs of NLP developers, health informatics researchers and health services researchers to inform the design of a collaborative NLP ecosystem that will facilitate development of more accurate methods.
2. Design and implement a clinical NLP ecosystem that fosters collaboration and accelerates research and adoption of accurate and generalizable NLP methods.
3. Conduct a comprehensive sublanguage analysis to guide the creation of adaptable NLP tools and methods based on VA text notes to support text processing and information extraction across multiple VA clinical domains.

First, workshops will be organized to identify a consensus development environment to support a clinical NLP ecosystem and identify NLP software requirements for health services researchers and clinicians for point-of-care. Workshop attendees will include NLP developers who implement, adapt, and debug NLP methods and systems, NLP researchers who design and evaluate NLP methods and systems, clinical informaticians who select and use NLP methods and systems, and health services researchers and providers who would be end users of NLP. Second, we will take the knowledge gained from the needs analysis workshops and refine and extend the V3NLP system to create a clinical NLP ecosystem. Specially, we will refine the existing functions in V3NLP, develop a new collaborative environment, and develop benchmarking support. Finally, we will develop a sublanguage model to guide the creation of high priority NLP functionalities.

Aims 1 and 2 are largely complete. The prototype Ecosystem has been developed and launched in a test account available to informatics researchers. Numerous NLP tools and an extensive bibliography are loaded on the platform. We designed the Ecosystem to facilitate collaboration with stakeholders, validate NLP systems, and disseminate tools, datasets and information. Our frameworks make it feasible to process text in extremely large corpora, referred to as scale-out functionality. We are using the results of analysis of semi-structured interview data and workshop discussions to inform the next iteration of the Ecosystem.
Aim 3 is in progress. UMLS Concepts and bi-grams were extracted from a corpus of 1,000,000 documents. We are adjusting our analytical methods to manage the extremely large size of the data files. We are collaborating with the other projects in the CREATE with shared personnel and tools shared from NLP other groups. We provided NLP tools to Puget Sound GRECC's clinical implementation project on Early Detection of Dementia. We will provide NLP consultation to the Baltimore GRECC patient safety project.

The ultimate goal of an NLP ecosystem is to produce new and more accurate NLP methods for clinical text. This requires a good understanding of the characteristics of various types of clinical text and the strengths and weakness of existing methods. The proposed ecosystem has the potential to advance NLP science and accelerate the pace of NLP tool production. Furthermore, the ecosystem will reduce the cost of re-use and aid in the rapid development of novel NLP techniques.
The research team has been assessing the impact of sublanguage analysis for machine learning. The data we have gathered will be an important resource for the broader NLP community. The current sub language analysis tasks: medical concept and word frequency trend analysis has both immediate and longer term utility. The data collected to produce the analysis (document and term frequencies) has utility for information retrieval tasks such as search engines, and has utility in tuning search engines through identifying context that words and concepts are used within. The sublanguage analysis intends to propagate questions induced by underlying shifts in word usage for surveillance, policy, adoption and utilization purposes.

External Links for this Project

NIH Reporter

Grant Number: I01HX001145-01

Dimensions for VA

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

Learn more about Dimensions for VA.

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
    Search Dimensions for this project


None at this time.

DRA: Health Systems
DRE: Diagnosis, Technology Development and Assessment
Keywords: Data Visualization, Decision Support, Healthcare Algorithms, Information Management, Natural Language Processing, Personal Health Record, Qualitative Methods, Research Tools, Technology Development
MeSH Terms: none

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.