The VA has invested hugely in electronic medical records and has achieved a nationwide system that collects medical information from all patients. Currently, the textual information in the medical records is inaccessible to all but a small number of researchers. In order to obtain the highest value from this existing system, researchers need to be able to access the textual information they need. Clinical natural language processing (NLP) is an important part the solution.
The value of NLP has been recognized in the biomedical domain. However, the general consensus in the informatics community is that processing and utilizing textual data remains challenging due to lack of interoperability and collaboration. Although synergistic development has the promise of advancing the science of NLP and accelerating the pace of NLP tool production, there lacks a vibrant collaborative environment attracting participation of a significant number of clinical NLP developers and researchers. We have created a prototype NLP ecosystem called V3NLP that supports the interoperability and integration of heterogeneous tools into VA research and operational initiatives. However, the environment needed to foster collaboration and a critical mass of users is limited.
In the proposed project, we will study the needs of existing and potential users of the V3NLP ecosystem to increase its utility and ease of adoption and to facilitate collaboration.
1. Collect and analyze the needs of NLP developers, health informatics researchers and health services researchers to inform the design of a collaborative NLP ecosystem that will facilitate development of more accurate methods.
2. Design and implement a clinical NLP ecosystem that fosters collaboration and accelerates research and adoption of accurate and generalizable NLP methods.
3. Conduct a comprehensive sublanguage analysis to guide the creation of adaptable NLP tools and methods based on VA text notes to support text processing and information extraction across multiple VA clinical domains.
First, workshops will be organized to identify a consensus development environment to support a clinical NLP ecosystem and identify NLP software requirements for health services researchers and clinicians for point-of-care. Workshop attendees will include NLP developers who implement, adapt, and debug NLP methods and systems, NLP researchers who design and evaluate NLP methods and systems, clinical informaticians who select and use NLP methods and systems, and health services researchers and providers who would be end users of NLP. Second, we will take the knowledge gained from the needs analysis workshops and refine and extend the V3NLP system to create a clinical NLP ecosystem. Specially, we will refine the existing functions in V3NLP, develop a new collaborative environment, and develop benchmarking support. Finally, we will develop a sublanguage model to guide the creation of high priority NLP functionalities.
Aims 1 and 2 are largely complete. The prototype Ecosystem has been developed and launched in a test account available to informatics researchers. Numerous NLP tools and an extensive bibliography are loaded on the platform. We designed the Ecosystem to facilitate collaboration with stakeholders, validate NLP systems, and disseminate tools, datasets and information. Our frameworks make it feasible to process text in extremely large corpora, referred to as scale-out functionality. We are using the results of analysis of semi-structured interview data and workshop discussions to inform the next iteration of the Ecosystem.
Aim 3 is in progress. UMLS Concepts and bi-grams were extracted from a corpus of 1,000,000 documents. We are adjusting our analytical methods to manage the extremely large size of the data files. We are collaborating with the other projects in the CREATE with shared personnel and tools shared from NLP other groups. We provided NLP tools to Puget Sound GRECC's clinical implementation project on Early Detection of Dementia. We will provide NLP consultation to the Baltimore GRECC patient safety project.
The ultimate goal of an NLP ecosystem is to produce new and more accurate NLP methods for clinical text. This requires a good understanding of the characteristics of various types of clinical text and the strengths and weakness of existing methods. The proposed ecosystem has the potential to advance NLP science and accelerate the pace of NLP tool production. Furthermore, the ecosystem will reduce the cost of re-use and aid in the rapid development of novel NLP techniques.
The research team has been assessing the impact of sublanguage analysis for machine learning. The data we have gathered will be an important resource for the broader NLP community. The current sub language analysis tasks: medical concept and word frequency trend analysis has both immediate and longer term utility. The data collected to produce the analysis (document and term frequencies) has utility for information retrieval tasks such as search engines, and has utility in tuning search engines through identifying context that words and concepts are used within. The sublanguage analysis intends to propagate questions induced by underlying shifts in word usage for surveillance, policy, adoption and utilization purposes.
- Divita G, Carter ME, Tran LT, Redd D, Zeng QT, Duvall S, Samore MH, Gundlapalli AV. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text. EGEMS (Washington, DC). 2016 Aug 11; 4(3):1228.
- Redd D, Kuang J, Mohanty A, Bray BE, Zeng-Treitler Q. Regular Expression-Based Learning for METs Value Extraction. AMIA Summits on Translational Science proceedings. 2016 Jul 20; 2016:213-20.
- Divita G, Carter M, Redd A, Zeng Q, Gupta K, Trautner B, Samore M, Gundlapalli A. Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes. Methods of Information in Medicine. 2015 Nov 4; 54(6):548-52.
- Murtaugh MA, Gibson BS, Redd D, Zeng-Treitler Q. Regular expression-based learning to extract bodyweight values from clinical notes. Journal of Biomedical Informatics. 2015 Apr 1; 54:186-90.
- Garvin JH, Zeng Q, Coronado G, Redd D, Kelly N. Preliminary Results from formative Evaluation. Poster session presented at: VA HSR&D Field-Based Partners, Agenda, Implementation, and Dissemination of VA Informatics Research Meeting; 2016 Sep 29; Indianapolis, IN.