Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website

Consortium for Healthcare Informatics Research (CHIR)

The Consortium for Healthcare Informatics Research (CHIR) develops methods in natural language processing and makes information available that is currently stored as free-text in the VA electronic health record (EHR). The research conducted by CHIR and supported by VINCI will unleash the information content of EHRs to advance knowledge that improves the care of Veterans.

Mission of CHIR

CHIR advances the effective use of unstructured text and other types of clinical data in the EHR to improve the health of Veterans.

CHIR is Multi-disciplinary

CHIR is a multi-disciplinary group of collaborating investigators who lead multiple, inter-related projects. CHIR's headquarters are in Salt Lake City, Utah. Additional research teams are located at VA medical centers distributed across the U.S., including Portland, Palo Alto, San Diego, Indianapolis, Nashville, Tampa, West Haven, Boston, and Pittsburgh. The academic institutions affiliated with each of these VA hospitals serve as research partners.

Natural Language Processing (NLP)
Natural language processing uses algorithms to automatically structure and extract relevant information from unstructured free text. CHIR investigators test and improve methods to model complex datasets to better identify clinically relevant patterns, such as previously undiscovered relationships between treatments, symptoms, and diseases.

Additional Disciplines and Concentration Areas
Other areas represented by CHIR investigators include

  • Knowledge representation
  • Machine learning
  • Biostatistics
  • Clinical epidemiology
  • Applied informatics
  • Health services research

Current Work

Applied projects (using NLP) in clinical domains to translate scientific advances to the bedside and to demonstrate the value of its research activities.

  • Methicillin-resistant Staphylococcus aureus (MRSA) infection
  • Post-traumatic stress disorder (PTSD).
  • Translational use case projects

Methodologically oriented research to development of novel methods to overcome current challenges or the in-depth critique of existing methods.

  • De-identification
  • Information extraction
  • Inference and modeling

Short-term projects and pilot studies

  • Automated Extraction of Ejection Fraction from VA Free Text Data
  • Extraction of lymph node status from VA pathology reports

Research integration

  • Document quality
  • Annotation
  • Evaluation


There are a number of research tools being developed by CHIR researchers.
  • ORBIT (
    • Online Registry of Biomedical Informatics Tools (ORBIT) Project was created to provide researchers and developers with a single, simple way to register and find open source software resources. ORBIT was formed by and for biomedical informatics researchers, leading to features we hope will be useful for others in the community.
  • Extensible Human Oracle Suite of Tools (eHOST)
    • A prototype annotation system that provides an open-source stand-alone client for manual annotation of clinical texts.
      • Leng J, Shen S, Gundlapalli A, South B. The Extensible Human Oracle Suite of Tools (eHOST) for Annotation of Clinical Narratives. Poster session presented at: American Medical Informatics Association Spring Congress; 2010 May 25; Phoenix, AZ.
  • Chest X-Ray Device Extractor (CXDE)
    • A tool developed at the VA to identify medical devices and device statuses from free-form text in chest x-ray reports.
      • Wang D, Chambers J, Chambers D, Rubin D, Goldstein M. Training an NLP system for Chest X-Ray Reports. Poster session presented at: American Medical Informatics Annual Symposium; 2010 November 11-17; Washington, DC.
  • v3NLP
    • A suite of information extraction tools that annotate medically relevant concepts from clinical records.
      • Cornia R, Redd D, Zeng-Trietler Q, Derby L, Nebeker J. VNLP: A Service Oriented Architecture for Clinical NLP. Poster session presented at: American Medical Informatics Annual Symposium; 2010 November 11-17; Washington, DC.
  • Automated Retrieval Console (ARCV2.0)*
    • Open-source software designed to improve the processes of information retrieval (e.g., natural language processing, machine learning, information extraction, etc).
      • D'Avolio LW, Nguyen TM, Farwell WR, Chen Y, Fitzmeyer F, Harris OM, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc. 2010 Jul-Aug;17(4):375-82.
  • RapidMiner Components*
    • Components that have been developed and will be made available to the research community including expanded term weighting options; fast attribute normalization; and wrappers around existing programs such as a fast and memory-efficient implementation of Singular Value Decomposition (SVD) and the National Library of Medicine's normalization tool.
  • TrANEMap *
    • A fast, Tree-based named entity recognition engine which can perform dramatically faster than other NER systems while retaining comparable effectiveness.
  • Module to extract microbiology laboratory results
    • This pipeline of processes is cumulative, with each step using information added from previous steps to perform more complex tasks. This system is composed of four general tasks: section identification, organism detection, susceptibility detection, and MRSA inference.
      • Jones, M, DuVall, SL, Spuhl, J, Samore, MH, Nielson, C, Rubin, M. (Submitted). Extraction of Methicillin-Resistant Staphylococcus aureus Data from the Nation's Veterans Affairs Medical Centers. BMC Med Informat Decis Making.
  • Module to extract ejection fraction
    • The EF module searches through echocardiogram reports and extracts information on ejection fraction concepts, quantitative values, qualitative descriptions, and methods used to measure ejection fraction. It then classifies the document as providing evidence for >=40% ejection fraction or < 40% - an indicator used as a quality measure.
      • Garvin JH, South BR, Bolton D, Shen S, Duvall S, Bray B, Heidenreich P, Samore M, Goldstein MK. Automated Ejection Fraction (EF) for Heart Failure (HF) from VA Echocardiogram Reports. Abstract presented at: VA HSR&D National Meeting 2011; 2011, February 16-18; National Harbor, MD.

*Tools that will be available in the near future, not yet available to all researchers

Collaborative Research

Achieving the objectives of both programs requires cooperative research by data specialists, computer programmers, medical informaticians, research physicians, and many other medical and information specialists. VINCI and CHIR bring together a community of researchers that can improve the care of Veterans by integrating surveillance, decision support, and other information technology-based interventions to enhance the effectiveness of patient care.

Contact CHIR

CHIR Principal Investigator
Matthew Samore, MD

CHIR Scientific Program Manager
Jorie Butler, PhD
VA Salt Lake City
Healthcare System
500 Foothill Drive
Salt Lake City, UT 84148
(801) 582-1565 x. 1964

Questions about the HSR&D website? Email the Web Team.

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.