Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HIR 09-007 – HSR Study

HIR 09-007
Consortium of Healthcare Informatics Research: Translational Use Case Projects
Mary K. Goldstein, MD MS
VA Palo Alto Health Care System, Palo Alto, CA
Palo Alto, CA
Funding Period: February 2009 - March 2014
The mission of the Consortium for Healthcare Informatics Research (CHIR) has been to improve the health of veterans through foundational and applied informatics research to advance the effective use of unstructured text in the electronic health record.

The CHIR Translational Use Case Projects (TUCPs) grant, one of the overall CHIR projects, aimed to assess the capability for rapid development of natural language processing (NLP) to topics of high clinical quality importance to the VA. The TUCPs applied information extraction techniques to identify and resolve issues, providing early experience for CHIR in practical issues such as reference standard annotations and use of the secure VINCI data resource. Sequential rounds of TUCPs built on other work of CHIR.

Each TUCP developed its own algorithms for text-abstraction. Typically, projects included mapping key concepts in text to a standardized vocabulary suitable to the clinical domain. Lexicons were refined as necessary to include synonyms, abbreviations, and common spellings of key words. The text-abstraction findings were compared with a reference standard annotation, that is, manually marked records that indicated text that should be identified by text-processing algorithms, by trained annotators using annotation schemata prepared through field testing. These records form an annotated corpus of reports used to test the NLP tools' accuracy and precision. Several rounds of TUCPs address VA clinical/quality high-priority areas and/or extend successful NLP to move closer to wide application to VA data.

(1) The Lymph Node (LN) project team developed Automated Retrieval Console (ARC). ARC converts unstructured text to structured data for submission to supervised machine learning algorithms. The algorithm identified lymph nodes examined and lymph nodes positive for cancer with Recall 0.96 for both and precision 0.94 and 0.95 respectively.
(2) The Ejection Fraction TUCP team developed NLP software to extract the ejection fraction value from free-text echocardiogram reports to automate measurement reporting. The software output was compared to a reference standard developed through human review. The EF system, entitled" Capture with UIMA of Needed Data using Regular Expressions for EF (CUIMANDREef)," was developed using echocardiography reports from 7 VA medical centers, and showed excellent performance metrics. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9%, a positive predictive value of 95%, and an F measure of 91.9% (Garvin et al 2012). To assess applicability of the NLP to records from other VA medical centers not included in initial development and to records from different data sources within VistA, we annotated echocardiography reports from a random selection of VA medical centers (details available from PI). Collaborating investigators at VA Salt Lake City built on their NLP work in the Congestive Heart Failure Information Extraction Framework (CHIEF) with a series of adaptations validated in a 5-fold cross validation approach.
(3) The Chest X-Ray (CXR) TUCP project team developed Chest X-Ray Device Extractor (CXDE), an NLP system which analyzes chest x-ray reports in two steps utilizing the GATE framework. Terms extracted include lines, and words/phrases that indicate line status. CXDE was evaluated against a human annotated reference standard using precision and recall metrics. After iterative development, with addition of new terms, CXDE identified device mentions with recall and precision of 95% and 98% respectively. We have developed an updated version of CXR NLP which captures line information from ICU chest x-ray exams at the report level. The output of this NLP passed to a separate module which aggregates information at the patient-day level. This updated software has many new capabilities including: producing an automated count of central line (CL) days, calculating various patient CL-day statistics, and creating visual patient timelines of line day presence. We have also evaluated the system on a small set of CT reports and found that the NLP performs well on this new modality, suggesting that the system can be used to extract line information from a wide variety of chest related radiology exams.
(4) The Contraception-TUCP team, based at New Haven, developed an annotation schema, ontology, and NLP system for capturing terms related to contraceptive use, duration of use and consistency of use over time. The annotation schema was applied to 1,739 text notes for 227 female Veteran patients. The ontology identified 84 (out of 1,739) notes with contraception terms, 52 (of 84) notes that had multiple terms and 7 (of 84) terms negated.
(5) The Falls-TUCP team, based at Tampa VAMC, developed a multi-step process that involves natural language processing, statistical text mining, association rule mining, and contrast sets to create classifiers that can accurately classify progress notes. A dataset of 5,009 EMR clinical progress notes was annotated to indicate the presence or absence of fall-related injuries. An automated classification process was developed by using a combination of customized, open source software that creates a classifier comprised of the best combined rule sets. The preliminary results demonstrate that the process does create reasonable classifiers. The resulting rule-based classifiers are easily interpretable and can serve as a base for refinements.
(6) Measured Value Assignment for the Prothrombin Time / International Normalized Ratio (INR) project: The INR project team compared methods to retrieve useful INR values from text entered into Health Factors from VA clinical reminder note templates. A Bayes classifier was used to identify the target dataset for training, and algorithms were run across the entire Health Factors dataset. Although all the algorithms were sufficient in identifying INR values, they were less efficient than parallel processing string matching algorithms such as the implemented cached-iFTS in SQL Server 2008. The final algorithm successfully identified non-VA INR values. The INR is not otherwise recorded in existing data elements, and the algorithms provide a critical step in allowing for better quality of care in warfarin patients.
(7) The hypertension project focuses on extracting information relevant to applicability of performance measures recorded in clinic notes. The team has developed a prototype hypertension NLP system based on an annotation guideline informed by subject matter experts. The team manually annotated 100 reports: 50 for the developers reference, and 50 reserved for future testing of the system. The team is finalizing automated methods for comparing NLP output with human annotator output.

Overall, the Translational Use Case Projects has had impact in the following significant ways: ( 1) These projects have illustrated what can be accomplished in a short time in focused areas, (2) they have developed NLP tools that can be used by VA, and (3) they have enhanced knowledge of VA data systems and their use on VINCI, (4) they have developed tools that work directly with VA data and can be used or revised by others. There are many potential uses for the NLP tools. CXDE can potentially be used by infection preventionists as part of infection control monitoring for central lines. EF results available through automated extraction can potentially be used for quality management purposes.

External Links for this Project

Dimensions for VA

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

Learn more about Dimensions for VA.

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
    Search Dimensions for this project


Journal Articles

  1. D'Avolio LW, Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. Journal of the American Medical Informatics Association : JAMIA. 2011 Sep 1; 18(5):607-13. [view]
  2. Garvin JH, DuVall SL, South BR, Bray BE, Bolton D, Heavirland J, Pickard S, Heidenreich P, Shen S, Weir C, Samore M, Goldstein MK. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. Journal of the American Medical Informatics Association : JAMIA. 2012 Sep 1; 19(5):859-66. [view]
  3. Garvin JH, Elkin PL, Shen S, Brown S, Trusko B, Wang E, Hoke L, Quiaoit Y, Lajoie J, Weiner MG, Graham P, Speroff T. Automated quality measurement in Department of the Veterans Affairs discharge instructions for patients with congestive heart failure. Journal for healthcare quality : official publication of the National Association for Healthcare Quality. 2013 Jul 1; 35(4):16-24. [view]
  4. D'Avolio LW, Nguyen TM, Farwell WR, Chen Y, Fitzmeyer F, Harris OM, Fiore LD. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). Journal of the American Medical Informatics Association : JAMIA. 2010 Jul 1; 17(4):375-82. [view]
  5. Rubin D, Wang D, Chambers DA, Chambers JG, South BR, Goldstein MK. Natural language processing for lines and devices in portable chest x-rays. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2010 Nov 13; 2010:692-6. [view]

  1. Brandt CA, Womack JA. Analysis of Contraceptive Use Among Female Veterans at the VA. San Francisco, CA: AMIA Joint Summits on Translational Science; 2012 Mar 1. 21-23 p. [view]
Center Products

  1. Wang D, Rubin DL, Chambers JG, Goldstein MK. Chest X-Ray Device Extractor (CXDE) NLP software - developed to identify medical devices and device statuses from free-form text in chest x-ray reports. [Software]. 2010 Oct 1. [view]
VA Cyberseminars

  1. Goldstein MK. Automated Detection of Lines/Devices from Chest Radiograph Reports: CXR Translational Use Case Project. [Cyberseminar]. 2010 Jun 15. [view]
  2. Goldstein MK, Garvin J, Meystre S. Developing Applied Informatics Information Extraction Tools in VA: CUIMANDREef (Capture with UIMA of Needed Data using Regular Expressions for EF) and CHIEF (Congestive Heart Failure Information Extraction Framework). HSR&D Cyber Seminars on CHIR (Consortium for Healthcare Informatics Research). [Cyberseminar]. 2011 Jun 30. [view]
  3. Goldstein MK. Potential for CHIR in QUERI and HSR&D Projects. [Cyberseminar]. 2010 Apr 1. [view]
  4. Weir CR, Nebeker JR. Timely Topics of Interest : The Orderly and Effective Visit: Impact of the Electronic Health Record on Modes of Cognitive Control. [Cyberseminar]. 2012 Mar 29. [view]
  5. Goldstein MK, Wang DY, Hwang TS. Using Natural Language Processing to Identify Lines and Devices in Portable Chest X-Ray Reports. [Cyberseminar]. 2012 Mar 29. [view]
Conference Presentations

  1. Garvin JH, South B, Bolton DJ, Shen S, Samore MH, DuVall SL. Automated extraction of ejection fraction (EF) for Heart Failure (HF) from VA Echocardiogram reports. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 18; National Harbor, MD. [view]
  2. Rubin DL, Wang D, Chambers DA, Chambers J, South B, Goldstein MK. Extracting Free Text from Electronic Health Record: Natural Language Processing to Identify Lines and Devices in Portable Chest X-Ray Reports. Poster session presented at: VA HSR&D National Meeting; 2011 Feb 17; National Harbor, MD. [view]
  3. Goldstein MK. HSR&D Future Directions II: Medical Informatics. Paper presented at: VA HSR&D Career Development Annual Meeting; 2010 Feb 26; San Francisco, CA. [view]
  4. Wang D, Rubin DL, Hwang TS, Chambers DA, Chambers JG, South BR, Goldstein MK. Identifying Line and Device Insertion Status from Chest X-Ray Reports Using Natural Language Processing (NL). Paper presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL. [view]
  5. Kim Y, Garvin J, Heavirland J, Meystre SM. Improving Heart Failure Information Extraction Domain Adaptation. Paper presented at: International Medical Informatics Association World Congress on Medical and Health Informatics; 2013 Aug 22; Copenhagen, Denmark. [view]
  6. Hope CJ, Garvin JH, Gundlapalli AV. Incomplete and selective Documentation of delirium in the VA Electronic medical Record. Poster session presented at: American Medical Informatics Association Annual Symposium; 2011 Oct 23; Washington, DC. [view]
  7. Hynes D, Young A, Ohl M, Houston T, Goldstein MK. Innovation and Synergy of HIT Approaches Addressing Complex Care with the VA. Presented at: American Medical Informatics Association Annual Symposium; 2013 Nov 16; Washington, DC. [view]
  8. DuVall SL. Large scale clinical text processing and process optimization. Paper presented at: International Medical Informatics Association World Congress on Medical and Health Informatics; 2013 Aug 20; Copenhagen, Denmark. [view]
  9. Wang D, Rubin DL, Chambers DA, Chambers J, South B, Hwang TS, Goldstein MK. Natural Language Processing of Portable Chest X-Ray Reports for Infection Surveillance. Poster session presented at: American Public Health Association Annual Meeting and Exposition; 2012 Oct 29; San Francisco, CA. [view]
  10. D'Avolio LW, South B, Shen S, Garvin JH, Goldstein MK. Reducing Dependency on Manual Chart Review through Automated Information Extraction Methods. Poster session presented at: VA HSR&D National Meeting; 2009 Feb 12; Baltimore, MD. [view]
  11. Kim Y, Garvin JH, Heavirland J, Meystre S. Relatedness Analysis of LVEF Qualitative Assessments and Quantitative Values. Poster session presented at: American Medical Informatics Association Spring Congress; 2013 Mar 20; San Francisco, CA. [view]
  12. Wang D, Chambers J, Chambers DA, Rubin DL, Goldstein MK. Training an NLP system for Chest X-Ray Reports. Poster session presented at: American Medical Informatics Association Annual Symposium; 2010 Nov 10; Washington, DC. [view]
  13. Wang DY, Hwang TS, Rubin D, Chambers J, South BR, Goldstein MK. Using Natural Language Processing for Extracting Information from Portable Chest X-Ray Reports. Presented at: American Geriatrics Society Annual Meeting; 2013 May 3; Grapevine, TX. [view]
  14. Wang D, Rubin DL, Chambers DA, South BR, Hwang TS, Goldstein MK. Using Natural Language Processing to Identify Lines and Devices in Portable Chest X-Ray Reports. Poster session presented at: Bay Area Clinical Research Annual Symposium; 2011 Nov 4; San Francisco, CA. [view]

DRA: Cardiovascular Disease, Health Systems
DRE: Technology Development and Assessment, Treatment - Comparative Effectiveness, Diagnosis
Keywords: none
MeSH Terms: none

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.