Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora.

Gundlapalli AV, Divita G, Carter ME, Redd A, Samore MH, Gupta K, Trautner B. Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora. Studies in health technology and informatics. 2015 Jan 1; 213:175-8.

Related HSR&D Project(s)

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions


Concepts of interest for clinical and research purposes are not uniformly distributed in clinical text available in electronic medical records. The purpose of our study was to identify filtering techniques to select 'high yield' documents for increased efficacy and throughput. Using two large corpora of clinical text, we demonstrate the identification of 'high yield' document sets in two unrelated domains: homelessness and indwelling urinary catheters. For homelessness, the high yield set includes homeless program and social work notes. For urinary catheters, concepts were more prevalent in notes from hospitalized patients; nursing notes accounted for a majority of the high yield set. This filtering will enable customization and refining of information extraction pipelines to facilitate extraction of relevant concepts for clinical decision support and other uses.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.