Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Veterans Crisis Line Badge
Go to the ORD website
Go to the QUERI website

2011 HSR&D National Meeting Abstract

Printable View

2011 National Meeting

2005 — An Introductory Look at Statistical Text Mining for Health Services Researchers

McCart JA (James A. Haley Veterans Hospital), Jarman J (James A. Haley Veterans Hospital), Finch DK (James A. Haley Veterans Hospital), Luther SL (James A. Haley Veterans Hospital)

Workshop Objectives:
To facilitate veteran care, VistA maintains a massive repository of patient-related data, including over 1.3 billion textual documents (e.g., progress notes, discharge summaries). This rich source of clinical data holds tremendous promise for expanding the ability to identify and manage adverse events and other health problems. This workshop will explore a technique called statistical text mining (STM), which is able to derive statistically-relevant patterns from these textual data for use in surveillance systems, identification of under- or un-coded conditions, etc. In particular, this workshop will provide an overview of (1) what is STM; (2) the relationship between STM and natural language processing (NLP); (3) how STM could be used in the VA; (4) the process of going from textual notes to a trained model that can be used for classification tasks; and (5) what software applications are available to researchers interested in performing STM.

A demonstration will be given that will walk through the statistical text mining process using an open source text mining application. Aspects of the process to be covered include (1) term-by-document matrix generation, (2) weighting schemes, (3) dimensionality reduction, (4) modeling, and (5) analysis of results. In addition, results will be presented from an ongoing STM study with approximately 20,000 outpatient progress notes from four VA medical centers. Participants are encouraged to share questions and interests in STM throughout the workshop.

Target Audience:
VA health services researchers interested in leveraging textual progress notes found in VistA for research, quality control, or decision support systems.

Assumed Audience Familiarity with Topic:
Audience members are not expected to be familiar with statistical text mining. However, basic knowledge of statistics (e.g., principal components, factor analysis, logistic regression) will help audience members better understand certain aspects of the STM process.

Questions about the HSR&D website? Email the Web Team.

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.