2011 HSR&D National Meeting Abstract
2005 — An Introductory Look at Statistical Text Mining for Health Services Researchers
McCart JA (James A. Haley Veterans Hospital), Jarman J
(James A. Haley Veterans Hospital), Finch DK
(James A. Haley Veterans Hospital), Luther SL
(James A. Haley Veterans Hospital)
To facilitate veteran care, VistA maintains a massive repository of patient-related data, including over 1.3 billion textual documents (e.g., progress notes, discharge summaries). This rich source of clinical data holds tremendous promise for expanding the ability to identify and manage adverse events and other health problems. This workshop will explore a technique called statistical text mining (STM), which is able to derive statistically-relevant patterns from these textual data for use in surveillance systems, identification of under- or un-coded conditions, etc. In particular, this workshop will provide an overview of (1) what is STM; (2) the relationship between STM and natural language processing (NLP); (3) how STM could be used in the VA; (4) the process of going from textual notes to a trained model that can be used for classification tasks; and (5) what software applications are available to researchers interested in performing STM.
A demonstration will be given that will walk through the statistical text mining process using an open source text mining application. Aspects of the process to be covered include (1) term-by-document matrix generation, (2) weighting schemes, (3) dimensionality reduction, (4) modeling, and (5) analysis of results. In addition, results will be presented from an ongoing STM study with approximately 20,000 outpatient progress notes from four VA medical centers. Participants are encouraged to share questions and interests in STM throughout the workshop.
VA health services researchers interested in leveraging textual progress notes found in VistA for research, quality control, or decision support systems.
Assumed Audience Familiarity with Topic:
Audience members are not expected to be familiar with statistical text mining. However, basic knowledge of statistics (e.g., principal components, factor analysis, logistic regression) will help audience members better understand certain aspects of the STM process.