Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Use of Statistical Text Mining (STM) to Adjust Estimation of Colonoscopy Follow-up Rates for Patients with Positive Fecal Occult Blood Test (FOBT+) Results.

Nugent SM, Nelson DB, Gravely AA, Lillie SE, Partin MR. Use of Statistical Text Mining (STM) to Adjust Estimation of Colonoscopy Follow-up Rates for Patients with Positive Fecal Occult Blood Test (FOBT+) Results. Poster session presented at: VA HSR&D / QUERI National Meeting; 2015 Jul 10; Philadelphia, PA.


Objectives: We used STM to search unstructured text from clinical notes for valid reasons for not receiving a colonoscopy (i.e., colonoscopy refusal (CR) or private sector colonoscopy (PSC)) in the VHA within 6 months post FOBT+. This information was used to adjust overall estimates of colonoscopy follow-up rates. Methods: We identified 74,014 patients who received a FOBT+ between August 2009 and March 2011. More than 85,000 clinical documents were extracted on the 41.4% of FOBT appropriate patients not receiving a colonoscopy within 6 months post FOBT+. Annotation was performed using eHOST software on a corpus of 828 notes from 250 randomly selected patients. Annotators highlighted key words (i.e., terms) in the notes and classified notes as associated with CR, PSC, or neither. Annotated terms were used in STM to develop logistic regression based classification algorithms to separately predict CR and PCS using split-half development (DS) and validation (VS) subsets from all annotated notes. The developed models were used to construct predicted probabilities of CR and PSC for the non-annotated notes. These predicted probabilities were used in sensitivity analyses of our main results, assessing organizational predictors of colonoscopy follow-up, by reclassifying predicted refusals and PSC from having no follow-up to having appropriate follow-up. Results: Annotators demonstrated very good agreement classifying notes indicating CR (kappa = .898) and PSC (kappa = .834). Model agreement of our CR classification algorithm was 98% for DS and 80% for VS; our PSC algorithm yielded 87% for DS and 75% for VS. Applying the scored logistic regression model to all FOBT+ cases in the sample we estimated that 8.8% refused colonoscopy while 10.1% received a colonoscopy in the private sector. The sensitivity analysis treating identified CR or PSC as being adequately followed up markedly increased our estimates of overall colonoscopy follow-up at 6 months from 49% to over 67%. Conclusions: We successfully employed STM techniques to estimate CR and PSC in our population of FOBT+ patients. Impact: Receipt of care outside the VA or intention to treat is often only documented in clinical notes. STM provides a useful technique to glean structured information from unstructured text. With the advent of the Veterans Choice Act receipt of care outside the VA will most likely increase.

Questions about the HSR&D website? Email the Web Team.

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.