Colonoscopy is widely used for colorectal cancer (CRC) screening, however there are no data describing its yield in important demographic subgroups. Knowing the yield of colonoscopy for clinically important neoplasia (CIN), the factors associated with it, and having a tool to risk-stratify individual patients for CIN would increase the efficiency and effectiveness of screening colonoscopy and of CRC screening in general.
1) Measure and compare the yield of first-time colonoscopy for CIN within pre-specified demographic subgroups and among the indications for colonoscopy; 2) Explore associations between demographic and clinical features and risk for CIN; 3) Determine which features stratify risk for CIN, and derive a risk index for CIN; 4) Establish a database and infrastructure for subsequent cohort studies on yield of subsequent colonoscopy.
We created a state-of-the-art remote data extraction tool to retrieve de-identified data from the VA's electronic medical record (EMR), pilot tested it to ensure accuracy, and used it to retrieve selected data from the EMR of veterans aged 40 years and older from one of 18 geographically-diverse VAMCs who had a first VA-based colonoscopy between 2002 and 2008 for any indication except cancer or polyp surveillance. Programs and software for data extraction were developed and pilot-tested with independent, "behind-the-firewall" review of a random sample of EMRs from the Indianapolis VAMC and with remote review of an EMR sample from other sites. After ensuring acceptable accuracy of the extraction tool, it extracted relevant clinical information from each site (including colonoscopy and pathology reports), clinical features (e.g., colonoscopy indication) and candidate risk factors, which include age, sex, race/ethnicity, physical features (e.g., weight, height, blood pressure), family history of CRC, lifestyle factors (e.g., cigarette smoking, ethanol use), medications, comorbidity (e.g., diabetes, cholecystectomy, coronary disease). Due to local data storage variation, our extraction tool was re-developed on two occasions to capture all the relevant data. To categorize the colorectal findings, we used natural language processing (NLP) software developed and tested at our Regenstrief Institute. The NLP software determined location, size, and histology of colorectal lesions from free text colonoscopy and pathology reports, a process that was validated with independent review of a random sample of reports from each site. We will describe and compare the prevalence of CIN within specific demographic subgroups and by colonoscopy indication. Further, we will attempt construction of a risk index that may be used to stratify an individual's risk for CIN and could be used to tailor screening colonoscopy.
Study conduct has been challenging due to varying IT regulatory regimes across VISNs and regions. A total of 6 sites have been excluded, either because the data are in a format that cannot be used in analysis (N=3 sites) or IT personnel at the collaborating sites simply would not or could not get the software uploaded for remote data extraction (N = 3 sites). We are currently completing the Natural Language Processing (NLP) on the data we collected. We have designed and validated NLP software on the data, the purpose of which is to identify, extract, and classify the most advanced colorectal finding from the colonoscopy. Analysis of NLP performance on the Indianapolis data was accepted for publication (Imler T., et al, Clin Gastro Hepatology Dec. 2013). We are in the beginning stages of statistical analysis, and results from the process will be available in the near term.
This proposal provides new knowledge by quantifying the yield of colonoscopy for CIN by age, sex, and race. We expect to identify one or more factors that are associated with CIN which may help in stratifying risk for CIN. This research will improve veteran's healthcare by providing a scientific basis for tailoring CRC screening. Such tailoring will allow providers to target high-risk veterans for colonoscopy screening and to identify veterans at low-risk, for whom CRC screening may be performed with less invasive methods (e.g., with immunochemical fecal occult blood testing) or deferred until risk increases. Using risk of CIN to tailor CRC screening will make screening more efficient and cost-effective.
- Imler TD, Morea J, Imperiale TF. Clinical decision support with natural language processing facilitates determination of colonoscopy surveillance intervals. Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association. 2014 Jul 1; 12(7):1130-6.
- Imler TD, Morea J, Kahi C, Imperiale TF. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association. 2013 Jun 1; 11(6):689-94.
- Imler TD, Morea J, Kahi CJ, Xu H, Calley C, Imperiale TF. Age, Gender, Insurance, and Race: Risk Factors for Early Repeat Colonoscopy. [Abstract]. The American journal of gastroenterology. 2014 Oct 19; 1(1):P421.
- Imler TD, Morea J, Kahi C, Xu H, Calley C, Imperiale TF. Age, Gender, and Race: Risk Factors for Early Repeat Colonoscopy. Poster session presented at: American College of Gastroenterology Annual Meeting; 2014 Oct 19; Philadelphia, PA.
- Imler TD, Morea J, El Hajj II, Klochan CM, Sagi S, Umar N, Imperiale TF. Clinical Decision Support Processing for Colonoscopy Surveillance Intervals. Poster session presented at: Digestive Disease Week Annual Meeting; 2013 May 19; Orlando, FL.
Health Systems, Cancer
Diagnosis, Prevention, Research Infrastructure
Cancer, Clinical Diagnosis and Screening, Guideline Development and Implementation, Natural Language Processing, Outcomes, Predictive Modeling, Risk Adjustment, Screening