Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

IIR 14-011 – HSR Study

IIR 14-011
Estimating Risk of Sporadic Colorectal Cancer in Veterans Under Age 50
Thomas F. Imperiale, MD
Richard L. Roudebush VA Medical Center, Indianapolis, IN
Indianapolis, IN
Funding Period: June 2016 - March 2020
Screening for colorectal cancer (CRC) is recommended for average-risk persons aged 50 years and older. However, 7-11% of all CRC occurs in persons < 50, most of whom have no classic risk factors at the time of diagnosis. These persons are not only younger, but often present with more advanced disease and have a less favorable prognosis than older persons. During the last 20 years, the incidence of CRC, while falling in persons 50 years old and older, has risen steadily in persons under age 50. For these reasons, it is critically important to try to identify among Veterans (who are already a high-risk group), those < age 50 at high-risk for CRC, who may be candidates for "early" screening. From a practical perspective, an efficient way to identify Veterans using electronic medical record (EMR) data would facilitate implementation.

1) Identify risk factors for sporadic (i.e., non-hereditary) CRC in persons < age 50;

2) Derive and validate a prediction model for quantifying absolute and relative risks for CRC;

3) Compare the accuracy of automated data abstraction using natural language processing for identifying and abstracting risk factor information from VA electronic health information to the gold standard of manual electronic medical record review.

Using the VA Central Cancer registry, we will identify incident cases of CRC diagnosed between 2008 and 2014. We will verify case eligibility from manual review of CPRS, excluding those with inflammatory bowel disease, a high-risk family history, polyposis syndrome, or hereditary nonpolyposis colon cancer syndrome. Using medical SAS datasets, we will match each final case to 4 controls during the same time period and validate the control group by using a second control group with a negative (i.e., no neoplasia) diagnostic colonoscopy. The same exclusions will apply to controls, along with previous colectomy of any extent and for any reason. Cases and controls will be matched for facility. Manual review of EMR in VistAweb will be conducted by trained research personnel, who will identify information about candidate risk factors of lifestyle habits (cigarette and ethanol use, occupation, leisure activity/exercise), family cancer history, BMI, socio-demographic features, certain laboratory test results, prior CRC screening test results, and medication use. Logistic regression will be used to identify independent factors associated with CRC. A prediction model will be derived and internally validated. Age- and gender-specific SEER CRC incidence rates will be used in conjunction with the prediction model to provide estimates of absolute and relative CRC risks (or "colon age"). Depending on the magnitude of the absolute risk and how it compares with SEER population risks, CRC screening using some screening modality may be considered. From a methodological perspective, we will create a natural language processing tool and use it to perform automated identification and abstraction on the EMRs of cases and controls, comparing its capture of information to that of manual EMR review.

Chart abstractions confirmed the lower exclusion rates of earlier screening. 20% of the original cohort were excluded. 65% of our the inclusion cohort fell into the 45-49 year old age range at index, with 27% being 40-44. Ethnicity of the included was 32% Black, 60% White, 5% unknown, with 6.39% claiming some Hispanic background. The most frequent presenting symptom was Rectal Bleeding (46% of cohort), followed by Abdomen Pain (38%) and Blood in Stool (30%). Hypertension was the most common co morbidity. Roughly 32% were current tobacco users.

Identification of risk factors for sporadic colorectal cancer (CRC) and creation of a prediction model for it will help target high-risk persons for early screening. Such targeting may reduce morbidity and mortality from this particularly devastating disease in this very vital age group, and without the need to apply screening broadly to a population where non-targeted screening is likely to cause more harm than good. A natural language processing tool that accurately performs automated identification and data abstraction will facilitate the conduct of health-services research, expediting completion and implementation of research findings to clinical practice.

External Links for this Project

NIH Reporter

Grant Number: I01HX001650-01A2

Dimensions for VA

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

Learn more about Dimensions for VA.

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
    Search Dimensions for this project


Journal Articles

  1. Redd DF, Shao Y, Zeng-Treitler Q, Myers LJ, Barker BC, Nelson SJ, Imperiale TF. Identification of colorectal cancer using structured and free text clinical data. Health Informatics Journal. 2022 Jan 1; 28(4):14604582221134406. [view]
  2. Imperiale TF, Myers LJ, Barker BC, Larson J, Stump TE, Daggy JK. Risk Factors for Early-onset Sporadic Colorectal Cancer in Male Veterans. Cancer prevention research (Philadelphia, Pa.). 2023 Sep 1; 16(9):513-522. [view]

DRA: Cancer
DRE: Treatment - Observational, TRL - Applied/Translational
Keywords: Predictive Modeling, Risk Factors, Cancer, Clinical Diagnosis and Screening, Natural Language Processing
MeSH Terms: none

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.