Colorectal cancer (CRC) is the third most common cancer in veterans, with approximately 3,200 new cases diagnosed in VA facilities each year. The Institute of Medicine and VA leadership consider cancer health services research to be a priority. National VA databases have previously been used to conduct cancer health services research; however, these studies have typically used simple methods, such as a single ICD-9 diagnosis code, to identify cancer cases without testing the validity and reliability of these methods. Lack of a valid and reliable method to identify veterans with incident cancers in national VA databases is a significant barrier to conducting meaningful cancer health services research. Another barrier has been the lack of access by VA investigators to the national VA central cancer registry. Consequently, the development of methodology to identify incident cancer cases would be a major advance in VA cancer health services research.
The objective of this study was to develop and validate a clinically informed algorithm using VA electronic medical record data (EMR) to identify incident colorectal cancer cases
To achieve this objective, we obtained existing EMR VISTA and cancer registry data (1997-2008) from the Indianapolis VA. The case population (N=273) included subjects diagnosed with CRC during 2001-2006. Two control populations were identified: (1) a 5% random sample of "cancer-free" veterans utilizing Indianapolis VA healthcare (N=9,086) and (2) subjects diagnosed with other types of cancer (excluding CRC) during 2001-2006 (N=2,792). The cancer-registry data was considered the "gold standard" for cancer case identification. VISTA data was used to develop and test algorithms to identify CRC cases. The overall goal of algorithm development was to determine the set of identifying factors that maximized discrimination between CRC cases and controls. Sensitivity, specificity, and positive predictive value (PPV) were calculated for each potential algorithm.
More than 97% of CRC cases had an ICD-9 diagnosis code for CRC in the EMR; however, agreement of the diagnosis date between the cancer registry and VISTA data was low. The diagnosis dates in the two data sources agreed for only 14% of subjects. We examined more than 60 different models to predict CRC in VISTA data. The models involved different combinations of diagnoses, procedures, treatments, laboratory tests, and clinic visits with oncologists. The model with the highest sensitivity (97.1%) was based on a subject having at least one inpatient and/or outpatient ICD-9 diagnosis for CRC in the VISTA data. The highest PPV (92.7%) was achieved for an algorithm that required an inpatient diagnosis in combination with codes for surgical resection, CEA testing, visits with oncologists, chemotherapy, radiation, and/or having received a CRC procedure with biopsy. All models that we examined had specificity >99%.
This pilot project has shown that it is feasible to develop a clinically informed algorithm to identify CRC cases in VA databases in the event that cancer registry data are unavailable. The methodology we developed will be used to expand and promote improved cancer health services research using existing VA databases. Although it is feasible to develop an algorithm to ascertain CRC cases in VA databases, VA Central Cancer Registry (VACCR) data is the optimal source of data to conduct cancer health services research using national VA data.
None at this time.