2017 HSR&D/QUERI National Conference

4103 — Assuring Quality in Transforming the Department of Veterans Affairs Data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model

Lead/Presenter: Stephen Deppen, Resource Center - VINCI
All Authors: Deppen S (VA Tennessee Valley Healthcare System) Cao A (Tennessee Valley Healthcare System) DuVall SL (VA Salt Lake City Healthcare System) Park D (Tennessee Valley Healthcare System) Lynch KE (Salt Lake City Health Care System) Viernes B (Salt Lake City Health Care System) Hanchrow EE (Tennessee Valley Healthcare System) FitzHenry F (Tennessee Valley Healthcare System) Matheny ME (Tennessee Valley Healthcare System)

Objectives:
Quality assurance is critical to ensure any data transformation maintains source fidelity and integrity, properly characterizes and documents data transformations, and improves usability. VINCI is engaged in transforming national VA medical record data into the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and transformation of such complex and comprehensive data requires an especially careful review.

Methods:
Iterative data quality review was implemented during the OMOP transformation measuring row counts, proportions, and effects of OMOP transformed data compared to source domains within the VA Corporate Data Warehouse (CDW). In addition to dedicated staff performing review, VINCI fostered the creation of a VA community of OMOP users. The group meets twice a month, reviews issues, and contributes by identifying and further curating subsets of high priority data.

Results:
VA CDW for 22 million patients from 2000 to 2016 were transformed into OMOP CDM. Volume coverage of OMOP row counts compared to source was > 90%, including more than 21 million inpatient visits, 2.6 billion outpatient visits, and their concomitant clinical activity. Row duplication from source was < 1%. Row counts following ETL logic provided tracing from each source table to local transformation and to final OMOP domain. For example, 94.7 million inpatient diagnosis rows from source translated to 94.5 million rows in local transformation table. Exclusion rules of incorrect ICD codes and activity before 2000 dropped 72,979 rows from the OMOP Condition_Occurrence domain table.

Implications:
Our quality review process ensured the data transformation resulted in a curated CDM that maintained integrity with source data and integrated best data quality practices as published by VA researchers and technical groups. The resulting environment reduced the need for domain specific knowledge and improved data access. It allows integration with other OMOP data warehouses being developed within and outside the VA and makes available community-developed data exploration and analysis tools.

Impacts:
Using standard methods, collaboration with data stewards and the OMOP community, VINCI uses ongoing review and user contributes to support the successful transformation of healthcare data of a large, clinically diverse population of over 22 million Veterans into the OMOP CDM for use by the vibrant VA research community.