Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

SDR 18-004 – HSR Study

SDR 18-004
Efficient electronic phenotyping using APHRODITE in the Million Veteran Program
Jennifer S Lee, MD PhD MA
VA Palo Alto Health Care System, Palo Alto, CA
Palo Alto, CA
Themistocles Assimes MD PhD MS
VA Palo Alto Health Care System, Palo Alto, CA
Palo Alto, CA
Funding Period: August 2019 - July 2021


The Million Veteran Program (MVP) is currently the largest biobank study in the world. The resource provides an unprecedented opportunity to identify the genetic causes of a variety of human diseases that disproportionally affect our veterans including diseases that affect the neurological, cardiovascular, pulmonary, gastrointestinal, endocrine, and musculoskeletal organs. Fast-paced technological progress over the last 10 years now allows us to reliably and densely profile individuals across their entire genome. Such data has already been generated and linked to a wide spectrum of human diseases and physiologic traits. However, many more links remain to be made which will provide the scientific community with additional important clues on the root causes of many life-threatening diseases as well as valuable insights on how to develop new drugs to treat or prevent these same diseases. The current challenge in making these additional discoveries is no longer the generation of high quality genetic data in large numbers but rather the organization and querying of very large and complex electronic health records (EHR) being leveraged by these large biobank studies. Until now, much effort and time has been expended to painstakingly develop and validate rules-based definitions to identify individuals with a specific disease, syndrome, or state across a variety of EHR platforms. However, the recent mapping of the VA corporate data warehouse to the Observational Medical Outcomes Partnership common data model (OMOP-CDM) provides us with unprecedented opportunities to apply new “electronic phenotyping” tools that can identify individuals with a specific disease, syndrome, or state in a much more efficient manner than rules-based methods. The goal of this proposal is to comprehensively test the ability of one of these new tools named APHRODITE (Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation) to identify established genetic links among MVP participants. APHRODITE was developed at Stanford by one of our co-investigators and uses state of the art machine learning algorithms to identify individuals with a condition in a fraction of the time it takes to identify them through rules-based definitions. The algorithm has shown great promise within the Stanford clinical data warehouse but requires validation in other EHR cohorts. In aim 1, we will test the accuracy of an APHRODITE classifier to that of a rules-based classifier for at least 5 diseases using gold-standard sets in the VA. In aim 2, we will test whether APHRODITE classifiers from aim 1 can be applied to MVP participants to replicate established genetic associations. If automated methods in APHRODITE perform equally well or better than rules-based methods for multiple diseases, automated methods may be leveraged for phenotypes where rules based methods may not exist, maximizing the efficiency of genetic discovery in MVP and facilitating rapid replication of findings within MVP in other EHRs mapped to the OMOP-CDM.

External Links for this Project

NIH Reporter

Grant Number: I01HX002487-01

Dimensions for VA

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

Learn more about Dimensions for VA.

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
    Search Dimensions for this project


Journal Articles

  1. Klarin D, Busenkell E, Judy R, Lynch J, Levin M, Haessler J, Aragam K, Chaffin M, Haas M, Lindström S, Assimes TL, Huang J, Min Lee K, Shao Q, Huffman JE, Kabrhel C, Huang Y, Sun YV, Vujkovic M, Saleheen D, Miller DR, Reaven P, DuVall S, Boden WE, Pyarajan S, Reiner AP, Trégouët DA, Henke P, Kooperberg C, Gaziano JM, Concato J, Rader DJ, Cho K, Chang KM, Wilson PWF, Smith NL, O'Donnell CJ, Tsao PS, Kathiresan S, Obi A, Damrauer SM, Natarajan P, INVENT Consortium, Veterans Affairs’ Million Veteran Program. Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nature Genetics. 2019 Nov 1; 51(11):1574-1579. [view]

DRA: None at this time.
DRE: TRL - Applied/Translational
Keywords: Electronic Health Record
MeSH Terms: None at this time.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.