» Back to Table of Contents
As outlined in the commentary by Fihn, the advent of “big data” poses important challenges for health services research. While researchers have been working with very large data sets for a long time, what defines “big data” is not the volume of data, but also its variety and velocity. Health systems now have access in near real-time to the broad variety of data generated within their organizations and they have invested in analytic shops to mine these data to advance goals of better quality, higher satisfaction, lower costs, and greater efficiency. Similarly, VA has made substantial investments in data analytics through the creation of its corporate data warehouse (CDW) and the Office of Informatics and Analytics. What was once the domain of research—building data sets, documenting variation in processes and outcomes, and exploring factors associated with good or bad outcomes—is now part of the core business of a learning health care system.
‘Big R’ Research Contributions Speakers at a recent Electronic Data Methods Forum (www.edm-forum.org/home) used the concepts of “little r” and “big R” research to distinguish the operations-focused analysis discussed by Fihn from the hypothesis-testing research funded by VA and NIH. The challenge for those of us who fund or conduct “Big R” research is to adapt to this new world. In order to remain relevant, we must add value to the more nimble, yet equally sophisticated, analyses being done by our operations partners. That means building on our partners’ work and ensuring it is translated effectively into improvements in care, outcomes, and policy.
“Big R” research contributions are essential across four broad areas.
1. Advancing basic methods for big data research. Big data and new techniques such as machine learning have greatly increased our ability to predict important clinical outcomes such as adherence, readmission, or death, and explore associations. Because the volume of data is so large, however, and the exploration of the data is not based on prior hypotheses, drawing causal inferences from observed patterns or associations is even more complicated than in other observational research.1 Large data sets do not offer protection from important sources of bias, such as confounding by indication or differences in case mix; in fact, the large sample sizes and multiple comparisons that are part of big data analyses increase the likelihood of finding spurious but significant findings. At the same time, big data approaches that use more diverse data sources—including extraction of data from text notes or selfreported patient data—may greatly improve on traditional administrative data by increasing our ability to measure and control for previously unmeasured confounders.
In the manner that epidemiology developed rules for drawing inferences from observational research data, methodological research is needed to improve causal inference from big data. A report from the National Research Council, “Frontiers in Massive Data Analysis” highlights the need to combine the mathematical and statistical perspectives to guide inferences.2 Research can help identify ways to improve data quality and determine how to address the inherent “noise” in all large data sets caused by missing, erroneous, or non-uniform data.
2. Increasing the clinical utility of big data insights. Big data methods have advanced our ability to predict clinical outcomes and costs for individual patients. Large data sets also increase the ability to detect clinically distinct sub-populations within a larger group—for example, different adherence patterns for patients taking a given drug. But research is needed to determine how to turn predictions into better interventions and better outcomes. VA has successfully rolled out the Clinical Assessment of Needs (CAN) score which can accurately identify patients at high risk for hospitalization or death. But CAN scores alone don’t tell clinicians how to intervene to lower patients’ risk. In designing an intensive management program for high-risk patients being piloted at five sites in VA, it became evident that high CAN scores reflect a diverse range of patients with distinct needs, from the patient in his or her last months of life needing palliative care to the homeless patient with mental illness who has trouble managing his diabetes. Research can help refine big data outputs to be more clinically useful and then can test how to use them most effectively in clinical care.
3. Exploring the value of non-clinical data. One of the exciting frontiers in big data is the potential value of linking individual clinical data with non-clinical data, such as census, geographic, and social network data. Many of the factors that influence the health of our Veterans lie outside the health care system and in the community, and VA will be attempting to capture community and other patient information as we pursue a vision of population health. Accessing and linking such data, however, is challenging and potentially expensive, so it is important to determine when it adds value.
4. Understanding the human element in “Big Data.” Qualitative research is needed to determine how best to present data so that it improves knowledge and decision making. While these questions are more the domain of health informatics than “big data” specifically, they are critical if we intend to bring big data to the bedside or exam room. If not applied carefully, the new torrent of real-time data could simply inundate clinicians and patients, and might even worsen rather than improve the decisions they make.
The worlds of “little r” and “big R” research need each other to succeed. As the advent of “big data” makes clear, those of us in “Big R” research have much to learn but also much to contribute to the common goal of improving patient outcomes.
- Kaplan, R.M. et al. “Big Data and Large Sample Size: a Cautionary Note on the Potential for Bias.” Clinical and Translational Science 2014; 7:342-6.
- National Research Council. 2013. Frontiers in Massive Data Analysis. Washington, D.C. The National Academies Press.