Rich data resources in the VA have begun to be utilized to evaluate costs and improve efficiency and quality of care Veterans receive. To date, VA cost data analyses have been limited in their scope and validity due to four special distributional features of health care costs. First, a certain proportion of the population can be expected to incur no health care costs during the study period. Second, non-zero cost observations are highly skewed to the right because a small percentage of patients invariably incur extremely high costs relative to most patients. Third, cost data have a hierarchical structure. Fourth, cost data exhibit heteroscedasticity, meaning that the variance of cost observations is not constant. Without taking these four characteristics into account, statistical analysis of cost data can lead to unreliable inferences. In this project, we have developed robust nonparametric and semiparametric cost models that would allow for the detailed evaluation of costs within the VA.
The project had two primary objectives:
1. To test and validate statistical properties of new nonparametric transformation regression models, and to use these models to assess costs and identify important sources of variation.
2. To apply the models to estimate disease-attributable costs of common conditions in the VA.
Data sources: the study used databases from:
VHA Corporate Data Warehouse (CDW), National Data Systems (NDS), VIReC Medicare Datasets, VA National MS Data Repository, VA Central Cancer Registry (VACCR) and VA Site Tracking (VAST) System.
Population: Veterans in the VHA cohort (known to the VA and eligible or potentially eligible to receive healthcare through the VHA), and in Medicare data during FY2004-2008; age 65 and older, veterans under 65 with certain disabilities, and veterans of all ages with end stage renal disease (ESRD). 6.9 million records were accessed and included for all veterans eligible to receive care within the VA and incurred any cost between FY04 and FY08. No Medicare data outside of the VHA cohort was used.
Main measures: DSS cost, total cost per patient, per month and year.
We developed semiparametric transformation models for dealing with the special features of cost distributions. We then conducted extensive simulation studies to assess finite-sample performance of the proposed methods, compared with existing methods in terms of bias and mean squared errors of estimators and coverage accuracy of confidence intervals. We used regression and matching techniques to develop disease-atributable cost models.
We have developed several new and more accurate statistical models for predicting future health care costs.
First, to address skewed health care costs, we proposed a new nonparametric heteroscedastic linear transformation model, tested and validated it (Ding and Zhou (2013)). Our theoretical results show that the model is more general and robust than the linear transformation models discussed in literature. We have also conducted extensive simulation studies to evaluate finite-sample properties of the newly proposed methods and have demonstrated that our proposed method out preforms the existing ones.
Second, we developed a new semiparametric regression model for longitudinal skewed data, which can provide a much broader class of models than the existing additive and multiplicative models. We have shown, theoretically, that the newly proposed estimators are consistent and asymptotically normally distributed. In simulation studies, we demonstrate that the proposed semiparametric method is robust with little loss of efficiency.
Third, we explored different approaches for studying disease attributable costs of 31 chronic conditions, including the matched method, the typical, demographic-adjusted regression approach, and the comorbidity-adjusted regression. Our analysis showed that the three approaches yielded substantially disease-attributable cost estimates for each of the 31 conditions. For example, patients with diabetes, one of the most common and expensive conditions in VA, the matching approach estimated that attributable portion of these costs due to diabetes was $3,851, while the typical, demographic-adjusted regression approach estimated the disease-attributable cost to be $3,394, and comorbidity-adjusted regression estimated the cost attributable to diabetes to be $1071. All methods appear to be overestimates of the costs attributable to any individual condition as the sum of the individual disease attributable costs would exceed the actual FY2008 expenditure by 252% using the matching approach, 231% using the demographic-adjusted regression approach and 110% using the comorbidity-adjusted regression approach. We concluded that these methods are not recommended for identifying disease-attributable costs and investigators should be aware of their tendency to exaggerate the cost attributable to any individual condition.
Finally, we compared our new nonparametric partially linear single-index transformation model with the existing generalized linear model and the ordinary least square method using the cohort of Veterans with multiple sclerosis (MS). The MS cohort includes 17,681 patients, which were randomly divided into two sets, training set (10,000 patients) and validation set (7,681 patients). We found that the partially linear single-index transformation model fits the data. The generalized linear model is the third best and the ordinary least square is worst.
Our newly proposed regression models will provide value to VA researchers by allowing for multiple complex covariates to be appropriately included in cost models, regardless of their distributional properties. Additionally, our newly proposed regression models have the following statistical advantages over existing methods: (1) For non-zero costs, they allow both the transformation function and error distribution functions to be unknown, and can handle unknown heteroscedasticity. (2) As two-stage regression models for cost data with zero values, they allow the link function for the probability of zero values to be unknown; in addition to the unknown transformation and error distribution functions. (3) The random effects transformation models for clustered cost data allow both the transformation function and the distribution function of random effects to be unknown; in addition to the unknown transformation and error distribution functions. (4) The nonparametric estimators are asymptotically normal with the convergent rate of the squared root of n, the best convergent rate one can expect for a parametric model. In addition, unlike existing tests for heteroscedasticity, our newly proposed nonparametric test will allow us to test for the intrinsic heteroscedasticity of cost data without specifying a parametric form for the mean cost model and heteroscedasticity. Consequently, the improved statistical models developed for this study will allow VA investigators to fully maximize information contained in the VA's cost databases
- Backhus LM, Farjah F, Varghese TK, Cheng AM, Zhou XH, Wood DE, Kessler L, Zeliadt SB. Appropriateness of imaging for lung cancer staging in a national cohort. Journal of clinical oncology : official journal of the American Society of Clinical Oncology. 2014 Oct 20; 32(30):3428-35.
- Chen B, Zhou XH. Doubly Robust Estimates for Binary Longitudinal Data Analysis with Missing Response and Missing Covariates. Biometrics. 2011 Sep 1.
- Makarov DV, Hu E, Walter R, Braithwaite S, Sherman S, Zhou XA, Gross C, Zeliadt SB. Regional Variation and Time Trends in Prostate Cancer Imaging Utilization among Veterans with Incident Disease. Poster session presented at: AcademyHealth Annual Research Meeting; 2014 Jun 10; San Diego, CA.
- Zeliadt SB, Makarov D, Au DH, Backhus LM, Zhou XA. Frequency of unnecessary imaging prior to Choosing Wisely among Veterans diagnosed with low-risk cancer. Paper presented at: AcademyHealth Annual Research Meeting; 2014 Jun 6; San Diego, CA.