Change initiatives, such as implementation of new, evidence-based programs frequently have poor success rates, and sustained implementation of new programs remains a challenge in health services. Some research suggests that baseline organizational readiness to change, such as staff attitudes about change, leadership support, and slack resources, may play a major role in successful organization change initiatives. However, much of this research has been retrospective, and relies on measures with limited validation. Recent systematic reviews find few published measures of organizational readiness to change have undergone rigorous validation.
Researchers in the VA Ischemic Heart Disease Quality Enhancement Research Initiative developed a survey, the organizational readiness to change assessment (ORCA), designed to be fielded among clinicians and staff implementing an evidence-based clinical practice. The survey is anchored by an opening "evidence statement" that links the evidence-based practice to the outcome it is meant to achieve. The respondent then answers 77 items organized into 3 scales, corresponding to elements of the Promoting Action on Research Implementation in Health Services (PARIHS) framework: (1) The strength and nature of the evidence for the practice change, as perceived by stakeholders (Evidence); (2) the quality of the organizational context that supports the practice change (Context); and the organization's capacity for internal facilitation of the practice change (Facilitation). The Evidence scale comprise 4 subscales, Context comprises 6 and Facilitation 9. Each subscale is measured with 3 - 6 survey items. The items consist of statements with which respondents express agreement or disagreement, scored on a 5-point Likert scale (1=strongly disagree; 5=strongly agree).
The ORCA is scored at the level of the operational unit making the change (a collective unit, such as a site, a department, or a lab). Subscales are averaged within respondent to form scales, and the scales are then averaged among respondents within each unit to produce aggregate unit-level scores for each scale.
Only the Evidence and Context scales are fielded at baseline, meaning when plans for the evidence-based practice change are sufficiently developed that there is a defined objective. The full ORCA, including the Facilitation scale, may be fielded after implementation activities have commenced. The ORCA can be used as a measure of implementation effectiveness by fielding the survey to the same respondents at multiple points in time and assessing changes in the ORCA scales.
The ORCA has previously been validated in terms of internal-consistency reliability and factor structure, i.e., confirming that the survey questions correlate together according to the predicted sub-scales and scales. In addition, a study of 9 VA substance-used disorder clinics found that some baseline ORCA sub-scales were associated with subsequent implementation of hepatitis prevention practices. Together, these findings were promising. However, there were still psychometric properties of the ORCA that had not been thoroughly evaluated. Notably, the evidence of predictive validity (i.e., the extent to which a baseline ORCA score accurately predicts subsequent implementation) was limited to the analysis of the substance-used disorder clinics; there had not previously been an assessment of inter-rater reliability (i.e., the extent to which respondents agree on assessments of collective readiness to change); and there had not been any research to ascertain discriminant validity (i.e., to ensure that the ORCA was measuring readiness and not respondents general feelings about the organization).
Our goal was to rigorously assess the psychometric properties of the ORCA by extending knowledge about: 1) inter-rater and internal consistency reliabilities; 2) content validity; and 3) criterion validity, including predictive, convergent, and discriminant validities.
This study included cross-sectional and longitudinal secondary data from 4 independent partner projects conducted in VA. Each partner project tested an external-facilitation intervention to improve the implementation of an evidence-based practice. The objective of the first partner project was to improve use of cognitive behavioral therapy for treating depression in primary-care settings, and the ORCA was fielded to primary-care based mental health providers at 20 sites. The objective of the second partner project was to improve use of a personal health record among Veterans with spinal cord injury, and the ORCA was fielded to spinal cord injury team members at 2 sites. The objective of the third partner project was to implement Hepatitis-C screening and treatment referral at substance use disorders clinics, and the ORCA was fielded to substance use disorder and gastroenterology clinicians at 23 sites. The objective of the fourth partner project was to enhance use of tools and strategies to improve metabolic side effect monitoring and management for patients taking antipsychotic medications, and the ORCA was fielded to mental health specialty providers at 12 sites. Each partner project fielded a baseline ORCA within 0-4 months of project initiation; the degree to which the practice was implemented was assessed via self-report 5-6 months after the ORCA was fielded. Two partner studies also fielded measures of job satisfaction in order to test convergent and discriminant validities of the ORCA.
We conducted 2 scale reliability analyses using classic psychometric measures: First, we assessed inter-rater reliability with intra-class correlation coefficients and a multi-item measure of observed agreement compared to random agreement. Second, we assessed internal-consistency reliability with Cronbach's alpha and item-rest correlations.
We assessed content validity using a modified Delphi technique. While the ORCA was originally organized based on the PARIHS framework, we conducted content validation to determine if there were important aspects of readiness to change that we were not measuring. A 9-member expert panel identified 15 conceptual domains critical for understanding readiness to change, and a separate Delphi panel, consisting of 160 volunteers with varying levels of expertise in implementation science, rated the fit of each ORCA item as an accurate measure of each of the 15 conceptual domains, in order to identify domains inadequately measured by the ORCA (defined being measured by fewer than 3 ORCA items). Expert panel members represented a range of expertise related to implementation, including a quality improvement (QI) expert who had used the ORCA in QI research. The Delphi panel comprised 160 volunteers recruited from attendees at 3 implementation science conferences.
We tested 3 types of criterion validity, predictive, discriminant and convergent validities, using simple Spearman correlations and scatter plots. For predictive validity, the outcome was implementation effectiveness and the independent variables were the ORCA scales. Convergent and discriminant validities examined Pearson correlations between measures of different aspects of job satisfaction that should have greater or lesser associations with the ORCA scales.
From the 4 partner projects, we obtained ORCA data from a total of 53 sites (i.e., units) with 130 respondents to the Evidence scale (2.5 respondents per site) and 140 respondents to the Context scale (2.6 respondents per site). In partner project 1, the response rate was 65%, with 18 of 20 sites returning one or more surveys. In partner project 2, the response rate was 96%, with 2 of 2 sites returning surveys. In partner project 3, a response rate was not possible to calculate because the denominator of respondents was not tracked; however 21 of 23 sites returned one or more surveys (60 surveys total). In partner project 4, the response rate was 87%, with 12 of 12 sites returning surveys.
The ORCA subscales and scales exhibited good internal-consistency reliability (i.e., items intended to measure the same concept were highly correlated), with the exception of the Evidence scale, which had poor reliability for some subscales in each of the partner projects, although different Evidence subscales exhibited poor internal-consistency reliability in different partner projects. The site-level mean scores, across all sites, also exhibited limited variation.
We found mixed results for inter-rater reliability. The Evidence and Context scale exhibited overall strong levels of within-site vs. between-site agreement with 20% and 29%, respectively, of ORCA variance attributable to site. However, this was largely due to very strong within-site agreement among a minority of sites (Evidence: 43% of sites; Context: 17% of sites). We also had too few respondents per site to obtain reliable estimates of mean site-level scores. We estimated that we would have needed 9.2 (Evidence) and 5.9 (Context) respondents per site to obtain reliable site-level mean scores.
We also found mixed results for the criterion validation, with negative findings for the predictive validation but positive findings for the discriminant and convergent validation. Sites exhibited substantial variation in the extent of implementation between baseline and follow-up: 6 sites had a negative change (i.e., reduced use of the practice change from baseline to follow-up), 14 sites had a small change, 10 sites had medium changes, and 9 sites had large changes. This is important because predictive validation is predicated on the existence of differences among sites in the outcomes. Neither site-level Evidence nor the Context scales were associated with extent of implementation.
For discriminant and convergent validation we tested the correlation of the ORCA scales with 4 measures of job satisfaction. We predicted that the ORCA scales would be significantly correlated with satisfaction with senior leadership and direct supervision because quality of senior leadership and direct supervision are part of the Context subscales in the ORCA; conversely, we predicted the ORCA scales would have modest or no correlation with overall job satisfaction and satisfaction with pay, as they have limited conceptual overlap with the ORCA scales. As predicted, the Context scale had significant correlations with satisfaction with direct supervision (r=.48, p<.01) and senior management (r=.69, p<.01), and no other correlations were significant between the ORCA scales and job satisfaction questions.
For the content validation, the Delphi survey participants achieved consensus after 2 rounds (73 of 160 participants [45.6%] completed both rounds) and determined that there were 4 conceptual domains (of 15 total) which were well-measured by fewer than 3 items in the ORCA, and thus were deemed as inadequately measured by the ORCA: 1) Compatibility of the evidence-based practice with the user or the setting; 2) users' commitment to implementing the evidence-based practice; 3) users' outcome expectancy; and 4) adaptability of the evidence-based practice change to their local setting. These 4 conceptual domains are areas for further development of the ORCA.
Findings from this study suggest potential revisions to the instrument and raise additional questions about how to effectively use this instrument, or similar instruments, to support implementation activities. Findings suggest that a minimum of 6-10 respondents per unit are needed to adequately measure unit-level readiness with the ORCA. The lack of predictive validity of the ORCA may be due to limited unit-level variance in ORCA scores, or due to need for ORCA items to measure additional domains of readiness such as compatibility and outcome expectancy. These are areas for future research.
- Helfrich CD, Blevins D, Smith JL, Kelly PA, Hogan TP, Hagedorn H, Dubbert PM, Sales AE. Predicting implementation from organizational readiness for change: a study protocol. Implementation science : IS. 2011 Jul 22; 6(1):76.
- Helfrich CD, Blevins D, Kelly P, Smith J, Hogan T, Hagedorn H, Gylys-Colwell I, Orlando RM. Inter-Rater Reliability and Criterion Validity of the Organizational Readiness to Change Assessment in Four Implementation Studies. Poster session presented at: AcademyHealth Annual Research Meeting; 2013 Jun 24; Baltimore, MD.
- Helfrich CD, Blevins D, Kelly PA, Gylys-Colwell IM, Dubbert PM. Using different intra-class correlations to assess inter-rater reliability + inter-rater agreement: Example of organizational readiness to change from three implementation studies. Paper presented at: National Institutes of Health Conference on the Science of Dissemination and Implementation: Research At The Crossroads; 2012 Mar 19; Bethesda, MD.