Document Type


Publication Date


Publication Title

JAMA Netw Open


Importance: Unreliable performance measures can mask poor-quality care and distort financial incentives in value-based purchasing.

Objective: To examine the association between test-retest reliability and the reproducibility of hospital rankings.

Design, Setting, and Participants: In a cross-sectional design, Centers for Medicare & Medicaid Services Hospital Compare data were analyzed for the 2017 (based on 2014-2017 data) and 2018 (based on 2015-2018 data) reporting periods. The study was conducted from December 13, 2020, to September 30, 2021. This analysis was based on 28 measures, including mortality (acute myocardial infarction, congestive heart failure, pneumonia, and coronary artery bypass grafting), readmissions (acute myocardial infarction, congestive heart failure, pneumonia, and coronary artery bypass grafting), and surgical complications (postoperative acute kidney failure, postoperative respiratory failure, postoperative sepsis, and failure to rescue).

Exposures: Measure reliability based on test-retest reliability testing.

Main Outcomes and Measures: The reproducibility of hospital rankings was quantified by calculating the reclassification rate across the 2017 and 2018 reporting periods after categorizing the hospitals into terciles, quartiles, deciles, and statistical outliers. Linear regression analysis was used to examine the association between the reclassification rate and the intraclass correlation coefficient for each of the classification systems.

Results: The analytic cohort consisted of 28 measures from 4452 hospitals with a median of 2927 (IQR, 2378-3160) hospitals contributing data for each measure. The hospitals participating in the Inpatient Prospective Payment System (n = 3195) had a median bed size of 141 (IQR, 69-261), average daily census of 70 (IQR, 24-155) patients, and a median disproportionate share hospital percentage of 38.2% (IQR, 18.7%-36.6%). The median intraclass correlation coefficient was 0.78 (IQR, 0.72-0.81), ranging between 0.50 and 0.85. The median reclassification rate was 70% (IQR, 62%-71%) when hospitals were ranked by deciles, 43% (IQR, 39%-45%) when ranked by quartiles, 34% (IQR, 31%-36%) when ranked by terciles, and 3.8% (IQR, 2.0%-6.2%) when ranked by outlier status. Increases in measure reliability were not associated with decreases in the reclassification rate. Each 0.1-point increase in the intraclass correlation coefficient was associated with a 6.80 (95% CI, 2.28-11.30; P = .005) percentage-point increase in the reclassification rate when hospitals were ranked into performance deciles, 4.15 (95% CI, 1.16-7.14; P = .008) when ranked into performance quartiles, 1.47 (95% CI, 1.84, 4.77; P = .37) when ranked into performance terciles, and 3.70 (95% CI, 1.30-6.09; P = .004) when ranked by outlier status.

Conclusions and Relevance: In this study, more reliable measures were not associated with lower rates of reclassifying hospitals using test-retest reliability testing. These findings suggest that measure reliability should not be assessed with test-retest reliability testing.

Medical Subject Headings

Cross-Sectional Studies; Hospitals; Humans; Medicaid; Medicare; Quality of Health Care; Reproducibility of Results; United States

PubMed ID






First Page


Last Page




To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.