The Reliability of the Tönnis Grading System in Patients Undergoing Hip Preservation

Document Type


Publication Date


Publication Title

The American journal of sports medicine


BACKGROUND: The presence of pre-existing osteoarthritis (OA) has been associated with poor results after hip arthroscopic surgery. There is limited evidence validating the currently available grading systems of hip OA in patients undergoing hip preservation.

PURPOSE/HYPOTHESIS: Our purpose was to evaluate the interobserver and intraobserver reliabilities of 2 grading systems in a group of patients undergoing hip preservation: the Tönnis grading system and a simple 4-choice Likert scale. The hypothesis was that interobserver and intraobserver reliabilities using the Tönnis grading system would be poor among surgeons experienced in hip preservation and that a 4-choice Likert scale would be more reliable.

STUDY DESIGN: Cohort study (diagnosis); Level of evidence, 3.

METHODS: A total of 100 hip radiographs were reviewed by 8 experienced hip preservation surgeons. Overall, 2 rounds of reviews were performed, at least 3 weeks apart, assessing for the presence, degree, and/or location of joint space narrowing, joint space asymmetry, subchondral cysts, osteophytes, and sclerosis. The radiographs were assigned a Tönnis grade as well as a Likert grade of OA, reported as none, mild, moderate, or severe. Statistical analysis was conducted to provide Fleiss kappa values with 95% CIs. Agreement was classified as poor for0.80.

RESULTS: A total of 50 patients (28 female and 22 male) with a mean age of 42.8 ± 14.2 years (range, 19-70 years) were reviewed. The Tönnis grade demonstrated an interobserver kappa value of 0.30 (95% CI, 0.26-0.34). The Likert grade demonstrated an interobserver kappa value of 0.33 (95% CI, 0.28-0.37). All other measures demonstrated interobserver kappa values classified as slight or fair except for subchondral cysts which was moderate. Intraobserver reliabilities were statistically significantly higher than interobserver reliabilities. Intraobserver reliabilities for both the Tönnis grade (κ = 0.55 [95% CI, 0.51-0.60]) and Likert grade (κ = 0.59 [95% CI, 0.55-0.63]) demonstrated similar kappa values, consistent with moderate agreement. Subchondral cysts demonstrated the strongest interobserver (κ = 0.53) and intraobserver (κ = 0.85) reliabilities.

CONCLUSION: Interobserver and intraobserver reliabilities were fair and moderate, respectively, for grading OA. Given the limited interobserver reliability, caution should be used when interpreting and translating studies that utilize the Tönnis grade or other rating to dictate treatment algorithms.

Medical Subject Headings

Humans; Male; Female; Adult; Middle Aged; Cohort Studies; Reproducibility of Results; Osteoarthritis, Hip; Arthroscopy; Radiography; Observer Variation

PubMed ID



ePub ahead of print





First Page


Last Page