Martinez-Zayas G, Almeida FA, Yarmus L, Steinfort D, Lazarus DR, Simoff MJ, Saettele T, Murgu S, Dammad T, Duong DK, Mudambi L, Filner JJ, Molina S, Aravena C, Thiboutot J, Bonney A, Rueda AM, Debiane LG, Hogarth DK, Bedi H, Deffebach M, Sagar AS, Cicenia J, Yu DH, Cohen A, Frye L, Grosu HB, Gildea T, Feller-Kopman D, Casal RF, Machuzak M, Arain MH, Sethi S, Eapen GA, Lam L, Jimenez CA, Ribeiro M, Noor LZ, Mehta A, Song J, Choi H, Ma J, Li L, and Ost DE. Predicting Lymph Node Metastasis in Non-small Cell Lung Cancer: Prospective External and Temporal Validation of the HAL and HOMER Models. Chest 2021; 160(3):1108-1120.
BACKGROUND: Two models, the Help with the Assessment of Adenopathy in Lung cancer (HAL) and Help with Oncologic Mediastinal Evaluation for Radiation (HOMER), were recently developed to estimate the probability of nodal disease in patients with non-small cell lung cancer (NSCLC) as determined by endobronchial ultrasound-transbronchial needle aspiration (EBUS-TBNA). The objective of this study was to prospectively externally validate both models at multiple centers.
RESEARCH QUESTION: Are the HAL and HOMER models valid across multiple centers?
STUDY DESIGN AND METHODS: This multicenter prospective observational cohort study enrolled consecutive patients with PET-CT clinical-radiographic stages T1-3, N0-3, M0 NSCLC undergoing EBUS-TBNA staging. HOMER was used to predict the probability of N0 vs N1 vs N2 or N3 (N2|3) disease, and HAL was used to predict the probability of N2|3 (vs N0 or N1) disease. Model discrimination was assessed using the area under the receiver operating characteristics curve (ROC-AUC), and calibration was assessed using the Brier score, calibration plots, and the Hosmer-Lemeshow test.
RESULTS: Thirteen centers enrolled 1,799 patients. HAL and HOMER demonstrated good discrimination: HAL ROC-AUC = 0.873 (95%CI, 0.856-0.891) and HOMER ROC-AUC = 0.837 (95%CI, 0.814-0.859) for predicting N1 disease or higher (N1|2|3) and 0.876 (95%CI, 0.855-0.897) for predicting N2|3 disease. Brier scores were 0.117 and 0.349, respectively. Calibration plots demonstrated good calibration for both models. For HAL, the difference between forecast and observed probability of N2|3 disease was +0.012; for HOMER, the difference for N1|2|3 was -0.018 and for N2|3 was +0.002. The Hosmer-Lemeshow test was significant for both models (P = .034 and .002), indicating a small but statistically significant calibration error.
INTERPRETATION: HAL and HOMER demonstrated good discrimination and calibration in multiple centers. Although calibration error was present, the magnitude of the error is small, such that the models are informative.