Surgery Articles

Battle of the Bots: Assessing the Ability of Four Large Language Models to Tackle Different Surgery Topics

M. Madi
Tarek Araji, Henry Ford HealthFollow
D. Hazimeh
Souheil W W. Adra

Recommended Citation

Madi M, Araji T, Hazimeh D, and Adra SW. Battle of the Bots: Assessing the Ability of Four Large Language Models to Tackle Different Surgery Topics. Am Surg 2025.

Document Type

Article

Publication Date

5-26-2025

Publication Title

The American surgeon

Abstract

Objective: Our study aims to compare the performance of different large language model chatbots on surgical questions of different topics and categories.

Materials and Methods: Four different chatbots (ChatGPT 4.0, Medical Chat, Google Bard, and Copilot Ai) were used for our study. 114 multiple-choice surgical questions covering 9 different topics were entered into each chatbot, and their answers were recorded.

Results: The performance of ChatGPT was significantly better than Bard (P < 0.0001) and Medical Chat (P = 0.0013) but not significantly better than Copilot (P = 0.9663). We also found a statistically significant difference in ENT (P = 0.0199) and GI (P = 0.0124) questions between each chatbot when we assessed their performances per surgical specialty. Finally, the mean scores of Bard, Copilot, Medical Chat, and ChatGPT 4.0 on the diagnosis questions were higher than those in the management questions. The difference was only statistically significant, however, for Bard (P = 0.0281).

Conclusion: Our study offers insight into the performance of different chatbots on surgery-related questions and topics. The strengths and shortcomings of each can provide us with a better understanding of how to use Chatbots in the surgical field, including surgical education.

Medical Subject Headings

chatbots; educational tools; innovation; large language models; resident education; surgical education

PubMed ID

40420550

ePublication

ePub ahead of print

First Page

31348251346538

Last Page

31348251346538

Find It @ Sladen

COinS

Surgery Articles

Battle of the Bots: Assessing the Ability of Four Large Language Models to Tackle Different Surgery Topics

Recommended Citation

Document Type

Publication Date

Publication Title

Abstract

Medical Subject Headings

PubMed ID

ePublication

First Page

Last Page

Browse

Author Corner

Surgery Articles

Battle of the Bots: Assessing the Ability of Four Large Language Models to Tackle Different Surgery Topics

Authors

Recommended Citation

Document Type

Publication Date

Publication Title

Abstract

Medical Subject Headings

PubMed ID

ePublication

First Page

Last Page

Share

Browse

Author Corner