Gastroenterology Articles

Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries

Recommended Citation

Maida M, Papaefthymiou A, Gupta S, Voiosu T, Lau LHS, Baraldo S, Pal P, Mwachiro M, Zuchelli T, Uchima H, Aguila E, Bouberra D, Degroote H, Düzenli T, Gameel A, Khurelbaatar T, Lakkasani S, Luvsandagva B, Maulahela H, Nobre R, Okubo Y, Rimondi A, Taiymi A, and Mostafa I. Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries. Dig Liver Dis 2025;58(2):250-257.

Document Type

Article

Publication Date

2-1-2026

Publication Title

Digestive and liver disease : official journal of the Italian Society of Gastroenterology and the Italian Association for the Study of the Liver

Keywords

Humans, Colorectal Neoplasms, Early Detection of Cancer, Language, Comprehension, Surveys and Questionnaires, Asia, Multilingualism, Europe, Male, Female, Mass Screening, Africa, Large Language Models

Abstract

BACKGROUND: Colorectal cancer (CRC) screening reduces incidence and mortality, yet patient adherence remains suboptimal. Large language models may improve participation by addressing patient questions in native languages, but their multilingual performance has not been systematically assessed.

METHODS: From April to June 2025, we conducted a cross-continental study involving 28 countries and 23 languages. A standardized set of 15 CRC screening-related questions was translated into each language and submitted to ChatGPT (GPT-4o). Responses were independently evaluated by 140 gastroenterologists (five per country) for accuracy, completeness, and comprehensibility on a 5-point Likert scale. Statistical analyses included t-test, Chi-square, and two-way ANOVA.

RESULTS: The study included experts and data from Europe, Asia, Africa, America, and Oceania. Mean scores (±SD) for accuracy, completeness, and comprehensibility were 4.1 ± 1.0, 4.1 ± 1.0, and 4.2 ± 0.9, respectively. Most languages achieved high ratings, with 73.9%, 86.9%, and 82.6% scoring ≥4 for accuracy, completeness, and comprehensibility. However, lower scores were observed in Chinese, Dutch, and Greek. Variability was also noted between countries sharing the same language, highlighting language- and context-dependent performance.

DISCUSSION: ChatGPT showed strong ability to answer CRC screening questions across multiple languages, supporting its promise as a multilingual patient education tool. Nonetheless, regional variability requires careful validation before clinical integration.

Medical Subject Headings

Humans; Colorectal Neoplasms; Early Detection of Cancer; Language; Comprehension; Surveys and Questionnaires; Asia; Multilingualism; Europe; Male; Female; Mass Screening; Africa; Large Language Models

PubMed ID

41436291

ePublication

ePub ahead of print

Volume

Issue

First Page

250

Last Page

257

Find It @ Sladen

COinS

Gastroenterology Articles

Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries

Recommended Citation

Document Type

Publication Date

Publication Title

Keywords

Abstract

Medical Subject Headings

PubMed ID

ePublication

Volume

Issue

First Page

Last Page

Browse

Author Corner

Gastroenterology Articles

Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries

Authors

Recommended Citation

Document Type

Publication Date

Publication Title

Keywords

Abstract

Medical Subject Headings

PubMed ID

ePublication

Volume

Issue

First Page

Last Page

Share

Browse

Author Corner