Advancing Post-Radiotherapy Toxicity Extraction A Novel Privacy-Preserving, Parameter-Efficient Language Model Fine-Tuning
Recommended Citation
Khanmohammadi R, Ghanem AI, Turfa J, Siddiqui M, Elshaikh MA, Bagher-Ebadian H, Movsas B, Chetty IJ, Ghassemi MM, Thind KS. Advancing Post-Radiotherapy Toxicity Extraction A Novel Privacy-Preserving, Parameter-Efficient Language Model Fine-Tuning. Med Phys 2025; 52(10):248.
Document Type
Conference Proceeding
Publication Date
9-30-2025
Publication Title
Med Phys
Keywords
aged, cancer patient, chain of thought prompting, conference abstract, cross validation, human, language model, learning, llama, natural language processing, nonhuman, privacy, prostate cancer, radiotherapy, reasoning, recall, toxicity, Wilcoxon signed ranks test
Abstract
Purpose: Extracting late radiotherapy-induced toxicities from free-text notes using natural language processing is complicated by negative symptom identification, computational demands, and data privacy. This study introduces a novel parameter-efficient fine-tuning method for compact language models, using Low-Rank Adaptation (LoRA) and Chain-of- Thought prompting to improve accuracy and efficiency while maintaining data privacy. Methods: Two Llama-based models (3.2-3B and 3.1-8B) were fine-tuned to extract long-term toxicities from 5,848 expert-labeled clinical notes of 100 prostate cancer patients who received 78-79.2 Gy /39-44 fractions definitive radiation therapy between 2017-2021. LoRA with a rank of 128 was applied, targeting attention and feed-forward layers for efficient parameter tuning and continual learning. Chain-of-Thought prompting was incorporated to improve reasoning during toxicity classification. Five-fold stratified cross-validation was performed with splits of 4,675 training, 584 validation, and 589 testing samples. Models were evaluated for precision, recall, and F1 scores, focusing on negative and positive toxicity symptoms, with statistical significance tested using the Wilcoxon signed-rank test. Results: For the 3.1-8B model, precision, recall, and F1 scores for negative classifications improved from 0.52 [0.49-0.56], 0.90 [0.83-0.91], and 0.64 [0.60-0.70] to 0.98 [0.95-1.00], 0.94 [0.91-0.95], and 0.93 [0.91-0.96], respectively. For positive classifications, precision, recall, and F1 scores increased from 0.83 [0.80-0.85], 0.89 [0.87-0.91], and 0.85 [0.83-0.87] to 0.93 [0.90-0.97], 1.00 [0.95-1.00], and 0.95 [0.93-0.96], respectively. The 3.2-3B model showed similar improvements, with F1 scores for negative classifications rising from 0.48 [0.44- 0.52] to 0.87 [0.81-0.91], and for positive classifications from 0.63 [0.60-0.68] to 0.83 [0.76-0.85]. All improvements were statistically significant (p<0.05, Wilcoxon signed-rank test). Conclusion: This novel fine-tuning approach significantly improves compact language model performance in extracting radiotherapy-induced toxicities, particularly for negative toxicity symptoms. This efficient method provides a privacy-preserving solution for automated toxicity extraction and monitoring in radiation oncology.
Volume
52
Issue
10
First Page
248
