Advancing Post-Radiotherapy Toxicity Extraction A Novel Privacy-Preserving, Parameter-Efficient Language Model Fine-Tuning

Document Type

Conference Proceeding

Publication Date

9-30-2025

Publication Title

Med Phys

Keywords

aged, cancer patient, chain of thought prompting, conference abstract, cross validation, human, language model, learning, llama, natural language processing, nonhuman, privacy, prostate cancer, radiotherapy, reasoning, recall, toxicity, Wilcoxon signed ranks test

Abstract

Purpose: Extracting late radiotherapy-induced toxicities from free-text notes using natural language processing is complicated by negative symptom identification, computational demands, and data privacy. This study introduces a novel parameter-efficient fine-tuning method for compact language models, using Low-Rank Adaptation (LoRA) and Chain-of- Thought prompting to improve accuracy and efficiency while maintaining data privacy. Methods: Two Llama-based models (3.2-3B and 3.1-8B) were fine-tuned to extract long-term toxicities from 5,848 expert-labeled clinical notes of 100 prostate cancer patients who received 78-79.2 Gy /39-44 fractions definitive radiation therapy between 2017-2021. LoRA with a rank of 128 was applied, targeting attention and feed-forward layers for efficient parameter tuning and continual learning. Chain-of-Thought prompting was incorporated to improve reasoning during toxicity classification. Five-fold stratified cross-validation was performed with splits of 4,675 training, 584 validation, and 589 testing samples. Models were evaluated for precision, recall, and F1 scores, focusing on negative and positive toxicity symptoms, with statistical significance tested using the Wilcoxon signed-rank test. Results: For the 3.1-8B model, precision, recall, and F1 scores for negative classifications improved from 0.52 [0.49-0.56], 0.90 [0.83-0.91], and 0.64 [0.60-0.70] to 0.98 [0.95-1.00], 0.94 [0.91-0.95], and 0.93 [0.91-0.96], respectively. For positive classifications, precision, recall, and F1 scores increased from 0.83 [0.80-0.85], 0.89 [0.87-0.91], and 0.85 [0.83-0.87] to 0.93 [0.90-0.97], 1.00 [0.95-1.00], and 0.95 [0.93-0.96], respectively. The 3.2-3B model showed similar improvements, with F1 scores for negative classifications rising from 0.48 [0.44- 0.52] to 0.87 [0.81-0.91], and for positive classifications from 0.63 [0.60-0.68] to 0.83 [0.76-0.85]. All improvements were statistically significant (p<0.05, Wilcoxon signed-rank test). Conclusion: This novel fine-tuning approach significantly improves compact language model performance in extracting radiotherapy-induced toxicities, particularly for negative toxicity symptoms. This efficient method provides a privacy-preserving solution for automated toxicity extraction and monitoring in radiation oncology.

Volume

52

Issue

10

First Page

248

Share

COinS