Leveraging AI to Personalize Symptom Assessment for Bipolar Disorder: A Comparative Study of LLM-driven PHQ-9 Tailoring
Recommended Citation
Ramekar S, Liu Y, Breitzig M, Kong L, Saunders E, Liu G. Leveraging AI to Personalize Symptom Assessment for Bipolar Disorder: A Comparative Study of LLM-driven PHQ-9 Tailoring. Bipolar Disorders 2025; 27:S105.
Document Type
Conference Proceeding
Publication Date
9-12-2025
Publication Title
Bipolar Disorders
Abstract
Introduction: This study evaluates the ability of large language model (LLM)-based tools in customizing the Patient Health Questionnaire-9 (PHQ-9) for individuals with bipolar disorder. Method: We simulated 50 cases with a diverse background in demographics and other characteristics such as age, gender, bipolar type, socioeconomic status, interests/hobbies, and social/ emotional characteristics/tendency. ChatGPT®, Gemini®, Microsoft Copilot®, and Claude® were used to adapt the standard PHQ-9 questions to individual cases. A qualitative analysis was carried out by two independent evaluators to assess the quality of PHQ-9 adaptation, contextual relevance, linguistic sensitivity, clarity, and the level of personalization. Results: ChatGPT had the most success, followed by Claude, then Gemini and Copilot. Adaptations by ChatGPT were highly tailored to individual's background with sensitive tone, and more conversational and fiducial to original questions. For one simulated case: Alice, a female college freshman student, upper-middle class, Bipolar I, finance major, on College Track and Field team, a PHQ-9 question 'Little interest or pleasure in doing things' was personalized by ChatGPT as 'Alice, have you felt like you're not really enjoying things you normally do? Like, have you been feeling less excited about practice, hanging out with friends, or even school activities lately?'. Questions generated by Claude had a more empathetic tone, but deviated stylistically from the originals. Less or minimal personalization was achieved by Gemini and Copilot. Conclusion: Our study demonstrated the utility of LLM-based tool in personalizing PHQ-9 questionnaire. Further studies are warranted to investigate if this AI-alternative may help promote patient engagement and improve the accuracy of assessment.
Volume
27
First Page
S105
