Predicting risk of suicidal behavior from insurance claims data vs. linked data from insurance claims and electronic health records

Document Type


Publication Date


Publication Title

Pharmacoepidemiology and drug safety


PURPOSE: Observational studies assessing effects of medical products on suicidal behavior often rely on health record data to account for pre-existing risk. We assess whether high-dimensional models predicting suicide risk using data derived from insurance claims and electronic health records (EHRs) are superior to models using data from insurance claims alone.

METHODS: Data were from seven large health systems identified outpatient mental health visits by patients aged 11 or older between 1/1/2009 and 9/30/2017. Data for the 5 years prior to each visit identified potential predictors of suicidal behavior typically available from insurance claims (e.g., mental health diagnoses, procedure codes, medication dispensings) and additional potential predictors available from EHRs (self-reported race and ethnicity, responses to Patient Health Questionnaire or PHQ-9 depression questionnaires). Nonfatal self-harm events following each visit were identified from insurance claims data and fatal self-harm events were identified by linkage to state mortality records. Random forest models predicting nonfatal or fatal self-harm over 90 days following each visit were developed in a 70% random sample of visits and validated in a held-out sample of 30%. Performance of models using linked claims and EHR data was compared to models using claims data only.

RESULTS: Among 15 845 047 encounters by 1 574 612 patients, 99 098 (0.6%) were followed by a self-harm event within 90 days. Overall classification performance did not differ between the best-fitting model using all data (area under the receiver operating curve or AUC = 0.846, 95% CI 0.839-0.854) and the best-fitting model limited to data available from insurance claims (AUC = 0.846, 95% CI 0.838-0.853). Competing models showed similar classification performance across a range of cut-points and similar calibration performance across a range of risk strata. Results were similar when the sample was limited to health systems and time periods where PHQ-9 depression questionnaires were recorded more frequently.

CONCLUSION: Investigators using health record data to account for pre-existing risk in observational studies of suicidal behavior need not limit that research to databases including linked EHR data.

Medical Subject Headings

Humans; Suicidal Ideation; Electronic Health Records; Semantic Web; Self-Injurious Behavior; Insurance

PubMed ID



ePub ahead of print





First Page


Last Page