Deep Learning Classification of Neuro-Oncology Medical Documents
Wells M, Sabedot T, Malta T, Snyder J, Poisson L, and Noushmehr H. Deep Learning Classification of Neuro-Oncology Medical Documents. Cancer Res 2019; 79(13).
Introduction Precision medicine and big data for cancer discovery requires well curated indexed critical health care data, however to date limited resources exist that successfully parse out unstructured clinical data in neuro-oncology. Current practice relies on time consuming manual extraction by researchers or clinicians resulting in data inconsistency and limitation in data set volume. Rule-based natural language processing algorithms could be used for simple consistent text, but medical documents are created longitudinally by multiple people across long periods of time resulting in inconsistencies and semantic heterogeneity that render rule-based techniques insufficient. Methods We applied a deep learning text classification method to multiple clinical document categories including clinical pathology reports and a text based clinical database spanning 17 years of clinical narratives with approximately 4000 unique cases. For this study we identified clinically relevant molecular criteria for glioma outlined in the WHO 2016 CNS classification of tumors including IDH mutation, MGMT methylation, and 1p19q co-deletion status. Using a convolutional neural network with two densely connected layers of 30 rectified linear nodes we were able to classify patients into their respected molecular cohort with an accuracy of 98%. Conclusion Parsing of unstructured text based clinical narratives and pathology reports using convolutional neural networks is a promising method to extract heterogeneous molecular data in neuro-oncology for large scale data analysis.