Accurate Prostate Lesion Classification Using Convolutional Neural Network and Weighted Extreme Learning Machine
Zong W, Lee J, Liu C, Carver E, Mohamed E, Chetty I, Pantelic M, Hearshen D, Movsas B, and Wen N. Accurate Prostate Lesion Classification Using Convolutional Neural Network and Weighted Extreme Learning Machine. J Med Phys 2019; 46(6):e108.
J Med Phys
Purpose: To accurately classify lesion malignancy for prostate cancer (PCa) patients from multiparametric MR imaging (mpMRI), and to tackle the small data sample problem by taking advantage of feature extraction power from convolutional neural networks (CNNs) and discriminant power from traditional shallow classifiers. Methods: A retrospective collection of 201 patients with 320 lesions from the SPIE-AAPM-NCI PROSTATEx Challenge (https://doi.org/10.7937/k9tcia.2017.murs5cl) was used for training and validation. Lesions with biopsy-proven Gleason Grading Group 1 were defined as benign and 2 and above as malignant. All patients were scanned with an mpMRI protocol including T2-weighted (T2W), diffusion-weighted and dynamic contrast-enhanced (DCE) imaging. All modalities were registered to T2W. Image rotation and scaling were used to augment data size in order to minimize bias caused by imbalance between numbers of malignant vs. benign lesions. A 4-convolutional layer CNN was trained from scratch. Features learned in each layer were then visualized and quantitively assessed by inputting them to a weighted extreme learning machine (wELM) classifier, which automatically emphasizes samples from minority categories. Results: Experiments on 10-fold cross validation showed the most accurate combination of modalities to be T2W, apparent diffusion coefficient (ADC) and B-value maps (b = 50 s/mm2), and optimal patch sizes range from 30×30 to 34×34 pixels. During phase 1 of CNN training, results of Sensitivity, Specificity, G-mean over 10 folds, shown as mean (std.) were as follows: 0.53 (0.22), 0.83 (0.05), and 0.65 (0.14), respectively. Features from 1st convolutional layer were found to be the most discriminating with respective results as follows: 0.80 (0.16), 0.79 (0.11), 0.78 (0.09). Conclusion: This work found substantial performance improvement by combing features learned by CNN and shallow classifier wELM versus CNN alone. By qualitative and quantitative evaluation of features learned from each layer of CNN, better understanding of how deep learning models interpret medical images has been gained.