Gleason grade group predictions from mp-MRI of prostate cancer patients using auto deep learning

Document Type

Conference Proceeding

Publication Date


Publication Title

Cancer Res


Gleason Grade Group Predictions from mp-MRI of Prostate Cancer Patients using Automated Deep Learning Though histopathology remains the gold standard, there have been significant interests in predicting Gleason Grade using noninvasive imaging such as mp-MRI. Most studies simplify the task into binary classification for the high uncertainty at each group. Handcrafted radiomic features were heavily investigated but prone to errors from the definition of region of interest, feature extraction variations, etc. We proposed an automated deep learning framework (auto-Keras) to predict the group directly based on the 3D data of the whole prostate gland. The training cohort A consisted of 96 PCa patients from SPIE-AAPM-NCI Challenge. The number of patients in each Group was 30, 35, 18, 7, and 6. The testing cohort B consisted of 34 PCa patients from our institute (10, 14, 4, 3, 3). We resampled and rigidly registered ADC and T2WI. N4-bias correction was applied to correct the non-uniformity. For each slice, we performed Gaussian blurring followed by prostate cropping from contour delineated by two clinicians.We tested five scenarios, including input of T2WI, ADC, both, two parallel inputs followed by feature concatenation, and prediction ensemble. The search space of augmentation included translation, flip, rotation, zooming, and contrast. The search space of the architectures had vanilla, ResNet, and Xception. With ADC alone, the model detected 75% of patients in Group 3. Using T2WI and ADC as input, 46% of Group 2 and 40% of Group 1 were identified. Since GG 2 is less aggressive and has a favorable outcome, we further studied the performance of classifying 1 VS. 2-5 and 1-2 VS. 3-5. The models' precision and recall were 91% and 72% for 1-2, 60% and 24% for 3-5. We separated 1 VS. 2-5, with a 96% precision and 73% recall for 2-5. The model had a better performance to predict lower GG when the input contained both T2WI and AD, and better at higher GG when the features were concatenated at the output level.