Develop a pathway level classifier to identify clinically relevant subtypes of glioblastoma

Document Type

Conference Proceeding

Publication Date


Publication Title

Cancer Res


Introduction: Glioblastoma Multiforme (GBM) is a malignant Grade IV brain tumor. With standard treatment, the median survival for adults with GBM, IDH-wildtype, is approximately 11-15 months. We aim at identifying clinically relevant GBM subtypes based on cell signaling pathways information to facilitate personalized medicine. Methods: We rebuilt a MultiPLIER model for GBM using the recount2 compendium with updated V7.0 canonical pathways from the MSigDB database. The MultiPLIER is a machine learning model based on pathway-level information extractor (PLIER), a matrix factorization approach to identify specific pathways that regulate gene expression using a large public dataset and prior biological knowledge from multiple tissues and biological conditions. It has two inputs, the gene expression matrix, and the prior knowledge. PLIER constructed eigengene like latent variables (LVs) to approximate relevant pathways by decomposing gene expression data and a sparse matrix to specify prior information gene sets and pathways in each LV. The MultiPLIER is an unsupervised transfer learning framework to transfer the PLIER model to a specific dataset or disease with smaller sample sizes. Knowledge learned in an extensive collection of datasets can be transferred to a target domain to discover unseen patterns. We used 2315 gene sets, which have 12604 genes as prior biological information in the PLIER training. We produced a decomposition of 903 LVs with 200 LVs with high confidence (area under curve (AUC) of >0.7, false discovery rate (FDR) of <0.05). We then projected the TCGA GBM HT-HG-U133A dataset with 526 primary solid tumor samples and ten solid tissue normal samples to the MultiPLIER 903-dimension space to get a GBM-MultiPLIER model. Results: We used the univariate Cox model to select 169 survival-related LVs specific to GBM subtype studies (p <0.05). Then we used an unsupervised clustering method, consensus clustering (Monti et al., 2003), to discover GBM subtypes. Five subtypes were obtained with the p value 0.00009 of the log-rank test of survival analysis. We also performed Silhouette width, the statistical significance of clustering, and differential expression tests with the Bioconductor package, CancerSubtypes in R. We found the differentially expressed LVs (adjusted p-value<0.01) patterns between each subtype and normal samples. There are also 37 shared LVs among these five subtypes relative to normal samples (adjusted p-value<0.01). The subtype three, which is the best survival subtype, has one LV that has an estimated AUC (Area under the ROC Curve) 0.81, with a 95% confidence interval (0.75,0.86) when subtype 3 compared to other tumor samples. Conclusions: The GBM MultiPLIER model can reveal a consistent pathway or gene set differences across subtypes and capture subtype-specific patterns. These findings provide an opportunity to untangle the underlying biologic meaning further.