Deep active learning for Interictal Ictal Injury Continuum EEG patterns
Ge W, Jing J, An S, Herlopian A, Ng M, Struck AF, Appavu B, Johnson EL, Osman G, Haider HA, Karakis I, Kim JA, Halford JJ, Dhakar MB, Sarkis RA, Swisher CB, Schmitt S, Lee JW, Tabaeizadeh M, Rodriguez A, Gaspard N, Gilmore E, Herman ST, Kaplan PW, Pathmanathan J, Hong S, Rosenthal ES, Zafar S, Sun J, and Westover MB. Deep active learning for Interictal Ictal Injury Continuum EEG patterns. J Neurosci Methods 2021; 351:108966.
Journal of neuroscience methods
OBJECTIVES: Seizures and seizure-like electroencephalography (EEG) patterns, collectively referred to as "ictal interictal injury continuum" (IIIC) patterns, are commonly encountered in critically ill patients. Automated detection is important for patient care and to enable research. However, training accurate detectors requires a large labeled dataset. Active Learning (AL) may help select informative examples to label, but the optimal AL approach remains unclear.
METHODS: We assembled >200,000 h of EEG from 1,454 hospitalized patients. From these, we collected 9,808 labeled and 120,000 unlabeled 10-second EEG segments. Labels included 6 IIIC patterns. In each AL iteration, a Dense-Net Convolutional Neural Network (CNN) learned vector representations for EEG segments using available labels, which were used to create a 2D embedding map. Nearest-neighbor label spreading within the embedding map was used to create additional pseudo-labeled data. A second Dense-Net was trained using real- and pseudo-labels. We evaluated several strategies for selecting candidate points for experts to label next. Finally, we compared two methods for class balancing within queries: standard balanced-based querying (SBBQ), and high confidence spread-based balanced querying (HCSBBQ).
RESULTS: Our results show: 1) Label spreading increased convergence speed for AL. 2) All query criteria produced similar results to random sampling. 3) HCSBBQ query balancing performed best. Using label spreading and HCSBBQ query balancing, we were able to train models approaching expert-level performance across all pattern categories after obtaining ∼7000 expert labels.
CONCLUSION: Our results provide guidance regarding the use of AL to efficiently label large EEG datasets in critically ill patients.