Florian Schmid, Shahed Masoudian, Khaled Koutini, Gerhard Widmer,
"Knowledge Distillation From Transformers For Low-Complexity Acoustic Scene Classification"
: Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), 2022
Original Titel:
Knowledge Distillation From Transformers For Low-Complexity Acoustic Scene Classification
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022)
Original Kurzfassung:
Knowledge Distillation (KD) is known for its ability to compress large models into low-complexity solutions while preserving high predictive performance. In Acoustic Scene Classification (ASC),
this ability has recently been exploited successfully, as underlined by three of the top four systems in the low-complexity ASC task of the DCASE?21 challenge relying on KD. Current KD solutions for ASC mainly use large-scale CNNs or specialist ensembles to
derive superior teacher predictions. In this work, we use the Audio Spectrogram Transformer model PaSST, pre-trained on Audioset, as a teacher model. We show how the pre-trained PaSST model can be properly trained downstream on the TAU Urban Acoustic
Scenes 2022 Mobile development dataset and how to distill the knowledge into a low-complexity CNN student. We study the effect of using teacher ensembles, using teacher predictions on extended audio sequences, and using Audioset as an additional dataset for
knowledge transfer. Additionally, we compare the effectiveness of Mixup and Freq-MixStyle to improve performance and enhance device generalization. The described system achieved rank 1 in the Low-complexity ASC Task of the DCASE?22 challenge