Florian Schmid, Shahed Masoudian, Khaled Koutini, Gerhard Widmer,
"Distilling Knowledge For Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer"
: in Detection and Classification of Acoustic Scenes and Events (DCASE2022 Challenge), Technical Report, 2022
Original Titel:
Distilling Knowledge For Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer
Sprache des Titels:
Englisch
Original Buchtitel:
in Detection and Classification of Acoustic Scenes and Events (DCASE2022 Challenge), Technical Report
Original Kurzfassung:
In this technical report, we describe the CP-JKU team?s submission for Task 1 Low-Complexity Acoustic Scene Classification of
the DCASE 22 challenge. We use Knowledge Distillation to teach low-complexity CNN student models from Patchout Spectrogram Transformer (PaSST) models. We use the pre-trained PaSST
models on Audioset and fine-tune them on the TAU Urban Acoustic Scenes 2022 Mobile development dataset. We experiment with
using an ensemble of teachers, different receptive fields of the student models, and mixing frequency-wise statistics of spectrograms
to enhance generalization to unseen devices. Finally, the student models are quantized in order to perform inference computations
using 8 bit integers, simulating the low-complexity constraints of edge devices.