Publikationsdetails

Zitat:	Florian Schmid, Khaled Koutini, Gerhard Widmer, "Efficient Large-Scale Audio Tagging Via Transformer-to-CNN Knowledge Distillation" : Proceedinbgs of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, Seite(n) pp 1-5, 5-2023
Original Titel:	Efficient Large-Scale Audio Tagging Via Transformer-to-CNN Knowledge Distillation
Sprache des Titels:	Englisch
Original Buchtitel:	Proceedinbgs of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
Original Kurzfassung:	Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. The proposed training schema and the efficient CNN design based on MobileNetV3 results in models outperforming previous solutions in terms of parameter and computational efficiency and prediction performance. We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of .483 mAP on AudioSet.
Sprache der Kurzfassung:	Englisch
Seitenreferenz:	pp 1-5
Erscheinungsmonat:	5
Erscheinungsjahr:	2023
Anzahl der Seiten:	5
DOI:	10.1109/ICASSP49357.2023.10096110
URL zu weiteren Infos:	https://ieeexplore.ieee.org/abstract/document/10096110
Reichweite:	international
Publikationstyp:	Aufsatz / Paper in Tagungsband (referiert)
Autoren:	Florian Schmid, Khaled Koutini, Gerhard Widmer
Forschungseinheiten:	Institut für Computational Perception

Wissenschaftsgebiete:	Informatik (ÖSTAT:102) Artificial Intelligence (ÖSTAT:102001) Bildverarbeitung (ÖSTAT:102003) Informationssysteme (ÖSTAT:102015) Audiovisuelle Medien (ÖSTAT:202002)

fodok.jku.at

Benutzerbetreuung: Sandra Winzer, letzte Änderung:

Johannes Kepler Universität (JKU) Linz, Altenbergerstr. 69, A-4040 Linz, Austria
Telefon + 43 732 / 2468 - 9121, Fax + 43 732 / 2468 - 29121, Internet www.jku.at, Impressum