Publikationsdetails

Zitat:	Florian Schmid, Tobias Morocutti, Jonathan Greif, Gerhard Widmer, "Multi-Iteration Multi-Stage Fine-Tuning of Transformers for Sound Event Detection with Heterogeneous Datasets" : Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024),, 2024
Original Titel:	Multi-Iteration Multi-Stage Fine-Tuning of Transformers for Sound Event Detection with Heterogeneous Datasets
Sprache des Titels:	Englisch
Original Buchtitel:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024),
Original Kurzfassung:	A central problem in building effective sound event detection systems is the lack of high-quality, strongly annotated sound event datasets. For this reason, Task 4 of the DCASE 2024 challenge proposes learning from two heterogeneous datasets, including audio clips labeled with varying annotation granularity and with different sets of possible events. We propose a multi-iteration, multi-stage procedure for fine-tuning Audio Spectrogram Transformers on the joint DESED and MAESTRO Real datasets. The first stage closely matches the baseline system setup and trains a CRNN model while keeping the pre-trained transformer model frozen. In the second stage, both CRNN and transformer are fine-tuned using heavily weighted self-supervised losses. After the second stage, we compute strong pseudo-labels for all audio clips in the training set using an ensemble of fine-tuned transformers. Then, in a second iteration, we repeat the two-stage training process and include a distillation loss based on the pseudo-labels, achieving a new single-model, state-of-the-art performance on the public evaluation set of DESED with a PSDS1 of 0.692. A single model and an ensemble, both based on our proposed training procedure, ranked first in Task 4 of the DCASE Challenge 2024.
Sprache der Kurzfassung:	Englisch
Erscheinungsjahr:	2024
Anzahl der Seiten:	5
URL zu weiteren Infos:	https://dcase.community/documents/workshop2024/proceedings/DCASE2024Workshop_Schmid_28.pdf
Reichweite:	international
Publikationstyp:	Aufsatz / Paper in Tagungsband (referiert)
Autoren:	Florian Schmid, Tobias Morocutti, Jonathan Greif, Gerhard Widmer
Forschungseinheiten:	Institut für Computational Perception

Wissenschaftsgebiete:	Informatik (ÖSTAT:102) Artificial Intelligence (ÖSTAT:102001) Bildverarbeitung (ÖSTAT:102003) Informationssysteme (ÖSTAT:102015) Audiovisuelle Medien (ÖSTAT:202002)

fodok.jku.at

Benutzerbetreuung: Sandra Winzer, letzte Änderung:

Johannes Kepler Universität (JKU) Linz, Altenbergerstr. 69, A-4040 Linz, Austria
Telefon + 43 732 / 2468 - 9121, Fax + 43 732 / 2468 - 29121, Internet www.jku.at, Impressum