Matthias Dorfer, Gerhard Widmer,
"Training General-Purpose Audio Tagging Networks with Noisy Labels and Iterative Self-Verification."
: Proceedings of workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018
Original Titel:
Training General-Purpose Audio Tagging Networks with Noisy Labels and Iterative Self-Verification.
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)
Original Kurzfassung:
This paper describes our submission to the first Freesound generalpurpose
audio tagging challenge carried out within the DCASE
2018 challenge. Our proposal is based on a fully convolutional
neural network that predicts one out of 41 possible audio class labels
when given an audio spectrogram excerpt as an input. What
makes this classification dataset and the task in general special, is
the fact that only 3,700 of the 9,500 provided training examples are
delivered with manually verified ground truth labels. The remaining
non-verified observations are expected to contain a substantial
amount of label noise (up to 30-35% in the ?worst? categories). We
propose to address this issue by a simple, iterative self-verification
process, which gradually shifts unverified labels into the verified,
trusted training set. The decision criterion for self-verifying a training
example is the prediction consensus of a previous snapshot of
the network on multiple short sliding window excerpts of the training
example at hand. On the unseen test data, an ensemble of
three networks trained with this self-verification approach achieves
a mean average precision (MAP@3) of 0.951. This is the second
best out of 558 submissions to the corresponding Kaggle challenge.