Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification
Sprache des Vortragstitels:
Englisch
Original Tagungtitel:
Detection and Classification of Acoustic Scenes and Events 2019
Sprache des Tagungstitel:
Englisch
Original Kurzfassung:
Distribution mismatches between the data seen at training and at
application time remain a major challenge in all application areas
of machine learning. We study this problem in the context of machine
listening (Task 1b of the DCASE 2019 Challenge). We propose
a novel approach to learn domain-invariant classifiers in an
end-to-end fashion by enforcing equal hidden layer representations
for domain-parallel samples, i.e. time-aligned recordings from different
recording devices. No classification labels are needed for
our domain adaptation (DA) method, which makes the data collection
process cheaper. We show that our method improves the target
domain accuracy for both a toy dataset and an urban acoustic
scenes dataset. We further compare our method to Maximum Mean
Discrepancy-based DA and find it more robust to the choice of DA
parameters. Our submission, based on this method, to DCASE 2019
Task 1b gave us the 4th place in the team ranking.