Paul Primus, Gerhard Widmer,
"Improving Natural-Language-based AudioRetrieval with Transfer Learning and Audio & Text Augmentations"
: Proceedings of the Detection and Classificationof Acoustic Scenes and Events 2022 Workshop (DCASE2022), 2022
Original Titel:
Improving Natural-Language-based AudioRetrieval with Transfer Learning and Audio & Text Augmentations
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the Detection and Classificationof Acoustic Scenes and Events 2022 Workshop (DCASE2022)
Original Kurzfassung:
The absence of large labeled datasets remains asignificant challenge in manyapplication areas of deep learning. Researchers and practitionerstypicallyresort to transfer learning and data augmentation to alleviatethis issue. Westudy these strategies in the context of audio retrieval withnatural languagequeries (Task 6b of the DCASE 2022 Challenge). Our proposed systemusespre-trained embedding models to project recordings and textualdescriptionsinto a shared audio-caption space in which related examples fromdifferentmodalities are close. We employ various data augmentationtechniques on audioand text inputs and systematically tune their correspondinghyperparameterswith sequential model-based optimization. Our results show thatthe usedaugmentations strategies reduce overfitting and improve retrievalperformance.