Publikationsdetails

Zitat:	Paul Primus, Gerhard Widmer, "Fusing Audio and Metadata EmbeddingsImproves Language-based Audio" : Proceedings of the 32nd European Signal Processing Conference(EUSIPCO), Lyon, France, 2024
Original Titel:	Fusing Audio and Metadata EmbeddingsImproves Language-based Audio
Sprache des Titels:	Englisch
Original Buchtitel:	Proceedings of the 32nd European Signal Processing Conference(EUSIPCO), Lyon, France
Original Kurzfassung:	Matching raw audio signals with textual descriptionsrequires understanding the audio's content and the description'ssemantics and then drawing connections between the two modalities.This paper investigates a hybrid retrieval system that utilizesaudio metadata as an additional clue to understand the content ofaudio signals before matching them with textual queries. Weexperimented with metadata often attached to audio recordings,such as keywords and natural-language descriptions, and weinvestigated late and mid-level fusion strategies to merge audioand metadata. Our hybrid approach with keyword metadata and latefusion improved the retrieval performance over a content-basedbaseline by 2.36 and 3.69 pp. mAP@10 on the ClothoV2 and AudioCapsbenchmarks, respectively.
Sprache der Kurzfassung:	Englisch
Erscheinungsjahr:	2024
Anzahl der Seiten:	5
URL zu weiteren Infos:	https://arxiv.org/abs/2406.15897
Reichweite:	international
Publikationstyp:	Aufsatz / Paper in Tagungsband (referiert)
Autoren:	Paul Primus, Gerhard Widmer
Forschungseinheiten:	Institut für Computational Perception

Wissenschaftsgebiete:	Informatik (ÖSTAT:102) Artificial Intelligence (ÖSTAT:102001) Bildverarbeitung (ÖSTAT:102003) Informationssysteme (ÖSTAT:102015) Audiovisuelle Medien (ÖSTAT:202002)

fodok.jku.at

Benutzerbetreuung: Sandra Winzer, letzte Änderung:

Johannes Kepler Universität (JKU) Linz, Altenbergerstr. 69, A-4040 Linz, Austria
Telefon + 43 732 / 2468 - 9121, Fax + 43 732 / 2468 - 29121, Internet www.jku.at, Impressum