B. Lehner, Gerhard Widmer,
"Improving Voice Activity Detection in Movies"
: Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015),, Seite(n) 2942-2946, 2015
Original Titel:
Improving Voice Activity Detection in Movies
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015),
Original Kurzfassung:
Voice Activity Detection in movies is a non-trivial and challenging
task. The different emotional states of the speakers, as
well as the variety of soundscapes and noises contribute to the
complexity of the task. In this paper, we propose a set of lightweight
features that are specifically designed to perform under
such conditions, while at the same time preventing confusions
of singing voice with speech. For evaluation, we use four fulllength
movies, previously unseen to the system and painstakingly
annotated. We compare our detector to a state-of-the-art
reference system. The new approach performs better, yielding
just about half the Equal Error Rate (EER). Furthermore, since
the ground truth annotation task is extremely tedious, and to
help with advancing in this topic, we release the annotations of
all four movies to the research community.
Index Terms: Voice Activity Detection, Speech Detection