Bernhard Lehner,
"Detecting the Presence of Singing Voice in Mixed Music Signals"
, 2018
Original Titel:
Detecting the Presence of Singing Voice in Mixed Music Signals
Sprache des Titels:
Englisch
Original Kurzfassung:
Singing Voice Detection is the problem of automatically identifying the parts of a polyphonic music recording where at least one person is singing. For humans, it does not seem to be a difficult task, regardless of the singers specific voice characteristics, dynamics of articulation, language, and the instrumental background. However, from the perspective of a machine, this is a very complex problem. This is to a considerable degree due to the extreme extent of vocal tone and style diversity. Instruments that share similarities in the sound production with the singing voice further contribute to the complexity of the matter, since they have an inherent risk of being misclassified as vocals. Additionally, the audio signal to be analysed is often extremely distorted by instrumental accompaniment. At the same time, Singing Voice Detection would provide useful input for many practical applications that the music listening experiences of tomorrow can build upon, like real-time tracking and synchronisation or automatic transcription of lyrics. In this thesis, I present the results of my approach to this task, where special attention was paid to keep the method light-weight and real-time capable. My work improves upon the state of the art and maintains good discriminative properties for instruments that resem- ble singing voice. Additionally, the resulting singing voice detector is robust to changes of the level of loudness. This is a very important property, and the implications if it is not ful- filled had not been discussed in the literature so far. The discussion along with a strategy for analysing robustness in that regard is another contribution of this thesis. In conclusion, this thesis advances the state of the art in Singing Voice Detection and fosters an understanding of the challenges and pitfalls related to developing and evaluating audio analysis algorithms in general.