Filip Korzeniowski,
"Harmonic Analysis of Musical Audio Using Deep Neural Networks."
, 10-2018
Original Titel:
Harmonic Analysis of Musical Audio Using Deep Neural Networks.
Sprache des Titels:
Englisch
Original Kurzfassung:
In this thesis, I consider the automatic extraction of harmonic information
from musical audio. Obtaining such information automatically is relevant not
only for theoretical analyses, but also for commercial applications such as music
tutoring programs or lead sheet generators. I focus on two aspects of harmony?
chords and the global key?and tackle the problem of extracting them using
deep neural networks.
My work on chord recognition constitutes the main part of this thesis. To
recognise chords in the audio, I first develop data-driven feature extraction methods
(or, acoustic models) that outperform hand-engineered ones. I then focus
on modelling chord sequences, and show that doing so on a frame-by-frame
basis (as common in existing chord recognition systems) prevents learning musical
relationships between chords?regardless of the complexity or power of a
sequence model. I also show that such models instead need to operate on higherlevel
chord symbol sequences in order to be useful. I continue by systematically
exploring such chord sequence models based on recurrent neural networks and
show their superiority to finite-context models. Finally, I devise a probabilistic
model that integrates these chord sequence models with acoustic models using
various models of chord duration, and evaluate how the performance of each
model influences the final chord recognition results.
The second part of this thesis concerns key classification. Here, I develop a
convolutional neural network based on traditional key classification pipelines to
create a key classifier that performs better than existing, hand-designed methods.
I then evaluate how well the model generalises over datasets of different musical
genres (a problem existing systems have not solved), and propose adaptations in
training and network structure that enable learning a genre-agnostic model that
outperforms genre-specific models on many available datasets.