Richard Vogl,
"Deep Learning Methods for Drum Transcription and Drum Pattern Generation"
, 2018
Original Titel:
Deep Learning Methods for Drum Transcription and Drum Pattern Generation
Sprache des Titels:
Englisch
Original Kurzfassung:
This thesis is situated in the field of music information retrieval and addresses the tasks of automatic drum transcription and automatic drum pattern generation. Automatic drum transcription deals with the problem of extracting a symbolic representation of the notes played by drum instruments from an audio signal. Automatic drum pattern generation aims at generating novel, musically meaningful and interesting rhythmic patterns involving several percussion instruments. The first part of this thesis focuses on automatic drum transcription. Music transcription from audio is a hard task, which can be challenging even for trained human experts. Challenges in drum transcription are the large variety of sounds for individual instrument types as well as groups of similar sounding instruments like different types of cymbals or tom-toms of varying sizes. The contributions covered by the drum transcription part introduce end-to-end deep learning methods for this task. With these, a new state of the art is established on a variety of public drum transcription datasets, as well as in the MIREX drum transcription competition. Furthermore, two additional objectives are met: (i) adding meta information like bar boundaries, meter, and local tempo to the transcripts, as well as (ii) increasing the number of instruments under observation. While traditionally, only bass drum, snare drum, and hi-hat have been focused on, in this thesis up to 18 different instrument classes are considered. The second part of this thesis deals with automatic drum pattern generation. The goal is to generate patterns which are musically meaningful and indistinguishable from human-created ones, and at the same time are not trivial but interesting. Evaluating generative methods is non-trivial, since quality in this context is subjective. This issue is addressed by conducting qualitative and quantitative user studies for evaluation purposes. Two different models are proposed for drum pattern generation: restricted Boltzmann machines (RBMs) and generative adversarial networks (GANs). While RBMs are comparably easy to train, GANs are more problematic in this respect, requiring more training data; on the other hand, GANs can better handle a greater variety of instruments and higher temporal resolutions. The need for data is met through two different approaches: (i) by creating synthetic large scale drum pattern datasets, and (ii) by leveraging the drum transcription methods from the first part of the thesis to extract drum patterns from real audio. Besides these methodological contributions, different user interfaces for drum pattern generation are implemented and evaluated in user studies. In addition, this thesis offers publicly available datasets and trained models for drum transcription as resources for the research community.