Papers by Jake Drysdale

Many recent approaches to creative transformations of musical audio have been motivated by the su... more Many recent approaches to creative transformations of musical audio have been motivated by the success of raw audio generation models such as WaveNet, in which audio samples are modeled by generative neural networks. This paper describes a generative audio synthesis model for multi-drum translation based on a WaveNet denosing autoencoder architecture. The timbre of an arbitrary source audio input is transformed to sound as if it were played by various percussive instruments while preserving its rhythmic structure. Two evaluations of the transformations are conducted based on the capacity of the model to preserve the rhythmic patterns of the input and the audio quality as it relates to timbre of the target drum domain. The first evaluation measures the rhythmic similarities between the source audio and the corresponding drum translations, and the second provides a numerical analysis of the quality of the synthesised audio. Additionally, a semi-and fully-automatic audio effect has been proposed, in which the user may assist the system by manually labelling source audio segments or use a state-of-the-art automatic drum transcription system prior to drum translation.

Journal of The Audio Engineering Society, Nov 2, 2022
The ability to perceptually modify drum recording parameters in a post-recording process would be... more The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a datadriven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an object evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful for emulating microphone position changes.

Journal of the Audio Engineering Society
The ability to perceptually modify drum recording parameters in a post-recording process would be... more The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a datadriven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an object evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful for emulating microphone position changes.

Proceedings of the 25th International Conference on Digital Audio Effects (DAFx20in22), 2022
Many electronic music (EM) genres are composed through the activation of short audio recordings o... more Many electronic music (EM) genres are composed through the activation of short audio recordings of instruments designed for seamless repetition-or loops. In this work, loops of key structural groups such as bass, percussive or melodic elements are labelled by the role they occupy in a piece of music through the task of automatic instrumentation role classification (AIRC). Such labels assist EM producers in the identification of compatible loops in large unstructured audio databases. While human annotation is often laborious, automatic classification allows for fast and scalable generation of these labels. We experiment with several deeplearning architectures and propose a data augmentation method for improving multi-label representation to balance classes within the Freesound Loop Dataset. To improve the classification accuracy of the architectures, we also evaluate different pooling operations. Results indicate that in combination with the data augmentation and pooling strategies, the proposed system achieves state-of-theart performance for AIRC. Additionally, we demonstrate how our proposed AIRC method is useful for analysing the structure of EM compositions through loop activation transcription.

Neural audio synthesizers exploit deep learning as an alternative to traditional synthesizers tha... more Neural audio synthesizers exploit deep learning as an alternative to traditional synthesizers that generate audio from hand-designed components, such as oscillators and wavetables. For a neural audio synthesizer to be applicable to music creation, meaningful control over the output is essential. This paper provides an overview of an unsupervised approach to deriving useful feature controls learned by a generative model. A system for generation and transformation of drum samples using a style-based generative adversarial network (GAN) is proposed. The system provides functional control of audio style features, based on principal component analysis (PCA) applied to the intermediate latent space. Additionally, we propose the use of an encoder trained to invert input drums back to the latent space of the pre-trained GAN. We experiment with three modes of control and provide audio results on a supporting website.

Many recent approaches to creative transformations of musical audio have been motivated by the su... more Many recent approaches to creative transformations of musical audio have been motivated by the success of raw audio generation models such as WaveNet, in which audio samples are modeled by generative neural networks. This paper describes a generative audio synthesis model for multi-drum translation based on a WaveNet denosing autoencoder architecture. The timbre of an arbitrary source audio input is transformed to sound as if it were played by various percussive instruments while preserving its rhythmic structure. Two evaluations of the transformations are conducted based on the capacity of the model to preserve the rhythmic patterns of the input and the audio quality as it relates to timbre of the target drum domain. The first evaluation measures the rhythmic similarities between the source audio and the corresponding drum translations, and the second provides a numerical analysis of the quality of the synthesised audio. Additionally, a semiand fully-automatic audio effect has been...

Recent advancements in generative audio synthesis have allowed for the development of creative to... more Recent advancements in generative audio synthesis have allowed for the development of creative tools for generation and manipulation of audio. In this paper, a strategy is proposed for the synthesis of drum sounds using generative adversarial networks (GANs). The system is based on a conditional Wasserstein GAN, which learns the underlying probability distribution of a dataset compiled of labeled drum sounds. Labels are used to condition the system on an integer value that can be used to generate audio with the desired characteristics. Synthesis is controlled by an input latent vector that enables continuous exploration and interpolation of generated waveforms. Additionally we experiment with a training method that progressively learns to generate audio at different temporal resolutions. We present our results and discuss the benefits of generating audio with GANs along with sound examples and demonstrations.

Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx), 2020
Recent advancements in generative audio synthesis have allowed for the development of creative to... more Recent advancements in generative audio synthesis have allowed for the development of creative tools for generation and manipulation of audio. In this paper, a strategy is proposed for the synthesis of drum sounds using generative adversarial networks (GANs). The system is based on a conditional Wasserstein GAN, which learns the underlying probability distribution of a dataset compiled of labeled drum sounds. Labels are used to condition the system on an integer value that can be used to generate audio with the desired characteristics. Synthesis is controlled by an input latent vector that enables continuous exploration and interpolation of generated waveforms. Additionally we experiment with a training method that progressively learns to generate audio at different temporal resolutions. We present our results and discuss the benefits of generating audio with GANs along with sound examples and demonstrations.
Thesis Chapters by Jake Drysdale

Deep Learning Methods for Sample-based Electronic Music, 2023
Sample-based electronic music (SBEM) encompasses various genres centred around the practice of sa... more Sample-based electronic music (SBEM) encompasses various genres centred around the practice of sampling—the act of repurposing existing audio to create new music. Contemporary SBEM production involves navigating digital collections of audio, which include both libraries of samples and recorded music. The proliferation of digital music access, sample libraries, and online resource services have introduced challenges in navigating and managing these extensive collections of musical material. Selecting suitable samples from these sources is a meticulous and time-consuming task, requiring music producers to employ aesthetic judgement. Despite technological advancements, many SBEM producers still utilise laborious methods—established decades ago—for obtaining and manipulating music samples. This thesis proposes deep learning, a subfield of machine learning that develops algorithms to decipher intricate data relationships without explicit programming, as a potential solution. This research primarily explores the potential of deep learning models in SBEM, with a specific focus on developing automated tools for the analysis and generation of electronic music samples, towards enriching the creative experience for music producers.
To this end, a novel deep learning system designed for automatic instrumentation role classification in SBEM is introduced. This system identifies samples based on their specific roles within a composition— such as melody, bass, and drums—and exhibits versatility across various SBEM production tasks. Through a series of experiments, the capacity of the system to automatically label unstructured sample collections, generate high-level summaries of SBEM arrangements, and retrieve samples with desired characteristics from existing recordings is demonstrated. Additionally, a neural audio synthesis system that facilitates the continuous exploration and interpolation of sounds generated from a collection of drum samples is presented. This system employs a generative adversarial network (GAN), further modified to interact with the generated outputs. The evaluations highlight the e↵ectiveness of the proposed conditional style-based GAN in generating a diverse range of high-quality drum samples. Various systematic approaches for interacting with the network and navigating the generative space are investigated, demonstrating novel methods of sample manipulation. Collectively, these contributions aim to foster further exploration and advancements at the intersection of deep learning and SBEM.
Uploads
Papers by Jake Drysdale
Thesis Chapters by Jake Drysdale
To this end, a novel deep learning system designed for automatic instrumentation role classification in SBEM is introduced. This system identifies samples based on their specific roles within a composition— such as melody, bass, and drums—and exhibits versatility across various SBEM production tasks. Through a series of experiments, the capacity of the system to automatically label unstructured sample collections, generate high-level summaries of SBEM arrangements, and retrieve samples with desired characteristics from existing recordings is demonstrated. Additionally, a neural audio synthesis system that facilitates the continuous exploration and interpolation of sounds generated from a collection of drum samples is presented. This system employs a generative adversarial network (GAN), further modified to interact with the generated outputs. The evaluations highlight the e↵ectiveness of the proposed conditional style-based GAN in generating a diverse range of high-quality drum samples. Various systematic approaches for interacting with the network and navigating the generative space are investigated, demonstrating novel methods of sample manipulation. Collectively, these contributions aim to foster further exploration and advancements at the intersection of deep learning and SBEM.