Skip to main content

Mark Sandler

Queen Mary, University of London, School of Electronic Engineering and Computer Science, Faculty Member

Followers

199

Following

18

Co-authors

18

Public Views

Director of the Centre for Digital Music

less

Interests

Uploads

Papers by Mark Sandler

Guitar String Separation Using Non-Negative Matrix Factorization and Factor Deconvolution

Guitar string separation is a novel and complicated task. Guitar notes are not pure steady-state ... more Guitar string separation is a novel and complicated task. Guitar notes are not pure steady-state signals, hence, we hypothesize that neither Non-Negative Matrix Factorization (NMF) nor Non-Negative Matrix Factor Deconvolution (NMFD) are optimal for separating them. Therefore, we separate steadystate and transient parts using Harmonic-Percussive Separation (HPS) as a preprocessing step. Then, we use NMF for factorizing the harmonic part and NMFD for deconvolving the percussive part. We make use of a hexaphonic guitar dataset which allows for objective evaluation. In addition, we compare several types of time-frequency mask and introduce an intuitive way to combine a binary mask with a ratio mask. We show that the HPS mask type has an effect on source estimation. Our proposed method achieved results comparable to NMF without HPS. Finally, we show that the optimal mask at the final separation stage depends on the estimation algorithm. • Computing methodologies → Source separation; Nonnegative matrix factorization; • Applied computing → Sound and music computing;

A comparison of dithered and chaotic sigma-delta modulators

Journal of The Audio Engineering Society, Apr 1, 1996

Recent work has shown that higher-order single-bit sigma-delta modulators suffer from low-level a... more Recent work has shown that higher-order single-bit sigma-delta modulators suffer from low-level artifacts such as idle tones and noise modulation. Techniques that have been proposed to reduce or eliminate these errors include the application of dither inside the one-bit quantiser loop, and selecting a loop filter which makes the modulator chaotic. This paper compares the efficacy of these two approaches by simulating high-resolution sigma-delta modulators suitable for audio-conversion applications.

Adversarial Attacks in Sound Event Classification

arXiv (Cornell University), Jul 4, 2019

Adversarial attacks refer to a set of methods that perturb the input to a classification model in... more Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures such as convolutional, recurrent and dense networks. The dataset used for training is the FSDKaggle2018 released for task 2 of the DCASE 2018 challenge and the models used are from participants of the challenge who open sourced their code. Our experiments show that adversarial attacks can be generated with high confidence and low perturbation. In addition, we show that the adversarial attacks are very effective across the different models.

Notes on Model-Based Non-Stationary Sinusoid Estimation Methods Using Derivatives

This paper reviews the derivative method and explores its capacity for estimating time-varying si... more This paper reviews the derivative method and explores its capacity for estimating time-varying sinusoids of complicated parameter variations. The method is reformulated on a generalized signal model. We show that under certain arrangements the estimation task becomes solving a linear system, whose coefficients can be computed from discrete samples using an integration-by-parts technique. Previous derivative and reassignment methods are shown to be special cases of this generic method. We include a discussion on the continuity criterion of window design for the derivative method. The effectiveness of the method and the window design criterion are confirmed by test results. We also show that, thanks to the generalization, off-model sinusoids can be approximated by the derivative method with a sufficiently flexible model setting.

Statistical Measures of Early Reflections of Room Impulse Responses

An impulse response of an enclosed reverberant space is composed of three basic components: the d... more An impulse response of an enclosed reverberant space is composed of three basic components: the direct sound, early reflections and late reverberation. While the direct sound is a single event that can be easily identified, the division between the early reflections and late reverberation is less obvious as there is a gradual transition between the two. This paper explores two statistical measures that can aid in determining a point in time where the early reflections have transitioned into late reverberation. These metrics exploit the similarities between late reverberation and Gaussian noise that are not commonly found in early reflections. Unlike other measures, these need no prior knowledge about the rooms such as geometry or volume.

Separation of Transient Information in Musical Audio Using Multiresolution Analysis Techniques

Whilst musical transients are generally acknowledged as holding much of the perceptual informatio... more Whilst musical transients are generally acknowledged as holding much of the perceptual information within musical tones, most research in sound analysis and synthesis tends to focus on the steady state components of signals. A method is presented which separates the noisy transient information from the slowly time varying steady state components of musical audio. Improvements of using adaptive thresholding, and multiresolution analysis methods are then illustrated. It is shown that by analyzing the resulting transient information only, current onset detection algorithms can be improved considerably, especially for those instruments with noisy attack information, such as plucked or struck strings. The idea is then applied to audio processing techniques to enhance or decrease the strength of note attack information. Finally, the transient extraction algorithm (TSS) is applied to time-scaling implementation, where the transient and noise information is analyzed so that only steady state regions are stretched, yielding considerably improved results.

Rigid-Body Sound Synthesis with Differentiable Modal Resonators

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Physical models of rigid bodies are used for sound synthesis in applications from virtual environ... more Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods, such as modal synthesis, often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work, we present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material using a bank of differentiable IIR filters. We demonstrate our method on a dataset of synthetic objects but train our model using an audio-domain objective, paving the way for physicallyinformed synthesisers to be learned directly from recordings of real-world objects.

A Study on the Transferability of Adversarial Attacks in Sound Event Classification

An adversarial attack is an algorithm that perturbs the input of a machine learning model in an i... more An adversarial attack is an algorithm that perturbs the input of a machine learning model in an intelligent way in order to change the output of the model. An important property of adversarial attacks is transferability. According to this property, it is possible to generate adversarial perturbations on one model and apply it the input to fool the output of a different model. Our work focuses on studying the transferability of adversarial attacks in sound event classification. We are able to demonstrate differences in transferability properties from those observed in computer vision. We show that dataset normalization techniques such as z-score normalization does not affect the transferability of adversarial attacks and we show that techniques such as knowledge distillation do not increase the transferability of attacks.

A Partial Searching Algorithm and Its Application for Polyphonic Music Transcription

International Symposium/Conference on Music Information Retrieval, 2005

This paper proposes an algorithm for studying spectral contents of pitched sounds in real-world r... more This paper proposes an algorithm for studying spectral contents of pitched sounds in real-world recordings. We assume that the 2 nd -order difference, w.r.t. partial index, of a pitched sound is bounded by some small positive value, rather than equal to 0 in a perfect harmonic case. Given a spectrum and a fundamental frequency f0, the algorithm searches the spectrum for partials that can be associated with f0 by dynamic programming. In section 3 a background-foreground model is plugged into the algorithm to make it work with reverberant background, such as in a piano recording. In section 4 we illustrate an application of the algorithm in which a multipitch scoring machine, which involves special processing for close or shared partials, is coupled with a tree searching method for polyphonic transcription task. Results are evaluated on the traditional note level, as well as on a partial-based sub-note level.

Detecting harmonic change in musical audio

We propose a novel method for detecting changes in the harmonic content of musical audio signals.... more We propose a novel method for detecting changes in the harmonic content of musical audio signals. Our method uses a new model for Equal Tempered Pitch Class Space. This model maps 12-bin chroma vectors to the interior space of a 6-D polytope; pitch classes are mapped onto the vertices of this polytope. Close harmonic relations such as fifths and thirds appear as small Euclidian distances. We calculate the Euclidian distance between analysis frames n + 1 and n -1 to develop a harmonic change measure for frame n. A peak in the detection function denotes a transition from one harmonically stable region to another. Initial experiments show that the algorithm can successfully detect harmonic changes such as chord boundaries in polyphonic audio recordings.

Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders

arXiv (Cornell University), Feb 14, 2018

The expressive nature of the voice provides a powerful medium for communicating sonic ideas, moti... more The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and queried sounds. In this paper, we address this question using similarity ratings between vocal imitations and imitated drum sounds. We use a linear mixed effect regression model to show how features learned by convolutional auto-encoders (CAEs) perform as predictors for perceptual similarity between sounds. Our experiments show that CAEs outperform three baseline feature sets (spectrogram-based representations, MFCCs, and temporal features) at predicting the subjective similarity ratings. We also investigate how the size and shape of the encoded layer effects the predictive power of the learned features. The results show that preservation of temporal information is more important than spectral resolution for this application.

Towards Music Captioning: Generating Music Playlist Descriptions

arXiv (Cornell University), Aug 17, 2016

Descriptions are often provided along with recommendations to help users' discovery. Recommending... more Descriptions are often provided along with recommendations to help users' discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a method for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural language processing are adopted to utilise the information of each track.

Anomalous behaviour in loss-gradient based interpretability methods

arXiv (Cornell University), Jul 15, 2022

Loss-gradients are used to interpret the decision making process of deep learning models. In this... more Loss-gradients are used to interpret the decision making process of deep learning models. In this work, we evaluate loss-gradient based attribution methods by occluding parts of the input and comparing the performance of the occluded input to the original input. We observe that the occluded input has better performance than the original across the test dataset under certain conditions. Similar behaviour is observed in sound and image recognition tasks. We explore different loss-gradient attribution methods, occlusion levels and replacement values to explain the phenomenon of performance improvement under occlusion.

Piano pedaller: a measurement system for classification and visualisation of piano pedalling techniques

New Interfaces for Musical Expression, Jun 21, 2017

This paper presents the results of a study of piano pedalling techniques on the sustain pedal usi... more This paper presents the results of a study of piano pedalling techniques on the sustain pedal using a newly designed measurement system named Piano Pedaller. The system is comprised of an optical sensor mounted in the piano pedal bearing block and an embedded platform for recording audio and sensor data. This enables recording the pedalling gesture of real players and the piano sound under normal playing conditions. Using the gesture data collected from the system, the task of classifying these data by pedalling technique was undertaken using a Support Vector Machine (SVM). Results can be visualised in an audio based score following application to show pedalling together with the player's position in the score.

Ontological Representation of Audio Features

Lecture Notes in Computer Science, 2016

Feature extraction algorithms in Music Informatics aim at deriving statistical and semantic infor... more Feature extraction algorithms in Music Informatics aim at deriving statistical and semantic information directly from audio signals. These may be ranging from energies in several frequency bands to musical information such as key, chords or rhythm. There is an increasing diversity and complexity of features and algorithms in this domain and applications call for a common structured representation to facilitate interoperability, reproducibility and machine interpretability. We propose a solution relying on Semantic Web technologies that is designed to serve a dual purpose (1) to represent computational workflows of audio features and (2) to provide a common structure for feature data to enable the use of Open Linked Data principles and technologies in Music Informatics. The Audio Feature Ontology is based on the analysis of existing tools and music informatics literature, which was instrumental in guiding the ontology engineering process. The ontology provides a descriptive framework for expressing different conceptualisations of the audio feature extraction domain and enables designing linked data formats for representing feature data. In this paper, we discuss important modelling decisions and introduce a harmonised ontology library consisting of modular interlinked ontologies that describe the different entities and activities involved in music creation, production and publishing.

Facilitating Music Information Research with Shared Open Vocabularies

Lecture Notes in Computer Science, 2013

There is currently no agreement on common shared representations of audio features in the field o... more There is currently no agreement on common shared representations of audio features in the field of music information retrieval. The Audio Feature Ontology has been developed as part of a harmonised library of modular ontologies to solve the problem of interoperability between music related data sources. We demonstrate a software framework which combines this ontology and related Semantic Web technologies with data extraction and analysis software, in order to enhance audio feature extraction workflows.

A Web-Based System For Suggesting New Practice Material To Music Learners Based On Chord Content

IUI Workshops, 2019

In this demo paper, a system that suggests new practice material to music learners is presented. ... more In this demo paper, a system that suggests new practice material to music learners is presented. It is aimed at music practitioners of any skill set, playing any instrument, as long as they know how to play along with a chord sheet. Users need to select a number of chords in a web app, and are then presented with a list of music pieces containing those chords. Each of those pieces can then be played back while its chord transcription is displayed in sync to the music. This enables a variety of practice scenarios, ranging from following the chords in a piece to using the suggested music as a backing track to practice soloing over. We set out the various interface elements that make up this web application and the thoughts that went behind them. Furthermore, we touch upon the algorithms that are used in the app. Notably, the automatic generation of chord transcriptions -such that large amounts of music can be processed without human intervention -and the query resolution mechanism -finding appropriate music based on the user input and transcription quality -are discussed. • Applied computing → Sound and music computing; Education; • Human-centered computing → Web-based interaction.

AUFX-O: Novel Methods for the Representation of Audio Processing Workflows

Lecture Notes in Computer Science, 2016

This paper introduces the Audio Effect Ontology (AUFX-O) building on previous theoretical models ... more This paper introduces the Audio Effect Ontology (AUFX-O) building on previous theoretical models describing audio processing units and workflows in the context of music production. We discuss important conceptualisations of different abstraction layers, their necessity to successfully model audio effects, and their application method. We present use cases concerning the use of effects in music production projects and the creation of audio effect metadata facilitating a linked data service exposing information about effect implementations. By doing so, we show how our model facilitates knowledge sharing, reproducibility and analysis of audio production workflows.

A Web of Musical Information

International Symposium/Conference on Music Information Retrieval, 2008

We describe our recent achievements in interlinking several music-related data sources on the Sem... more We describe our recent achievements in interlinking several music-related data sources on the Semantic Web. In particular, we describe interlinked datasets dealing with Creative Commons content, editorial, encyclopedic, geographic and statistical data, along with queries they can answer and tools using their data. We describe our web services, providing an on-demand access to content-based features linked with such data sources and information pertaining to their creation (including processing steps, applied algorithms, inputs, parameters or associated developers). We also provide a tool allowing such music analysis services to be set up and scripted in a simple way.

The Studio Ontology Framework

International Symposium/Conference on Music Information Retrieval, 2011

This paper introduces the Studio Ontology Framework for describing and sharing detailed informati... more This paper introduces the Studio Ontology Framework for describing and sharing detailed information about music production. The primary aim of this ontology is to capture the nuances of record production by providing an explicit, application and situation independent conceptualisation of the studio environment. We may use the ontology to describe real-world recording scenarios involving physical hardware, or (post) production on a personal computer. It builds on Semantic Web technologies and previously published ontologies for knowledge representation and knowledge sharing.