Academia.eduAcademia.edu

Pitch Tracking

description187 papers
group1 follower
lightbulbAbout this topic
Pitch tracking is the process of detecting and analyzing the fundamental frequency of a sound signal, typically in music or speech, to determine its pitch. This involves algorithms that can identify variations in frequency over time, enabling applications in music transcription, voice recognition, and audio processing.
lightbulbAbout this topic
Pitch tracking is the process of detecting and analyzing the fundamental frequency of a sound signal, typically in music or speech, to determine its pitch. This involves algorithms that can identify variations in frequency over time, enabling applications in music transcription, voice recognition, and audio processing.

Key research themes

1. How can auditory models improve accurate pitch segmentation and transcription in singing sequences?

This research area focuses on developing and evaluating auditory model-based transcription systems that can convert singing sequences into discrete pitch and duration pairs with minimized segmentation errors. Accurate segmentation and transcription are critical for applications like Query-by-Humming (QBH) systems, where matching sung queries to musical databases depends fundamentally on precise note boundary detection and pitch estimation. Challenges include reducing segmentation errors and accommodating variability caused by singing with or without lyrics.

Key finding: The study demonstrates that existing state-of-the-art transcription systems suffer from high segmentation error rates, up to 60%. By developing a new auditory model-based transcription system incorporating advanced acoustic... Read more
Key finding: The paper introduces a novel voicing detection and pitch estimation algorithm that leverages the multi-scale product of wavelet transform coefficients to robustly detect pitch periods and voicing segments in noisy speech.... Read more
Key finding: Proposes an innovative pitch extraction method based on Continuous Wavelet Transform (CWT) coefficients mean signal to improve voiced/unvoiced detection and pitch estimation, especially in challenging creaky voice regions.... Read more

2. What advanced signal processing techniques enable robust and high-resolution pitch tracking in complex and noisy audio signals?

This theme investigates algorithmic advancements in pitch estimation that provide enhanced time-frequency resolution, noise robustness, and effective multi-pitch tracking. These techniques use innovative mathematical transforms, empirical mode decomposition, canonical correlation analysis, and statistical modeling to disambiguate pitch information from acoustically rich or degraded signals. The focus is on leveraging continuous pitch estimation and harmonic models to improve pitch tracking accuracy, essential for applications such as speech synthesis, music transcription, and robot audition.

Key finding: Develops a novel pitch detection technique utilizing a 'Fourier of Fourier' transform—applying two sequential Fourier transforms—to precisely identify the fundamental frequency in harmonic sounds, even when fundamentals are... Read more
Key finding: Introduces a pitch estimation method that combines Empirical Mode Decomposition (EMD) to generate Intrinsic Mode Functions with Canonical Correlation Analysis (CCA) to select relevant components. The approach reconstructs a... Read more
Key finding: Proposes a robust multi-pitch tracking algorithm for noisy speech that integrates an enhanced channel and peak selection method, a novel probabilistic integration of periodicity cues across frequency bands, and Hidden Markov... Read more
Key finding: Enhances the RAPT pitch tracking method by introducing an instantaneous normalized cross-correlation function computed from instantaneous harmonic parameters obtained via complex bandpass filtering. This results in smooth,... Read more

3. How can integrated acoustic and music language models enable multi-pitch detection and voice assignment in polyphonic vocal music?

Research within this theme explores systems combining probabilistic acoustic models with musicological language models to simultaneously detect multiple concurrent pitches and assign detected pitches to individual voices or singers in polyphonic a cappella recordings. Such integration addresses challenges of pitch detection amidst overlapping harmonic sources and enables voice separation based on voice-leading rules and temporal continuity. The resulting methods facilitate transcription and analysis of complex vocal ensembles like chorales and quartets.

Key finding: The paper proposes a system combining spectrogram factorization acoustic models (PLCA) driven by a learned spectral template dictionary with hidden Markov music language models embodying voice-leading constraints to perform... Read more
Key finding: Presents a hybrid speech segregation approach that combines Hidden Markov Model (HMM)-based pitch tracking, computational auditory scene analysis (CASA), and medium-frame harmonic modeling to segregate co-channel speech from... Read more

All papers in Pitch Tracking

Publication in the conference proceedings of EUSIPCO, Bucharest, Romania, 2012
People identify powerfully with music: someone might say ÒthatÕs my song!Ó but they are unlikely to say ÒthatÕs my book!Ó or ÒthatÕs my picture!Ó A digital library of popular music therefore has the potential to be a compelling... more
Abstract-Chinese is known as a syllabic and tonal language and tone recognition plays an important role and provides very strong discriminative information for Chinese speech recognition [1]. Usually, the tone classification is based on... more
This paper introduces a new spectral representation-based pitch estimation method. Since pitch is never stationary during real conversations, but often undergoes changes because of intonation, the spectral representation is derived from... more
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists... more
This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extracting additive synthesis parameters from inharmonic sounds such as the piano. Parshl is based on the Short-Time Fourier Transform (STFT),... more
This paper describes a polyphonic note detection system incorporating a simple masking technique that can accurately transcribe chords and polyphonic piano music. The system, developed in MATLAB, will take input files in .wav format. The... more
In this paper, a computationally efficient method for the estimation of the parameters of harmonic sinusoidal signals, including the order, which is of particular importance, for speech and audio signals is presented. The signal is... more
A pitch detection/tracking strategy for solo bowed-string and wind musical instrumental recordings is presented. To avoid the missing fundamental problem, we adopted the greatest common divisor method and modified it with a... more
In this paper, the problem of legato pedalling technique detection in polyphonic piano music is addressed. We propose a novel detection method exploiting the effect of sympathetic resonance which can be enhanced by a legato-pedal onset.... more
Glottal instants namely GCIs and GOIs are useful in a wide variety of speech processing and biomedical applications. This paper presents the recent developments in the methodologies for glottal activity detection using EGG and speech... more
We prove that a finite (state and action spaces) semi-Markov decision process with limiting ratio average (undiscounted) payoff has an optimal pure semi-stationary policy (i.e., a semi-Markov policy independent of decision epoch count).... more
In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the... more
In this paper the improvement in performance of automatic speech recognition (ASR) system is achieved with help of pitch dependent features and probability of voicing estimated features. The pitch dependent features are useful for tonal... more
We propose in this paper a spectral synthesis model to generate noisy sounds with independent control parameters for spectral density and spectral envelope. Algorithms defining in a efficient way these spectral properties from the... more
This paper describes a polyphonic note detection system incorporating a simple masking technique that can accurately transcribe chords and polyphonic piano music. The system, developed in MATLAB, will take input files in .wav format. The... more
In this thesis a novel multiresolution approach for note detection in a polyphonic mix is proposed. The idea is to use a set of wavelets whose lengths are adapted to the theoretical fundamental period of musical notes. Using the typical... more
This paper describes a sound synthesis technique that modulates the coefficients of allpass filter chains using audio-rate frequencies. It was found that modulating a single allpass filter section produces a feedback AM–like spectrum, and... more
In this paper, we describe the BBN 2007 Mandarin Speechto-Text system developed for the GALE Evaluation 2007. In comparison to the BBN 2006 Mandarin system, we achieved 25% relative reduction in character error rate on the most important... more
Music understanding is a process closely related to the knowledge and experience of the listener. The amount of knowledge required is relative to the complexity of the task in hand. This dissertation is concerned with the problem of... more
In this work, a new instantaneous fundamental frequency extraction method is presented, with the attention especially focused on its robustness for pathological voices processing. It is based on the Ensemble Empirical Mode Decomposition... more
This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a... more
In this paper, we describe the BBN 2007 Mandarin Speechto-Text system developed for the GALE Evaluation 2007. In comparison to the BBN 2006 Mandarin system, we achieved 25% relative reduction in character error rate on the most important... more
Close to glottal closure instants (GCIs), the speech signal is expected to change its amplitude rapidly and, at GCIs, it is expected to have strong negative peaks. A novel algorithm that exploits these two properties for the estimation of... more
A new light is thrown on the Portnoff [1] speech signal timescale modification algorithm. It is shown in particular that the Portnoff algorithm easily accommodates expansion factors bigger than 2 without causing reverberation nor... more
The aim of the paper is to show a system engineered for automatic detection and correction of detuned singing. For this purpose, existing methods of fundamental frequency detection and pitch correction are reviewed. In addition, main... more
Although research on the use and effectiveness of visual feedback for teaching tone and intonation began more than thirty years ago, the technology for signal analysis and pitch extraction using microcomputers has only recently become... more
In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the... more
In this paper, we introduce a pitch detection algorithm that is particularly robust for telephone speech and prosodic modeling. The algorithm uses a logarithmically sampled spectral representation of speech, similar to that in the... more
In this work, a new instantaneous fundamental frequency extraction method is presented, with the attention especially focused on its robustness for pathological voices processing. It is based on the Ensemble Empirical Mode Decomposition... more
The problem of music retrieval by sung query consists of building a machine capable of simulating the cognitive process of identifying a musical piece from a few sung notes of its melody. In this paper, the algorithms of pitch tracking,... more
This extended abstract details a submission to the Music In-formation Retrieval Evaluation eXchange in the Query by Singing/Humming task. The problem of query by singing consists of building a machine capable of simulating the cognitive... more
An effective multi-pitch tracking algorithm for noisy speech is critical for auditory processing. However, the performance of existing algorithms is not satisfactory. We have developed a robust algorithm for multi-pitch tracking of noisy... more
We present a robust algorithm for multi-pitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new integration method for extracting periodicity information across different frequency... more
Neural processes underlying pitch perception at the level of the cerebral cortex are influenced by language experience. We investigated whether early, pre-attentive stages of pitch processing at the level of the human brainstem may also... more
Neural encoding of pitch in the auditory brainstem is shaped by long-term experience with language. The aim herein was to determine to what extent this experience-dependent effect is specific to a particular language. Analysis of variance... more
This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a... more
This paper introduces the Vector Phaseshaping (VPS) synthesis technique, which extends the classic Phase Distortion method by providing flexible means to distort the phase of a sinusoidal oscillator. This is achieved by describing the... more
We prove that a finite (state and action spaces) semi-Markov decision process with limiting ratio average (undiscounted) payoff has an optimal pure semi-stationary policy (i.e., a semi-Markov policy independent of decision epoch count).... more
We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical signals. This is an important issue for melody-based music information retrieval, as well as melody transcription. Past work in the field... more
In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the... more
In this paper, we are proposing the idea of making an automated software that will transcribe each note while the musician plays the instrument. The software will take the sound of the instrument as an input and will process the frequency... more
El fallecimiento de Eduardo Lizalde (1929-2022) representa una pérdida irreemplazable para la poesía mexicana. A lo largo de su trayectoria literaria, desarrolló una afinidad casi inevitable con la sonoridad de su voz, ya fuera a través... more
Background Children with pervasive developmental disorders (PDD), such as children with autism spectrum disorders (ASD), often show auditory processing deficits related to their overarching language impairment. Auditory training programs... more
Comunicacio presentada a la Eighth International Conference on Creative Content Technologies, celebrada els dies 20 a 24 de marc de 2016 a Roma, Italia.
In this paper, we describe the BBN 2007 Mandarin Speechto-Text system developed for the GALE Evaluation 2007. In comparison to the BBN 2006 Mandarin system, we achieved 25% relative reduction in character error rate on the most important... more
We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical signals. This is an important issue for melody-based music information retrieval, as well as melody transcription. Past work in the field... more
and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in... more
This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero-crossing rates are one of the most distinguishable... more
Download research papers for free!