While previous generations of the MPEG multimedia standard have focused primarily on coding and t... more While previous generations of the MPEG multimedia standard have focused primarily on coding and transmission of content digitally sampled from the real world, MPEG-4 contains extensive support for structured, synthetic and synthetic/natural hybrid coding methods. An overview is presented of the "Structured Audio" and "AudioBIFS" components of MPEG-4, which enable the description of synthetic soundtracks, musical scores, and effects algorithms and the compositing, manipulation, and synchronization of real and synthetic audio sources. A discussion of the separation of functionality between the systems layer and the audio toolset of MPEG-4 is presented, and prospects for efficient DSP-based implementations are discussed.
Using musical knowledge to extract expressive performance information from audio recordings
Computational auditory scene analysis, Jun 1, 1998
A computer system is described which performspolyphonic transcription of known solo pianomusic by... more A computer system is described which performspolyphonic transcription of known solo pianomusic by using high-level musical informationto guide a signal-processing system. This process,which we term expressive performance extraction,maps a digital audio representation ofa musical performance to a MIDI representationof the same performance using the score ofthe music as a guide. Analysis of the accuracyof the system is presented, and its
Algorithmic and Wavetable Synthesis in the MPEG-4 Multimedia Standard
The newly released MPEG-4 standard for multimedia t ransmission contains several novel tools for ... more The newly released MPEG-4 standard for multimedia t ransmission contains several novel tools for the low-bitrate coding of audio. Among these is a new codec called “Structured Audio” that allows sound to be transmitted in a synthetic description format and synthesized at the client. MPEG-4 Structured Audio contains facilities for both algor ithmic synthesis, using a new software-synthesis language called SAOL, and wavetable synthesis, usin g a ew format for the efficient transmission of banks of samples. We contrast the use of these techniques for various multimedia applications, discussing scenarios in which one is favored over t h other, or in which they are profitably used together.
We have created a link between the Sound Description Interchange Format (“SDIF”) and MPEG-4’s Str... more We have created a link between the Sound Description Interchange Format (“SDIF”) and MPEG-4’s Structured Audio (“SA”) tools. We cross-code SDIF data into SA bitstreams, and write SA programs to synthesize this SDIF data. By making a link between these two powerful formats, both communities of users benefit: the SDIF community gets a fixed, standard synthesis platform that will soon be widespread, and the MPEG-4 community gets a set of powerful, robust analysis-synthesis tools. We have made the cross-coding tools available at no cost.
Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)
The application of a new technique for sound-scene analysis to the segmentation of complex musica... more The application of a new technique for sound-scene analysis to the segmentation of complex musical signals is presented. This technique operates by discovering common modulation behavior among groups of frequency subbands in the autocorrelogram domain. The algorithm can be demonstrated to locate perceptual events in time and frequency when it is executed on ecological music examples taken directly from compact disc recordings. It operates within a strict probabilistic framework, which makes it convenient to incorporate into a larger signal-understanding testbed. Only within-channel dynamic signal behavior is used to locate events; therefore, the model stands as a theoretical alternative to methods that use pitch as their primary grouping cue. This segmentation algorithm is one processing element to be included in the construction of music perception systems that understand sound without attempting to separate it into components.
XRDS: Crossroads, The ACM Magazine for Students, 2000
Data compression, a fundamental aspect of today's digital world, is often sited as a triumph ... more Data compression, a fundamental aspect of today's digital world, is often sited as a triumph of basic research. Digital cellular telephones, open-source programming, Internet music, high-production video games, high-definition television, and Internet software distribution would be impossible without compression. In the future, new applications such as digital radio, TV-on-demand, interactive multimedia, and custom compact-disc kiosks will use advanced forms of compression.
A brief discussion presents some of the opportunities and challenges involved with creating metad... more A brief discussion presents some of the opportunities and challenges involved with creating metadata-centric businesses that bring Music Information Retrieval technologies to the marketplace. In particular, two related difficulties-- that of the difficulty of proving incremental value for new metadata systems, and that of the relative influidity of the marketplace for MIR-- are highlighted. Potential directions for resolving these issues are also discussed. 1.
The MPEG-4 standard defines numerous tools that represent the state-of-the-art in representation,... more The MPEG-4 standard defines numerous tools that represent the state-of-the-art in representation, transmission, and decoding of multimedia data. Among these is a new type of audio standard, termed “Structured Audio”. The MPEG-4 standard for structured audio allows for the efficient, flexible description of synthetic music and sound effects, and the use of synthetic sound in synchronization with natural sound in interactive multimedia scenes. A discussion of the capabilities, technological underpinnings, and application of MPEG-4 Structured Audio is presented.
Musical content analysis through models of audition
{ k dm, ed s, bv} @ me di a. mi t. ed u The direct application of ideas from music theory and mus... more { k dm, ed s, bv} @ me di a. mi t. ed u The direct application of ideas from music theory and music signal processing has not yet led to successful musical multimedia systems. We present a research framework that addresses the limitations of conventional approaches by questioning their (often tacit) underlying princi-ples. We discuss several case studies from our own research on the extraction of musical rhythm, timbre, harmony, and structure from complex audio signals; these projects have demon-strated the power of an approach based on a realistic view of human listening abilities. Con-tinuing research in this direction is necessary for the construction of robust systems for music content analysis.
A comparison of two models for processing sound is presented: the perceptually -based pitch model... more A comparison of two models for processing sound is presented: the perceptually -based pitch model of Meddis and Hewitt [6], and a vocoder model for rhythmic analysis by Scheirer [5]. Similarities in the methods are noted, and it is demonstrated that the pitch model is also adequate for extracting the tempo of acoustic source signals. The implications of this finding for perceptual models and signal processing systems are discussed. I. INTRODUCTION AND BACKGROUND Perceptual and computer analyses of rhythmic musical pulse have received increasing attention in recent years. The goal in this work is to understand human methods for, and realize computer analogues of, extracting from a piece of music a symbolic representation corresponding to a human listener's phenomenal experience of "beat". For the purposes of this paper, we shall define the "beat" or "pulse" of a piece of music as the phenomenal impulse train which defines a "tempo" for the...
The direct application of ideas from music theory and music signal processing has not yet led to ... more The direct application of ideas from music theory and music signal processing has not yet led to successful musical multimedia systems. We present a research framework that addresses the limitations of conventional approaches by questioning their (often tacit) underlying principles. We discuss several case studies from our own research on the extraction of musical rhythm, timbre, harmony, and structure from complex audio signals; these projects have demonstrated the power of an approach based on a realistic view of human listening abilities. Continuing research in this direction is necessary for the construction of robust systems for music content analysis.
A new technique for sound-scene analysis is presented. This technique operates by discovering com... more A new technique for sound-scene analysis is presented. This technique operates by discovering common modulation behavior among groups of frequency subbands in the autocorrelogram domain. The analysis is conducted by first analyzing the autocorrelogram to estimate the amplitude modulation and period modulation of each channel of data at each time step, and then using dynamic clustering techniques to group together channels with similar modulation behavior. Implementation details of the analysis technique are presented, and its performance is demonstrated on a test sound.
A computer system is described which performs polyphonic transcription of known solo piano music ... more A computer system is described which performs polyphonic transcription of known solo piano music by using high-level musical information to guide a signal-processing system. This process, which we term expressive performance extraction, maps a digital audio representation of a musical performance to a MIDI representation of the same performance using the score of the music as a guide. Analysis of the accuracy of the system is presented, and its usefulness both as a tool for music-psychology researchers and as an example of a musical-knowledge-based signal-processing system is discussed. Thesis Supervisor: Barry Vercoe Title: Professor of Media Arts and Sciences Extracting Expressive Performance Information from Recorded Audio by Eric David Scheirer
External Documentation and Release Notes for saolc
The saolc package is the reference software for the Structured Audio part of the MPEG-4 Audio sta... more The saolc package is the reference software for the Structured Audio part of the MPEG-4 Audio standard (ISO 14496-3 Section 5). saolc provides non-real-time decoding of Structured Audio bitstreams, and demonstrates the proper functioning of a normative SA decoder. The structure of saolc is documented for implementors who wish to make use of the reference software, beginning at a high-level overview and proceeding to a list of important data structures and a module-by-module description. Bugs, extensions, and areas of non-conformance to the paper specification are documented. This documentation augments the internal documentation provided by comments in the code.
Extracting Expressive Performance Information from Recorded Music by
ABSTRACT A computer system is described which performs polyphonic transcription of known solo pia... more ABSTRACT A computer system is described which performs polyphonic transcription of known solo piano music by using high-level musical information to guide a signal-processing system. This process, which we term expressive performance extraction, maps a digital audio representation of a musical performance to a MIDI representation of the same performance using the score of the music as a guide. Analysis of the accuracy of the system is presented, and its usefulness both as a tool for music-psychology researchers and as an example of a musical-knowledge-based signal-processing system is discussed. Thesis Supervisor: Barry Vercoe Title: Professor of Media Arts and Sciences Extracting Expressive Performance Information from Recorded Audio by Eric David Scheirer Readers Certified by ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: John Stautner Director of Software Engineering Compaq Computer Corporation Certified by :::::::::::::::::::::::::::::::::::::::::::::...
A comparison of two models for processing sound is presented: the perceptually-based pitch model ... more A comparison of two models for processing sound is presented: the perceptually-based pitch model of Meddis and Hewitt (1991), and a vocoder model for rhythmic analysis by Scheirer. Similarities in the methods are noted, and it is demonstrated that the pitch model is also adequate for extracting the tempo of acoustic signals. The implications of this finding for perceptual models
Uploads
Papers by Eric Scheirer