Academia.eduAcademia.edu

Music Information Retrieval

description2,411 papers
group8,806 followers
lightbulbAbout this topic
Music Information Retrieval (MIR) is an interdisciplinary field that focuses on the extraction, analysis, and organization of information from music data. It encompasses techniques from computer science, signal processing, and musicology to enable efficient searching, classification, and recommendation of music based on various attributes and features.
lightbulbAbout this topic
Music Information Retrieval (MIR) is an interdisciplinary field that focuses on the extraction, analysis, and organization of information from music data. It encompasses techniques from computer science, signal processing, and musicology to enable efficient searching, classification, and recommendation of music based on various attributes and features.

Key research themes

1. How can scalable content-based search frameworks be designed to efficiently retrieve music documents in diverse digital encodings?

This research area focuses on developing scalable and generic frameworks capable of content-based search across large music collections encoded in heterogeneous digital formats (such as audio recordings, symbolic scores in XML, MIDI, or specialized encodings). The challenge lies in representing complex, multifaceted musical content independently from encoding specifics and integrating feature extraction, indexing, and search/ranking within scalable architectures (e.g., leveraging text search engine principles). Such frameworks are crucial for enabling fast, sublinear search in massive music libraries, going beyond metadata-based retrieval to support queries on rich musical content elements (e.g., melodic patterns in specific parts).

Key finding: This paper introduces a general framework that decouples music content representation from encoding using a Music Content Model (MCM), supports extendible feature extraction, and integrates search/index/rank operations within... Read more
Key finding: The MiDiLiB project surveyed various content-based retrieval techniques for both symbolic and audio-based music, emphasizing the lifecycle challenges including digitization, format choice, content analysis, and indexing. The... Read more
Key finding: This review presents CB-MIR tasks focusing on extracting musical meta-information (genre, artist, vocal segmentation) directly from audio content without requiring manual annotations. It articulates the need for scalable,... Read more

2. What audio feature extraction and representation methods enable robust retrieval and classification of music from raw audio signals?

This theme addresses approaches that extract meaningful features directly from raw audio signals to support retrieval tasks such as genre classification, query-by-example, segmentation, and audio thumbnailing. The research emphasizes the design and evaluation of spectral, timbral, temporal, and statistical features (e.g., MFCCs, LPC, FFT) and their combinations, as well as advanced visualization techniques (e.g., TimbreGrams) and classifications. The challenge is to effectively capture musical characteristics from audio signals under conditions of noise, variability, and non-symbolic formats, surpassing transcription limitations.

Key finding: This paper develops an extensible audio retrieval system focused on feature-based audio analysis at small time scales (FFT, LPC, MFCC) and supports audio classification, segmentation, retrieval, and audio thumbnailing without... Read more
Key finding: The authors propose novel user interfaces that extend beyond traditional query-by-example for audio retrieval by enabling users to synthesize or combine new audio queries based on content-derived parameters. This relies on... Read more
Key finding: This study presents the merger of two prototype systems (Sonic Browser and MARSYAS), combining audio analysis, classification, segmentation, and novel visualization (TimbreSpaces) to provide a flexible 2D/2.5D audio browsing... Read more
Key finding: This paper systematically investigates integration of short-time audio features (e.g., MFCCs) over longer time scales through statistical approaches (mean, variance) and proposes using autoregressive models for feature... Read more

3. How can melodic similarity and structural patterns be leveraged to identify cover versions and structural relationships between music recordings?

This theme investigates methodologies for detecting melodic similarity and structural characteristics to identify cover songs, versions, and versions sharing underlying structure despite different performances. It builds on symbolic representation extraction (e.g., sung notes) and self-similarity matrices to capture deeper musical relations, accommodating transformations like tempo variation, transposition, and accompaniment. These approaches enable content-based retrieval that transcends metadata and superficial audio matching, crucial for music version identification, copyright management, and indexing live or polyphonic recordings.

Key finding: The paper develops a method for identifying cover versions by extracting and comparing main vocal melodies from polyphonic audio, using preprocessing to remove accompaniment and tempo normalization. Experiments on 594... Read more
Key finding: This work introduces structure fingerprints derived from self-similarity matrices as compact descriptors of a recording's structural properties, enabling comparison without complex alignment. Using kernel density estimation... Read more
Key finding: The authors evaluate various time series representations of sung queries (pitch contours, sequences of notes, and novel pitch histogram sequences) combined with dynamic time warping variants for alignment. They show that... Read more

All papers in Music Information Retrieval

Official Abstract Book for the 22nd International Conference of Music Analysis and Theory, held from October 2nd to 5th in Salerno. This event is took place in Conservatory of Salerno and organized by the Italian Society of Music Theory... more
This article provides details on the implementation and design of the 'Make Your Own Band' software suite, a suite of standalone software applications utilizing machine learning (ML) and music information retrieval (MIR) developed for use... more
We present the Ultrasonic Consciousness Hypothesis, proposing that the systematic removal of ultrasonic frequencies (20-96kHz) through lossy audio compression since the 1990s may have inadvertently eliminated crucial emotional grounding... more
Music is time-art. Even though we all live in many dimensions at once, most of us cannot think any further than the third dimension: up-down, left-right, back-front, are stable opposing parameters of space. Navigation is easy. However,... more
As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude... more
The comprehension of English speaking skills is one of the major activities of language acquisition as it gives feedback to learners. Some of the disadvantages of conventional assessment techniques such as scoring and rules base systems... more
This paper presents a comprehensive investigation of existing feature extraction tools for symbolic music and contrasts their performance to determine the set of features that best characterizes the musical style of a given music score.... more
In this work, we introduce musif, a Python package that facilitates the automatic extraction of features from symbolic music scores. The package includes the implementation of a large number of features, which have been developed by a... more
Ligeti's etude Fanfares extends the technical skills demanded of the modern pianist and many incorporate completely new techniques. This etude, which was composed around 1985 consists of some of the most technically demanding issues... more
This paper documents a fully reproducible method for generating new musical compositions using the Law of Universal Mathematical Unity (LUMU). By merging two public domain sheet music scores through a recursive operator and rendering the... more
This paper introduces a new compositional process based on transformations of previously existing material by segmentation of information located in a 2-dimensional cellular-space, the use of Walsh Functions as triggers, and recombinancy... more
The tabla is a unique percussion instrument due to the combined harmonic and percussive nature of its timbre, and the contrasting harmonic frequency ranges of its two drums. This allows a tabla player to uniquely emphasize parts of the... more
A Dhrupad vocal concert comprises a composition section that is interspersed with improvised episodes of increased rhythmic activity involving the interaction between the vocals and the percussion. Tracking the changing rhythmic density,... more
This paper introduces the beta version of i-Berlioz, an interactive CAO system that generates orchestrations from verbal timbre descriptors. This system relies on machine learning models for timbre classification and generative... more
In the context of networked music performance (NMP), temporal synchronization has seen widespread adoption, notably through protocols such as Ableton Link [1]. However, a significant gap remains in the domain of harmonic synchronization,... more
Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies... more
We present a system for content-based retrieval of perceptually similar sound events in audio documents ('sound spotting', using a query by example. The system consists of three discrete stages: a front-end for feature extraction, a... more
To integrate access to musicology's heterogeneous data sources so that they can be explored effectively and efficiently via one interface service.  To deliver an optimally interactive approach to support this exploration.  To develop a... more
Trained musicians intuitively produce expressive variations that add to their audience's enjoyment. However, there is little quantitative information about the kinds of strategies used in different musical contexts. Since the literal... more
A curiosidade sempre me conduziu aos aprofundamentos mais inusitados na música. Muitas vezes, quando abro um instrumento na bancada da oficina ou quando o levo aos lábios para estudar, me pergunto o que acontece dentro daquele tubo sonoro... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized... more
The AI music industry is growing, raising questions
around how to protect and pay artists whose work is
used to train generative AI models. Are the answers
in the models themselves?
This study focuses on the exploration of the possibilities arising from the application of an NLP word-embedding method (Word2Vec) to a large corpus of musical chord sequences, spanning multiple musical periods. First, we analyse the... more
Classification of musical audio signals according to expressed mood or emotion has evident applications to contentbased music retrieval in large databases. Wrapper selection is a dimension reduction method that has been proposed for... more
The perceptual attributes of timbre have inspired a considerable amount of multidisciplinary research, but because of the complexity of the phenomena, the approach has traditionally been confined to laboratory conditions, much to the... more
BACKGROUND The automatic prediction of emotional content in music is nowadays a growing area of interest. Several algorithms have been developed to retrieve music features and computational models using these features are continuously... more
In response to the proposal of digitizing the entire back-run of several European audio archives, many research projects have been carried out in order to discover the technical issues involved in making prestigious audio documents... more
Through the application of the "trigram melodic contour analysis" first described in [Yarman & Sethares et al. 2019, "An investigation of the role of diatonic functions in the seyir of Turkish makam music: case of 'Hicaz family'"] to the... more
Studies on the perception of musical qualities (such as induced or perceived emotions, performance styles, or timbre nuances) make a large use of verbal descriptors. Although many authors noted that particular musical qualities can hardly... more
We propose a system for recognizing a singer based on past observations of the singer’s voice qualities and their ability to sing a given song. Such a system could be useful to improve predictions in Query by Humming systems, or as... more
With the continuous improvement in various aspects in the field of artificial intelligence, the momentum of artificial intelligence with deep learning capabilities into the field of music is coming. The research purpose of this paper is... more
The goals of this project are the creation of a new dataset of sounds that belong to the domestic environment, called DomesticFSD2018, and to research on methods for the automatic classification of them. A Semi-Supervised approach is used... more
My supervisor Eduardo Fonseca, whose expertise, diligence, patience and constant encouragement were crucial for going through with this thesis • My co-supervisor Frederic Font, for the valuable input • All the people of the MTG behind the... more
Music is an art form whose medium is sound. It includes various attributes like rhythm, melody, timber etc. The term melody is a musicological concept based on the judgment of human listener's .Melody extraction from polyphonic music is a... more
In this paper, the methodology used to recognize the musical instrument is summarized. To recognize musical instruments, there are two ways. They are training phase & testing phase. The details regarding the same is mentioned. Music... more
This study presents an automatic, computer-aided analytical method called Comparison Structure Analysis (CSA), which can be applied to different dimensions of music. The aim of CSA is first and foremost practical: to produce dynamic and... more
during the years 2003-2010. I wish to express my sincere gratitude to everyone I have worked with in Europe, the United States and China during these years. First, I would like to express my gratitude to my supervisor and friend,... more
This paper presents D'Accord Guitar, an innovative environment for learning, editing and performing music. D'Accord Guitar can be seen as an Instrumental Performance System (IPS). In order to improve musical notation completeness,... more
I comment on Clark and Arthur's response to a YouTuber’s claim of the death of melody for which they used corpus analysis and statistical methods of computational musicology. While I basically appreciate the effort, I also will... more
Scientists studying music and evolution often discuss similarities and differences between music, language, and bird song, but few studies have simultaneously compared these three domains quantitatively. One of the striking features often... more
Melody identification is an important early step in music analysis. This paper presents a tool to identify the melody in each measure of a Standard MIDI File. We also share an open dataset of manually labeled music for researchers. We use... more
Human Computer Music Performance (HCMP) is the study of music performance by live human performers and real-time computer-based performers. One goal of HCMP is to create a highly autonomous artificial performer that can fill the role of a... more
musical performances and synchronize prestored computer music accompaniments. The Interaction with computers in musical third project is a system for analyzing the performances is very much limited by a lack harmonic and rhythmic content... more
Don't know the composer, performer, or title? Let the system match the theme you know to the song you want.
Probabilistic Latent Component Analysis (PLCA) is a tool similar to Non-negative Matrix Factorization (NMF), which is used to model non-negative data such as non-negative time-frequency representations of audio. In this paper, we put... more
This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the... more
Download research papers for free!