Key research themes
1. How can scalable content-based search frameworks be designed to efficiently retrieve music documents in diverse digital encodings?
This research area focuses on developing scalable and generic frameworks capable of content-based search across large music collections encoded in heterogeneous digital formats (such as audio recordings, symbolic scores in XML, MIDI, or specialized encodings). The challenge lies in representing complex, multifaceted musical content independently from encoding specifics and integrating feature extraction, indexing, and search/ranking within scalable architectures (e.g., leveraging text search engine principles). Such frameworks are crucial for enabling fast, sublinear search in massive music libraries, going beyond metadata-based retrieval to support queries on rich musical content elements (e.g., melodic patterns in specific parts).
2. What audio feature extraction and representation methods enable robust retrieval and classification of music from raw audio signals?
This theme addresses approaches that extract meaningful features directly from raw audio signals to support retrieval tasks such as genre classification, query-by-example, segmentation, and audio thumbnailing. The research emphasizes the design and evaluation of spectral, timbral, temporal, and statistical features (e.g., MFCCs, LPC, FFT) and their combinations, as well as advanced visualization techniques (e.g., TimbreGrams) and classifications. The challenge is to effectively capture musical characteristics from audio signals under conditions of noise, variability, and non-symbolic formats, surpassing transcription limitations.
3. How can melodic similarity and structural patterns be leveraged to identify cover versions and structural relationships between music recordings?
This theme investigates methodologies for detecting melodic similarity and structural characteristics to identify cover songs, versions, and versions sharing underlying structure despite different performances. It builds on symbolic representation extraction (e.g., sung notes) and self-similarity matrices to capture deeper musical relations, accommodating transformations like tempo variation, transposition, and accompaniment. These approaches enable content-based retrieval that transcends metadata and superficial audio matching, crucial for music version identification, copyright management, and indexing live or polyphonic recordings.