In the last 20 year there has been an increasing need for an objective method for eval- uating au... more In the last 20 year there has been an increasing need for an objective method for eval- uating audio from a perceptual point of view. Perceptual encoding and its prevalence in popular audio distribution models highlights the demand for software that is able to estimate the quality without having to organize costly and time-consuming listening experiments which have been the main means of evaluating perceptual audio quality so far. In addition to this timbre has been the subject of many publications recently that have contributed to understanding the connection between di↵erent metrics to quantify it and their perceptual analog. MFCCs have recently been shown to have a close rela- tion with perception of sound and the Echonest Analyzer API has been proven to be particularly successful at measuring timbre in MIR (Music Information Retrieval) tasks and it efficacy and ease of use make it a perfect candidate for exploring timbre. The project proposes a new approach to objective audio quality evaluation in which, in order to find computationally the perceptual di↵erences between two tracks, timbre features are retrieved using the Echonest Analyzer, a perceptually based audio analysis service, and MFCC, a perceptually relevant feature set used in speech recognition. A distance measure derived from speech recognition research, Dynamic Time Warping, is used in conjunction with the Euclidean distance of two vectors representing the first four statistical moments of the features derived, are used to acquire a 6-dimensional feature set detailing dissimilarity between two tracks. These are then used with labels obtained in listening tests in the training of a system that uses K-Nearest Neighbour regression to predict quality. An experiment is designed to gather data for the train- ing and validation of the system. The quality prediction are found to correlate with subjective ratings and compared with the PEAQ standard, which is found to perform better. Finally considerations are made about the verification process and about how this research can be taken forward.
Uploads
Papers by Gianni Massi
The project proposes a new approach to objective audio quality evaluation in which, in order to find computationally the perceptual di↵erences between two tracks, timbre features are retrieved using the Echonest Analyzer, a perceptually based audio analysis service, and MFCC, a perceptually relevant feature set used in speech recognition. A distance measure derived from speech recognition research, Dynamic Time Warping, is used in conjunction with the Euclidean distance of two vectors representing the first four statistical moments of the features derived, are used to acquire a 6-dimensional feature set detailing dissimilarity between two tracks. These are then used with labels obtained in listening tests in the training of a system that uses K-Nearest Neighbour regression to predict quality. An experiment is designed to gather data for the train- ing and validation of the system. The quality prediction are found to correlate with subjective ratings and compared with the PEAQ standard, which is found to perform better. Finally considerations are made about the verification process and about how this research can be taken forward.