Perceptual audio quality assessment

description8 papers

group2 followers

lightbulbAbout this topic

Perceptual audio quality assessment is the evaluation of audio signals based on human perception, focusing on how listeners experience sound quality. This field employs subjective testing methods and objective algorithms to quantify audio fidelity, often considering factors such as clarity, distortion, and overall listener satisfaction.

lightbulbAbout this topic

Key research themes

1. How can listener variability and task design influence the reliability of perceptual voice and audio quality assessments?

This research theme focuses on the sources of variability in human perceptual ratings of voice and audio quality, and how experimental design choices (including rating scales, tasks, and listener backgrounds) affect reliability and agreement among listeners. Understanding these factors is crucial for developing standardized and valid clinical and perceptual evaluation protocols that yield consistent and interpretable results, which underpin both subjective assessments and the validation of objective quality metrics.

Perceptual evaluation of voice quality: review, tutorial, and a framework for future research

by Bruce Gerratt

2015

Key finding: This paper presents a detailed theoretical framework attributing variability in clinical voice quality ratings to multiple sources, including listener backgrounds and biases, the nature of the rating task, and random error.... Read more

articleView Paper downloadDownload

Web Audio Evaluation Tool: A framework for subjective assessment of audio

by David Moffat

2022

Key finding: This paper introduces a web-based tool (WAET) implemented via the Web Audio API to conduct perceptual listening tests with flexible test types and interfaces, remotely deployable without programming knowledge. The framework... Read more

articleView Paper downloadDownload

Microphone Handling Noise: Measurements of Perceptual Threshold and Effects on Audio Quality

by Bruno Fazenda

2022, PloS one

Key finding: This psychoacoustic study designed a representative listening test paradigm mimicking distracted (everyday) listening rather than analytical listening to measure perceived degradation due to microphone handling noise. By... Read more

articleView Paper downloadDownload

Influence of working memory and attention on sound-quality ratings

by Rainer Huber

2024, The Journal of the Acoustical Society of America

Key finding: This study demonstrated that individual cognitive differences, specifically working memory capacity and selective attention, significantly influence subjective sound quality ratings in older listeners with near-normal... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What objective and subjective methods effectively quantify perceptual audio quality across diverse applications, including speech and spatial audio?

This research theme investigates computational models, objective metrics, and subjective testing methodologies used for evaluating audio quality perception in various contexts—ranging from speech communication systems (VoIP), digital audio broadcasting, to spatial audio and ambisonics. The focus is on comparing and validating algorithmic metrics against listener ratings, improving real-time assessment techniques, and extending quality evaluation to emerging audio formats while incorporating perceptual and spatial localization components.

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

by Peter Pocta and

2015

Key finding: The paper combines subjective listening tests and objective quality models (PEAQ, POLQA Music) to assess the audio quality impact of typical lossy codecs in digital audio broadcasting and web-casting. Results show that low... Read more

articleView Paper downloadDownload

AMBIQUAL - a full reference objective quality metric for ambisonic spatial audio

by Miroslaw Narbutt

2022

Key finding: This work proposes AMBIQUAL, a novel full-reference metric for assessing spatial audio quality of Ambisonic B-format signals, evaluating both Listening Quality and Localization Accuracy. The metric extends ViSQOLAudio by... Read more

articleView Paper downloadDownload

CAQoE: A Novel No-Reference Context-Aware Speech Quality Prediction Metric

by Rajesh Kumar Dubey

2022, ACM Transactions on Multimedia Computing, Communications, and Applications

Key finding: This paper proposes CAQoE, a no-reference, context-aware speech quality metric designed for real-time VoIP applications under varying noise conditions. Unlike traditional metrics requiring reference signals, CAQoE initially... Read more

articleView Paper downloadDownload

Accuracy analysis on call quality assessments in voice over IP

by Jonathan Dunne

2023

Key finding: By experimentally comparing Perceptual Evaluation of Speech Quality (PESQ) and the E-Model under various network conditions and codecs in VoIP systems, this paper found discrepancies between off-line (PESQ) and real-time... Read more

articleView Paper downloadDownload

NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram

by SHAKEEL ZAFAR

2023, Sensors

Key finding: This paper develops a non-intrusive speech quality evaluator (NI-SQE) using natural scene statistics on mean-subtracted contrast normalized spectrogram features. By avoiding the need for a pristine reference, their method... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can computational modeling and pilot data aid the selection of stimuli and parameters to optimize perceptual audio and video quality studies?

This theme focuses on methodologies for selecting experimental stimuli, parameters, and degradation levels to maximize the perceptual discriminability and representativeness of audio and video quality assessments. It incorporates techniques such as perceptual similarity distances, multidimensional scaling, and statistical modeling to ensure even coverage of perceived quality ranges in subjective tests, thereby improving the robustness and interpretability of results. These design strategies impact both subjective test efficacy and objective metric validation.

Selecting stimuli parameters for video quality assessment studies based on perceptual similarity distances

by Amber Gislason-Lee

2025, Image Processing: Algorithms and Systems XIII

Key finding: This paper proposes a paired-comparison based parameter selection methodology where observers judge similarity in quality between parameter-modulated video stimuli. The approach uses classical multidimensional scaling on... Read more

articleView Paper downloadDownload

Subjective Assessment of Objective Image Quality Metrics Range Guaranteeing Visually Lossless Compression

by sonain jamil

2023, Sensors

Key finding: Combining subjective flicker tests with objective image quality metrics, this study proposes a methodology to determine objective metric thresholds guaranteeing visually lossless compression levels. Human observers performed... Read more

articleView Paper downloadDownload

Evaluation of two principal approaches to objective image quality assessment

by Pavel Slavík

2024, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004.

Key finding: This study compares two fundamental computational models of image quality assessment—the Visible Differences Predictor (error sensitivity based) and the Structural Similarity Index (structural similarity based)—against... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Perceptual audio quality assessment

Sound Spotting - a Frame-Based Approach

by Richard Polfreman

2025, Ismir

We present a system for content-based retrieval of perceptually similar sound events in audio documents ('sound spotting', using a query by example. The system consists of three discrete stages: a front-end for feature extraction, a... more

descriptionView Paper arrow_downwardDownload

Control of Loudness in Digital TV

by Thomas Lund

2022

To facilitate better consistency between programs and stations, ITU, EBU and ARIB have investigated the standardization of broadcast loudness. This paper examines some consequences of a global loudness standard with regard to metering and... more

Fig 1. Dynamic Range Tolerance for consumers under different listening situations. While DTV in itself must be able to cover several consumer situations, other emerging digital broadcast platforms widen the dynamic range target even further.

Fig 3. Typical noise spectrum in a moving car with the windows closed (upper trace), and when idling (lower trace). Fig 3 shows spectral noise conditions inside a car [3]. Low frequency noise from the road-tire contact is the main source, as long as the windows are kept closed.

Fig 4. Weighting filters used in combination with Leq measures. A, B, C, D, M and RLB (green). The tests suggested that a relatively simple Leq measure close to a C weighing, labeled “teq(RLB)”, under certain conditions was a good predictor of perceived loudness.

Fig 5. A dose approach to Loudness. Audio segment 1, 2 and 3 may produce the same dose measure, even though their level profiles over time are quite different.

Fig 6. A guide to reading the Loudness Model Evaluation Diagram of Fig 7. Finally, it should be noted that the idea of a perceptually based level calculation is not new. An aging, but respectable measure such as “CBS Loudness”, is still being used with success for automated level control [9]. This model has served as a Je facto reference for objective loudness measurement, in the broadcast community.

Fig 8. Example of Loudness Meter combining a realtime measure in the outer ring with a history in the “radar view”.

Routing internally at the station is based on linear digital audio, typically using AES/EBU and/or SDI transports. Fig 9. Example of Dolby LM100 meter measurement before and after automatic loudness correction during transmission. Challenging 20 sec broadcast segments butt edited over 5:30 minutes.

Fig. 7. Evaluation of different Loudness Models (names at the bottom) using a wide range of broadcast audio material [8]. Loudness models to the left are in better agreement with human listeners than models to the right of the chart. Red indication at the top signifies outlier audio segments, misjudged by more than 6 aB of a particular loudness model.

¥. OlAllONS USING Metadata OMY fOr Film Normal content transmitted using automatic realtime control of loudness and format (fixed metadata). Feature films transmitted with or without dynamic range correction (dynamic metadata). Fig. 10. Three different ways of handling Loudness Control, Multichannel audio and Data Reduction in digital broadcast. DTV transmission relies solely on metadata when it comes to loudness control and speech intelligibility. In Fig 10, drawing no. 2, the Ingest Gate (i2) is used to datareduce import programming, and to inspect metadata associated with it. Downstream of Ingest, metadata must always be available and preserved, meaning no analog transfers or sample rate converters. Routing internally at the station is based exclusively on datareduced, synchronous digital audio. Data encoders and decoders are used for breakouts and monitoring. Audio/video synchronization needs special attention in designs where an arbitrary number of monitoring posts are needed. In production studios, metadata has to be attached to all programs. Production can be native mono, stereo or 5.1 as required. OB and Live production can be incorporated using fixed metadata with appropriate upstream dynamics processing.

The main part of dynamic range translation and loudness control should be done at the station, leaving only smaller corrections to be performed at the consumer. Fig 11. Example of dynamic range re-mapping: From Home Theatre/DVD to Living Room listening conditions (Fig 1).

Fig 12. Example of dynamic range re-mapping: From Home Theatre/DVD to Living Room listening conditions (Fig 1). Fig 11 and fig 12 show rational transfer characteristics complying with the DRT of the consumer, without affecting level already on target.

REFERENCES Fig 13. Example of multiband dynamic range re-mapping of a 5.1 feature film to domestic listening conditions (Fig 7). Black curve: Center channel. Orange curve: L, R, Ls, Rs.

Table 1. Typical noise levels measured by the author. All environments are realistic for broadcast consumption today. In situations with significant backgrou nd noise, such as inside various transports or urban environments, see Table 1, it’s a challenge to get a wid message across - be it music or s reproduction distortion being added, listener’s ears. The latter is becomi recent studies suggest that headphone 5-10 dB above the same person’s e dynamic range poken - without or damaging the ng important as evels may be set preference when listening through speakers, see Fig 2. f the same holds true in noisy environments, where iPods are often used, mobile platforms could pose a threat to hearing.

descriptionView Paper arrow_downwardDownload

A New Set of Directional Weights for Itu-R BS.1770 Loudnessmeasurement of Multichannel Audio

by Hani Camille Yehia

2022

The ITU-R BS.1770 multichannel loudness algorithm performs a sum of channel energies with weighting coefficients based on azimuth and elevation angles of arrival of the audio signal. In its current version, these coefficients were... more

descriptionView Paper arrow_downwardDownload

On Some Biases Encountered in Modern Audio Quality Listening Tests-A Review

by Sławomir Zieliński

2022

A systematic review of typical biases encountered in modern audio quality listening tests is presented. The following three types of bias are discussed in more detail: bias due to affective judgments, response mapping bias, and interface... more

descriptionView Paper arrow_downwardDownload

On Some Biases Encountered in Modern Audio Quality Listening Tests (Part 2): Selected Graphical Examples and Discussion

by Sławomir Zieliński

2022, Journal of the Audio Engineering Society

This paper provides complementary data to the review of biases in audio quality listening tests by Zieliński et al. (2008) [1]. The paper presents selected illustrations of range equalizing bias, centering bias, stimulus spacing bias,... more

descriptionView Paper arrow_downwardDownload

Audio quality evaluation by experienced and inexperienced listeners

by Frederik Nagel

2022, The Journal of the Acoustical Society of America

Basic perceptual quality of coded audio material is commonly evaluated using ITU-R BS-1534 MUSHRA (Multi Stimulus with Hidden Reference and Anchors) listening tests. MUSHRA guidelines call for experienced listeners. However, the majority... more

descriptionView Paper arrow_downwardDownload

Objective Evaluations of Synthesised Environmental Sounds

by David Moffat

2022

There are a range of different methods for comparing or measuring the similarity between environmental sound effects. These methods can be used as objective evaluation techniques, to evaluate the effectiveness of a sound synthesis method... more

descriptionView Paper arrow_downwardDownload

Metrics for Quantifying Loudness and Dynamics 1

by Earl Vickers

2021

This material was originally intended as part of the article "The Loudness War: Background, Speculation and Recommendations" [1] but was removed for reasons of scope and to keep that article to a manageable length.) In this paper, I... more

descriptionView Paper arrow_downwardDownload

Strategies to Increase the Applicability of Methods for Objective Assessment of Audio Quality

by Jayme G A Barbedo

2021

The current ITU's standard for objective assessment of audio quality, Perceptual Evaluation of Audio Quality (PEAQ), has some shortcomings that prevent its reliable use for a number of codification conditions and some kind of signals. The... more

descriptionView Paper arrow_downwardDownload

by Nina Düvel

2021, Music & Science

Over the last decades, the simulation of musical instruments by digital means has become an important part of modern music production and live performance. Since the first release of the Kemper Profiling Amplifier (KPA) in 2011,... more

descriptionView Paper arrow_downwardDownload

Auditory Cues for Gestural Control of Multitrack Audio

by Josh Reiss

2017

This paper presents a study undertaken to evaluate user ratings on auditory feedback of sound source selection within a multi-track auditory environment where sound placement is controlled by a gesture control system. Selection... more

descriptionView Paper arrow_downwardDownload

Web Audio Evaluation Tool: A framework for subjective assessment of audio

by David Moffat and

2017

Perceptual listening tests are commonplace in audio research and a vital form of evaluation. While a large number of tools exist to run such tests, many feature just one test type, are platform dependent, run on proprietary software, or... more

Figure 2: Box and whisker plot showing the aggre- gated numerical ratings of six stimuli by a group of subjects.

One of the key initial design parameters for WAET was to make the tool as open as possible to non-programmers. To this end, all of the user modifiable options are included in a single XML document, referred to as the specification document, that can be written manually (or modifying an existing document or template) or using the included test creator. The test creator can modify existing specification documents or generate new ones in an intuitive yet power- ful HTML GUI. This simplifies the creation of elements by visualising the data structure with explanatory text.

descriptionView Paper arrow_downwardDownload

Loudness management for home television viewing

by Amal Punchihewa

2016, 2012 IEEE International Instrumentation and Measurement Technology Conference Proceedings

This paper presents dual sensor based management system for television viewing at home environment in compliance with BS1770 loudness measurement standard aim of the research was mainly to address the issue of sudden increase in loudness... more

The research investigated and developed an adaptive loudness control system to be implemented at the user end to create a pleasurable television viewing environment. Fig. 1 shows a block diagram of the conceptual idea. This paper has been organized as follows. Firstly, scope of the paper has been established in introduction followed by section II with background that provides some history of the loudness problem, current situation and the previous work that

Previous research reports loudness dependence on frequency and the source. These are shown in Fig.4 and Fig. 5. Fig. 4 reports the equal loudness measured using phons depicting the equal loudness contours [11]. Figure 5. Four different weighted curves for loudness curves for different sound sources [4]

The Leq algorithm operates on two monophonic signals L and R. The main advantage of this algorithm lies in its simplicity and scalability. It is simple since it is entirely made up of basic signal processing blocks that can be implemented in the time domain using inexpensive hardware. It scalable since the same processing is applied at each channel therefore it is easy to implement a meter to accommodate any number of channels from 1 to N. The Fig. 7 shows the configuration of the dual microphone based loudness metering to mimic the audio that a human in front of a television will hear. The audio captured using a model-human head fixed with two microphones were processed using a pre-filter and a RLB filter to account for human auditory system that is modeled at sampling frequency of 48KHz. Sampling frequency of 48 kHz was chosen as it the common high fidelity audio sampling frequency used in many digital media such as digital video broadcasting and DVD. Fig. 5 shows weighted curves for different loudness types such as aircraft sound etc. The sound that is captured needs to be pre-processed for human auditory system [4] and equalisation using a weighting curve. Then it is known as Leq.

Figure 8. Response of the pre-filter used to account for the acoustic effects of the head [1]. Figure 7. Loudness compuation based on ITU standard using two microphones.

Figure 9. RLB weighting curve based on weighted curves for loudness [1] The value of the coefficients shown in the table I are the coefficients when the sampling rate is 48 kHz. If the sampling rate is different to 48 kHz, different coefficient values should be chosen to provide the same frequency response as the specified filter provides at 48 kHz. From the tests done it is shown that the performance of the algorithm is not affected by small variations in these coefficients.

Figure 10. Schematic diagram for the RLB and pre-filter as a second ordet IIR filter Both the pre-filter and RLB weighting digital IIR filter can be drawn as shown in Fig. 10. These filter coefficients are for sampling rate of 48 kHz. After the pre-filtering and the RLB filtering, the mean- square energy during the measurement interval or integration period T is obtained using the formula:

Figure 11. Loudness measurement of a televison commercial (top) origianl signal above -23dBLKFS, (bottom) Blue signal kept below -23dB LKFS

descriptionView Paper arrow_downwardDownload

Considerations when calibrating program material stimuli using LUFS

by Malachy Ronan and

2016

While the LUFS standard was originally developed for broadcast applications, it offers a convenient means of calibrating program material stimuli to an equal loudness level, while remaining in a multichannel format. However, this... more

descriptionView Paper arrow_downwardDownload

Temp

by Puneet Bhatnagar

2016

In this age of DTV systems which allow wide dynamic range, we easily find media content which is not within the confort zone of most of the listeners. With the advent of various object loudness measurement techniques and compliance... more

descriptionView Paper arrow_downwardDownload

PRELIMINARY GUIDELINES FOR SUBJECTIVE EVALUATION OF AUDIO SOURCE SEPARATION ALGORITHMS

by Mark D Plumbley

2016

Evaluating audio source separation algorithms means rat- ing the quality or intelligibility of separated source sig- nals. While objective criteria fail to account for all audi- tory phenomena so far, precise subjective ratings can be... more

descriptionView Paper arrow_downwardDownload

Exploration of Timbre features as analytic tools for sound quality perception

by Gianni Massi

2016

In the last 20 year there has been an increasing need for an objective method for eval- uating audio from a perceptual point of view. Perceptual encoding and its prevalence in popular audio distribution models highlights the demand for software that is able to estimate the quality without having to organize costly and time-consuming listening experiments which have been the main means of evaluating perceptual audio quality so far. In addition to this timbre has been the subject of many publications recently that have contributed to understanding the connection between di↵erent metrics to quantify it and their perceptual analog. MFCCs have recently been shown to have a close rela- tion with perception of sound and the Echonest Analyzer API has been proven to be particularly successful at measuring timbre in MIR (Music Information Retrieval) tasks and it efficacy and ease of use make it a perfect candidate for exploring timbre.
The project proposes a new approach to objective audio quality evaluation in which, in order to find computationally the perceptual di↵erences between two tracks, timbre features are retrieved using the Echonest Analyzer, a perceptually based audio analysis service, and MFCC, a perceptually relevant feature set used in speech recognition. A distance measure derived from speech recognition research, Dynamic Time Warping, is used in conjunction with the Euclidean distance of two vectors representing the first four statistical moments of the features derived, are used to acquire a 6-dimensional feature set detailing dissimilarity between two tracks. These are then used with labels obtained in listening tests in the training of a system that uses K-Nearest Neighbour regression to predict quality. An experiment is designed to gather data for the train- ing and validation of the system. The quality prediction are found to correlate with subjective ratings and compared with the PEAQ standard, which is found to perform better. Finally considerations are made about the verification process and about how this research can be taken forward.

descriptionView Paper arrow_downwardDownload

Measuring Dynamics: Comparing and Contrasting Algorithms for the Computation of Dynamic Range

by Jon Boley

2015

There is a consensus among many in the audio industry that recorded music has grown increasingly compressed over the past few decades. Some industry professionals are concerned that this compression often results in poor audio quality... more

descriptionView Paper arrow_downwardDownload

Strategies to Increase the Applicability of Methods for Objective Assessment of Audio Quality

by Jayme G A Barbedo

2013, 116th Convention of the Audio Engineering Society

descriptionView Paper arrow_downwardDownload

Automated Tonal Balance Enhancement for Audio Mastering Applications

by Konstantinos Drossos and

2013, Proceedings of the 134th Audio Engineering Society Convention

Modern audio mastering procedures include selectively equalisation of specific frequency bands for the tonal enhancement of the unmastered material. This process is mostly based on music scores' or listening information, like the musical... more

descriptionView Paper arrow_downwardDownload