Institute for Telecommunication Sciences / Research / Audio Quality Research / Audio Home

ITS Audio Quality Research Program

The ITS Audio Quality Research Program addresses selected open questions in digital speech and audio quality assessment, enhancement, compression, and transmission. Our contributions are most easily appreciated via our Publications & Talks page. Here are some of the more recent highlights:

WAWEnets: We have finished with simplification, unification, regularization, further training, and more thorough analysis of the ITS Wideband Audio Waveform Evaluation Networks (WAWEnets). WAWEnets are fully convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. Example evaluation scales are overall speech quality, intelligibility, and noisiness. WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our work has leveraged 334 hours of speech in 13 languages, more than two million full-reference target values, and more than 93,000 subjective mean opinion scores.

Our most recent paper provides full details and our codebase is available as well. We first introduced WAWEnets at ICASSP 2020 in this paper.

Bursty packet losses or bit errors: Our technical memorandum and supporting code are available. These result from our work on the mathematical links between the Gilbert-Elliot model parameters (2, 3, and 4 parameter cases) and the resulting packet loss or bit error statistics. These links allow one to set parameters to obtain desired average loss rates, average burst lengths, loss covariances, etc. The code can estimate models and parameters from loss patterns and can generate error patterns dictated by model parameters or error statistics.

Optimal Frame Durations: We presented our work addressing optimal frame durations for separation of audio signals at the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP 2021).The paper "Optimal Frame Duration for Oracle Audio Signal Separation is Determined by Joint Minimization of Two Antagonistic Artifacts" was one of the papers nominated for the Best Paper Award at the conference.  The work shows that optimal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration) and log-spectral-error change per unit time (which deceases with frame duration). These effects are related to the stationarity of the signals but saying that “stationarity determines optimal frame duration” falls far short of describing the true nature and complexity of the interaction. The supporting demonstration and code are available on a separate page: Audio Demos for Frame Duration Study.

DNN Speech: We investigated the issue of measuring the quality of DNN generated speech. Speech with high subjective quality is reliably given unrealistically low quality scores by well-established full-reference objective quality tools. We identified causes and proposed solutions. Our results were presented at QoMEX 2021 (paper).

Input Speech Quality: We proposed that speech device-under-test (DUT) output signals can contain information about the DUT speech input quality as well as the DUT speech output quality. We designed a proof-of-concept experiment that shows this is indeed possible and presented the work at QoMEX 2021 (paper).

The Bigger Picture: The quality of speech sent over a telecommunication system depends on a variety of factors, such as the background noise in the environment, the algorithms used to digitally enhance and code the speech signal, the bandwidth used in transmitting the speech signal, and others. The ITS Audio Quality Research Program supports community-wide efforts towards robust and adaptable telecommunication speech services and equipment with high quality and intelligibility.

In the program we identify and address open issues in these areas and we also develop and characterize algorithm innovations. In particular we seek to improve tools and techniques for quantitatively characterizing the user experience of speech quality and speech intelligibility, both through subjective testing and by means of signal processing algorithms.