The ITS Audio Quality Research Program addresses selected open questions in digital speech and audio quality assessment, enhancement, compression, and transmission. Our contributions are most easily appreciated via our Publications & Talks page. Here are some recent highlights:
Bursty packet losses or bit errors: We are finalizing our work on the mathematical links between the Gilbert-Elliot model parameters (2, 3 and 4 parameter cases) and the resulting packet loss or bit error statistics. These links allow one to set parameters to obtain desired average loss rates, average burst lengths, loss covariances, etc. A technical memorandum and supporting code is nearly finished. The code can estimate models and parameters from loss patterns and can generate error patterns dictated by model parameters or error statistics.
WAWEnets: We have unified, regularized, and analyzed the ITS Wideband Audio Waveform Evaluation Networks (WAWEnets). This progress is described in a new technical paper and we continue with updates to the codebase. WAWEnets are no-reference waveform-based convolutional neural network architectures that emulate full-reference speech quality or speech intelligibility values as well as subjective scores. We first introduced WAWEnets at ICASSP 2020 in this paper.
Optimal Frame Durations: We presented our work addressing optimal frame durations for separation of audio signals at the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP 2021).The paper "Optimal Frame Duration for Oracle Audio Signal Separation is Determined by Joint Minimization of Two Antagonistic Artifacts" was one of the papers nominated for the Best Paper Award at the conference. The work shows that optimal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration) and log-spectral-error change per unit time (which deceases with frame duration). These effects are related to the stationarity of the signals but saying that “stationarity determines optimal frame duration” falls far short of describing the true nature and complexity of the interaction. The supporting demonstration and code are available on a separate page: Audio Demos for Frame Duration Study.
DNN Speech: We investigated the issue of measuring the quality of DNN generated speech. Speech with high subjective quality is reliably given unrealistically low quality scores by well-established full-reference objective quality tools. We identified causes and proposed solutions. Our results were presented at QoMEX 2021 (paper).
Input Speech Quality: We proposed that speech device-under-test (DUT) output signals can contain information about the DUT speech input quality as well as the DUT speech output quality. We designed a proof-of-concept experiment that shows this is indeed possible and presented the work at QoMEX 2021 (paper).
The Bigger Picture: The quality of speech sent over a telecommunication system depends on a variety of factors, such as the background noise in the environment, the algorithms used to digitally enhance and code the speech signal, the bandwidth used in transmitting the speech signal, and others. The ITS Audio Quality Research Program supports community-wide efforts towards robust and adaptable telecommunication speech services and equipment with high quality and intelligibility.
In the program we identify and address open issues in these areas and we also develop and characterize algorithm innovations. In particular we seek to improve tools and techniques for quantitatively characterizing the user experience of speech quality and speech intelligibility, both through subjective testing and by means of signal processing algorithms.