The ITS Audio Quality Research Program addresses selected open questions in digital speech and audio quality assessment, enhancement, compression, and transmission. Our contributions are most easily appreciated via our Publications & Talks page. In addition, here are some recent highlights:
WAWEnets: We are currently unifying and regularizing the Wideband Audio Waveform Evaluation Networks (WAWEnets) that we introduced at ICASSP 2020 (paper, software). WAWEnets are no-reference waveform-based convolutional neural network architectures that can accurately estimate speech quality or speech intelligibility. Our ongoing work involves additional speech signals, full-reference objective quality and intelligibility target values, and subjective test scores.
Optimal Frame Durations: We will present our recent work addressing optimal frame durations for separation of audio signals at the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP 2021). The supporting demonstration and code are available on a separate page: Audio Demos for Frame Duration Study. The paper "Optimal Frame Duration for Oracle Audio Signal Separation is Determined by Joint Minimization of Two Antagonistic Artifacts" shows that optimal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration) and log-spectral-error change per unit time (which deceases with frame duration). These effects are related to the stationarity of the signals but saying that “stationarity determines optimal frame duration” falls far short of describing the true nature and complexity of the interaction.
DNN Speech: We recently investigated the issue of measuring the quality of DNN generated speech. Speech with high subjective quality is reliably given unrealistically low quality scores by well-established full-reference objective quality tools. We identified causes and proposed solutions. Our results were presented at QoMEX 2021 (paper).
Input Speech Quality: We proposed that speech device-under-test (DUT) output signals can contain information about the DUT speech input quality as well as the DUT speech output quality. We designed a proof-of-concept experiment that shows this is indeed possible and presented the work at QoMEX 2021 (paper).
The Bigger Picture: The quality of speech sent over a telecommunication system depends on a variety of factors, such as the background noise in the environment, the algorithms used to digitally enhance and code the speech signal, the bandwidth used in transmitting the speech signal, and others. The ITS Audio Quality Research Program supports community-wide efforts towards robust and adaptable telecommunication speech services and equipment with high quality and intelligibility.
In the program we identify and address open issues in these areas and we also develop and characterize algorithm innovations. In particular we seek to improve tools and techniques for quantitatively characterizing the user experience of speech quality and speech intelligibility, both through subjective testing and by means of signal processing algorithms.