Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020), Barcelona, Spain, April 2020, pp. 331-335
WAWEnets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality
Cite This Publication
Andrew A. Catellier and Stephen D. Voran, “WAWEnets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality,” in Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020) Barcelona, Spain, April 2020, pp. 331-335.
Andrew A. Catellier and Stephen D. Voran
Abstract: Building on prior work we have developed a no-reference (NR) waveform-based convolutional neural network (CNN) architecture that can accurately estimate speech quality or intelligibility of narrowband and wideband speech segments. These Wideband Audio Waveform Evaluation Networks, or WAWEnets, achieve very high per-speech-segment correlation (Pseg >= 0:92, RMSE <= 0:38) to established full-reference quality and intelligibility estimators (PESQ, POLQA, PEMO, STOI) based on over 17 hours of speech from 127 previously unseen talkers speaking in 13 different languages; just 10% of our total data. NR correlations at this level across such a broad scope are unprecedented. This achievement was made possible by using full-reference estimates as training targets so that WAWEnets could learn implicit undistorted speech models and exploit them to produce accurate NR estimates.
Keywords: wideband; speech quality; no reference (NR); speech intelligibility; convolutional neural network (CNN)
Related Publications:
For technical information concerning this report, contact:
Stephen D. Voran
Institute for Telecommunication Sciences
(303) 497-3839
svoran@ntia.gov
Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.
For questions or information on this or any other NTIA scientific publication, contact the ITS Publications Office at ITSinfo@ntia.gov or 303-497-3572.