Andrew A. Catellier ORCID logo and Stephen D. Voran ORCID logo

Abstract: In this white paper, we describe a new convolutional framework for waveform evaluation, WEnets, and build a Narrowband Audio Waveform Evaluation Network, or NAWEnet, using this framework. NAWEnet is single-ended (or no-reference) and was trained three separate times in order to emulate PESQ, POLQA, or STOI with testing correlations 0.95, 0.92, and 0.95, respectively when training on only 50% of available data and testing on 40%. Stacks of 1-D convolutional layers and non-linear downsampling learn which features are important for quality or intelligibility estimation. This straightforward architecture simplifies the interpretation of its inner workings and paves the way for future investigations into higher sample rates and accurate no-reference subjective speech quality predictions.

Keywords: speech quality; no reference (NR); speech intelligibility; CNN; neural nets

For technical information concerning this report, contact:

Stephen D. Voran
Institute for Telecommunication Sciences
(303) 497-3839
svoran@ntia.gov

Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.

For questions or information on this or any other NTIA scientific publication, contact the ITS Publications Office at ITSinfo@ntia.gov or 303-497-3572.

Back to Search Results