Cite This Publication

Stephen D. Voran ORCID logo

Abstract:

Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts have found limited success, primarily in analog and higher-rate, error-free digital environments where speech waveforms are preserved or nearly preserved. The objective estimation of the perceived quality of highly compressed digital speech, possibly with bit errors or frame erasures has remained an open question. We report our findings regarding two essential components of objective estimators of perceived speech quality: perceptual transformations and distance measures. A perceptual transformation modifies a representation of an audio signal in a way that is approximately equivalent to the human hearing process. A distance measure reflects the magnitude of a perceived distance between two perceptually transformed signals.

We then describe a new objective estimation approach that uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizing blocks. Each measuring normalizing block integrates two perceptually transformed signals over some time or frequency interval to determine the average difference across that interval. This difference is then normalized out of one signal, and is further processed to generate one or more measurements. The resulting new estimators, and several established estimators, are thoroughly evaluated and compared in Part II of this paper. Hierarchical structures of measuring normalizing blocks, or other structures of measuring normalizing blocks may also address open issues in perceived audio quality estimation, layered speech or audio coding, automatic speech or speaker recognition, audio signal enhancement, and other areas.

Keywords: distance measures; perceived speech quality; bit errors; objective estimation; audio signal; frame erasures; highly compressed digital speech; human hearing process; measuring normalizing block technique; objective estimators; perceptual

For technical information concerning this report, contact:

Stephen D. Voran
Institute for Telecommunication Sciences
(303) 497-3839
svoran@ntia.gov

Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.

For questions or information on this or any other NTIA scientific publication, contact the ITS Publications Office at ITSinfo@ntia.gov or 303-497-3572.

Back to Search Results