Institute for Telecommunication Sciences / About ITS / 2016 / Research Spotlight: Speech Intelligibility

Research Spotlight: Speech Intelligibility

December 2016

Speech intelligibility is one of the primary requirements the National Public Safety Telecommunications Council (NPTSC) Broadband Working Group defined for mission critical voice services like those to be delivered over the new nation-wide public safety broadband network that the First Responder Network Authority (FirstNet) is charged with deploying. The NPSTC requirements begin with “The listener MUST be able to understand [what is being said] without repetition.”

For years ITS has conducted various types of subjective testing in tightly-controlled laboratory conditions to sort through myriads of emerging telecom options to find those that sound better or work better in some respect. Where this work was directed towards intelligibility, it has been done through ITS’s participation in the Public Safety Communications Research (PSCR) program, a joint effort with the National Institute of Standards and Technology (NIST), and with the involvement of those who are directly affected—the public safety practitioners. A particular focus has been intelligibility in the presence of background noise to provide comparative intelligibility results for new digital speech and audio codecs, but now the work has expanded to include the condition of the communication network itself.

A report issued in November 2016 describes comparative intelligibility results for new digital speech and audio codecs under different conditions of radio access network (RAN) degradation. Characterizing the relationship between the condition of the RAN and intelligibility is particularly important for mission critical voice because the events that stress the RAN may very well be events that also have critical intelligibility requirements.

One public safety related example would be an event that is escalating, requiring additional personnel to report to the scene. As more and more first responders share radio resources on the scene, those resources will be stressed more and more. As they are stressed, the voice data stream can be corrupted and packets or frames of data can be lost. Voice codecs use various mechanisms to compensate for packet loss or frame erasure—the more successfully they do this, the more “robust” they are and the more likely it is that the listener will be able to understand the message.

The test results published in NTIA Technical Report TR-17-522: Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions can inform codec selection for mission critical voice applications, as well as the design, provisioning, and adaptation of these services and the underlying network. Most importantly, these results can allow those engineering activities to be driven by the critical user experience factor—speech intelligibility.