The Audio Quality Research Program periodically conducts listening experiments to support various research efforts. These experiments provide information on the perceived quality that results from various coding and transmission schemes and impairments. The Program is well equipped to conduct these experiments. The laboratory facility includes two sound-isolated rooms. Other equipment includes digital audio tape recorders, compact disk players, digital audio encoders and decoders, a spectrum analyzer, signal generators, level meters, mixers, amplifiers, processors, speakers, and microphones. Workstations equipped with 24-bit digital-to-analog and analog-to-digital converters provide for additional signal conversion and processing flexibilities.
Listening experiments are the most direct way to measure perceived speech or audio quality. (Listening experiments are also known as subjective tests.) These experiments take many forms, but always include the presentation of an auditory stimulus, and the recording of a subject's response. The subject is often referred to as the "listener." It is important that the experimental environment be closely controlled, especially with regard to background noise. Sound isolated rooms are often used to gain the required level of control. The auditory stimuli are often short (5 to 30 second) passages of speech or music. Stimuli may be played through loudspeakers, headphones, telephone handsets, or other devices, as appropriate for a given experiment. The type of experiment also determines the listener responses that are recorded. As with all human subject experiments, variability of responses across subjects is to be expected. Analysis of listening experiments often involves measures of central tendency and measures of dispersion, both of which are inherently linked to the number of subjects in the experiment.
For some specific classes of listening experiments, the International Telecommunication Union (ITU) has provided guidance for the experimenter. Some of the relevant Recommendations include:
- P.800: "Methods for subjective determination of transmission quality"
- P.830: "Subjective performance assessment of telephone-band and wideband digital codecs"
- P.831: "Subjective performance evaluation of network echo cancellers"
- BS.1116-1: "Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems"
When listening experiments are used to evaluate telephony equipment and services, the following conditions often apply. The number of subjects is usually between 20 and 100. Experiments are generally divided into sessions that last for 30 to 45 minutes, and subjects are allowed a short break between sessions. Most experiments will have between one and four such sessions. Auditory stimuli are often one or more Harvard phonetically balanced sentences. Absolute category rating (ACR) experiments are popular. In ACR experiments, the subject places each stimulus in a category. The most common response scale for ACR experiments is the mean opinion score (MOS) scale. When this scale is used, subjects rate the speech quality of each stimulus as excellent, good, fair, poor, or bad. For analysis purposes, these five descriptors are then associated with the integers 5, 4, 3, 2, and 1, respectively.
Other important considerations in the design and conduct of listening experiments include: the selection of subjects (hearing acuity, age, sex, prior knowledge, expectations), subject fatigue, selection of auditor stimuli (age, sex, language of speakers, types of music or other stimuli, appropriate balance of auditory stimuli), and the randomization of auditory stimuli to prevent order effects.