Institute for Telecommunication Sciences / Research / Quality of Experience / Video Quality Research / Audiovisual Quality / Audiovisual Quality

Audiovisual Quality Assessment

Video quality assessment research typically uses silent videos, to eliminate variables. However, the human visual system pulls information from senses of hearing and touch. 

McGurk Effect

This video demonstrates how the eyes and ears work together to understand speech. This is called audiovisual perception or cross-modal perception. Audio of a woman saying "ba ba ba" has been dubbed over the picture of that woman saying "da da da." Your perception of what she is saying changes as you open and close your eyes.

Subtitles are not provided, because the McGurk effect is an audiovisual effect.

Audio-Visual Quality Integration

Project lead Arthur Webster

The perceived quality of an audiovisual sequence is heavily influenced by both the quality of the audio and the quality of the video. The question then arises as to the relative importance of each factor and whether a regression model predicting audiovisual quality can be devised that is generally applicable. ITS and other labs conducted a series of experiments that compared different methods of combining an audio-only mean opinion score (MOS) and a video-only MOS to an audio-visual MOS. See this publication for details.

The most important overall conclusion is that only the cross term (audio-only MOS × video-only MOS) is needed to predict the overall audio-visual MOS. This provides us with a simple and reasonably 
 accurate model. One missing factor is the impact of audiovisual synchronization errors on audiovisual quality (e.g., lip synchronization). 

Video MOS vs Speech MOS

This white paper explains differences in how mean opinion scores (MOS) are calculated for video, speech, and audiovisual quality subjective tests. 

Publications on Audiovisual Quality