IEEE Transactions on Multimedia, vol. 17, no. 12, December 2015, pp 2210-2224
Lucjan Janowski; Margaret H. Pinson
Abstract: How accurately are people able to use the absolute category rating (ACR) 5-level scale? Put another way, how repeatable are an individual subject’s scores? Several subjective experiments have asked subjects to rate the same sequences a couple of times. Analyses indicate that none of the subjects exactly repeated their prior scores for these sequences. We would like to better understand this imperfection. This paper uses ACR subjective video quality tests to explore the precision of subjective ratings. To make formal measurements possible, we propose a theoretical subject model that is the main contribution of this paper. The proposed subject model indicates three major factors that influence accuracy: subject bias, subject inaccuracy, and stimulus scoring difficulty. These appear to be separate random effects and their existence is a reason why none of the subjects were able to perfectly repeat scores. There are three key consequences. First, subject scoring behavior includes a random component that spans approximately half of the rating scale. Second, the sensitivity and accuracy of most subjective analyses can be improved if the subject scores are normalized by removing subject bias. Third, to some extent, multiple subjects can be replaced with a single subject who rates each sequence multiple times.
Keywords: design of experiments; mean opinion score; video quality assessment; quality of experience (QoE); subjective ratings; subject model
To request a reprint of this report, contact:
For technical information concerning this report, contact:
Margaret H. Pinson
Institute for Telecommunication Sciences
Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.