Institute for Telecommunication Sciences / Research / Quality of Experience / Video Quality Research / White Papers (obsolete) / Guides and Tutorials

Video Quality Research Topics and White Papers

Improved Methods for Video Quality Subjective Testing

ITS contributes to efforts within the Video Quality Experts Group (VQEG) and ITU-T Study Group 12 to improve best practices for video quality subjective testing. ITS has conducted various studies to understand subjective test methods that ask questions like "what environmental factors matter in subjective testing" and "how do methods need to be modified for new video technologies."

History of ITU-T Rec. P.910 and P.913

By Margaret Pinson, July 2025

By 2014, the techniques described in ITU-R Rec. BT.500 and ITU-T Rec. P.910 had become obsolete. They did not reflect the change from cathode ray tube (CRT) televisions to digital monitors and mobile devices. ITU-T Rec. P.913 was created to provide experimental new subjective methods for digital monitors and mobile devices. ITU-T Rec. P.910 was left as-is, both to provide the trusted subjective methods and due to concerns that voting members would block the addition of less mature methods into P.910. Once international experts gained confidence in the P.913 methods, ITU-T Study Group 12 merged P.913, P.910, and P.911 (audiovisual test methods) into a single standard (P.910) and discontinued both P.911 and P.913.

As a point of clarity, ITU-T Rec. P.910 contains subjective tests methods that are suitable for most modern video systems. P.910 contains a large number of options due to the large variety of video applications, displays, use cases, and research questions. ITU-R Rec. BT.500 focuses on the needs of broadcasters who use much higher bandwidths and bits-per-pixel.

Circa 2015, Subject Screening

This white paper provides an overview of subject screening techniques. MATLAB® code implementing the techniques is provided. Since this white paper was written, Zhi LI (Netflix) led a collaborative effort within VQEG to develop improved methods for subject screening. Those methods appear in ITU-T Rec. P.910.

Analyses of Subjective Test Methods

The following publications showcase ITS research into an improved understanding of video quality subjective test methods. Most of these studies were only possible due to international collaborations and insights from discussions at VQEG meetings.

Margaret H. Pinson, “The Precision and Repeatability of Media Quality Comparisons: Measurements and New Statistical Methods,” Journal Article, February 2023
Pablo Pérez, Lucjan Janowski, Narciso García, and Margaret H. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR) ,” Journal Article, July 2021
Margaret H. Pinson, “Confidence Intervals for Subjective Tests and Objective Metrics That Assess Image, Video, Speech, or Audiovisual Quality,” Technical Report NTIA TR-21-550, October 2020
Lucjan Janowski, Ludovic Malfait, and Margaret H. Pinson, “Evaluating Experiment Design with Unrepeated Scenes for Video Quality Subjective Assessment,” Journal Article, June 2019
Lucjan Janowski and Margaret H. Pinson, “The Accuracy of Subjects in a Quality Experiment: A Theoretical Subject Model,” Journal Article, December 2015
Lucjan Janowski and Margaret H. Pinson, “Subject Bias: Introducing a Theoretical User Model,” Conference Paper, September 2014
Margaret H. Pinson, Marc Sullivan, and Andrew A. Catellier, “A new method for immersive audiovisual subjective testing,” Conference Paper, January 2014
Margaret H. Pinson, Marcus Barkowsky, and Patrick Le Callet, “Selecting Scenes for 2D and 3D Subjective Video Quality Tests,” Journal Article, August 2013
Margaret H. Pinson et al., “Subjective and Objective Evaluation of an Audiovisual Subjective Dataset for Research and Development,” Conference Paper, July 2013
Joel Dumke, “Visual acuity and task-based video quality in public safety applications,” Conference Paper, February 2013
Margaret H. Pinson, Karen Sue Boyd, Jessica Hooker, and Kristina Muntean, “How To Choose Video Sequences For Video Quality Assessment,” Conference Paper, January 2013
Margaret H. Pinson et al., “The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study,” Journal Article, October 2012
Margaret H. Pinson and Stephen Wolf, “Techniques for Evaluating Objective Video Quality Models Using Overlapping Subjective Data Sets,” Technical Report NTIA TR-09-457, November 2008
Margaret H. Pinson and Stephen Wolf, “Comparing Subjective Video Quality Testing Methodologies,” Conference Paper, July 2003
Margaret H. Pinson and Stephen Wolf, “An Objective Method for Combining Multiple Subjective Data Sets,” Conference Paper, July 2003
Stephen D. Voran, “An Iterated Nested Least-Squares Algorithm for Fitting Multiple Data Sets,” Technical Memorandum NTIA TM-03-397, October 2002
Edwin L. Crow, “Methods for Analysis of Inter-laboratory Video Performance Standard Subjective Test Data,” Technical Contribution, March 1994

Analysis of Cameras, Monitors, and Codecs

The following publications showcase ITS research into an improved understanding of specific aspects of video quality subjective methods, including the impact of use case, device, and cameras capture. Most of these publications describe experimental methods used to create video quality datasets.

Michele A. Saad et al., “Image Quality of Experience: A Subjective Test Targeting the Consumer’s Experience,” Conference Paper, February 2016
Andrew A. Catellier and Margaret H. Pinson, “Characterization of the HEVC Coding Efficiency Advance Using 20 Scenes, ITU-T Rec. P.913 Compliant Subjective Methods, VQM, and PSNR ,” Conference Paper, December 2015
Michele A. Saad et al., “Impact of Camera Pixel Count and Monitor Resolution Perceptual Image Quality,” Conference Paper, August 2015

Andrew A. Catellier, Margaret H. Pinson, William J. Ingram, and Arthur A. Webster, “Impact of Mobile Devices and Usage Location on Perceived Multimedia Quality,” Conference Paper, July 2012
Gregory W. Cermak, Margaret H. Pinson, and Stephen Wolf, “The Relationship Among Video Quality, Screen Resolution, and Bit Rate,” Journal Article, June 2011
Margaret H. Pinson, Stephen Wolf, and Gregory W. Cermak, “HDTV Subjective Quality of H.264 vs. MPEG-2, With and Without Packet Loss,” Journal Article, March 2010
Marcus Barkowsky, Margaret H. Pinson, Romuald Pépion, and Patrick Le Callet, “Analysis of Freely Available Subjective Dataset for HDTV including Coding and Transmission Distortions,” Conference Paper, January 2010
Margaret H. Pinson and Stephen Wolf, “The impact of monitor resolution and type on subjective video quality testing,” Technical Memorandum NTIA TM-04-412, March 2004
Stephen Wolf, “Color correction matrix for digital still and video imaging systems,” Technical Memorandum NTIA TM-04-406, December 2003
Wael Ashmawi, Roch Guerin, Stephen Wolf, and Margaret H. Pinson, “On the Impact of Policing and Rate Guarantees in Diff-Serv Networks: A Video Streaming Application Perspective,” Conference Paper, August 2001

Reduced Reference (RR) Video Quality Metrics

A Quick History of ITS Research on RR Metrics

By Margaret Pinson, July 2025

The video quality research project began as a line item in the U.S. Department of Commerce 1989 budget. Television broadcasters were assessing analog video links by pointing their camera at static prints, like the Porta-Pattern resolution chart shown below. These static images did not work for the nascent digital video technology. The research proposal was inspired by prior ITS research into user-oriented performance evaluation of data communication and audio quality.

The initial research goal was to develop replacement metrics that could be used real-time, for in-service systems. Therefore, research focused on reduced reference (RR) metrics. Most people used these RR metrics as full reference (FR) metrics, due to the difficulty of simultaneously accessing both the original and impaired video streams at the same time.

ITS research on RR metrics is best known for the NTIA General model (2002), which is referred to as Video Quality Model (VQM) in literature. In 2011, ITS released an updated video quality model with variable frame delay (VQM_VFD). The VQM software repository provides open source software for these and several other RR and FR metrics.

Nearly half of ITS research on RR metrics was devoted to detecting and removing impairments that people cannot see and, as such, do not impact subject ratings. Examples include dynamic changes to temporal alignment, spatial shifts up to 10%, spatial scaling up to 10%, additive changes to luma gain, multiplicative changes to luma offset, and invalid pixels at the edge of the video that are hidden in a television's overscan region. These calibration impairments, if not removed, will cause RR and FR metrics to produce random values.

Early video codecs (1988 to around 2010) were particularly prone to calibration problems. These issues impacted all video codes, from software-only to hardware video coders with analog elements. We cannot comment on the likelihood of calibration problems in modern video codecs, since ITS research on RR metrics was discontinued in 2011.

For more insights into calibration, see this report. However, be aware that these early codecs exhibited very different temporal patterns than modern video codec. Today, video delay is typically constant, with large but infrequent changes due to rebuffering events. When the algorithms in this report were developed, the video system delay and the frequency of frame updates both changed dynamically in response to changes in the video's motion and spatial complexity. These delay patterns can be observed by downloading the T1A1 dataset or the Video Quality Experts Group (VQEG) Full Reference Phase II dataset from the Consumer Digital Video Library (CDVL).

Early ITS research on video quality also included sponsoring VQEG, sponsoring CDVL, developing the spatial information (SI) and temporal information (TI) metrics in ITU-T Rec. P.910 (see this ITU Contribution), involvement in VQEG metric validation tests (both as a metric proponent and as an independent lab), and supporting international standards development.

Circa 1996, Motivation for Digital Video Quality Metrics

This white paper from the late '90s explains the need for new video quality metrics, as video delivery changed from analog to digital.

1996, Insights into Video Quality Metrics from Early ITS Research

This set of white papers describes the ITS video quality metric research from 1989 to 1996. The algorithms described in these white papers were submitted to ATIS for the T1A1 validation test, 1993 to 1994. These ideas matured into the VQM software released between 2000 to 2005. For more information on the T1A1 validation effort, including videos and subject ratings, see the video quality project data.

Circa 2010, Detecting Large Edges with the Spatial Information (SI) Filter

This white paper describes a spatial information (SI) filter developed by NTIA/ITS. MATLAB® code is provided. The SI filter detects long edges and estimates edge angle.

Circa 2010, Description of Video Quality Metric (VQM) Software

This white paper describes the RR video quality metrics released by ITS between 2000 to 2010. The white paper gives an overview of the VQM Software tools in the VQM software GitHub repository.

Circa 2008, Proof-of-concept RR Metric Software Overview

The vision behind RR video quality metrics was to enable real-time video quality assessment by deploying two probes: one extract features from the source video and another to extract features from the impaired video. These low bandwidth features would be collected at a single location for analysis.

This line of research culminated in the Command Line Video Quality Metric (CVQM) software. This white paper provides a written guide that describes how to use the CVQM software, and the VQM software repository provides code. CVQM demonstrates how to logically split the ITS metrics into an RR implementation. The other code in the VQM repository was developed for research purposes, so both video streams must be available to a single computer. This white paper is the only manual for the CVM software.

The CVQM software code provides the best code for people who want to understand, use, or port the ITS RR metrics. The code was cleaned up significantly, when compared to the batch processing software (in sub-folder bvqm) and the library of code we used internally for our research (in sub-folder its_video).

2012, Video Quality Metrics Tutorial

By Margaret Pinson, 2012

This 2012 Video Quality Metrics Tutorial uses Microsoft PowerPoint® 2010 slides with audio. Start the slideshow to hear the audio. Subtitles are provided in the notes section for each page. Microsoft provides support on ways to view a PowerPoint presentation for Windows® on their website.

This tutorial provides an overview of the RR video quality metrics and other algorithms available in the VQM software, including guidance on which calibration options and models suit different purposes. Other topics covered are the different types of models, mapping multiple subjective datasets onto a single scale, objective video quality model validation, and the Consumer Digital Video Library.

RR Metric Publications

Most of the video quality projects' publications from 1989 to 2011 focus on RR metric development. The list identifies key publications that describe final algorithms and completed lines of research.

Margaret H. Pinson and Stephen Wolf, “A New Standardized Method for Objectively Measuring Video Quality,” Journal Article, September 2004
Stephen Wolf and Margaret H. Pinson, “Video Quality Measurement Techniques,” Technical Report NTIA TR-02-392, June 2002
Margaret H. Pinson and Stephen Wolf, “Low Bandwidth Reduced Reference Video Quality Monitoring System,” Conference Paper, January 2005
Margaret H. Pinson and Stephen Wolf, “Video scaling estimation technique,” Technical Memorandum NTIA TM-05-417, January 2005
Margaret H. Pinson and Stephen Wolf, “Reduced Reference Video Calibration Algorithms,” Technical Report NTIA TR-08-433b, November 2007
Stephen Wolf, “A No Reference (NR) and Reduced Reference (RR) Metric for Detecting Dropped Video Frames,” Technical Memorandum NTIA TM-09-456, October 2008
Stephen Wolf, “A No Reference (NR) and Reduced Reference (RR) Metric for Detecting Dropped Video Frames,” Conference Paper, January 2009
Stephen Wolf and Margaret H. Pinson, “Reference Algorithm for Computing Peak Signal to Noise Ratio (PSNR) of a Video Sequence with a Constant Delay,” Technical Contribution, February 2009
Stephen Wolf, “A Full Reference (FR) Method Using Causality Processing for Estimating Variable Video Delays,” Technical Memorandum NTIA TM-10-463, October 2009

Publications Search