Subject screening algorithms detect and discard subjects. MATLAB® code implementing several algorithms is available below.
To Screen Subjects...
A subject's scores can be erroneous for a multitude of reasons, including failures on the part of the subject, the experimenter, the rating method, the video playback system, or the rating recording system. Many experimenters err on the side of discarding valid subjects. The goal is eliminate all subjects whose data might be invalid. The motivation is to only retain subjects who are able to detect just noticeable differences and can rate video sequences consistently.
Currently, all automated subject screening algorithms apply this philosophy. They rely upon thresholds, which are essentially educated guesses. These thresholds may need to be adjusted for new types of experiments.
...Or Not To Screen Subjects
Discarding subjects makes it appear that you are doctoring your data to fit your hypothesis. Many psychologists believe it is inappropriate to discard any subject, unless the subject clearly misunderstood the rating task or the task was too difficult. Examples of subjects who misunderstand are someone who rated whether or not they liked the video content when asked to rate the video quality, or someone who applied the scale in reverse (e.g., marks "excellent" when meaning "bad" and vice versa). An example of a subject who finds the task too difficult is someone who rates all sequences identically (e.g., "excellent" quality).
The philosophy is that if we cannot explain why that subject scored differently, then we must assume that the differences are genuine and need to be included in the data analysis. To apply this philosophy, either keep all subjects or examine the data manually for obvious problems. Each discarded subject must be justified in the experiment report.
And a Compromise
Alternatively, analyze the data twice: once with subject screening and once without. The experiment report may emphasize the analysis performed on the screened data, yet also mentions any opposing conclusions reached by the non-screened data analysis. This compromise approach is advisable when subject screening eliminates a large fraction of subjects. A large fraction of discarded subjects will cause some researchers to doubt the validity of an experiment.
The experiment report should mention exactly why each subject was discarded.
MATLAB Code for Popular Subject Screening Algorithms
Following is a list of popular subject screening techniques. MATLAB code for the automated techniques is available here. This code may be used for any purpose, commercial or non-commercial. Please contact Margaret Pinson if you find any bugs or errors in this code.
ITU-R Rec. BT.500 Annex 2 Clause 2.3.1
ITU-R Rec. BT.500 Annex 2 Clause 2.3.1 recommends a technique for screening Double
Stimulus Impairment Scale (DSIS) and Double Stimulus Continuous Quality Scale (DSCQS) data. However, this technique has been applied to tests conducted with other methods. This technique discards subjects whose ratings disagree frequently with other subjects. This technique can only be used with scores that have a normal distribution. Note that because the BT.500 technique analyzes agreement, it might discard a subjects who scores consistently higher or lower than other subjects, despite agreeing on the ranking of sequences by quality.
ITU-R Rec. BT.500 Annex 2 Clause 2.3.2 recommends a technique for Single Stimulus Continuous Quality Evaluation (SSCQE), however this algorithm is not implemented in the above code.
ITU-R BT.1788 (SAMVIQ) Annex 2 Clauses 3.2, 3.3 and 3.4 in series
ITU-R BT.1788, also known as SAMVIQ, demands that subjects have a stable and coherent method to vote degradations of quality. The rejection criteria uses both Pearson correlation and Spearman rank correlation. This technique rejects subject who do not associate with other subjects (i.e., rank impairments differently).
VQEG HDTV Test Plan, Annex I
The Video Quality Experts Group (VQEG) HDTV Phase I Test Plan includes a method for screening subjects in Annex I (page 37). The rejection criteria tests consistency of scores using Pearson correlation on a per-clip basis. This technique rejects subject who do not associate with other subjects (e.g., rank impairments differently). The thresholds were chosen to be appropriate for Absolute Category Rating (ACR) tests.
Note that if the threshold were to be adjusted to be very low (e.g., 0.30), then the VQEG HDTV test plan rejection criteria would probably only eliminate subjects who did not understand the task.
VQEG MM Test Plan, Annex VI
The Video Quality Experts Group (VQEG) Multimedia Phase I Test Plan includes a method for screening subjects in Annex VI (page 57). The rejection criteria tests consistency of scores using Pearson correlation on both a per-clip basis and averaging scores across all scenes associated with one impairment (i.e., per-HRC or Hypothetical Reference Circuit). This technique rejects subject who do not associate with other subjects (e.g., rank impairments differently). The thresholds were chosen to be appropriate for ACR tests.