Perceptual Image Distortion

Perceptual Image Distortion, First IEEE International Conference on Image Processing, vol 2, pp 982-986, November 1994


Background

Many imaging and image processing methods are evaluated by how well the images they output resemble some given image. Examples include: image data compression, dithering algorithms, flat-panel display and printer design. In all of these cases, the human visual system is the judge of image fidelity. Most of these methods use the mean squared error (MSE) or root mean squared error (RMSE) between the two images as a measure of visual distortion. These measures are popular largely because of their analytical tractability. It has long been accepted that MSE (or RMSE) is inaccurate in predicting perceived distortion. This is illustrated in the following paradoxical example.

The top two images on the right were created by adding different types of distortions to the original image; the original image is shown below them. The root mean squared error (RMSE) between each of the distorted images and the original were computed. The root mean squared error is the square root of the average squared difference between every pixel in the distorted image and its counterpart in the original image.

The RMSE between the first distorted image and the original is 8.5 while the RMSE between the second distorted image and the original is 9.0. Although the RMSE of the first image is less than that of the second, the distortion introduced in the first image is more visible than the distortion added to the second. Thus, the root mean squared error is a poor indicator of perceptual image fidelity.

RMSE = 8.5

RMSE = 9.0

Original

Model of Perceptual Image Distortion

We have developed a perceptual distortion measure based on a model of spatial pattern detection. It is important to recognize the relevance of these empirical spatial pattern detection results to developing measures of image integrity. In a typical spatial pattern detection experiment, the contrast of a visual stimulus (called the target) is adjusted until it is just barely detectable. Threshold contrasts of the target are measured over a range of spatial frequencies, mean luminances, and spatial extents. In some experiments (called contrast masking experiments), the target is also superimposed on a background pattern (called the masker). In other experiments (called luminance masking experiments), the target is superimposed on a brief, bright, uniform background. In either case (contrast or luminance masking), the contrast of the target is adjusted (while the masker is held fixed) until the target is just barely detectable. Typically, a target is harder to detect (i.e., a higher contrast is required) in the presence of a masker. A model that predicts spatial pattern detection is obviously useful in image processing applications. In the context of image compression, for example, the target takes the place of quantization error and the masker takes the place of the original image.

Model of Perceptual Image Distortion

Our model consists of three main parts: a retinal component, a cortical component, and a detection mechanism. The retinal component is responsible for contrast sensitivity and its dependence on mean luminance masking. The cortical component accounts for contrast masking. To compute perceptual image distortion, the reference and distorted images are passed through these two stages of the model independently. At this point, the images have been normalized for the differential sensitivities of the human visual system. The final (detection mechanism) component of the model compares these two normalized images to give a measure of image fidelity. The final result is an image representing the probability of perceiving a distortion at each position in the distorted image.

Model Predictions of Visible Distortion

The leftmost image below is the original image. The two images to the right of it were created by adding different types of distortions to the original image. The root mean squared error (RMSE) between the first distorted image and the original is smaller than the root mean squared error between the second distorted image and the original. In spite of that, the distortion is more visible in the left image.

The images directly below each distorted image are the predictions by the perceptual image distortion model. Lighter areas indicate regions where the distortion is more visible while darker areas indicate regions where the distortion is less visible. The model correctly predicts that the first distorted image is more visibly distorted than the second.

Original

Model Predictions of JPEG Compressed Images

To further validate the model's performance, we applied the model to JPEG compressed images. The original image was compressed using the JPEG algorithm at different quality settings. The model was then used to predict the visibility of the distortion between each compressed image and the original.

The vertically-stacked pairs of images below are images compressed using the JPEG algorithm along with the model's predictions of the amount of visible distortion when compared with the original.

The image compressed at a quality setting of 80 is virtually indistinguishable from the original. The model's prediction corroborates this observation. The average distortion value computed by the model is 1.2, which indicates that the distortion is slightly above threshold (threshold is set at 1.0). The image compressed at a quality setting of 20 is slightly deteriorated while the image compressed at a quality setting of 10 shows marked blocking artifacts. The model's predictions agree with these trends fairly well.

JPEG qual. setting = 80, RMSE = 9.5, PDM = 1.2

JPEG qual. setting = 20, RMSE=11.4, PDM = 5.7

JPEG qual. setting = 10, RMSE = 12.9, PDM = 9.8

Error Histograms of JPEG Compressed Images

The top graph on the right plots a histogram of the squared error differences for individual pixels. The bottom graph plots a histogram of the perceptual distortion predictions of the model for individual pixels. Both histograms have been normalized so that the vertical axis represents fractions of the total number of pixels. In the legends of both graphs, "left image" corresponds to the image compressed with a JPEG quality setting of 80; "middle image" and "right image" refer to the images compressed with JPEG quality settings of 20 and 10 respectively.

The histograms of squared error differences for the different compressed images are very similar to one another. The histograms of perceptual distortion predictions of the different compressed images are dramatically different from one another. It is clear, for example, that the model predicts that the images compressed at quality settings of 20 and 10 (the "middle" and "right" images) are more severely distorted than the image compressed at a quality setting of 80 (the "left" image).

RMSE = 8.5

RMSE = 9.0