text readability measures

Discriminability Measures for Predicting Readability

Lauren F. V. Scharff, Albert J. Ahumada, Jr.*, Alyson L. Hill
Stephen F. Austin State University, Box 13046, Nacogdoches, TX 75962
*NASA Ames Research Center, Moffett Field, CA 94035-1000

Presented at the 1999 IS&T; / SPIE Electronic Imaging Symposium, January 24-29, San Jose, CA.

Published in B. E. Rogowitz and T. N. Pappas, eds., Human Vision and Electronic Imaging I , SPIE Proc. Vol. 3644, paper 27,1999.

A subsequent paper which examines predictability of readability usingspatial-frequency-filtered textures has since been published in Optics Express.

ABSTRACT

Several discriminability measures were correlated with reading speed over a range of screen backgrounds. Reading speed was measured using a search task in which observers tried to find one of three words in a short paragraph of black text. There were four background patterns (one plain, three periodic) combined with three colors (gray, blue, yellow) at two intensities. The text contrast had asmall positive correlation with speed (r = .43). Background RMS contrast showed a stronger, negative correlation (r = -.78). Text energy in the spatial frequency bands corresponding to lines and letters also showed strong relationships (r = -.80 and r = -.80, respectively). A general procedure for constructing a masking index from an image discrimination model is described and used to generate two example indices: a global masking index, based on a single filter model combining text contrast and background RMS contrast, and a spatial-frequency-selective masking index. These indices did not lead to better correlations than those of the RMS measures alone (r = .79 and .78, respectively), but they should lead to better correlations when there are larger variations in text contrast and masking patterns.

1. INTRODUCTION

The increased use of computer-presented text displays has increased the interest in predicting their readability. Readers may leave displays which are not easy to read, and poor readability can slow processing. While most displays have text presented on plain backgrounds, an increasing number use textured backgrounds,especially on the Internet. In addition to background texture, several other factors have been shown to influence the readability oftext displays, e.g., contrast, polarity, foreground and background colors, font style, line spacing, and text and margin widths.¹ Although such information is useful, it does not necessarily allow a designer to predict how a new combination of such variables will influence the readability of his display. Also, text display designers may not be able to assess readability subjectively when making their design choices; correlations between subjective preference ratings and readability measures are consistently near zero.^2-4

Although they did not provide metrics to predict the readability of text as a function of display parameters, researchers have investigated letter discrimination and identification as a function of such variables as spatial frequency and noise.^5,6 In particular, Solomon and Pelli 6 measured the effect that selectively low- or high-passed additive noise had on letter identification. They showed that letter identification seemed to depend mainly on the text contrast passed by a mid-frequency, band-passed filter (between 1.5 and 6 cycles per letter). These results suggest that only those background textures which contain the critical band-pass frequencieswill lead to strong masking. However, there is a difference betweenthe Solomon and Pelli stimuli and webpage text, which is combined with the text using a multiplicative, rather than additive,combination rule.

This paper uses two approaches to predicting the readability of black text placed on various backgrounds. In the first, several image measures were correlated with reading speed. The second approach correlates reading speed with indices developed from image discrimination models which have been used to predict target detectability in complex backgrounds. A general method for generating indices from models is presented and two indices are derived and tested. The first index is derived from an image discrimination model with global masking. It turns out to be a simple combination of the text contrast and the RMS contrast of the background. The second of these, uses the Cortex Transform^{7, 8} to create an index with spatial-frequency-selective masking.

2. AN EXPERIMENT MEASURING READABILITY

The experiment used a 4 (texture) x 3 (color) x 2 (saturation/lightness) design. Full details of the methods and results are reported in Hill and Scharff¹. The twenty-four conditions were each repeated six times, leading to a total of 144 actual trials.

2.1 Apparatus and Stimuli

Macintosh Power PC 7200/120 computers were used to create and run the experiment. The text portions of the stimuli were created in B/C Power Laboratory (an experiment application), which was also used to present the stimuli and collect the data. The color and saturation of the textured backgrounds were set in Adobe Photoshop. Viewing distance was controlled by a chin rest.

There were three background colors, and within each color, two saturation/lightness settings. The RGB values were as follows: Gray (204, 204, 204), Lt. gray (250, 250, 250), Blue (20, 20, 204), Lt.blue (66, 66, 250), Yellow (204, 204, 20), Lt. yellow (250, 250, 66). Average luminance values (cd/m2) for each color condition were: Gray (65.6), Lt. gray (84.8), Blue (12.3), Lt. blue (25.1), Yellow (60.1), Lt. yellow (80.1), Black (3.0).

The textured backgrounds were taken from a popular web page dedicated to supplying free graphical backgrounds to designers.⁹ We selected three backgrounds with a range of texture sizes that designers might choose to use with black text (i.e. black text was legible). See Figure 1(a-d) for plain and textured background examples. The textures have a period of 72 pixels horizontally and vertically. The final, textured background size was 15.5 cm x 12.7 cm (18.36 x 16.83 deg at a viewing distance of 476 mm). Each textured background was centered at the top of the screen. Heavy black lines on the left and right separated each textured background from the surrounding white background.

(a) Plain (b) Fine (c) Medium (d) Coarse

Figure 1 (a-d) Textured backgrounds used in current experiment.

Black text was placed on top of the textured backgrounds. Other variables were set to maximize readability: ^{2, 3} 12 point (6 pixels per letter) Times New Roman font, and the text blocks (10.2 cm x 12.7cm ) were centered at the top of the screen, leaving a 2.5 cm margin on either side.

The text excerpts were from a newspaper. A text block to be read contained 99-101 words. A target word ("triangle", "circle", or "square") was placed randomly within each text block. At the bottom of each screen there were three black geometric shapes (circle, square, and triangle) that corresponded to each of the three possible target words. These 1 cm x 1 cm shapes were spaced 3.5 cm apart and centered below the textured area.

2.2 Procedure

Fifty-two participants completed the experiment. The data from thirty-four low-error-rate participants were included in the analyses. All participants except two, the experimenters, were naive to the hypothesis. All participants had self-reported 20/20 or corrected to 20/20 vision.

Participants were instructed to scan the text and find a target shape word ("triangle", "square", or "circle"). Once they found the target word, they clicked (using the mouse pointer) on the corresponding shape at the bottom of the screen. The start of each trial was self-paced, and each trial ended when the participant clicked the target-word shape. Participants were instructed to respond as quickly and accurately as possible.

2.3 Results

The data were sorted by each condition for each participant and the median for each was calculated. The data from participants with an overall accuracy rate of at least 95% were used in the analyses, and of those, only reaction times from correct responses were used.

Results of a 3-way, with-in groups ANOVA showed several significant effects and interactions; see Figure 2 for means of all conditions. A Tukey HSD analysis of the main effect of texture (F(3,99) = 3.80, p < .05) indicated that the plain background was read significantly faster than the medium-textured background, although all textures were responded to more slowly than the plain backgrounds.

Color also significantly affected reaction times (F (2,66) =20.74, p < .05), in that the yellow and gray backgrounds were read significantly faster than the blue background. There was no significant main effect for lightness/saturation.

Figure 2. Three-way interaction, from Hill and Scharff¹

These main effects were modified by interactions. The color x saturation interaction (F(2,66) = 3.50, p < .05) showed that light blue backgrounds were read significantly more slowly than all others except dark blue, and dark blue backgrounds were read significantly more slowly than the light gray and light yellow ones.

The significant three-way interaction (F(6,198) = 3.86, p<.05) indicated that the major effects were due to a significant slowing of search times when using the light-blue, medium-textured background and to relatively fast search times when using the dark-gray, plain-textured background.

As seen in Figure 2, texture did not change search times with yellow (dark or light) backgrounds or light gray backgrounds. There are large effects when using the blue backgrounds. Finally, within the dark-gray background conditions, the plain background was much faster than any of the textured backgrounds. However, the plain-textured dark-gray was not significantly faster than the plain-textured light-gray.

3. PREDICTING READABILITY

As mentioned above, two approaches were used to predict readability of the different stimuli combinations: image measure regressions and indices generated from image discrimination models.

3.1 Image Measure Regressions

The specific image measures used included text contrast, background RMS contrast, and background RMS contrast in four spatial frequency bands that roughly segregate contrast energy corresponding to lines, short and long words, and letters. The text contrast was defined as

C_T = (L_B - L_T) / L_B,

where L_B is the average background luminance and L_T is the luminance of the text. The background RMS contrast was defined as

C_RMS = L_RMS / L_B ,

where

L_RMS = (( S(L_i - L_B)²) / n)^0.5,

and where the summation is over all pixels , L_iis the luminance of the ith pixel, and n is the number of pixels. Four spatial frequency bands were created using filters with a rectangular spatial frequency response and a uniform orientation response. The spatial frequency range corresponding to identification of letters (between 1.5 and 6 cycles per letter, cpl) was determined from Solomon and Pelli.⁶ The filters selected adjacent octaves, with the high-frequency cutoff for the highest spatial frequency band (letters) equal to the Nyquist limit (0.5 cycles per pixel = 12 cycles/deg = 3 cpl).

Figure 3. Scatter plots of the relationships between reading speed and text contrast (top) and background RMS contrast (bottom).

Average latencies were converted to reading speed estimates in words per sec by dividing the latency in seconds into 50, half the number of words per display. Figure 3 shows reading speed with respect to text contrast and background RMS contrast. The text contrast had a small positive correlation with speed (r = .43); background RMS contrast showed a stronger, negative correlation (r =-.78). Notice that dark blue backgrounds have the lowest text contrasts, but that the three light blue, textured backgrounds have large background contrast variations. This finding provides an explanation for the slow search times for the light blue backgrounds. Text energy in the spatial frequency bands corresponding to lines (0.1875-0.375 cpl) and letters (1.5-3 cpl) also showed strongrelationships (r = -.80 and r = -.80, respectively), although they were not significantly better than the spatial frequency bands corresponding to short (0.75-1.5 cpl) and long words (0.375-0.75 cpl) (r = -.72 and r = -.66, respectively).

3.2 Metrics based on Image Discriminability Models

Image discriminability models have been developed to predict the visibility of the difference between two similar images. They take two images as input, and output a prediction of the number of Just Noticeable Differences (JNDs) between them. The first computational model for two-dimensional images was developed by Watson¹⁰. A major application of these models has been as image quality metrics, an application in which the two images are often an original image and a reconstructed version following image compression, and the model predicts the visibility of the compression artifacts. Here we propose an adaptation of these models to predict readability of text on different backgrounds by regarding the text as an "artifact" whose visibility is masked by the background. We assume that in general, the easier it is to detect the text on the background, the easier it will be to read it.

To simplify our metrics we will base them on linearizable discrimination models.^{11, 12} In these models, one of these luminance images is considered to be the original or background image, B_L, and the other is the background-with-text image, T_L. Throughout the rest of the paper, bold characters will represent lists of values, and operations upon these lists will indicate the corresponding element-wise operations.

The first step is to convert the two luminance images to contrast images using the mean luminance of the background image L_B.

T_C = (T_L - L_B)/L_B,

B_C = (B_L - L_B)/L_B.

Next, each contrast image is converted into a list of visual features using linear transformations,

T_V = V(T_C),

B_V = V(B_C).

For example in Watson¹⁰ each element of the feature list is the cross-correlation of the contrast image with a Gabor weighting function having a particular position, spatial frequency, orientation, and phase. Then visual features masked by the background are computed using a masking function which has two inputs, the first being the masked features and the second the masking features,

T_M = M(T_V,B_V),

B_M = M(B_V,B_V).

Finally, the differences between the two masked-visual-featurelists are computed and aggregated using a Minkowski distance metric

d = D_M (T_M -B_M),

where

D_M (X) = (S|x_i|^b )^1/b.

The sum is over the elements xi of the list X, and b is referred to as the Minkowski summation exponent.

In linearizable models, the masking function M is linear in the list of features being masked (T_M -B_M = M(T_V -B_V, B_V)), the visual features are linear (T_V - B_V =V(T_C - B_C)), and the contrast calculation is linear (T_C -B_C = (T_L - B_L)/L_B), so

d = D_M (M(V((T_L -B_L)/ L_B, B_V)).

This equation says that for these models, one can define a target luminance image as the difference between the two images and that the visibility can be computed as the Minkowski length of the visual representation of the target masked by the background.

In our application B_L will be the background image and T_L will be the background with the text. If we let T be the text indicator image having the value 0 where there is no text and 1 where the there are text pixels, we see that

T_L = T _LT + (1 - T)B_L,

where L_T is the luminance of the black text.

T_L - B_L= (1-T )B_L + T L_T - B_L= -T (B_L - L_T).

This equation says that the difference image is zero outside the text, and inside the text it is the difference between the text level and the background. The latter image can be regarded as having two parts

B_L - L_T = (B_L -L_B) + (L_B - L_T),

the difference of the background and its mean (change in texture) and the difference of the mean background and the text level (change in luminance). The first component can contribute to detectability (i.e. letters defined by texture can be detected¹³), but for computational simplicity we assume it does not significantly contribute to readability. Therefore, we remove it from the signal component, leaving

d = C_T D_M (M(V(-T,B_V))),

where C_T = (L_B - L_T)/L_B is the contrast of the text based on the background mean luminance as before. This equation says that we can compute a detectability index for the text in a background by computing the detectability of full contrast text in that background and scaling it by the text contrast.

The detectability index depends strongly on the size of the text sample used. We form a readability index that has the dimensions of contrast by dividing this index by the Minkowski length of the unmasked, full contrast text. This equivalent masked contrast is the contrast for the unmasked text (masking V=0) that would give the same detectability. It is given by

d = C_T D_M (M(V(-T,B_V))) / D_M (M(V(-T,0))),

A final simplification we shall make in our readability indices based on image discrimination models is to assume a flat contrast sensitivity function. We assume that the reader is sitting close enough that the frequencies relevant to reading the text are in the optimal visual range (about 6 cpd) or lower.

3.3 A Global Masking Index

A single filter, image discrimination model with global RMS contrast masking generates an index combining text contrast and background RMS contrast. This model has been used to predict the detectability of targets in natural and noisy backgrounds.^{11,12, 14} The masking function in this model is the same for all the features, so it essentially assumes that the masking contrast energy is uniform over the target region and similar to the target in spatial frequency. The visual feature list for this model is the contrast image filtered by a contrast sensitivity filter. Since we are dropping that filter here, the visual feature list is the contrast image,

T_V = T_C,

B_V = B_C.

The text masking function for this model is

M(-T, B_C) = -sT / (1+(C_RMS / C₂) ²) ^0.5,

where C_RMS is the background RMS contrast as above, C₂ is the contrast masking threshold, whose value (0.05) was determined in the work referenced above, and s is a contrast sensitivity parameter.

For our binary text case, the discriminability index turns out to be

d = s n_T ^1/bC_T / (1+(C_RMS / C₂) ²) ^0.5.

where n_T^1/b is the Minkowski sum of the full-contrast text image. Our readability index eliminates the size of the text target and the contrast sensitivity, giving the effective luminance contrast C_M of the masked text as

C_M = C_T/ (1+ (C_RMS /C₂) ²) ^0.5 ,

The correlation for this index (r = .79) was not significantly stronger than that of C_RMS alone (r = -.78), but the index should lead to better predictions when there are larger variations in text contrast.

3.4 A Frequency-Selective Masking Index

To predict the effect of background masking when the spatial frequency content of the background varies, a spatial-frequency-selective masking model can be used to compute the readability index C_M. To test this concept on our data we used the Cortex Transform model used by Rohaly, Ahumada, andWatson.¹² In this model the visual feature list V is formed by using the Cortex Transform. In our case this list had 20 images of simulated cortical units (5 octave spatial frequency bands and 4 orientation bands) and a high frequency and a low frequency residue. The transform was subsampled for spatial frequency but notf or orientation, giving a visual feature list 6.3 x 128 x 128 elements long (our 72 x 72 text and background images were padded with zeros to obtain an image size that was a power of two). The masking for this model is limited to within-feature masking. The masking function is given by

M(V(-T), B_V) =-sV(T) / (1+ (|V| / C₂) ²)^0.5,

where | | indicates the list of the absolute values of the individual elements. In this case, to be consistent with the earlier work, we set C₂ = 0. 07. The equivalent masked threshold contrast was then computed using a Minkowski summation exponent of b= 4.

The correlation for this index (r = .78) was essentially the same as for the global masking model index, but it should lead to better predictions when the background contrast variations vary in their spatial position and spatial frequency coincidence with the those of the text.

4. DISCUSSION

Each of the above approaches to predicting readability led to similar correlations with reading speed. Thus, if someone wanted to determine the cost-benefit of texture and contrast choices, we would currently recommend using the more simple, global masking index. The image measure regressions indicate that text contrast alone is a poor predictor of reading speed, and that RMS contrast energy in the background better predicts reading speed. However, the image-measures-regression approach does not allow trade-off calculations between the two measures. The results are similar to those of Rohaly, et al.,^{11, 12} in that the more complex index did not significantly increase the predictability.

When we determined the contrast energy in spatial frequency regions roughly corresponding to letters (based on Solomon and Pelli ⁶ ), small and large words, and lines, we found that the different textures did not predominantly contain energy in any of these bands. So, although the correlations with reading speed were slightly higher for the spatial frequencies corresponding to letters and lines, no firm conclusions can be made at this time about the relative contributions of the different bands.

Each of the above approaches were performed post hoc on data collected using stimuli that were not optimally created to differentiate between the approaches. It is likely that if there had been larger variations both in background contrast and text contrast, then the global masking index would have led to higher correlations with reading speed than the regressions using single image measures. Further, if the background variations had included larger variations within different spatial frequency bands, then the frequency-selective masking index should show the best correlations with reading speed.

The equations at the beginning of the section on model-indices provide a method of using models other than the two considered above. Models can be chosen allowing for the trade-off between accuracy and model complexity. For example, if backgrounds are being considered that vary in their spatial coverage, but not in their spatial frequency content, a single filter model with local spatial masking might be appropriate.¹⁵ If variations in spatial frequency content are an issue, but computational complexity is an important consideration, Watson's image discrimination model based on the Discrete Cosine Transform (DCT) could be the best choice.^16,17 If accuracy is of major importance and the computational advantages of linearization are not important, a full, nonlinear masking model might be the best choice.¹⁸ An additional complexity many discrimination models include is discriminability from color variations.^{19, 20} However, we think that the reading task is less dependent upon color variations than the typical discrimination task because rapid scanning induces higher temporal frequencies.

ACKNOWLEDGEMENTS

This work was supported in part by NASA RTOP 548-50-12.

REFERENCES

1. A. L. Hill and L. F. V. Scharff, "Readability of computer displays as a function of colour, saturation, and background texture," Proceedings for the Second International Conference for Engineering Psychology and Cognitive Ergonomics, (in press).

2. A. Hill and L. V. Scharff,"Readability of screen displays with various foreground/background color combinations, font styles, and font types," Proceedings of the Eleventh National Conference on Undergraduate Research, Vol. II, pp.742-746, 1997.

3. M. Youngman and L. V. Scharff,"Text width and border space influences on readability of GUIs,"Proceedings of the Twelfth National Conference on Undergraduate Research, Vol. II, pp. 786-789, 1998.

4. B. Parker and L. V. Scharff,"Influences of contrast sensitivity on text readability in the context of a GUI," http://hubel.sfasu.edu/research/agecontrast.html,1997.

5. D. H. Parish and G. Sperling, Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Research 31, pp. 1399-1415, 1991.

6. J. A. Solomon and D. G. Pelli, "The visual filter mediating letter identification," Nature 369, pp. 395-397, 1994.

7. A. B. Watson, "The Cortex Transform: rapid computation of simulated neural images," Computer vision, graphics, and image processing 39, pp. 311-327, 1987.

8. A. B. Watson, "Efficiency of an image code based on human vision," J. Opt. Soc. Amer. A 4, pp. 2401-2417, 1987.

9. G. Schorno, "Texture tiles,"http://mars.ark.com/~gschorno/tiles/index.html, 1996.

10. A. B. Watson, "Detection and recognition of simple spatialforms," in O. J. Braddick and A. C. Sleigh, eds., Physical and Biological Processing of Images, pp. 100-114, Berlin:Springer-Verlag,1983.

11. A. J. Ahumada, Jr., A. M. Rohaly, and A. B. Watson, "Models of human image discrimination predict object detection in naturalbackgrounds," in B. Rogowitz and J. Allebach, eds., Human Vision,Visual Processing, and Digital Display IV, SPIE Proc. 2411, pp.355-362, 1995.

12. A.M. Rohaly, A. J. Ahumada, Jr., and A. B. Watson, "Object detection in natural backgrounds predicted by discrimination performance and models," Vision Research 37, pp. 3225-3235, 1997.

13. D. Regan and X. H. Hong, "Recognition and detection of texture-defined letters," Vision Research 34, pp. 2403-2407, 1994.

14. B.L. Beard and A. J. Ahumada, Jr., "Image discrimination models predict detection in fixed but not random noise," J. Opt. Soc. Amer. A 14, pp. 2471-2476, 1997.

15. A.J. Ahumada, Jr. and B. L. Beard, "A simple vision model for inhomogeneous image quality assessment," J. Morreale, ed., SID Digest 29 (Society for Information Display: Santa Ana, CA) Paper 40.1, 1998.

16. A. B. Watson, "DCTune: A technique for visual optimization of DCT quantization matrices for individual images," J. Morreale, ed., SID Digest 24 (Society for Information Display: Santa Ana, CA) pp. 946-949, 1993.

17. A. B. Watson, "DCT quantization matrices visually optimizedfor individual images," B. Rogowitz and J. Allebach, eds., Human Vision, Visual Processing, and Digital Display IV, SPIE Proc. 1913, (SPIE: Bellingham, WA) pp. 202-216, 1993.

18. A. B. Watson and J. A. Solomon, "A model of visual contrast gain control and pattern masking," J. Opt. Soc. Amer. A 14, pp. 2379-2391, 1997.

19. X. Zhang and B. A. Wandell, "A Spatial Extension of CIELAB for Digital Color Image Representation," . Morreale, ed., SID Digest 27 (Society for Information Display: Santa Ana, CA) pp. 731-734, 1996.

20. X. Zhang, J. Farrell, and B. A. Wandell, "Application of S-CIELAB: A spatial extension to CIELAB," V. R. Algazi, S. Ono, and A. G. Tescher, eds., Very High Resolution and Quality Imaging II, SPIE Proc. 3025, (SPIE: Bellingham, WA) Paper 17, 1997.