Effects of Rhythmic Sound Rates on a Visual Counting Task

Kristina Davis

Stephen F. Austin State University

Faculty Advisor, Lauren F. V. Scharff,Ph.D.

This paper was published in the PsiChi Journal of Undergraduate Research


Interactions between visual and auditory sense modalities were examined in this experiment. Eighty participants were tested on avisual counting task with an accompanied by an auditory noise, which was either slow (2 beats/sec), medium (3 beats/sec), fast (5beats/sec), or white noise (static). Attention to the auditory stimulus was manipulated to determine its effects on the expected cross-modal interactions. Displays of both asymmetrical and symmetrical configurations of 13 and 15 dots were arranged in both near (1.27 cm) and far (5.08cm) proximities. Cross modal effects were found for the medium sound rate so that response times were longer when participants attended to the auditory stimuli. Interactions between saccadic eye movements, auditory stimuli, and display configurations are thought to explain this effect.

Return to Scharff research page.


Sensory information converges in the brain to create a complete perception of the environment. Due to human reliance on vision, the visual system has been studied in detail. However, the precise processes used by the brain to integrate visual input with input from the other senses are not well understood. Studying simple visual and auditory tasks and the ways in which they interact can clarify the perceptual processes involved in cross-modal tasks.

Counting is a basic process that usually relies on visual input, although information from other senses can also be used for counting. Common counting activities (e.g. counting money) most often occur in an environment where multiple sensory information is present. The influences that extraneous sensory information exert on the visual counting process may help us to understand more complex cross modal processing.

Quantification of objects has been proposed to occur through different mental activities, which are often comprised of multiple cognitive steps. Lechelt (1971) showed that quantifying up to seven objects is almost automatic; he refers to this process as"subitizing". More specifically, subitizing is defined as the immediate recognition of quantity that requires no purposeful serial counting for less than seven objects. Lechelt distinguished subitizing from two other types of mental activities that are used indetermining number, estimating and counting. Estimating is multiple subitizing and adding to reach an approximate quantity. Counting is the serial process of determining precise quantities.

Research on the cognitive processes involved in the serial counting has suggested there are multiple steps in the counting process. According to Lassaline and Logan (1993), any act of quantification requires spatial indexing, mapping (to known numbers or quantities), and responding. Towse and Hitch (1997) propose that the counting process uses both visual partitioning and verbal labeling of objects, processes that are comparable to the conclusions of Lassaline and Logan. Towse and Hitch assert that problems in counting (e.g., an increase in errors or slow response times) may be due to the simultaneous performance of different cognitive processes.

In addition to the previously mentioned cognitive processes involved in counting, memory also appears to play an important role in the counting process. Lassaline and Logan (1993) determined that memory is necessary for counting to take place because it plays a role in the three processes of indexing, mapping, and response. Towse and Hitch (1997) assert that the role of memory in counting is toc ount each object only once. Inability to remember which items have been counted will hinder the process of accurate counting.

Other studies of counting have examined the more physical aspects of counting by studying physiological functions and behaviors linked with the visual system that might influence counting. Kowler and Steinman (1977) studied the use of saccades (rapid involuntary eyemovements used in fixation) in visual counting tasks. They found that saccades aided the counting processes only when dissimilar objects were organized into distinct groups. However, when objects were inclose proximity and highly similar, saccades hindered the counting process because they interfered with the kinesthetic methods of distinguishing between the counted objects and the objects that needed to be counted. For objects placed far apart, the movement of the eyes offered a consciously detectable muscular sensation, which aided the counting process by allowing one to count the eye movements that occur. Consequently, Kowler and Steinman were able to show that the interaction between saccades and the spatial arrangements ofstimuli is influential in the counting process.

In addition to object proximity, item arrangement can affect the counting process. For example, Noro (1980) studied the counting of black dots on a white screen using both symmetrical and randomly arranged patterns. The relationship between number of dots and response time was examined. For random patterns, a linear relationship occurred; increases in counting time directly related to increases in number of dots for patterns containing four or more dots. Symmetrical arrangements did not exhibit the same linear relationship between number of dots and counting time. Noro's study also found that the number of saccades used for counting symmetrically arranged patterns was more variable than the number of saccades used for counting randomly arranged patterns. This suggests that physiological processes (eye movements) used in counting can be directly affected by the layout of dots in a display.

The role of cognitive and physiological processes in visual counting raises questions concerning the influence of other sense modalities on the visual counting process. Although no studies have specifically investigated cross-modal interactions with counting, much research has been dedicated to studying the role of cross-modal interactions for all of the five senses. For example, researchers have investigated interactions between vision and touch, the influence of color on taste and smell, and the interactions between bright tones and colored words (Goldstein, 1999). More relevant tothe current study is the research that has been done concerning crossmodal interactions between vision and hearing. Bolia, D'Angelo, and McKinley (1999) found that when directional auditory and visual stimuli were congruent, the auditory stimuli enhanced visual processing by decreasing three-dimensional search times. Similarly, Lewis (1971) studied interference effects between auditory and visual stimuli in a visual recall task and found that directionally congruent auditory and visual stimuli enhanced recall, while incongruent auditory distracters interfered with recall of the visual stimuli. Other studies have shown that visual and auditory stimuli may provide differing influences in cross-modal tasks. For instance, an asymmetrical interactive relationship was found between auditory and visual stimuli for a categorization task (Ben-Artzi and Marks,1995; Melara and O'Brien, 1987). More specifically, their visual stimuli were found to more strongly influence auditory perception than auditory stimuli influenced visual perception. However, neither Ben-Artzi and Marks nor Melara and O'Brien provided an auditory reference for the perception tasks, and this deficit made the pitch placement (high or low) of the auditory stimuli more ambiguous relative to the physical/spatial placement (high or low) of thev isual stimulus, which always had the monitor screen edges as avisual reference. In contrast, Wilkerson and Scharff (2000)implemented an auditory as well as visual reference for the cross-modal perception tasks. With a mid-pitch auditory tone reference prior to each trial, less asymmetry occurred between the classification of visual and auditory tasks, although there were still cross-modal interactions. These studies exemplify the interactive relationship of auditory and visual information and their influence on perception.

The current study was designed to determine how three rates of rhythmic sound (i.e., slow, medium, or fast) and white noise (i.e., static) would affect counting of objects that are placed in near or far proximity for both symmetrical and asymmetrical arrangements. Previous research has independently studied both proximity (Kowler & Steinman, 1977) and arrangement (Noro, 1980) but not the possible interactions obtained between the two types of configurations. Further, no known previous research has investigated cross-modal influences on counting tasks.

Based on the assumption that a certain internal rhythm is inherent in the serial counting process, I expected more interference by rhythmic auditory sounds when compared to the interference created bywhite noise distraction. Furthermore, the rhythm rate was expected to interact with proximity and arrangement due to the incongruity between internal counting speed (determined by spatial arrangement and proximity) and the external influence on counting speed (provided by the sound rates). Specifically, the external rhythm was expected to create interference when display arrangements predisposed an opposing internal counting rhythm. For example, a slow rhythmic sound was expected to increase response times for counting when dots were placed close together (for which eye movements were expected to occurmore rapidly). Conversely, the fast rhythmic sound was expected to decrease response times for close dots, but cause increased response times when dots were far apart due to the interference between the expected slow eye movements with the fast rhythm.

Finally, attention to the auditory stimulus was systematically manipulated in order to determine if attending to the auditory stimulus might increase the cross modal interference effects. In the attention conditions, possible interference was created by the necessity of simultaneously performing two tasks using different sense modalities. When attention to the auditory stimuli was not explicitly instructed, interference effects were expected to decrease due to the ability to selectively attend to visual information (i.e., participants might be able to "tune out" the auditory stimulus).




One hundred and two participants were tested in this study. Participants were obtained through sign-up sheets in the Psychology department and received course credit for participating. Each participant gave informed consent, and each was given debriefing forms upon completion of the session.


This study was a 4 sound rate (white, fast, medium, slow) x 2 attention instructions (attend, no attend) x 2 proximity (near, far) x 2 arrangement (symmetrical, asymmetrical) mixed factorial design. The within variables were proximity and arrangement, so that all participants were tested using both the near and far proximities in both the random and symmetrical arrangements.

The between variables were sound rate and instructions. Thus, each participant was tested with only one type of auditory stimulus, and the two groups for each type of auditory stimulus received different sets of instruction regarding attention to the auditory stimuli. Half of the participants in each sound condition were not given specific instructions regarding the sound, and the other half were told to attend to the sound by listening for changes in volume or rhythm during the session. No actual sound changes occurred for any of the groups.

The display order for each participant was randomized by the presentation software (BC Powerlab for Macintosh). In addition, specific key press instructions were used to counterbalance for handedness. Half the participants in each group were instructed to press the P button when they counted 13 dots and the Q button when they counted 15 dots. The other participants were instructed to do the opposite by pressing P for 15 and Q for 13.

Apparatus and Stimulus

The displays consisted of either 13 or 15 black dots that were vertical ovals with an approximate length of 1.27 cm. Quantities of 13 and 15 were chosen to prevent subitizing (which occurs whenc ounting quantities of seven or lower). All displays were created using the drawing function in BC Powerlab software for Macintosh.

The dot proximity (edge-to-edge) was either "near" (1.27 cm) or"far" (5.08 cm), and dots were displayed in both asymmetrical and symmetrical arrangements. The dots were placed on a white background that covered the entire viewing area on an Apple Color Plus 14"monitor. Figure 1 shows samples of experimental displays in the near and far proximities, the asymmetrical and symmetrical arrangements, and the 13 and 15 dot configurations. For each of the four combinations of proximity and arrangement, there were 10 trials for both the 13- and 15-dot displays. There were also 3 practice trials, for a total of 83 trials per participant.

Figure 1. Reduced size sample displays as presented individually on an Apple Color Plus 14" monitor. Actual stimuli were enlarged when presented in the experiment. a) Far proximity, asymmetrical arrangement, 13 dots; b) Near proximity, asymmetrical arrangement, 15dots; c) Near proximity, symmetrical arrangement, 13 dots; d) Far proximity, symmetrical arrangement, 15 dots.

The rhythmic sound rates were generated using a Lowery Synthesizer model V-120 (Victor Company, Tokyo) placed at the front of the room. The rhythmic sound consisted of an alternating drumbeat with cymbal, which was the most distinct rhythmic sound available on the synthesizer. Identical sounds were used for all rhythm conditions with only the speed of beats changed to create slow, medium, and fast rhythms. In the slow rhythm condition, the sound rate was set at one beat per second, and was audible at a range of 44dB spl to 52 dB spl corresponding to the back and front of the testing room. The medium sound rate was three beats per second with a range of 45dB spl to 52dB spl, and the fast sound rate was five beats per second ranging from 46dB spl to 54 dB spl. The white noise (static) was generated with an off-frequency radio setting, and had a loudness range of 58 dBB to 72 dBB.


There were 10 participants per condition run in groups of two to 15 participants depending on availability. Each student was seated at a computer, given key press instructions, and then signed informed consent forms. For the groups requiring attention to the auditory stimuli, participants were instructed to listen for changes in the volume or rhythm of the auditory stimulus during the session. The other groups were told only that a noise would be playing as they performed the counting task. All participants were instructed to use only a visual counting method, and explicitly instructed not to use kinesthetic behaviors (i.e., movements of their hands or the mouse) to aid them in the counting process. Other than the difference in attention instructions and the noise conditions, all participants performed the same task and received identical instructions.

The auditory stimulus was started and participants began the test by reading the instructions on the computer screen. Participants were instructed to count the dots on the screen as quickly and accurately as possible and report their answers according to the key press instructions they were given.

The first three trials were practice trials. After each of the practice trials, the correct answer was displayed. A blank screen followed each display (including the practice trials), which provided an opportunity for participants to ask questions between trials and allowed for self-paced testing by the participants. All stimulus displays were exhibited to the participant until a key press was made, then response time was recorded. No response times were recorded for the practice trials. Average total testing time was approximately 12 minutes.

After completing the visual counting task, the groups that were instructed to attend to the sound were asked to complete a written report describing any changes that they detected in the sound during the session. For both volume and rhythm rate, participants were asked to report the detection of changes and the number of changes that may have occurred during the session.


Of the 102 participants tested, data from 80 participants were used in analysis. Data from 16 participants were eliminated due to high error rates (over 10% error), and an additional 6 participants were randomly eliminated to make 10 participants per sound group for each instruction type. Response times were sorted by proximity (near and far), arrangement (asymmetrical and symmetrical), and number of dots (13 and 15) for each participant, and the medians of each condition were determined; medians were used to eliminate outliers from individual conditions for each participant. Then the medians within each proximity and arrangement for the 13-dot and 15-dot conditions were averaged together for each participant. These means were analyzed using a 4 (sound type) x 2 (attention instructions) x 2(proximity) x 2 (arrangements) mixed Analysis of Variance.

Two significant main effects and several interactions were found. Near proximity (1.27 cm) configurations (M = 6.1 s) required significantly more time to count than the far (5.08 cm) configurations (M = 5.9s), F (1, 72) = 4.97, p < .05. Instructions to attend to the auditory stimuli (M = 6.4s) significantly increased response times as compared to conditions where attention was not specified (M = 5.6s), F (1, 72) = 7.12, p < .01.

The main effects must be interpreted in light of the interactions. A significant interaction occurred between sound rate and attention, F (3, 72) = 2.83, p < .05. Tukey's HSD post hoc analysis revealed that response times increased for the medium level sound regardless of configuration when participants were instructed to attend to the auditory stimuli, but attention led to no significant differences for the other sound rates (See Figure 2). This is contrary to the hypothesis that attention would cause increased interference for all sound types. An interaction between arrangement and proximity was also obtained, F (1, 72) = 4.93, p < .05. Response times for the asymmetrical arrangement were more affected by dot proximity than were the response times for the symmetrical arrangement. Specifically, Tukey's HSD post hoc analysis showed that for the asymmetrical arrangement, near-proximity response times were significantly longer than the far-proximity response times for that arrangement, whereas proximity did not affect the response times for the symmetrical arrangement. This makes sense in light of the possibility of saccadic interference that was found for similar arrangements by Kowler and Steinman (1977).

Figure 2. Mean response times (in seconds) and standard error bars for each sound type (fast, medium, slow, white) in both the attending and non-attending conditions. While the response times to the attention conditions were always longer, this was only significantf or the medium sound rate.

Finally, a significant four-way interaction was also obtained, F(3, 72) = 2.83, p < .05. Figure 3 shows a plot of means for each condition, and the standard errors for each are presented in Table 1. Response times were longer for the medium sound rate when participants were instructed to attend to the sound. Tukey HSD posthoc analysis showed that this result is exaggerated within the near-proximity, asymmetrical-arrangement conditions and the far-proximity, symmetrical-arrangement conditions. When attention to the auditory stimulus was not specified, the medium sound rate was significantly associated with the shortest response times of all the sound types. In addition, there was a tendency for the variance for medium sound rate conditions without attention instructions to be smaller than all other conditions. As seen in Figure 3, attention to the auditory stimuli for the slow and fast sound rates appear to show slight increases in response times, however these increases were not significant. In general, the white noise conditions were least affected by attention to the auditory stimuli.

Figure 3. Graph of the mean response times (in sec) for each sound type for each configuration type in both the attending and non-attending conditions. For each configuration condition, the medium sound rate conditions showed significantly longer counting times when attention to the auditory stimulus was instructed. Incontrast, not attending showed significant decreases in counting time for the medium sound rate. (See Table 1 for SE values of each condition.)

Table 1: Standard error of the means for each sound type in each display type for both attending and non-attending conditions.


No Attention


Sound type









White Noise (static)









Fast (5b/s)









Med (3b/s)









Slow (1b/s)









Of the groups instructed to attend to the sound, all but two participants reported change(s) in either rhythm or volume despite the fact that no actual changes occurred.


The results of this study indicate that a cross-modal interaction occurred between the auditory and visual stimuli, although much of the effect was dependent on attention to the auditory stimulus. Further, the dot configurations of the displays created both individual effects on counting as well as interactive effects between the different configuration types.

In support of past research (Kowler & Steinman, 1977), the near proximity configurations produced longer counting times for all conditions. Further inspection of the interaction between proximity and arrangement suggests that this proximity effect is driven by the large difference between response times for the near and far proximities in the asymmetrical but not the symmetrical arrangement. An arrangement effect was also found by Noro (1980); however, in his study, the dot proximity within the arrangements was not controlled. The results of the present study show that proximity had an influence on counting time, and the asymmetrical arrangements were more strongly affected by the proximity of the objects to be counted.

More specifically, the asymmetrical arrangement in the nearproximity condition produced the longest counting times. This result was most likely due to saccadic interference that is known to occur in close proximity arrangements of similar objects (Kowler &Steinman, 1977) coupled with greater difficulty counting asymmetrical configurations when the number of dots is greater than four (Noro,1980). However, saccadic interference cannot be firmly established in the current study because saccades were not measured.

Based on the studies of Kowler and Steinman (1977) and Noro (1980), the shortest response times should have been for the symmetrical arrangement in the far proximity. Contrary to these expectations, however, the best response times were seen for the asymmetrical arrangements in the far proximity configuration. This effect could be due to counting strategy. Some participants commented that counting the symmetrical patterns was easier when a grouping strategy was implemented (e.g. the dots are grouped and then added), whereas counting the asymmetrical configurations was not aided bythis strategy. In Noro's study, the asymmetrical configuration showed an increase in counting times as the number of dots increased, but the same trend was not seen in the symmetrical arrangements. This result, too, could be a product of differing counting strategies used for symmetrical versus asymmetrical arrangements. The group-and-add strategy may be more closely related to Lechelt's (1971) estimating process than to serial counting. Perhaps the grouping and estimating strategy is actually less time efficient than the serial counting strategy.

Finally, attention to the auditory stimuli produced a systematic increase in counting time for all conditions. Because the groups who received no instruction to attend did not need to retain any information about the auditory stimuli, they could easily ignore the noise in much the same way that a person would ignore the radio while reading. For the groups that were told to attend to the sound, they had to divide their attention between listening and visual counting. Thus, it is not surprising that the necessity of simultaneously performing two tasks hindered one or both processes. Towse and Hitch (1997) also found that performing multiple cognitive tasks created detriments in at least one of the tasks.

On inspection of the interaction between sound rate and attention, the significant attention effects were seen for the medium sound rate only. The medium sound rate conditions led to the shortest response times when no attention was given to the auditory stimuli, but they showed the longest response times for all conditions in which the auditory stimuli were attended. This effect may be explained by the timing of the saccadic eye movements as the eyes move from item to item within a display. Saccades themselves lasts 20 to 100 ms, whereas the fixations between them require approximately 200 ms (Matlin & Foley, 1997). The physiological constraints of the sesaccadic movements allow for approximately three to four fixations within 1 sec. Thus, for the fast (5 beats per sec) and white noise (static) conditions, the sound rates were inconsistent with any naturally occurring saccadic eye movements. If participants voluntarily matched the slow rate (1 beat per sec) then they would have slowed their counting times, which was contradictory to testing instructions. However, the medium sound rate (3 beats per sec) approximately corresponded to the number of saccadic movements per second in a scanning task (i.e. 3 to 4 per sec).

For the medium sound rate conditions in which no attention was instructed, the response times may have reached an optimal speed because the sound rate and the rate of saccadic eye movements were able to become synchronized. The neural mechanisms that control the rate of saccades may have been influenced by the external rhythm (sound rate), thereby contributing to the synchronicity of the eye movements and the sound rate. Alternately, during the medium sound rate conditions where attention was instructed, there was a need to perform two different tasks in which the stimuli induced behaviors that occurred at similar rhythms or frequency of occurrence. This requirement may have caused increased counting times due to a greater interference between the tasks. Because both tasks required the same rhythm of behavior occurrence, attending to both auditory and visual stimuli may have required more effort to keep the two tasks separated.

In conclusion, the sound rates did not uniformly interfere with visual counting. Although the medium sound rate did produce some of the expected interference effects, the effects were influenced by attention and display configuration. The effect for the medium sound rate, which is presumably created by the physiological correlation between eye movements and attention to the auditory stimulus, suggests neural interactions between high level processes (attention and counting) and low level processes (eye movements).

Other high-level processes are likely to influence the perceptual processing for the tasks in this study. One example would be memory, as proposed by Towse and Hitch (1997) and Lassaline and Logan (1993). In addition, the possibility that grouping strategies were used when counting the symmetrical arrangements could mean that, for these displays, the serial counting process was not measured. The use of grouping strategies may decrease the need for memory in a counting task (much like chunking in a recall task), thereby changing the cognitive processes used. Thus, a grouping strategy may influence the role of memory in counting. Further, this alternate strategy for counting may have affected accuracy by exchanging speed for increased accuracy. In the current study, accuracy was not analyzed and excessively inaccurate data were discarded, therefore any effects on accuracy by possible alternate counting strategies could not be determined.

Future research is needed to inspect the cognitive counting processes between arrangement types in more detail prior to inspecting the cross-modal influences. The possibility of variable counting strategies used for different display types needs to be clarified before further investigation of the cross-modal influences on these strategies is determined. In addition, the possible interaction between high- and low-level processing found in this study indicates that other high-level processes are likely to be influential to counting. Exploration of these possible influences on counting may lead to better research when examining the cross modal influences on counting.

Finally, changes in the stimuli could produce different results than were obtained in this study. Using different types of sound (e.g. changes in pitch or volume, or length and type of pulse) could change the perception of the auditory stimulus, thereby producing different interaction effects. The characteristics of the visual stimulus could be manipulated as well. For example, displays containing different object types could change the effects of object proximity seen in the current study. As noted by Kowler and Steinman (1977) saccadic interference in near proximity configurations was increased by the object similarity within a display. Therefore, changing the objects in the display would likely influence the effects of dot proximity on a counting task.

Further research into the cross-modal effects on counting is needed to fully elucidate the influences that may occur during the task. The direct application of the knowledge of cross modal influences may be limited. However, it is precisely this type of perceptual-cognitive information that is required to more fully understand the ways the human brain functions in a multi-sensory environment.


Ben-Artzi, E. & Marks, L.E. (1995). Visual-auditory interaction in speeded classification: Role of stimulus difference. Perception & Psychophysics, 57 (8), 1151-1162.

Bolia, Robert S., D'Angelo, William R. & McKinley, Richard L.(1999). Aurally aided visual search in three-dimensional space. Human Factors, 41 (4), 664-9.

Chute, D., & Westall, R. (1996). PowerLaboratory a B/C Product. Version 1.0, Maclaboratory, Inc.: Devon, PA.

Goldstein, E. B. (1999). Sensation and Perception, 5th ed. Pacific Grove, CA: Brooks/Cole Publishing Company.

Kowler, E. & Steinman, R. M. (1977). The role of small saccades in counting. Vision Research, 17(1), 141-146.

Lassaline, M. E. & Logan, G. D. (1993). Memory-based automaticity in the discrimination of visual numerosity. Journal of Experimental Psychology: Learning, Memory, & Cognition, 19 (3), 561-581.

Lechelt, E. C. (1971). Spatial numerosity discrimination as contingent upon sensory and extrinsic factors. Perception &Psychophysics, 10 (3), 180-184.

Lewis, J. L. (1971). Activation of "logogens" in an audio-visual word task. Dissertation Abstracts, 32 (1-B), 590.

Matlin, M. W. & Foley, H. J. (1997). Sensation and Perception 4th ed. Boston: Allyn & Bacon.

Melara, R.D. & O'Brien, T.P. (1987). Interaction between synesthetically corresponding dimensions. Journal of Experimental Psychology: General, 116 (4), 323-336.

Noro, K. (1980). Determination of counting time in visual inspection. Human Factors, 22 (1), 43-55.

Towse, J. N. & Hitch, G. J. (1997). Integrating information in object counting: A role for a central coordination process? Cognitive Development, 12 (3), 393-422.

Wilkerson, A. & Scharff, L. (2000). Expanding cross-modal research using auditory glides and stereoscopic depth. http://hubel.sfasu.edu/research/cm.html Date accessed: July 2001.