Bottom-Up and Top-Down Processes in Visual Perception

Bottom-Up and Top-Down Processes in Visual Perception

Human cognition, and especially perception, is a two-way process. On one hand, information from the outside world affects our senses: visual, auditory, taste, and other receptors. On the other hand, what we perceive is determined by our knowledge, accumulated experience, needs, and attitudes—that is, our readiness to see or perceive something.

1. Two Classes of Processes

In cognitive psychology (a field that interprets human cognition using the metaphor of information processing by a technical device, like a computer), it is common to talk about two classes of processing. On one side, there are data-driven processes—those led by incoming information. On the other, there are schema-driven processes, which rely on information already stored in the system’s memory.

From a biological perspective, which has also become widespread in cognitive research, these are called bottom-up processes (moving from the sensory organs through subcortical structures to the cerebral cortex) and top-down processes (moving from the cortical areas, especially the frontal cortex, toward lower structures).

2. Helmholtz’s Theory

The dual nature of our perception was noticed long before modern cognitive research began. In the mid-19th century, Hermann von Helmholtz, while analyzing visual illusions, proposed distinguishing between the primary image (what arises in the mind solely due to sensory input) and the representational image—our knowledge that allows us to recognize an object as such. In the final perceptual image, these are combined. For example, when we see an apple or a house, these processes are inseparable. The differences become apparent in situations where the stimulus is the same, but the perceived images differ, or when our past experience leads us to misinterpret an object’s properties.

Take the Ponzo illusion, also known as the “railroad tracks illusion”: the horizontal lines are objectively the same, and their retinal images are identical, but we see them as different because the converging “rails” create linear perspective. We know that closer objects appear larger and distant ones smaller, so our visual system adjusts perception based on this knowledge.

3. Top-Down Processes in Cognition Regulation

For the past hundred years, psychology has popularized the construction of ambiguous images. The most famous is the “profiles and vase” by Danish psychologist Edgar Rubin. When shown two black profiles on a white background, a person can see either two faces looking at each other or a vase, but not both at once. The stimulus does not change. The famous Dutch artist M.C. Escher often used ambiguous and physically impossible, yet visually processable, images in his artwork.

Even without ambiguous images, we can make recognition errors simply due to our readiness or desire to see something. For example, if I show you a round orange object: if I hold it coming out of a grocery store, you’ll likely see an orange. If I hold it on a tennis court, you’ll likely see a tennis ball, even though the object is the same.

This uncertainty or underdetermination of external stimuli is the main condition for the activation of top-down cognitive regulation processes—our knowledge, experience, and attitudes.

4. Challenging Perception Conditions

There are many examples of challenging perception conditions where top-down processes manifest. One is stimulus incompleteness: if a person is partially blocked by a chair, you still perceive them as whole, even though part of the image is missing on the retina. Another is brief presentation, where an object flashes and disappears. Or when something else appears immediately after, a phenomenon called “masking” in psychology. Overload situations also apply, such as when visual objects change rapidly (like on scrolling news tickers) or when many objects are presented at once in the visual field.

In such cases, top-down information processing comes into play: to recognize an object without complete information, a person must rely on what is already stored in memory.

5. The Word Superiority Effect

One of the most striking examples of top-down processes in cognition, described in the late 19th century, is the so-called “word superiority effect.” This effect means that if a person is shown a set of letters organized into a word for a very short time, or under noisy or masked conditions, they can recognize about twice as many letters as if the letters were presented in a random order.

This phenomenon was first described by American James McKeen Cattell in 1886, who studied psychology in Germany. In the 1970s, the effect was studied using a different experimental model: researchers compared the recognition of a letter within a word to recognition when the letter was presented alone. It turned out that people recognize a letter more effectively within a word, especially if the letter is shown briefly or followed by a “mask” (like a row of hashes, #####) that interferes with recognition. Surprisingly, it is easier to see a letter surrounded by others forming a word than to see a single letter by itself.

This phenomenon has been explained in several ways. The most widely supported explanation was proposed by American psychologists David Rumelhart and James McClelland, who introduced a neural network approach to cognition. They modeled human cognition as the interaction of several layers of simple, interconnected elements that activate each other. They suggested that we have a layer of elements recognizing parts of letters (circles, lines, arcs), a layer recognizing whole letters, and a layer recognizing words. Letter elements are connected to all letters, and letters are connected to all words they can form. If a word is shown and one letter is obscured, the other letters activate the word layer, which in turn provides feedback. Even if we can’t see the letter clearly, the word is activated correctly, and top-down influences help us identify the missing letter.

6. Ways of Processing Information

The word superiority effect does not occur in all conditions. For example, when we tried showing words letter by letter at a speed of about ten letters per second, a special kind of visual attention error appeared: we skipped some elements in the sequence. If shown a row of ten black letters and asked to name two gray letters in the row, a person can easily name the first, but the second is much less likely if it appears within half a second after the first. This is called the “attentional blink” phenomenon.

We decided to test whether these “gaps” in attention would persist if one of the letters could be skipped and still form a meaningful word (like “petal” and “broom” in Russian). Interestingly, if a person is not told that a word is being presented letter by letter, they won’t notice it, and the attentional gaps remain. But if they are told to look for a word, they make no mistakes, even if the sequence is random letters.

In this case, we are dealing with a different kind of top-down process. It’s not just the influence of stored experience on what we can perceive, but also the influence of the way we are used to working with information. If told, “Read the words,” we can catch rapidly presented letters, even if they don’t form a word. If told, “Name as many letters as possible,” we won’t notice the word, and will skip letters where attentional limitations (like the “attentional blink”) occur.

7. Strategies for Solving Perceptual Tasks

These ways of organizing the visual system’s work are called strategies for solving perceptual tasks. We use the word “task” because, under challenging perception conditions, the image is not given automatically: we need to discern, see, or notice it. By applying different strategies to the same task, we can achieve fundamentally different results and, for example, avoid or fail to avoid attention errors.

Thus, top-down influences on visual information processing come in at least two forms. On one hand, they are our past experience, expectations, and attitudes. On the other, they are the strategies we use to solve visual tasks, which we ultimately organize based on our past experience. Importantly, these strategies are not ready-made; we have to apply, build, and actively search for them—possibly using hints provided to us.

Leave a Reply