The miracle of spatial filters

dscn0218.jpgOnce upon a time, students of auditory perception made a neat discovery: the inner ear effectively performs a Fourier analysis on the complex vibrations of the tympanic membrane. Frequency is a natural feature of the auditory stimulus, both distal – the vibrating object – and proximal – the vibrating membranes of the ear. Vision scientists lifted this concept wholesale, and applied it to the visual stimulus – which, however, lacks frequency characteristics both proximally and distally. Specifically, they decided that the visual system performs a Fourier analysis of the 2D pattern of intensities of photons striking the surface of the retina. Why? How? These issues are never addressed, though the action is said to take place at anatomically early points in the process, most popularly V1.

Part of the reason for locating these putative Fourier-style events at V1 is that the idea arose at a time when V1 was thought to be the sum total of the visual system. Neurons in this part of the brain are said to perform a Fourier analysis within their receptive fields, on the following basis: Each has a preference for a particular sine-wave pattern of light and dark across the retina, as well as for particular orientations of such repeating patterns. These patterns are not, of course, explicit features of the retinal stimulation; they need to be derived as part of a set of patterns that, when superimposed, will reproduce the actual (non-repeating) pattern of a particular stimulation event. Yet, somehow, these “detectors” in V1 are able to intuit that their preferred frequency is one of the collection of frequency patterns into which the total pattern of retinal stimulation within their receptive field could be decomposed, if one were inclined to perform the relevant mathematical acrobatics. This is really amazing; they’re “detecting” a pattern that is not present in the stimulus, except in a purely abstract sense, and doing so without benefit of the total analysis of the image that would normally be required. The V1 neuron, it is said, is of the nature of a “detector.” Thus, it doesn’t analyze the image into conceptual Fourier components, it merely detects their presence in principle, should they be implicitly part of the relevant set.

As a “detector,” this special V1 neurons must have a template to make the match; there appears to be a homunculus smuggled into these ingenious neurons. But wait; isn’t there a simpler way? Can’t these “detection” capabilities be achieved on the basis of the neuron’s connectivities with the cells of the retina? If the set of cells to which the V1 is ultimately connected fire in just the pattern that is specified – homuncularly expected – by the detector, then it will fire a lot, and this will “signal” (to another homunculus) the presence of the pattern. But of course, this will never happen, because as mentioned above, the thing being detected is never actually present in the retinal pattern, but only a potential product of a particular type of analysis that, unlike in the case of audition, is not a natural feature of the physics and physiology of the visual stimulus. The problem is so deep that the advocates of spatial frequency detectors never do try to explain how the trick is done; they just take it on faith.

Paradoxically, this dogma coexists with another, the notion of V1 neurons as line or edge detectors. The notion is a holdover from the early data and interpretations of Hubel and Wiesel, who are still the go to reference for this type of claim, though their data were anything but clear-cut. However, a sharp edge apparently requires an infinite series of patterns to be described in terms of spatial frequency components. This is apparently why, while often citing Hubel and Wiesel to support an “orientation detector” claim, the stimuli most frequently employed are simple Gabor patches, which shade smoothly and slowly from light to dark, thus allowing this inconvenient fact to be side-stepped.

How did the neurons –as-spatial-filters story come about? As indicated above, part of the motivation was the analogy with audition. Also attractive was the analogy to electronic circuits – the neurons are often described as “band-pass filters,” which sounds very scientific.  The go-to reference for the spatial filter story is still Campbell and Robson (1968), with seven pages in GoogleScholar for 2017 alone. They collected some data (on themselves) and interpreted it in these terms, in a vague sort of way. Not only are the conclusions rather unsatisfactory and speculative (e.g. “Thus it seems that we cannot satisfactorily model the over-all visual system by a simple peak detector following a spatial filter…As a modification of this theory we may assume…Thus we may suppose…”); the interpretations of the data are based on very sketchy underlying assumptions, most notably the assumption that certain features of certain percepts directly reflect the activities of particular sets of neurons at particular, anatomically early, layers of the visual system. This assumption is not defensible, or at least needs to be defended on the basis of a complete description of how this is achieved in the context of the whole system. Teller (1984) challenged the idea on logical grounds, but a less pedantic Graham (1992; 2011) saved the situation by boldly asserting that under certain conditions, a “miracle” happens and the brain becomes transparent down to whatever level investigators have in mind. It is thinkers like Graham that have moved the field forward, creating the basic foundations on which it rests today. Her boldness extends to acknowledging that there is currently no conceivable functional reason why the visual system should perform these truly miraculous acts. It just does. The position is similar in its heroism to that of Darryl Bem, who defended a different miracle in the face of all the laws of physics. Facts are facts, after all.

Having established, in their minds, that V1 neurons act as “spatial filters,” investigators set out to generate evidence consistent with this notion. They did this with extraordinary success – even at a time when they believed V1 to form the sum total of the visual system, and interpreted data accordingly. But again, in retrospect this didn’t matter, as apparently they had serendipitously hit upon methods that rendered the brain transparent (these results/interpretations were what led Graham to posit transparency), such that it might just as well have consisted only of V1 neurons (as they understood them to behave). Specifically, it turned out that when you flash a Gabor patch really, really quickly, you trigger the homuncular-clairvoyant-frequency-detecting-Fourier analyzing-neurons in V1, whereupon all their connections become inoperative except certain special ones that lead directly to consciousness. These methods (discussed in more detail in a separate post) involved forcing observers to choose one of two options via button presses, guessing if necessary, thus keeping the data clean and unperturbed by actual perceptual experience (dirty facts), and enabling fitting to simple mathematical models.

This is why you see so many Gabor patches at vision science talks. Unfortunately, when you don’t use the precision tool of Gabors, even forced-choice methods and all the goodwill in the world may not make the data fit the story even crudely. If the frequency patterns supposedly detected by the “tuned” detector/Fourier analyzers are supposed to reach consciousness directly, then why does this happen only when the image being viewed is a Gabor patch? It should be clear that the myth can only be maintained by careful curating of experimental conditions, as well as a great deal of gullibility. But the failures in all other conditions have to be addressed somehow, and that is why we are in a new great phase of this pseudoscience, the “natural stimulus” phase (about which more will be discussed in a separate post.) There is no definition of “natural;” in practice it includes very weird manmade objects, and vague references to the “statistics” of the image.

n.b. A relevant article is Westheimer (2001) The Fourier Theory of Vision, discussing both the history and logical weakness of the concept.













9 thoughts on “The miracle of spatial filters”

  1. “Vision scientists lifted this concept wholesale, and applied it to the visual stimulus – which, however, lacks frequency characteristics both proximally and distally” – why does a visual stimulus lack frequency characteristics? A uniformly blue sky will have very different spatial frequency components from a forest with plenty of trees, branches and leaves. You argue that this “is not a natural feature of the physics and physiology”. However, a set of trees in front of a blue sky will give rise to a very different spatial frequency pattern of photons than the blue sky alone. Please clarify. How is this not a natural feature of the physics?


    1. The question is whether, given a pattern of photons on the retina corresponding to light from trees, etc, the visual system redescribes that pattern in terms of the collection of simple sine-wave patterns that, if superimposed, would produce that pattern. In other words, does it perform a Fourier analysis of the patterns of light and dark (and color?). No one has ever explained how it could do this, nor why it should do this. What use would it be to take a group of points corresponding to light from the tree, or leaves, and analyze it into different types of groups that have no relationship to the shape of the tree, and are useless in allowing us to organize the retinal stimulation into the shape of the tree?


  2. Why it should do this? One possible answer is that representations of a signal in the frequency domain can be more efficient than in the spatiotemporal domain. In short, the tree can be represented with less action potentials in the frequency domain, while still being recognizable as a tree, than in the spatiotemporal domain. Most image and video compression algorithms exploit this insight. How it could do this? I’m not a vision scientist, but neurons that are selective for certain spatiotemporal frequencies could do the job. What use would it be to “analyze [a group of points] into different types of groups that have no relationship to the shape of the tree”? I don’t understand the question. The representation in the frequency domain is equivalent to the representation in the spatiotemporal domain, it’s just a different way of representing it. I’d still be curious to know your answer to my question in the first comment – why do you think that a a visual stimulus lacks frequency characteristics? I seem to fundamentally misunderstand your premise.


    1. I’m not a mathematician, but from what I’ve read analyzing a pattern into superimposed spatial frequencies is not a trivial problem. And vision requires that points be organized into the groups that correspond to the objects we need to see. There’s no signal of a tree; the visual system has to infer a tree via organizing individual points of stimulation of the retina, in an implicitly inferential process. The visual stimulus lacks frequency characteristics in the sense that the patterns of light and dark typically aren’t neatly repeating, they’re incidental to whatever objects are reflecting light to our eyes at the moment. They have a certain kind of order but not the sine-wave type. No one has ever proposed a mechanism for how the visual system could perform a Fourier analysis, or how such an analysis could let us see a tree, which may be out there and may be in our minds but it’s not on the retina, not in the “signal.”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s