Once upon a time, students of auditory perception made a neat discovery: the inner ear effectively performs a Fourier analysis on the complex vibrations of the tympanic membrane. Frequency is a natural feature of the auditory stimulus, both distal – the vibrating object – and proximal – the vibrating membranes of the ear. Vision scientists lifted this concept wholesale, and applied it to the visual stimulus – which, however, lacks frequency characteristics both proximally and distally. Specifically, they decided that the visual system performs a Fourier analysis of the 2D pattern of intensities of photons striking the surface of the retina. Why? How? These issues are never addressed, though the action is said to take place at anatomically early points in the process, most popularly V1.
Part of the reason for locating these putative Fourier-style events at V1 is that the idea arose at a time when V1 was thought to be the sum total of the visual system. Neurons in this part of the brain are said to perform a Fourier analysis within their receptive fields, on the following basis: Each has a preference for a particular sine-wave pattern of light and dark across the retina, as well as for particular orientations of such repeating patterns. These patterns are not, of course, explicit features of the retinal stimulation; they need to be derived as part of a set of patterns that, when superimposed, will reproduce the actual (non-repeating) pattern of a particular stimulation event. Yet, somehow, these “detectors” in V1 are able to intuit that their preferred frequency is one of the collection of frequency patterns into which the total pattern of retinal stimulation within their receptive field could be decomposed, if one were inclined to perform the relevant mathematical acrobatics. This is really amazing; they’re “detecting” a pattern that is not present in the stimulus, except in a purely abstract sense, and doing so without benefit of the total analysis of the image that would normally be required. The V1 neuron, it is said, is of the nature of a “detector.” Thus, it doesn’t analyze the image into conceptual Fourier components, it merely detects their presence in principle, should they be implicitly part of the relevant set.
As a “detector,” this special V1 neurons must have a template to make the match; there appears to be a homunculus smuggled into these ingenious neurons. But wait; isn’t there a simpler way? Can’t these “detection” capabilities be achieved on the basis of the neuron’s connectivities with the cells of the retina? If the set of cells to which the V1 is ultimately connected fire in just the pattern that is specified – homuncularly expected – by the detector, then it will fire a lot, and this will “signal” (to another homunculus) the presence of the pattern. But of course, this will never happen, because as mentioned above, the thing being detected is never actually present in the retinal pattern, but only a potential product of a particular type of analysis that, unlike in the case of audition, is not a natural feature of the physics and physiology of the visual stimulus. The problem is so deep that the advocates of spatial frequency detectors never do try to explain how the trick is done; they just take it on faith.
Paradoxically, this dogma coexists with another, the notion of V1 neurons as line or edge detectors. The notion is a holdover from the early data and interpretations of Hubel and Wiesel, who are still the go to reference for this type of claim, though their data were anything but clear-cut. However, a sharp edge apparently requires an infinite series of patterns to be described in terms of spatial frequency components. This is apparently why, while often citing Hubel and Wiesel to support an “orientation detector” claim, the stimuli most frequently employed are simple Gabor patches, which shade smoothly and slowly from light to dark, thus allowing this inconvenient fact to be side-stepped.
How did the neurons –as-spatial-filters story come about? As indicated above, part of the motivation was the analogy with audition. Also attractive was the analogy to electronic circuits – the neurons are often described as “band-pass filters,” which sounds very scientific. The go-to reference for the spatial filter story is still Campbell and Robson (1968), with seven pages in GoogleScholar for 2017 alone. They collected some data (on themselves) and interpreted it in these terms, in a vague sort of way. Not only are the conclusions rather unsatisfactory and speculative (e.g. “Thus it seems that we cannot satisfactorily model the over-all visual system by a simple peak detector following a spatial filter…As a modification of this theory we may assume…Thus we may suppose…”); the interpretations of the data are based on very sketchy underlying assumptions, most notably the assumption that certain features of certain percepts directly reflect the activities of particular sets of neurons at particular, anatomically early, layers of the visual system. This assumption is not defensible, or at least needs to be defended on the basis of a complete description of how this is achieved in the context of the whole system. Teller (1984) challenged the idea on logical grounds, but a less pedantic Graham (1992; 2011) saved the situation by boldly asserting that under certain conditions, a “miracle” happens and the brain becomes transparent down to whatever level investigators have in mind. It is thinkers like Graham that have moved the field forward, creating the basic foundations on which it rests today. Her boldness extends to acknowledging that there is currently no conceivable functional reason why the visual system should perform these truly miraculous acts. It just does. The position is similar in its heroism to that of Darryl Bem, who defended a different miracle in the face of all the laws of physics. Facts are facts, after all.
Having established, in their minds, that V1 neurons act as “spatial filters,” investigators set out to generate evidence consistent with this notion. They did this with extraordinary success – even at a time when they believed V1 to form the sum total of the visual system, and interpreted data accordingly. But again, in retrospect this didn’t matter, as apparently they had serendipitously hit upon methods that rendered the brain transparent (these results/interpretations were what led Graham to posit transparency), such that it might just as well have consisted only of V1 neurons (as they understood them to behave). Specifically, it turned out that when you flash a Gabor patch really, really quickly, you trigger the homuncular-clairvoyant-frequency-detecting-Fourier analyzing-neurons in V1, whereupon all their connections become inoperative except certain special ones that lead directly to consciousness. These methods (discussed in more detail in a separate post) involved forcing observers to choose one of two options via button presses, guessing if necessary, thus keeping the data clean and unperturbed by actual perceptual experience (dirty facts), and enabling fitting to simple mathematical models.
This is why you see so many Gabor patches at vision science talks. Unfortunately, when you don’t use the precision tool of Gabors, even forced-choice methods and all the goodwill in the world may not make the data fit the story even crudely. If the frequency patterns supposedly detected by the “tuned” detector/Fourier analyzers are supposed to reach consciousness directly, then why does this happen only when the image being viewed is a Gabor patch? It should be clear that the myth can only be maintained by careful curating of experimental conditions, as well as a great deal of gullibility. But the failures in all other conditions have to be addressed somehow, and that is why we are in a new great phase of this pseudoscience, the “natural stimulus” phase (about which more will be discussed in a separate post.) There is no definition of “natural;” in practice it includes very weird manmade objects, and vague references to the “statistics” of the image.
n.b. A relevant article is Westheimer (2001) The Fourier Theory of Vision, discussing both the history and logical weakness of the concept.