This comment was just rejected by PubPeer. Why?

 

Image source: https://www.google.com/search?q=censorship&client=firefox-b-d&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjFopHXhZzkAhWDzKQKHbzwAiEQ_AUIESgB&biw=1297&bih=507#imgrc=vG8ssZ1qZ_JrRM:

Advertisements

Is this April 2019 Science article an example of fake visual neuroscience?

Screen Shot 2019-08-01 at 2.28.00 PM

In the past two posts I’ve tried to explain why prevailing methods in visual neuroscience amount to a fake science even less demanding than astrology. A recent Science article by Stringer, Pachitariu, Steinmetz, Reddy, Carandini & Harris (2019) seems a perfect example of such methods. The article is titled “Spontaneous behaviors drive multidimensional, brainwide activity.”

Stringer et al (2019) wave some objects around in the lab while recording from a few thousand neurons, then mine their data for coincidences between (their partial descriptions of) those external events and the electrical activity they have recorded from 10,000 neurons.

As is well-known, even if those external events had been random numbers from a random number generator, correlations would be found.

In other words, there’s no necessary, rational link between Stringer et al’s experimental conditions and the “data” they collect. The same methodological principle could be used to support any thesis whatsoever, e.g. to identify supposed psychics in our midst.

Big data doesn’t help, either; it just makes things worse, as Calude and Longo (2016) recently showed in a paper titled The Deluge of Spurious Correlations in Big Data.

The hallmark of scientific practice is, of course, an investigator’s ability to show a tight, necessary link between theory and experimental conditions, and experimental conditions and results. Again, that crucial connection here is completely lacking. Their method, in other words, does not allow the authors to distinguish between chance and necessity.

Relatedly: As Gary Smith explains in The AI Delusion, the principle component analysis (PCA) technique used by Stringer et al is a tool for data reduction whose output – the “components” – need have no predictive value:

“A goal of summarizing data is very different from a goal of using the data to make predictions….the principle components are chosen based on the statistical relationships [in the sample] among the explanatory variables, with no consideration whatsoever of what these components will be used to predict. For example, a person’s birth month or favorite candy might end up being included among the principal components used to predict whether someone will be involved in a car accident. Moreover, if the principal components are based on a data-mining exploration of hundreds or thousands of variables, it is virtually certain that the components will include nonsense. Once again, Big Data is the problem, and principal components is not the solution.”

To avoid confusion, it should be noted that Smith is using the word “predict” in the normal, forward-looking sense, not in the neuroscience newspeak, post hoc manner of Stringer et al. (2019) (see below).

The “thousands of variables” here correspond to the 10,000-plus putative neurons being recorded from. They constitute only a small subset of a highly integrated system involving billions or trillions of synapses. The idea that meaningful inferences about how a such a complex system, whose basic functional principles are as yet unknown, may be drawn via random correlation-fishing beggars belief.

Correlation-fishing is also, naturally, the basis of the literature Stringer et al inappropriately cite.

They state, for example, that “The firing of sensory cortical [] neurons correlates with behavioral variables such as locomotion…,” citing DiPoppa et al (2018). But the claims of DiPoppa et al were arrived at via straightforward p-hacking.

The discovery of such correlations in a sample of data is, again, no basis for making causal neuroscientific claims (as pointed out recently by Mehler and Kording (2018)), due to the obvious problem of massive confounding. (One of Mehler and Kording’s main points was the impropriety of employing causal language – like the term “drive” used in the title of the present paper – to describe correlation-fished neuron-stimulus associations as though they implied a causal relationship). And such associations are known not to replicate.

Stringer et al also tell us that:

“[N]eurons’ responses to classical grating stimuli revealed robust orientation tuning as expected in visual cortex (fig. S1).”

As someone who has studied this literature closely, this statement reads to me like a lie. Claims of orientation-tuning have always been correlation-fished, exactly in the way we could identify psychics based on a series lucky guesses of the results of dice rolls. If we go to Stringer et al’s figure S1, the situation becomes quite clear:

“Orientation tuning curves of the 400 most tuned neurons in each experiment (as assessed by orientation selectivity index)…” As in DiPoppa et al, neurons that happen to be firing at highish rates (according to some arbitrary criterion) coincidentally with the presentation of the “stimulus” are defined as tuned, and their firing is causally attributed to the “stimulus.” Practitioners of such methods seem to be totally unaware of the massive confounds involved.

Finally, I have to note the reference to “classical grating stimuli.” The only meaning of “classical” here is to indicate stimuli that have been used continuously for at least fifty years, so that the correlation-fishing nature of the neuron-stimulus correlations will not be obvious. The method could just as (in)validly be used to identify “kitty-tuned” neurons. Even more plausibly, perhaps, given the utter absurdity of the rationale underlying the use of gratings.

Why It’s Easier To Be A Neuroscientist Today Than An Astrologer

Screen Shot 2019-06-26 at 2.58.57 PM

Believe it or not, by today’s standards, the demands made on astrologers are more stringent than those made on “neuroscientists.”

Sure, neuroscientists today use lots of high-tech equipment and fussy, complicated techniques; but they’ve arranged things so that they’ll will always (seem to) turn out the way they want; so that their experiments can never prove their most basic assumptions wrong, even if they are.

If you’re an astrologer, you believe that people’s personality traits are determined by the stars. If you’re a Gemini, you’re a certain way, a Taurus, another, and so on. Lots of times, these “predictions” come true – no wonder Joe has a hot temper, he’s an Aries! But they can also be challenged; Shannon is hot-tempered, too, which is strange, because she was born under a milder star. Of course, astrologers can always resort to more detailed astrological analyses to rationalize apparent discrepancies; but at each step, their “predictions” may be falsified or challenged. The lack of reliability of astrologers’ predictions are one reason we don’t let them publish in scholarly journals.

Now, imagine this:

All records of birth dates disappear, as well as memories. The astrologers step in; they can fix it! They ask everyone for a detailed bio, and perform analyses based on their astrological assumptions about the connection between the stars and personality. Approximate dates of birth are then assigned based on the results of the test; if the test says you’re a Pisces, then we’ll presume you were born in February or March, and so on. Note the difference between this scenario and the previous one. In the previous case, astrologers assumptions could be shown to be wrong, based on failure to make accurate predictions. In this one, the assumptions are taken as true a priori, and analyses simply lie on top. The assumption, in other words, that having a particular personality/behavior is caused by particular alignment of the stars at a certain moment (of your birth) is used to label/define you as an Aquarian, etc. What if your future behavior doesn’t align with the label assigned? Well, in that case the astrologers, as mentioned above, are allowed to keep the label, but argue that the discrepancy is due to other, complicating factors.

As I discussed in the previous post, neuroscientists do pretty much the same thing: They assume that a particular neuron’s (part of a network of billions or trillions of connections) “personality” falls into a small number of simple categories of “preference,” and that an instance of a coincidence in time between a neuron’s high activity and the presence of an exemplar of that “preferred” category licenses them to label the neuron (post hoc) as, e.g. an “orientation detector” with a particular “tuning curve.” The fact that such findings do not replicate (the coincidences don’t repeat) is treated as “variability” in neural activity due to complicating factors. The “orientation preference” assumption, in other words, is carved in stone, and violations are explained away.

That this technique may be used to support any assumptions, even the most untenable, is evidenced dramatically by the continued claims of the existence of “spatial filters.” In general, the small set of claimed “preferences” of “visual” neurons are historical artifacts dating back many decades. They survive because never challenged.

Going back to our astrologers: Let’s imagine that, having been given license to treat their assumptions about birth and stars as true, astrologers then decided to expand their research program. They could, for example, ask question about the role of star sign in determining success in various professions. They could collect data on professional success, employing various parameters, and perform linear regressions to find whether it’s better to be a Virgo or a Libra if you want to be a neuroscientist. Or they could dig deeper to see whether there is an interaction between the sign of a student and their PI. Naturally, they would couch their results in probability terms, employing Bayesian “default priors” to fit in with the current zeitgeist.

Note, again, the astrologers would be taking no risks here; again, their underlying assumptions are not on the line. They aren’t required to test them by making any predictions about the results of their investigation; they simply describe certain arbitrary parameters of their sample, with whatever mathematical techniques and assumptions they choose to assess them. To data-mine/p-hack their sample, in other words. This is what neuroscientists are doing when they collect “data” and then mine it for correlations with “behaviors” on certain “tasks,” etc.

Will such correlations found post hoc apply in general? As a rule, they don’t.

This is the case with the analogous practices in neuroscience, as many have acknowledged. They include Konrad Kording of UPenn, speaking on Waterloo Brain Day 2017. Addressing the issue of why generalization studies of correlation-fished results are “never, or almost never” performed, he replies:

I’ll tell you why…All my generalization studies fail, almost all of them, both in psychophysics and in data analysis.”

If our astrologers were, as a result of their inability to achieve reliable results via correlation-fishing built on arbitrary assumptions, to engage in years of earnest discussions about their field experiencing “replication” or “reproducibility” crises, and found (and fund) a “Center for Reproducible Astrology,” and still continue on with business as usual…they would be acting like neuroscientists in good standing.

FURTHER READING FROM THE BLOG

Contemporary Neuroscience Depends on Outright p-hacking

Bondy, Haefner & Cumming Base Their Post Hoc Correlational Study on Correlations They Say (Correctly) Don’t Exist

It Is Bullshit: None of it Replicates

Neuroscience Newspeak, Or How to Publish Meaningless Facts

The Miracle of Spatial Filters

Why Correlational Studies Are Fake Science

Nature Neuroscience Starts Year Strong With Correlation-Fishing from Yale, Mount Sinai

 

Why “Correlational Studies” Are Fake Science

Screen Shot 2019-06-14 at 9.53.35 AM
The brain as “neuroscientists” crystal ball; they see what they want to see.

It seems that the dominant practice in “visual neuroscience” today is to take some “stimulus,” wave it in front of a human or animal subject, and record brain activity.  Correlations in time between this activity (as defined by some arbitrary metric, e.g. averaging over arbitrary time intervals), and exposure to the “stimulus” are then described as “responses” to the “stimulus.”

The metrics are ad hoc and flexible. In Lau et al (2019), for example, we have this:

Responses that fell above the top 2.5%, or below the bottom 2.5%, of this distribution were considered significantly excitatory or inhibitory respectively.

Even the neurons to which the “data” are supposed to correspond are “putative:”

…neurons with waveforms that had an interval of 0.5 ms or less and a trough/peak amplitude ratio of >0.5 were designated as putative PV neurons.

Do you see what’s going on here?  The expectation that there exist certain neurons that “respond” to whatever investigators imagine their “preference” to be is, in the circumstances, a sure-fire prediction. There will always be more or less electrical activity at any given brain location at any given moment. We could “link” highish or lowish points of activity with any external event we like –  it’s a low-to-no-risk operation. The method doesn’t punish you for being wrong, for not understanding anything about your system.. If you want to rack up a relatively higher number of coincidental correlations, you simply use a low p-value, such as the p < .05 criterion (understandably) still very popular in the “neuroscience” literature.

And voila – Nature paper, Science paper, Neuron paper, Current Biology paper, etc.

The procedure is exactly like trying to discover psychics among a group of people. First, you assume that some people are psychic. Then, you choose a decision criterion – on what basis will you classify certain people as psychic? You could, for example, ask them to guess at the the number on a playing card without looking, and classify the ones that got a certain proportion correct as psychic. The idea that some people are psychic wouldn’t need to be true for us to be able to classify some of our subjects as such. It wouldn’t matter that the idea violates the known laws of physics.

Similarly, it doesn’t matter that the idea of neurons as “detectors” “signalling” things (the notion is implicitly homuncular) via highish firing rates violates basic logic and known facts; post hoc correlation-fishing doesn’t care about fact, doesn’t care about logic, doesn’t care about truth. It’s a racket.

 

Image credit: Screenshot taken from video by shaihulud. 

 

 

 

 

 

Signal detection theory’s “Ideal observer:” A device for obscuring empirical failure.

Screen Shot 2019-05-04 at 10.04.43 AM
Mr. Perfect: The Ideal Observer

The “signal detection theory” approach, despite the self-evident absurdity of its assumptions, forms part of the bone structure of contemporary neuroscience. The most recent example I’ve examined at is a biorxiv preprint from the Churchland lab at Cold Spring Harbor, and it got me thinking again about how the whole scheme works.

The title of the paper is “Lapses in perceptual judgments reflect exploration.” The term “lapses” (like the term “perceptual judgments) is part of SDT terminology. Unpacking it helps reveal the way this cargo cult (and I’m not being dramatic) functions.

SDT’s concept of “lapse rate” is intrinsically tied to SDT’s concept of the “ideal observer.”  It took me a while to appreciate the role of this device; thinking about the Churchland paper helped bring it home.

They key to a successful pseudoscience – and SDT, with its sixty-year run so far, is certainly successful – is to immunize its assumptions from the challenges and insults of reality. This involves, first, a willful blindness to logical contradictions and contradiction with fact – but that is not the topic of this post – and, second, an arrangement of experimental practices and analytical techniques that eternally ensure a superficial consistency with “prediction.”

One of the safest ways, of course, to ensure that your predictions are consistent with the results of experiment are to make predictable predictions.

The basic “prediction” of SDT is that data will have a sigmoid shape. Obviously, data having such a form can mean a million different things – it depends on the conditions, the manipulations, etc. For SDT’ers studying the brain, it reflects the effects of very specific assumptions about brain function (which I’ve described in the link above).

The sigmoid shape of the data is easily achieved by setting up a situation where subjects are required to make a binary choice, and in which giving the “correct” answer becomes more and more difficult. If you set up such a situation, and then refer to the stimuli that produce correct responses as the “preferred (by neurons) stimuli” and the more ambiguous stimuli as “noisy” and the stimulus set as a whole as “sensory evidence” consisting of different “stimulus levels” then congratulations, you can be an SDT’er in good standing! Your results may now be interpreted via the assumptions of SDT.

Wait, there’s another step. Because subjects are being forced to make a choice, -regardless of whether that choice reflects any relevant experience (due to stimulus ambiguity) – SDT practitioners realized that subjects might be guessing! If subjects are guessing, SDT’ers reason, then the proportion of “correct” answers will be 50%. So, prior to their experiments, SDT’ers adjust their basic stimuli and conditions to produce some particular level of correct choices, somewhere between 65% and 80%. Raposo et al (2012), for example, (a paper to which the “Lapse rate” paper refers us for detailed methods), states that:

“The amplitude of the events was adjusted for each subject so that on the single-sensory trials performance was70–80% correct and matched for audition and vision.”

For some reason, such figures are treated by SDT’ers as not subject to statistical uncertainty – that is, they never seem to report having conducted a test to show that their 75% etc, figure is statistically different from chance.

Now we’re all prepped to ensure we’ll get sigmoidish curves going forward to our “experiments,” and will be free to apply the neurons-as-noisy-detectors-binary-probability-distribution-comparison mechanism explanation to our data.

BUT. Even with such conscientious preparation, data don’t tend to come out quite in line with SDT prediction. Something about the tails…Even turning a blind eye to the basic absurdity of its assumptions, empirical failures might make an SDT’er stop and think. And indeed, they did…and concluded that the theory was right and the data were wrong. Fortunately, math comes to the rescue to correct the curves, in the form Mr. Perfect, the “ideal observer.”

The ideal observer is an imaginary subject who would produce the results SDT “predicts,” and thus (superficially) license the claim that subjects’ choices are based on comparing firing rates of individual neurons with distributions of potential firing rates of these neurons. (The fact that this scenario doesn’t correspond to any aspect of subjects’ experience doesn’t seem to matter).

Unlike the ideal observer, real subjects are sloppy, biased, whatever, and tend to “lapse,” producing undesirable curves. As the Churchland team writes:

In practice, the shapes of empirically obtained psychometric curves do not perfectly match the ideal observer [i.e. don’t match prediction] since they asymptote at values that are less than 1 or greater than This is a well known phenomenon in psychophysics (Wichmann and Hill, 2001), requiring two additional lapse parameters to precisely capture the asymptotes.”

A few free parameters later, the assumptions of SDT with respect to the basic mechanism underlying the “data” remain intact. Now, proposals for additional mechanisms may be layered on top to “explain” deviations in the “data.”

Notice that it doesn’t matter whether or not the assumptions underlying the “ideal observer” concept are true or false. They’re never tested, just taken for granted, with mathematical fixes taking care of prediction failures, which occur despite strenuous preliminary stage-managing.

SDT’ers are apparently not confident enough about the lapse rate concept for it to rate a mention on Wikipedia, and discussion is very patchy. The Churchland team runs with it, though, and proposes a mechanism for the deviations of their “data” from prediction. The proposal is, of course, silly, but that’s not the subject of this post.

 

 

 

Adelson & Movshon’s (1982) “Phenomenal coherence of moving visual patterns,” a classic in an ongoing pseudoscientific tradition in motion perception

Screen Shot 2019-03-07 at 9.12.15 AM

Adelson & Movshon’s 1982 Nature paper, “Phenomenal coherence of moving visual patterns,”  cited to this day, is a good illustration of the difference between science and pseudoscience.

The brand of pseudoscience I will be discussing here has several basic features.

First, it draws general inferences from post hoc observations of special cases while studiously ignoring cases that contradict these inferences, as well as logical inconsistencies within among their theoretical assumptions. The chief characteristic of real science – the ability to make forward-looking predictions that put its assumptions to the test – is lacking. Pseudoscience is a riskless game. When stories built around a set of special cases are perpetuated and elaborated over generations, then we are dealing with a pseudoscientific tradition.

Before going on to show that Adelson & Movshon and their successors are part of such a pseudoscientific tradition, I want to offer an imaginary analogue of how they operate.

Imagine that Martians come to Earth and discover a cylindrical object floating in a pond filled with fish. It happens to be what we call a cork. They then proceed to construct an account of why this object floats. Their explanation hinges on certain features of the particular object – e.g. that it is cylindrical, that it has a little nick on one side, that it is tan-colored, that it has a particular texture, that it’s contains a molecule with a particular structure; and  also on certain features of the pond, e.g. that it contains fish. They write up their report, successfully submit it to Martian Nature and advance in their careers.

Naturally, an investigation of a wider set of objects and conditions would expose the Martians’ mistake in treating incidental features of the cork and pond as causal factors underlying the phenomenon of interest. To protect their pet “theory,” the Martians make sure that future studies of flotation always revolve around corks and ponds, on which they  make more and more post hoc observations referencing more and more incidental features and around which they construct ever more elaborate ad hoc stories.When a colleague asks, “What about that floating light bulb over there?” they pretend not to hear, or smugly describe their ad hoc theories as “partial explanations.”

It should be obvious that constructing robust, general explanations of why objects float would require much more time, effort, ingenuity and a wider field of view than concocting casual explanations based on easily observed or incidental features of particular cases while failing to examine this explanation critically or acknowledge contradictory evidence. It should also be evident that the stories constructed by these Martians will not figure additively in the construction of the true explanation.

Why the obsession with Gabor patches?

If you’ve ever wondered why vision science for decades seems almost exclusively interested in stimuli consisting of circular areas of light and dark stripes – the famous “Gabor patches” – it is for this reason; to provide perpetual, if thinly cosmetic, cover for the simplistic, fragile, irrational ad hoc stories built around these forms.

The story Adelson & Movshon are offering about motion perception in this paper is such a story.

They begin their abstract by stating that:

When a moving grating is viewed through an aperture, only motion orthogonal to its bars is visible…

The statement is false; it only applies to a narrow set of conditions. A more complete picture is described in a 1935 text cited by Adelson & Movshon:

There exists a tendency for line motion to be perpendicular to line orientation, and also a tendency to perceive motion along one of the cardinal directions of space. Above all, however, the perceived direction of motion runs as parallel as possible to the direction of the edge of the aperture that the line happens to intersect. If the two aperture edges that are intersected simultaneously have different orientations, then the line pursues an intermediate direction; being a subject of psychophysical `self-organization’…” (Wallach, 1935).

Wallach’s is a much more subtle and complex description with profound and difficult theoretical implications.

 Adelson and Movshon prefer the more easily digestible, ad hoc version.

Their claim is descriptively correct when the “aperture” is circular. If we were to change the shape of the aperture – to make it rectangular, for example – then Wallach’s general statement would remain true, but A & M’s would fail. To put it another way: Wallach’s claim applies as a general principle over all known cases; Adelson & Movshon’s claim is a description of a special case. General hypotheses founded on it cannot be seriously entertained as explanatory. They are not robust. Scientists respect the phenomena, take on challenges, and actively look for weaknesses in their accounts; pseudoscientists avoid challenges and turn a blind eye to contradictions.

The use of circles and the false statements about the nature of aperture motion continues. We may find the same unqualified claim about orthogonality in a review on motion perception and the brain by Born & Bradley (2005), who state simply: 

A moving edge seen through an aperture appears to move perpendicularly to itself…

A very recent example of a study treating employing circular apertures and treating orthogonal motion as the general case are Junxiang et al (2019)/Journal of Neuroscience.

So, a false, simplistic assumption about motion perception has been imported into the neuroscience age and serves as the basis for explanations about brain function; and since neuroscience as a whole has adopted the post hoc/ad hoc approach to theorizing – as a result suffering from a replication crisis that shows no signs of abating, – and has adopted Orwellian language to hide its barrenness, this false assumption remains safe in its pseudoscientific cocoon, and the pseudoscientific tradition it underpins remains strong and has even colonized new lands.

More on Gabors

The stimuli Adelson and Movshon employ (and which are employed generally under the label “Gabor patches”) are not “gratings” consisting of simple lines or bars, with solid black areas alternating with solid white areas. Their colors gradually change from light to dark.

The use of these patterns is linked to another pseudoscientific notion, this one quite bizarre, irrational and,  of course, unsupported. It is the idea, to quote Adelson and Movshon, that, “visual analyzing mechanisms selective for…spatial frequency exist at a relatively early stage in the visual pathway.”

The idea that mechanisms analyzing “spatial frequency” of patterns exist at any stage of the visual process is patently absurd. I’ve addressed the reasons why in a separate blog post; among them is the fact that there is no possible utility to such a function, nor is there any conceivable mechanism by which it could be achieved, given that this “analysis” first requires synthesis of the point stimulation on the retina.

The assumption was protected by, first and foremost, being deaf to reason and restricting experimental activity to manipulations of the features of this narrow category of patterns, describing effects post hoc, and and drawing ad hoc inferences presented as general principles.

This is exactly what Adelson and Movshon do here. They are drawing various technical-sounding inferences by acting like the Martians messing around with corks and ponds and pretending to make general discoveries about the nature of floating objects. If you pointed to a floating feather, or a floating  light bulb, or a floating corpse, they would be at an utter loss for words. Similarly, if you pointed Adelson and Movshon, even today, to known facts of motion perception – you can read about them in an online translation of a chapter in an 82-year-old text (Metzger, 1937, translation made available by Brandeis University) – these practitioners of pseudoscience would be at a total loss. They would do what they’ve always done, simply look away. It’s worked well so far.

Update #1: Pseudoscientists never lose

It occurred to me after finishing this post that there is a very well-known effect that contradicts the “orthogonal motion” claim, and this is the barberpole illusion. Again, it’s the nature of pseudoscience that gross contradictions go unnoticed or politely ignored. 

Update #2: Pseudoscientists never lose redux

Two recent articles illustrate the way that the dogma of “spatial frequency tuning” is protected by its followers. In “Mechanisms of Spatialtemporal Selectivity in Cortical Area MT” (2019/Neuron), Pawar et al describe how this selectivity is contingent on a variety of stimulus “features:” 

“…even interactions among basic stimulus dimensions of luminance contrast, spatial frequency, and temporal frequency strongly influence neuronal selectivity. This dynamic nature of neuronal selectivity is inconsistent with the notion of stimulus preference as a stable characteristic of cortical neurons.”

Even when results are all over the place, the spatial tuning concept remains in place – it is now merely described as “unstable.” The idea that neurons are “signalling” spatial frequency via their firing rate but that that firing rate is contingent on a bunch of other factors is even more senseless than the simpler notion.

In “Single-Neuron Perturbations Reveal Feature-Specific Competition in V1,” (2019/Nature), Chettih & Harvey also find instability in their desired correlations: 

“Precise levels of decoding accuracy were variable from experiment to experiment, depending on the number and tuning of imaged cells as well as overall signal quality. …This is of note because the tuning bias also causes different grating orientations to be more or less likely to be matched to the tuning preferences of photostimulated neurons.”

Again, the fundamental tenet that neurons are tuned to “spatial frequency,” as irrational as it is, is never questioned, despite needing to be qualified beyond recognition.

Naturally, the data in both papers is correlation-fished and fitted via assumptions – such as linear models, Gaussian “priors” – chosen because they make the math easy, not because of any rationale. None of the authors seems to have considered how their mathematical acrobatics and probability functions can illuminate how we see that elephant over there.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nature Neuroscience Starts Year Strong with Correlation-Fishing from Yale, Mount Sinai

DSCN1142

I just wanted to highlight another of my many PubPeer comments. The paper is called “Neural computations of threat in the aftermath of combat trauma,” and it’s by P. Homan and ten other authors, all but one affiliated with either Yale University or New York’s Mount Sinai. I’m picking on the big names, for obvious reasons. The connection with medical issues makes the scientifically invalid nature of the project especially disturbing. My edited PubPeer comment is replayed below. (To anyone who might object that my comments are redundant, I’ll admit it, it’s largely true. All neuroscience pubs today are largely the same paper; all post hoc, all analyzed casually and formulaically. The goal is to make a little noise over it.)

This is a correlational study. As such, it is subject to the stricture that “correlation doesn’t mean causation.” There is no way around this; no amount or kind of statistical analysis can make post hoc observations of correlations, regardless of p-value, reliable, because thet do not and cannot identify confounds.

As Mehler and Kording (2018) have noted, and as is done by Homan et al, practitioners in the field of neuroscience routinely and inappropriately employ causal language to describe correlational findings. From Mehler and Kording:

“[C]ausal statements us[e] filler verbs (e.g. to drive, alter, promote) as a form of storytelling (Gomez-Marin, 2017); …this is a major mistake, which [causes] the field to misrepresent its findings.” In other words, to lie.

For example, if investigators observe that during a point in a procedure when a particular stimulus x was present (or a particular neuron x or brain area x was especially active), activity was simultaneously (or successively) higher in neuron y, they may describe neuron x, or brain area “x,” as having been “driven” by the stimulus, neuron, etc. This is inappropriate; it is part and parcel of the post hoc fallacy to overlook the fact that, in a complex system under complex conditions, both internal and external, there is a universe of confounds that we could have pointed to as our “causal” explanation. Here is how Mehler and Kording put it:

“Our argument about the impossibility to obtain causality from functional connectivity [I think the term functional here is misused] …rests on a simple consideration of the factors that are known to make causal inference theoretically impossible…we record only few of the neurons when recording spikes, or a few projections of neural activities in imaging. We have no convincing reasons to assume that the observed dimensions should be more important than the unobserved. …if the bulk of causal interactions happen from and within unobserved dimensions, then the correlations between observed variables are simply epiphenomena. Correlation is not causation, regardless the mathematical sophistication we use when calculating it. Causal inference algorithms that work with observational data are generally built on the assumption of causal sufficiency, which boils down to there being no unobserved confounders (although see Ranganath and Perotte, 2018; Wang and Blei, 2018). … Recording only few variables in a densely interacting causal system generally renders causal inference impossible (Jonas Peters et al., 2017; Pearl, 2009a). When analyzing spike data, there are far more unobserved variables than observed variables. When we record a few hundred neurons (Stevenson and Kording, 2011), the number of recorded neurons is a vanishingly small subset of all neurons. We have little reason to assume that the recorded neurons are much more important than the un-recorded neurons. …the confounding signal should be many orders of magnitude more important than those coming from observed data. ….When analyzing imaging data such as fMRI, or LFP, EEG, or MEG, there are also far more unobserved variables than observed variables. Within each signal source, we can, in some abstraction, observe the sum of neural activity. …The signals that we can measure are arbitrarily low-dimensional relative to the full dimensionality of communication in the brain. As such we are still in the situation where we have a number of confounders that is many orders of magnitude larger than the number of measured variables. This again puts us into the domain where causal inference should be impossible.

The type of storytelling Mehler and Kording describe is the type of storytelling we find in Homan et al. Their language throughout implies causality:

“enhanced sensitivity to prediction errors was partially mediated by the striatal associability computations,”

“we found that the neural computations that were shaped by these altered prediction-error weights”

“decreased neural tracking of value in the amygdala…”

“These results provide evidence for distinct neuro-computational contributions to PTSD symptoms.” No, they really don’t.

It’s not as though the authors are really unaware of what they’re doing (being unaware in itself would be bad enough):

“While our results do not allow us to draw causal inferences, our data do support the notion that veterans may develop more severe PTSD symptoms in response to altered neural computation of value and associability in several brain regions.”

The above statement is frankly paradoxical. The authors initially admit that their results “do not allow us to draw causal inferences;” and immediately contradict themselves by suggesting causal inferences may be drawn from their data. Of course, these inferences are inevitably trivial, because atheoretical, correlational studies produce nothing more than a hodge-podge of uninterpretable outcomes, allowing only the most trivial general interpretations:

“These results provide evidence for distinct neurocomputational contributions to PTSD symptoms.” Would anyone imagine that processes underlying different (or even the same!) symptoms of any human experience would be identical? The bar is so low no study was even necessary to achieve it. This is also why the title of the paper is so uninformative.

Homan et al’s post hoc analyses of their observational data also involve major ad hoc assumptions. For example:

“Following classic computational learning theory(31), we assumed a deterministic learning model and a probabilistic observation model to describe the generation of our data.”

The use of terms like “classic” as arguments for the validity of procedures is widespread and just as inappropriate as treating correlations as causal relationships. Never-validated notions aren’t like fine wine – they don’t mature with age. On the contrary, even corroborated scientific hypotheses tend to lose their shine as time goes by. In addition, what we know about living processes makes it a sure thing that brain processes are not “deterministic.” So readers may draw their own conclusions about the thinking that preceded data collection and analysis here. (I would add that major assumptions should be aired and justified in the exposition of a paper, not buried in the methods section as though they were a trivial matter. ) Earlier in the paper we also encounter a reference to a “prominent” learning “theory” – again, popularity is commonly used as a proxy for empirical validity.

As it happens, the citation supporting the reference for the “classic learning theory” seems inappropriate as well, having been published in 2016. The article is titled “POSTTRAUMATIC STRESS SYMPTOMS AND AVERSION TO AMBIGUOUS LOSSES IN COMBAT VETERANS” Relevant quote: “Behavioral economics provides a framework for studying these mechanisms at the behavioral level. Here, we utilized this framework…” So the “classic” learning paradigm for this brain study was apparently lifted straight out of economics.

We also learn that the authors “used a hybrid version of the Pearce– Hall learning model to estimate the computations performed during associative threat learning (7–9) and how the behavioral and neural tracking of these computations relate to PTSD symptom severity.”

“…we used a hybrid version…” With what rationale? It is truly remarkable that a “scientific” study exploring the most complex object known (the brain) can simply pull a “model” out of a hat, subject correlational data to it, and publish the resulting “story” in a major journal as though it meant something.

“The threshold for this analysis was set at P < 0.05, two-tailed.”

The use of the p<.05 criterion has been almost universally censured for producing too many false positives. (Of course all “findings” in a correlational study must be treated as false positives, but at least for appearances the criterion could be tightened…but perhaps that would produce too few false positives….).

“The relationship between latent learning parameters (see below) and PTSD symptomatology was estimated with a linear regression model.”

As Kording has also noted, “The brain is non-linear and recurrent and we use techniques that have been developed for linear systems and we just use them because we can download the code and it’s easy.” Is this why Homan at all used it? It would be nice to know if there was a better reason.

“Data distribution was assumed to be normal but this was not formally tested.” Good enough.

“We assumed the likelihood of each trial’s SCR Sn to be an independent and identically distributed Gaussian distribution around a mean determined by value, associability or the combination of both value and associability…” Why not? In a scientific study you’re allowed to assume anything you want.

“For the Rescorla–Wagner model, individual parameters were assumed to be drawn from group-level normal distributions. Normal and half-Cauchy distributions were used for the priors of the group-level means and standard deviations, respectively.” Why not? Anything goes.

….. Looking a little more closely at the results section, the data-fishing quality of this paper really shines through. It appears that it began with complete failure; no differences were found with respect to task performance:

“Irrespective of symptoms, veterans show successful reversal learning. Combat-exposed veterans (N=54 participants) successfully acquired and reversed threat conditioning, as assessed by the differential SCR (face A versus face B) in the two phases of the task (Fig. 1b). To test for a potential relationship between threat reversal and PTSD symptoms, we used a linear regression with threat reversal index as predictor and CAPS scores as the outcome. Reversal index was calculated by subtracting stimulus discrimination in reversal (that is, face A minus face B) from stimulus discrimination in acquisition (Fig. 1b). Controlling for irrelevant variables (age and gender), the regression revealed no significant relationship between symptoms and reversal learning (β=0.02, t(50)=0.13, two-tailed, P = 0.894). We also did not find evidence that PTSD symptoms were related to stimulus discrimination during threat acquisition only (β=0.03, t(52)=0.22, two-tailed, P=0.827) or during the reversal phase only (β = 0.02, t(52) = 0.12, two-tailed, P = 0.901). Additional ways of categorizing veterans as highly and mildly affected did not reveal any significant results(see Methods, ‘Sample characteristics’). These results motivate the [a post hoc fishing expedition using casually-adopted ad hoc assumptions via] a computational approach that could potentially reveal latent learning differences across individuals exposed to combat trauma.” Potentially? Even that’s too strong.

And so it began.