Adelson & Movshon’s (1982) “Phenomenal coherence of moving visual patterns,” a classic in an ongoing pseudoscientific tradition in motion perception

Screen Shot 2019-03-07 at 9.12.15 AM

Adelson & Movshon’s 1982 Nature paper, “Phenomenal coherence of moving visual patterns,”  cited to this day, is a good illustration of the difference between science and pseudoscience.

The brand of pseudoscience I will be discussing here has several basic features.

First, it draws general inferences from post hoc observations of special cases while studiously ignoring cases that contradict these inferences, as well as logical inconsistencies within among their theoretical assumptions. The chief characteristic of real science – the ability to make forward-looking predictions that put its assumptions to the test – is lacking. Pseudoscience is a riskless game. When stories built around a set of special cases are perpetuated and elaborated over generations, then we are dealing with a pseudoscientific tradition.

Before going on to show that Adelson & Movshon and their successors are part of such a pseudoscientific tradition, I want to offer an imaginary analogue of how they operate.

Imagine that Martians come to Earth and discover a cylindrical object floating in a pond filled with fish. It happens to be what we call a cork. They then proceed to construct an account of why this object floats. Their explanation hinges on certain features of the particular object – e.g. that it is cylindrical, that it has a little nick on one side, that it is tan-colored, that it has a particular texture, that it’s contains a molecule with a particular structure; and  also on certain features of the pond, e.g. that it contains fish. They write up their report, successfully submit it to Martian Nature and advance in their careers.

Naturally, an investigation of a wider set of objects and conditions would expose the Martians’ mistake in treating incidental features of the cork and pond as causal factors underlying the phenomenon of interest. To protect their pet “theory,” the Martians make sure that future studies of flotation always revolve around corks and ponds, on which they  make more and more post hoc observations referencing more and more incidental features and around which they construct ever more elaborate ad hoc stories.When a colleague asks, “What about that floating light bulb over there?” they pretend not to hear, or smugly describe their ad hoc theories as “partial explanations.”

It should be obvious that constructing robust, general explanations of why objects float would require much more time, effort, ingenuity and a wider field of view than concocting casual explanations based on easily observed or incidental features of particular cases while failing to examine this explanation critically or acknowledge contradictory evidence. It should also be evident that the stories constructed by these Martians will not figure additively in the construction of the true explanation.

Why the obsession with Gabor patches?

If you’ve ever wondered why vision science for decades seems almost exclusively interested in stimuli consisting of circular areas of light and dark stripes – the famous “Gabor patches” – it is for this reason; to provide perpetual, if thinly cosmetic, cover for the simplistic, fragile, irrational ad hoc stories built around these forms.

The story Adelson & Movshon are offering about motion perception in this paper is such a story.

They begin their abstract by stating that:

When a moving grating is viewed through an aperture, only motion orthogonal to its bars is visible…

The statement is false; it only applies to a narrow set of conditions. A more complete picture is described in a 1935 text cited by Adelson & Movshon:

There exists a tendency for line motion to be perpendicular to line orientation, and also a tendency to perceive motion along one of the cardinal directions of space. Above all, however, the perceived direction of motion runs as parallel as possible to the direction of the edge of the aperture that the line happens to intersect. If the two aperture edges that are intersected simultaneously have different orientations, then the line pursues an intermediate direction; being a subject of psychophysical `self-organization’…” (Wallach, 1935).

Wallach’s is a much more subtle and complex description with profound and difficult theoretical implications.

 Adelson and Movshon prefer the more easily digestible, ad hoc version.

Their claim is descriptively correct when the “aperture” is circular. If we were to change the shape of the aperture – to make it rectangular, for example – then Wallach’s general statement would remain true, but A & M’s would fail. To put it another way: Wallach’s claim applies as a general principle over all known cases; Adelson & Movshon’s claim is a description of a special case. General hypotheses founded on it cannot be seriously entertained as explanatory. They are not robust. Scientists respect the phenomena, take on challenges, and actively look for weaknesses in their accounts; pseudoscientists avoid challenges and turn a blind eye to contradictions.

The use of circles and the false statements about the nature of aperture motion continues. We may find the same unqualified claim about orthogonality in a review on motion perception and the brain by Born & Bradley (2005), who state simply: 

A moving edge seen through an aperture appears to move perpendicularly to itself…

A very recent example of a study treating employing circular apertures and treating orthogonal motion as the general case are Junxiang et al (2019)/Journal of Neuroscience.

So, a false, simplistic assumption about motion perception has been imported into the neuroscience age and serves as the basis for explanations about brain function; and since neuroscience as a whole has adopted the post hoc/ad hoc approach to theorizing – as a result suffering from a replication crisis that shows no signs of abating, – and has adopted Orwellian language to hide its barrenness, this false assumption remains safe in its pseudoscientific cocoon, and the pseudoscientific tradition it underpins remains strong and has even colonized new lands.

More on Gabors

The stimuli Adelson and Movshon employ (and which are employed generally under the label “Gabor patches”) are not “gratings” consisting of simple lines or bars, with solid black areas alternating with solid white areas. Their colors gradually change from light to dark.

The use of these patterns is linked to another pseudoscientific notion, this one quite bizarre, irrational and,  of course, unsupported. It is the idea, to quote Adelson and Movshon, that, “visual analyzing mechanisms selective for…spatial frequency exist at a relatively early stage in the visual pathway.”

The idea that mechanisms analyzing “spatial frequency” of patterns exist at any stage of the visual process is patently absurd. I’ve addressed the reasons why in a separate blog post; among them is the fact that there is no possible utility to such a function, nor is there any conceivable mechanism by which it could be achieved, given that this “analysis” first requires synthesis of the point stimulation on the retina.

The assumption was protected by, first and foremost, being deaf to reason and restricting experimental activity to manipulations of the features of this narrow category of patterns, describing effects post hoc, and and drawing ad hoc inferences presented as general principles.

This is exactly what Adelson and Movshon do here. They are drawing various technical-sounding inferences by acting like the Martians messing around with corks and ponds and pretending to make general discoveries about the nature of floating objects. If you pointed to a floating feather, or a floating  light bulb, or a floating corpse, they would be at an utter loss for words. Similarly, if you pointed Adelson and Movshon, even today, to known facts of motion perception – you can read about them in an online translation of a chapter in an 82-year-old text (Metzger, 1937, translation made available by Brandeis University) – these practitioners of pseudoscience would be at a total loss. They would do what they’ve always done, simply look away. It’s worked well so far.

Update #1: Pseudoscientists never lose

It occurred to me after finishing this post that there is a very well-known effect that contradicts the “orthogonal motion” claim, and this is the barberpole illusion. Again, it’s the nature of pseudoscience that gross contradictions go unnoticed or politely ignored. 

Update #2: Pseudoscientists never lose redux

Two recent articles illustrate the way that the dogma of “spatial frequency tuning” is protected by its followers. In “Mechanisms of Spatialtemporal Selectivity in Cortical Area MT” (2019/Neuron), Pawar et al describe how this selectivity is contingent on a variety of stimulus “features:” 

“…even interactions among basic stimulus dimensions of luminance contrast, spatial frequency, and temporal frequency strongly influence neuronal selectivity. This dynamic nature of neuronal selectivity is inconsistent with the notion of stimulus preference as a stable characteristic of cortical neurons.”

Even when results are all over the place, the spatial tuning concept remains in place – it is now merely described as “unstable.” The idea that neurons are “signalling” spatial frequency via their firing rate but that that firing rate is contingent on a bunch of other factors is even more senseless than the simpler notion.

In “Single-Neuron Perturbations Reveal Feature-Specific Competition in V1,” (2019/Nature), Chettih & Harvey also find instability in their desired correlations: 

“Precise levels of decoding accuracy were variable from experiment to experiment, depending on the number and tuning of imaged cells as well as overall signal quality. …This is of note because the tuning bias also causes different grating orientations to be more or less likely to be matched to the tuning preferences of photostimulated neurons.”

Again, the fundamental tenet that neurons are tuned to “spatial frequency,” as irrational as it is, is never questioned, despite needing to be qualified beyond recognition.

Naturally, the data in both papers is correlation-fished and fitted via assumptions – such as linear models, Gaussian “priors” – chosen because they make the math easy, not because of any rationale. None of the authors seems to have considered how their mathematical acrobatics and probability functions can illuminate how we see that elephant over there.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Advertisements

Nature Neuroscience Starts Year Strong with Correlation-Fishing from Yale, Mount Sinai

DSCN1142

I just wanted to highlight another of my many PubPeer comments. The paper is called “Neural computations of threat in the aftermath of combat trauma,” and it’s by P. Homan and ten other authors, all but one affiliated with either Yale University or New York’s Mount Sinai. I’m picking on the big names, for obvious reasons. The connection with medical issues makes the scientifically invalid nature of the project especially disturbing. My edited PubPeer comment is replayed below. (To anyone who might object that my comments are redundant, I’ll admit it, it’s largely true. All neuroscience pubs today are largely the same paper; all post hoc, all analyzed casually and formulaically. The goal is to make a little noise over it.)

This is a correlational study. As such, it is subject to the stricture that “correlation doesn’t mean causation.” There is no way around this; no amount or kind of statistical analysis can make post hoc observations of correlations, regardless of p-value, reliable, because thet do not and cannot identify confounds.

As Mehler and Kording (2018) have noted, and as is done by Homan et al, practitioners in the field of neuroscience routinely and inappropriately employ causal language to describe correlational findings. From Mehler and Kording:

“[C]ausal statements us[e] filler verbs (e.g. to drive, alter, promote) as a form of storytelling (Gomez-Marin, 2017); …this is a major mistake, which [causes] the field to misrepresent its findings.” In other words, to lie.

For example, if investigators observe that during a point in a procedure when a particular stimulus x was present (or a particular neuron x or brain area x was especially active), activity was simultaneously (or successively) higher in neuron y, they may describe neuron x, or brain area “x,” as having been “driven” by the stimulus, neuron, etc. This is inappropriate; it is part and parcel of the post hoc fallacy to overlook the fact that, in a complex system under complex conditions, both internal and external, there is a universe of confounds that we could have pointed to as our “causal” explanation. Here is how Mehler and Kording put it:

“Our argument about the impossibility to obtain causality from functional connectivity [I think the term functional here is misused] …rests on a simple consideration of the factors that are known to make causal inference theoretically impossible…we record only few of the neurons when recording spikes, or a few projections of neural activities in imaging. We have no convincing reasons to assume that the observed dimensions should be more important than the unobserved. …if the bulk of causal interactions happen from and within unobserved dimensions, then the correlations between observed variables are simply epiphenomena. Correlation is not causation, regardless the mathematical sophistication we use when calculating it. Causal inference algorithms that work with observational data are generally built on the assumption of causal sufficiency, which boils down to there being no unobserved confounders (although see Ranganath and Perotte, 2018; Wang and Blei, 2018). … Recording only few variables in a densely interacting causal system generally renders causal inference impossible (Jonas Peters et al., 2017; Pearl, 2009a). When analyzing spike data, there are far more unobserved variables than observed variables. When we record a few hundred neurons (Stevenson and Kording, 2011), the number of recorded neurons is a vanishingly small subset of all neurons. We have little reason to assume that the recorded neurons are much more important than the un-recorded neurons. …the confounding signal should be many orders of magnitude more important than those coming from observed data. ….When analyzing imaging data such as fMRI, or LFP, EEG, or MEG, there are also far more unobserved variables than observed variables. Within each signal source, we can, in some abstraction, observe the sum of neural activity. …The signals that we can measure are arbitrarily low-dimensional relative to the full dimensionality of communication in the brain. As such we are still in the situation where we have a number of confounders that is many orders of magnitude larger than the number of measured variables. This again puts us into the domain where causal inference should be impossible.

The type of storytelling Mehler and Kording describe is the type of storytelling we find in Homan et al. Their language throughout implies causality:

“enhanced sensitivity to prediction errors was partially mediated by the striatal associability computations,”

“we found that the neural computations that were shaped by these altered prediction-error weights”

“decreased neural tracking of value in the amygdala…”

“These results provide evidence for distinct neuro-computational contributions to PTSD symptoms.” No, they really don’t.

It’s not as though the authors are really unaware of what they’re doing (being unaware in itself would be bad enough):

“While our results do not allow us to draw causal inferences, our data do support the notion that veterans may develop more severe PTSD symptoms in response to altered neural computation of value and associability in several brain regions.”

The above statement is frankly paradoxical. The authors initially admit that their results “do not allow us to draw causal inferences;” and immediately contradict themselves by suggesting causal inferences may be drawn from their data. Of course, these inferences are inevitably trivial, because atheoretical, correlational studies produce nothing more than a hodge-podge of uninterpretable outcomes, allowing only the most trivial general interpretations:

“These results provide evidence for distinct neurocomputational contributions to PTSD symptoms.” Would anyone imagine that processes underlying different (or even the same!) symptoms of any human experience would be identical? The bar is so low no study was even necessary to achieve it. This is also why the title of the paper is so uninformative.

Homan et al’s post hoc analyses of their observational data also involve major ad hoc assumptions. For example:

“Following classic computational learning theory(31), we assumed a deterministic learning model and a probabilistic observation model to describe the generation of our data.”

The use of terms like “classic” as arguments for the validity of procedures is widespread and just as inappropriate as treating correlations as causal relationships. Never-validated notions aren’t like fine wine – they don’t mature with age. On the contrary, even corroborated scientific hypotheses tend to lose their shine as time goes by. In addition, what we know about living processes makes it a sure thing that brain processes are not “deterministic.” So readers may draw their own conclusions about the thinking that preceded data collection and analysis here. (I would add that major assumptions should be aired and justified in the exposition of a paper, not buried in the methods section as though they were a trivial matter. ) Earlier in the paper we also encounter a reference to a “prominent” learning “theory” – again, popularity is commonly used as a proxy for empirical validity.

As it happens, the citation supporting the reference for the “classic learning theory” seems inappropriate as well, having been published in 2016. The article is titled “POSTTRAUMATIC STRESS SYMPTOMS AND AVERSION TO AMBIGUOUS LOSSES IN COMBAT VETERANS” Relevant quote: “Behavioral economics provides a framework for studying these mechanisms at the behavioral level. Here, we utilized this framework…” So the “classic” learning paradigm for this brain study was apparently lifted straight out of economics.

We also learn that the authors “used a hybrid version of the Pearce– Hall learning model to estimate the computations performed during associative threat learning (7–9) and how the behavioral and neural tracking of these computations relate to PTSD symptom severity.”

“…we used a hybrid version…” With what rationale? It is truly remarkable that a “scientific” study exploring the most complex object known (the brain) can simply pull a “model” out of a hat, subject correlational data to it, and publish the resulting “story” in a major journal as though it meant something.

“The threshold for this analysis was set at P < 0.05, two-tailed.”

The use of the p<.05 criterion has been almost universally censured for producing too many false positives. (Of course all “findings” in a correlational study must be treated as false positives, but at least for appearances the criterion could be tightened…but perhaps that would produce too few false positives….).

“The relationship between latent learning parameters (see below) and PTSD symptomatology was estimated with a linear regression model.”

As Kording has also noted, “The brain is non-linear and recurrent and we use techniques that have been developed for linear systems and we just use them because we can download the code and it’s easy.” Is this why Homan at all used it? It would be nice to know if there was a better reason.

“Data distribution was assumed to be normal but this was not formally tested.” Good enough.

“We assumed the likelihood of each trial’s SCR Sn to be an independent and identically distributed Gaussian distribution around a mean determined by value, associability or the combination of both value and associability…” Why not? In a scientific study you’re allowed to assume anything you want.

“For the Rescorla–Wagner model, individual parameters were assumed to be drawn from group-level normal distributions. Normal and half-Cauchy distributions were used for the priors of the group-level means and standard deviations, respectively.” Why not? Anything goes.

….. Looking a little more closely at the results section, the data-fishing quality of this paper really shines through. It appears that it began with complete failure; no differences were found with respect to task performance:

“Irrespective of symptoms, veterans show successful reversal learning. Combat-exposed veterans (N=54 participants) successfully acquired and reversed threat conditioning, as assessed by the differential SCR (face A versus face B) in the two phases of the task (Fig. 1b). To test for a potential relationship between threat reversal and PTSD symptoms, we used a linear regression with threat reversal index as predictor and CAPS scores as the outcome. Reversal index was calculated by subtracting stimulus discrimination in reversal (that is, face A minus face B) from stimulus discrimination in acquisition (Fig. 1b). Controlling for irrelevant variables (age and gender), the regression revealed no significant relationship between symptoms and reversal learning (β=0.02, t(50)=0.13, two-tailed, P = 0.894). We also did not find evidence that PTSD symptoms were related to stimulus discrimination during threat acquisition only (β=0.03, t(52)=0.22, two-tailed, P=0.827) or during the reversal phase only (β = 0.02, t(52) = 0.12, two-tailed, P = 0.901). Additional ways of categorizing veterans as highly and mildly affected did not reveal any significant results(see Methods, ‘Sample characteristics’). These results motivate the [a post hoc fishing expedition using casually-adopted ad hoc assumptions via] a computational approach that could potentially reveal latent learning differences across individuals exposed to combat trauma.” Potentially? Even that’s too strong.

And so it began.

 

Marinescu, Lawlor & Kording (2018) Roll Out Correlation’s New Clothes

Screen Shot 2019-02-13 at 12.19.04 PM
In addition to his boxers, the King is plausibly dressed in a dark purple velvet, fur-trimmed robe.

A recent article Marinescu, Lawlor & Kording (2018) in Nature Human Behavior – “Quasi-experimental causality in neuroscience and behavioural research” – by is hard to beat as an illustration of the uncoupling of contemporary science from the constraints of reason, fact and even language.

The authors’ purpose is to introduce readers to three analytical techniques that, according to them, will allow us to researchers to use post hoc correlational observations to infer causal relationships. This is intriguing, because, as everyone knows, and as Kording himself has observed in his 2017 Waterloo Brain Day lecture, “correlational studies mean preciously little” if results don’t “generalize,” going on to say that, in his experience, such studies almost never generalize.

It has been known since ancient times that post hoc observations don’t produce reliable predictions. For some reason, this was forgotten in recent decades, at least in practice. This amnesia led, predictably, to widespread and widely-discussed “replication” or “reproducibility” crises across all disciplines – it turns out that even contemporary science is not immune to the post hoc problem, that even a Nature-published Harvard study showing a correlation, say, between eating pink jelly beans and higher SAT scores might not actually mean that eating pink jelly beans will result in higher SAT scores.

Despite all of this, it appears that the imagination of contemporary practitioners can stretch no further than the correlational method (the hypothetico-deductive method, a proven tool of discovery, is not even entertained). We must find some tricks to make correlations must become reliable. Such tricks are what Marinescu et al are supposed to be proposing. Unfortunately, their examples and arguments are false without exception; all they achieve, in the end, is a linguistic trick – redefining correlations as “causal.”

I’ll take the methods one by one.

The first is the “Regression Discontinuity Design.”

The authors tell us it is based on “thresholds.” Making it clear from their example that the “thresholds” in question can be completely arbitrary, they go on to describe an experiment by Thistlewaite, who, to study the effect of academic recognition on students, rewarded with certificate of merit and public recognition those who achieved test scores higher than a certain designated value. He then observed their future achievements, to see if they were correlated with the rewards.

The study contains many confounds. Marinescu et al claim that they may be managed because: “As we approach the threshold score from either side, the students will become arbitrarily [?] similar….This strategy, which is based on the idea that samples just above and just below the threshold are nearly indistinguishable, is the basis of RDD.”

Are they right in saying this technique effectively removes the problem of confounds/sampling error? It is trivial to show that this is not the case.

We can easily imagine different combinations of underlying factors to be responsible for two students receiving even identical scores: One might be very smart but have studied little; another may be less gifted and have studied a lot. One may have gotten a lot of help from parents, or tutors; another none. One may have ambition with respect to the test topics, another not. One may not have been feeling well that day; may not have slept well, may be lovestruck; may test badly, or well. Marinescu et al seem to be saying that these confounders will be randomly distributed near the “threshold.” There is simply no rational basis for this claim.

To indicate the potential usefulness of their method, the authors claim that “thresholds exist widely in human behaviour and neuroscience…” It seems odd, then, that they  offer no valid examples. In the Thistlewaite example, we are obviously dealing with an artificial, not a natural, threshold. The other example provided refers to neurons; they claim that “The translation of neural drive to spiking has a firing threshold.” There is no empirical basis for this claim.

The second technique treated is the “Difference-in-Difference” method.

Marinescu et al’s description makes it clear that it is similar to the first in terms of an (in)ability to manage confounds, except by simply assuming that they don’t exist. This is admitted by the authors: “The DiD approach naturally comes with its own assumptions and caveats…Most importantly , it assumes that the two groups are chosen such that they are similarly affected by relevant and perhaps unmeasured factors…” Again, it seems the authors are purporting to solve the confounding problem by simply assuming a lack of confounds.

The third technique is “Instrumental variables.”

Again, the authors seem blind to obvious confounds in their example. They describe a study whose purpose was to determine the effects of maternal smoking on infant birth weight. Investigators decided that the correlation between tobacco taxes and birth weight would be a good proxy for the correlation of interest (better than looking directly at smoking habits and birth weights, apparently!). The rationale provided is that the tax rate criterion would eliminate the socio-economic status confound, and thus the health of the mother confound. Is underlying assumption that taxes “arguably affects smoking” reasonable? Does it eliminate sampling error? Does it reduce uninformative variability in the data? Does it shift a correlational relationships into the category of a causal relationship? The answer is clearly no. For one thing, the effect of taxes on cigarettes would clearly be dependent on socio-economic and healt status, for example. A poor mother might sacrifice food for cigarettes, or medicine for cigarettes, or suffer from withdrawal, etc., as a result of taxes. A rich mother might not change her habits at all. “Instrumental variable” seems to be simply a magic word that makes your correlations reliable. Variables may be confounded by instrumental variables, by fiat, are not.

For all three methods, the authors effectively acknowledge that they are not fit to purpose. In all cases, procedures rest on faith-based assumptions.

For the RDD, Marinescu et al refer to statistical methods for “checking whether the assumptions of RDD are valid…” and note that there are “many statistical issues to consider for the RDD…” Using statistical tests to test the validity of empirical assumptions always leads to an infinite regress, as each test in turn makes its own untestable assumptions.

With respect to the DiD, we learn, as quoted above, that the approach “naturally comes with its own assumptions and caveats…it assumes that the two groups are chosen such that they are similarly affected by relevant and perhaps unmeasured factors.” Again, it sounds like confounds in correlational studies are being “removed” merely by an act of faith. Why not apply this faith-based logic to any correlational method? Again, it is suggested that more statistical tests (again with their own untested assumptions) might be used to “support” the assumption that there are no relevant confounds.

With respect to the “Instrumental Variables” approach, we’re told that to perform an analysis we “first identify the independent and dependent variables [and] find, through an understanding of the system, another variable that can serve as an instrument…that only affects the independent variable.” Again, we are just allowed to assume that any relationship between the instrumental variable, the variable of interest, and our outcome is uncorrupted by confounding. As ever, “the Instrumental Variable approach also has its assumptions and caveats;” it assumes we can “exclude that the instrument affects the outcome other than through the independent variable.” But this assumption is not testable…so we assess it “on plausibility grounds”!! Plausibility criteria mean we casually assume knowledge about the thing we’re supposed to be investigating. Again, we are talking about acts of faith, not efforts at scientific reason and rigor.

So Marinescu et al have made a weak attempt to promote correlation as causality, apparently unaware that causal explanations are far more than merely reliable correlations, even if these were possible to find. Causal explanations are what allows us to bring about reliable correlation between conditions and effects; they rely on solid arguments – not quasi-arguments – and are subjected to rigorous empirical tests, not tests of “plausibility.”

Russ Poldrack’s mixed messages to early career researchers

In a recent colloquium titled “The Costs of Reproducibility,” (published in Neuron), Russ Poldrack focusses his attention on early career researchers who wish to do actual science, as opposed to the kind of activity that drove him and his colleagues to their academic perches.

To me, Poldrack sounds like a politician who feels your pain but doesn’t really intend to help – someone who wants to appear to be on board with the – now unavoidable – demand that something be done about a field so broken it cannot generate reproducible effects; but he wants to do this without putting any real pressure on established players who have and continue to exploit low standards of evidence to achieve a constant stream of (irreproducible) publications. He feels his colleagues pain even more than that of ECR’s.

An anonymous PubPeer contribution to the article thread, written after the original version of this blog post, aptly responded to a quote from Poldrack’s article:

“”However, there is also a great deal of concern about the potential impact of adopting these best practices on their careers, given that career incentives are not yet aligned with these practices.”

A terrifying sentence, since ‘best practices’ is here defined as “conducting reproducible research”. One could uncharitably render this sentence as saying “There is great concern among early years researchers that conducting actual scientific research could harm their careers in the field of Neuroscience””

As one of the founders of the “Stanford Center for Reproducible Neuroscience,” (yes, reproducible, not the other kind), funded by the Laura and John Arnold Foundation, Poldrack should be at the forefront of the movement to end the shabby practices that have rendered all of the expensive technological toys used in neuroscience scientifically impotent. He should be the one leading by example, censuring colleagues and journals when they stray. He should be challenging corruption at its most potent – at the top, starting with his own practices.

Instead, he chooses to put the onus on weakest, most powerless group – early career researchers, would-be academics only a tiny proportion of whom have any chance of making it into a relatively safe harbor, and who are being “bullied (and educated) into bad science” by the ones with power.

The last line of the abstract says it all:

“I highlight the ways in which ECRs can achieve their career goals while doing better science and the need for established researchers to support them in these efforts.”

In other words, it’s the ECR’s who need to take the lead in improving the science; “established researchers” need only “support them.” We’re talking about those same established researchers who have and continue to perform, promote and profit from bad practice. Why isn’t Poldrack “highlighting” the things they can do to improve science directly, rather than via mildly urging their juniors to take the lead? Do established researchers get to go on with the bad science that got them their careers, while only “supporting” juniors trying to do better?

Which established researcher will “support” junior colleagues whose ethical stance only makes them look bad? Which junior researcher wants to take the chance of being kicked to the curb?

To be fair, in the first sentence of his very last paragraph, Poldrack does manage to suggest that his colleagues take some initiative – no, “aspire” to take some initiative – to improve themselves:

“Finally, we need to lead by example, aspiring to demonstrate for our trainees the kind of integrity that Feynman spoke of as the sine qua non of science.”

Finally. But in the next breath he pivots back to put the responsibility on the people with no money and no power (to quote comedian and social commentator Jimmy Dore), concluding that “it’s our responsibility to do everything we can to advance the careers of ECRs who are focused on scientific integrity.”

“Everything we can” turns out to be not much at all:

“When ECRs ask why they should have hope that their reproducible practices will pay off, I can only say that I am hopeful that the world is changing. Researchers of my generation, who are responsible for many hiring and tenure decisions at this point, are becoming increasingly aware of the reproducibility problem, and this is starting to flavor our decisions. However, it’s also worth realizing how challenging this kind of change is and admitting that there are no guarantees.”

Can ECR’s compete with Russ Poldrack and still do good science?

Poldrack’s credibility as a spokesman for good science is undermined by his own commitments to weak practice.

Let’s say, for example, a young researcher takes to heart the decades of sturm und drang over too-permissive false-positive-producing p-values and the recent advice of 72 academics that, if adopting null-hypothesis-testing, investigators adopt a significance criterion of p<.005. This researcher will obviously take longer to achieve a publishable result than someone like, say, Poldrack, who exploits the p<.05 level, still accepted at all the top journals. (Why?) In fact, not only does Poldrack not espouse a stricter standard, not only does he routinely exploit the permissive standard in his own work…In discussing the problem of “low power”  in this article- “a problem throughout neuroscience (Button et al., 2013)” – he slips in this:

“[L]et’s say…we are only measuring random noise. There will still be some number of false positives, which in standard null hypothesis testing is controlled at some low rate (usually 0.05).

With this casual, deliberate and self-serving conflation of “standard” practice with good practice  Poldrack reveals his high-sounding references to scientific integrity to be mere window-dressing.

Poldrack continues:

“However, these are the only positive results that will occur, meaning that all the positive findings are false. As power increases, the proportion of true positive to false positive results increases, such that greater statistical power gives one greater confidence in the positive findings of a study.”

Is this true? Not in the case of his “random noise” example – wouldn’t a larger sample just produce a larger number of false positives? The confusion surrounding the substitution of post hoc statistical analyses for proper scientific arguments and predictions is infinite…

At any rate, Poldrack’s advice is to increase sample sizes, but, apparently, to keep the permissive p<.05 standard.

If all else fails, try theory

Bigger data is Poldrack’s main idea, but he also gives a nod to theory:

“Thus, another move for trainees without the ability to collect sufficient data is to pivot to theory.”

Apparently, Poldrack believes the field has been zipping along (irreproducibly) without theory all this time. This isn’t actually true. Though there have been no valid theories, the (irreproducible) correlation-fishing that characterizes the field today is founded on specific assumptions about neural function such that this fishing is considered merely uncomplicated counting of interpretable effects. That these interpretations are false is how we must interpret the failures to replicate, let alone generalize, results).

Poldrack acknowledges that researchers “like myself” built careers on “questionable practices;” but then he immediately lets those researchers off the hook by pretending they couldn’t have known:

“I would also point out that science is always a moving target, and that it’s inevitable that some of our practices will be found to be lacking as science moves forward.”

Understanding the need for valid, testable and tested theories, and problem of confounds, (which neuroscientists only now seem to be discovering, to judge from a recent paper by Mehler and Kording), is not the result of the forward movement of science; it has been understood for centuries. It’s just more difficult than faking it, and our profit-based incentives favor faking it.

Hope for change; no guarantees, p<.05. Also, get a lot of followers on Twitter because “it will almost certainly help your chances of getting a second look that could help bump you onto the short list.”

Inspired and inspiring advice.

Neuroscience Newspeak, Or How to Publish Meaningless Facts

Screen Shot 2018-12-30 at 10.21.08 AM
The Texas sharp-shooter fitting his data to his model: A post hoc definition of “sharp-shooting” (Cartoon by Dirk-Jan Hoek)

Mainstream neuroscience has long abandoned conceptual thinking in favor of isolated “correlation studies” that, as Konrad Kording has pointed out, mean “preciously little.” At the same time, the language used to describe these futile exercises convey an entirely different and misleading picture. Below I list some of the terms coopted and implicitly redefined by the field as essentially their opposites.

Predict

Among the most basic of these is  “predict.” We all know what it means in normal usage; to assert that some event will happen in the future. A successful prediction is one that actually comes to pass after we have claimed it will.

In neuroscience (and not only neuroscience), “predict” has acquired the opposite meaning. It means that you have, after the fact,” linked” your experimental conditions with your data via algorithms employing multiple free parameters and various untested, ad hoc assumptions, so that, for a particular study and only that study, you can plug in one of your independent variables and it will spit out (even if ever so approximately) your dependent variable. (This is my rough description; I believe it is accurate in principle).

The “prediction,” in other words, is entirely retrospective and the methods to achieve it Procrustean. Not surprisingly, it is widely understood that these fitted “models” are not even capable of predicting the results of identical experiments, let alone generalizing with respect to any derived principles. Again, this is distinctly unsurprising; but Mehler and Kording (2018) feel the need to point out at length, for the benefit of contemporary neuroscientists (another euphemism, to tell the truth), that correlation doesn’t license claims about causation when confounds are legion.

Explain

The word “explain” is similarly used in neuroscience to denote the ability of a fitted algorithm to link (very approximately) conditions to data for one particular experiment; no concepts, principles, references to mechanism are involved. This clearly a very impoverished, barren definition of “explain.”

Both “predict” and “explain” imply that investigators have uncovered a reliable structure to phenomena, the latter involving hypotheses describing unseen mechanisms, leading to  a new ability to control events and produce formerly unpredicted/unpredictable outcomes. This is clearly not a fair description of post hoc correlation-fishing.

“Drive, alter, promote, etc”

Mehler and Kording (2018) list additional terms used inappropriately to imply that products of data-mining reflect causal relations:

The current publication system almost forces authors to make causal statements using filler verbs (e.g. to drive, alter, promote) as a form of storytelling (Gomez-Marin, 2017); without such a statement they are often accused of just collecting meaningless facts. In the light of our discussion this is a major mistake, which incites the field to misrepresent its findings.”

So the current publication system almost forces authors to lie. (Going the rest of the way is up to them). One might note that if authors’ activities were actually discovering principles of brain function, they wouldn’t need to lie. They should be grateful that the current publication system tolerates this cheap linguistic cover-up.

Causality

The term “causality” itself is being exploited and misused to make statistical techniques appear to have value in discerning the way things work:

“Anil Seth, a pioneer of Granger Causality approaches to neuroscience explicitly states on Twitter “I am NOT saying that Granger Causality (GC) indicates causality in the colloquial sense of “if I did A then B would happen” (Seth, 2018).”(Quoted in Mehler & Kording, (2018))

It’s a good thing Seth clarified this on Twitter; better yet if he didn’t lie by implicitly redefining an important term. What he refers to as “the colloquial sense” is the dictionary definition of causal relationships, not some slang used on the streets.

Computer scientist Judea Pearl, considered a pioneer of AI and “credited for developing a theory of causal and counterfactual inference based on structural models” also redefines “causality” in a manner that drains it of its meaning -for him it means asking “counterfactual questions – what if we did things differently?” This is not a “why” question in the normal sense of the word. The (unanswered) why problem is just transferred to “why does whatever happens if we do things differently happen?” (So I guess we can add “why” to this list).

        Neuroscience

Even the term “neuroscience” has become hollowed out. I say this because there is an institute at Stanford University called the “Center for Reproducible Neuroscience.” This implies that the findings of “neuroscience” aren’t necessarily reproducible. But “irreproducible” science isn’t really science in any meaningful sense. In addition, “neuron,” “neural,” etc. now apparently means some element of computer software/hardware:

From Twitter:

“Interesting to see that after fine-tuning to ImageNet lots of neurons become dog detectors due to dataset bias.”

“Highly tuned neurons [eg, strong ‘cat detectors‘] in Deep Neural Networks are not super important to object classification.”

No real neuron need apply for the position of “neuron.” The term “neural is similarly misused.

                                                  Miscellaneous…

Double-Blind: The above misuse of words may seem like nuanced subterfuges compared to the lie being told by authors investigating the effects of mind-altering drugs.

In the Methods section of their recent article in the Journal of Neuroscience, Gabay et al (2018) describe their study of the mind-altering drug Ecstasy as follows:

“This study followed a double-blind, placebo-controlled, cross-over, counter-balanced design.”

But in their Discussion section, the point out that:

“A limitation of the current study is the use of an inactive placebo. Given the clear subjective effects of MDMA, participants became aware that they had been given the active compound.”

They might as well just have said, “We were lying when we called this study “double-blind;” our participants knew what was up. The study was hyped by Nature as a “Research highlight.”

The literature is full of studies of MDMA calling themselves “double-blind.”

This is called “scientific peer review.”

Direct Evidence: Durschmid et al (2018) pump up their language to announce in Cerebral Cortex that they have found “Direct Evidence for Prediction Signals in Frontal Cortex…” I don’t even know how to interpret this term; is there really any such thing as direct evidence? In any case, here the authors employ post hoc data analysis with liberal selection of data, which seems, on the contrary, the most indirect and non-credible of methods.

“It is bullshit. None of it replicates.”

Screen Shot 2018-12-02 at 10.04.28 AM
Konrad Kording explains why neuroscience today is in trouble

In a recent (and soon-thereafter deleted) tweet, Konrad Kording of UPenn observed that he had yet to meet a neuroscientist who knew what they were doing. The comment struck a cord with colleagues.

Wei Ji Ma of NYU responded:

“I would say that in a way all of us are faking. Nobody in science really knows what they’re doing, so most confidence is facade. See through it. Better yet, let’s as a community be open about it. And don’t select yourself out. Thoughts?” (I responded by asking Ma what he, personally, was faking about, but got no reply.)

Eric Jonas of UCBerkeley had some related thoughts:

“I think the biggest challenge I face professionally is always feeling like I’m doing a poor job. Even in the areas that I’m an “expert”, I know where all the bodies are buried. Wei is right, we’re all clueless, it comes with the job. We should talk about it more.”

Jennifer Lee of NYU observed:

“Ambitious people seem fine with contributing knowingly to field noise, which, in neuro, seems to mean couching conceptually impoverished or messy results in extremely technical jargon (and presenting the story with confidence).”

In an unrelated tweet, Hanna Isolatus of the University of Bristol shared the following tragi-comic chat she overheard between two guys about a neuroscience paper in Nature:

“It is bullshit, none of it replicates”

“I tried it and I fucked up the mice”

“I know, I fucked up so many mice”

“It doesn’t replicate.” “It is bullshit.”

Adam J. Calhoun of Princeton University also tweeted his discomfort with the general situation in neuroscience:

“Have to admit though I feel like the “how do we analyze neural populations” answer is still always super unsatisfying. Feels like something big is missing.”

(Yes, they’re missing valid concepts about neural function).

But not to worry; Kording followed up his original deleted tweet with this light-hearted afterthought:

“To be clear, there are some people that are much better at doing science while clueless than others.”

Luke Sjulson, of the Albert Einstein College of Medicine, agreed, incidentally confirming Jennifer Lee’s observations:

“Neuroscience and psychiatry are similar in that confidence comes not from knowing what you are doing, but from becoming comfortable with not knowing what you are doing.” (Some of the tragic consequences of psychiatrists’ comfort may be appreciated here.)

In sum, a neuroscientist today is either a con(fidence) man – at ease with not knowing what they’re doing – or a tortured soul who wants to talk about it. (I suggest degree programs include group therapy sessions for ambitious but sensitive students.)

Kording is a mixed case. He wants to have it both ways – to be honest about problems while continuing to participate in and condone practices he understands (though perhaps not fully) are pointless. In a Waterloo Brain Day lecture last year, he pretty much put a torch to the most common practices and assumptions of neuroscience, e.g. the ubiquitous measurements of neuron’s supposed “tuning curves.” He observed that the post hoc correlation fishing exercises that are used to claim tuning mean “preciously little” if you can’t show that they generalize – but that generalization studies are almost never done. Why?

I’ll tell you why…All my generalization studies fail, almost all of them, both in psychophysics and in data analysis.”

(That’s not surprising; it’s well known that neuroscientists trying to repeat measurements of “tuning” fail even under identical conditions).

He also observed that:

“The brain is non-linear and recurrent and we use techniques that have been developed for linear systems and we just use them because we can download the code and it’s easy.

These are not just techniques; they incorporate assumptions, which in science are the province of theories. These techniques, in other words, whose implicit assumptions everyone knows have no connection with reality, currently constitute the de facto theories of neuroscience. And Kording has to explain to his professional audience that “we need non-trivial theories” (bypassing the fact that the implicit assumptions are not trivial). It shouldn’t need explaining that science consists of theories that have been corroborated and continue to be tested, and that a science without valid, “non-trivial” theories isn’t a science; it’s just math.

But Kording was still comfortable recently tweeting about:

“New approximate algorithms allow running GLMs [Generalized Linear Models] on *really* big neural datasets.”

Can you fix institutional corruption from the inside, when so many, you included, have so much to lose? Can you be both transparent, open to criticism, and keep your job?

Many of Kording’s colleagues obviously don’t think so. Ilana Witten of Princeton University blocked me on Twitter when I replied to a tweet about her recent publication with a link to my comment on PubPeer. I can no longer even read her tweets, let alone respond to them. (Twitter has a mute function where you can prevent the latter without the former). I’ve been similarly blocked by Matteo Carandini, Pascal Wallisch, Michael Shadlen, Jonathan Winawer, Gunnar Blohm, all but the last of whom I’ve critiqued on PubPeer. In no case was there any warning. Such denial of information seems the opposite of the scientific ethos, but it’s the internet expression of the blockade on critical Letters to the Editor long-enforced by journals – a censorship that made PubPeer so revolutionary.

Contemporary neuroscience depends on outright p-hacking: DiPoppa, Ranson, Krumin, Pachitariu, Carandini, Harris (2018).

DSCN1053
p-hacking

As I’ve explained in previous posts, the core assumption of contemporary neuroscience – that the neural code is based on homuncular interpretations of highest relative firing rates of feature-detecting neurons – though multiply paradoxical (on top of everything else, it implies an infinite number of neurons) and empirically untenable, is kept alive through post hoc analyses that render the assumption immune to falsification. (Such methods cannot, however, render them rational.)

In the interests of making the situation very clear, I offer a recent article by DiPoppa et al (2018) in the journal Neuron. What these authors do in order to be able to claim to have located “size-tuned” neurons is simple p-hacking at p<.05:

“We defined a neuron to have significant size tuning if it passed in at least one of the two locomotion conditions …a one-way ANOVA test (p<.05) comparing the mean visual responses to different stimuli…”

It seems straightforward enough. This practice is not considered kosher in psychology; the standards of neuroscience are apparently much lower.

Another example of the p-hacking type analysis on which this and virtually every other neuro article is based:

To study the circuit underlying such interactions, we imaged these four types of cells in mouse primary visual cortex (V1)…Capturing the effects of locomotion, however, required allowing it to increase feedforward synaptic weights and modulate recurrent weights.

The “capturing” is all post hoc, and the adjustments made all based on the untested, simplistic, entirely implausible theory of neural function. Are the products of this fishing (or trapping) expedition replicable? Can we even define what a replication should look like?

Conceptually, no one who is remotely cognizant of the complexities of perceptual organization could suggest that “size” is “encoded” via the firing rates of individual neurons.

Perceived size per se, even after all the other problems are solved, is a product of perceptual organization, highly contingent on the situation on the entire light-sensitive surface; consider the moon illusion, for example, where the same simple form may look quite small or extremely large depending on location on the horizon. Or consider the simple trick photos where, for example, by aligning your thumb and forefinger around the sun you make it look the size of a pea. Which neuron is firing, the big sun one or the pea-size one?

And before we can talk about size we need to talk about how the individual points of stimulation are grouped into bounded areas, converted to  perceived 3D shapes with perceived 3D distance relationships. Obviously, the processes underlying these achievements do not and cannot amount to tabulations and correlations of neurons firing “signals” individually – they are highly dynamic, self-organizing processes that we haven’t begun to understand. (And if all our neurons are busy “signalling” feature x, y, z, e.g. every size of every object perceived at any given moment, how are any left for these processes?) So, how much sense does it make to a. ignore this complexity, and b. to presume that after it’s all done, the process includes and concludes with an individual neuron (or a bunch of them) somehow “representing” size via its firing rate, and c. why, after it’s all done, does the information need to be (assuming it were possible) additionally converted to a firing rate? Wouldn’t it be redundant?

…Oops, I think they did it again

Afterthought: Given that the team are p-hacking their way to size-tuning claims, shouldn’t they at least apply a correction for multiple comparisons? How many neurons would drop out as a result?