“Science is rooted in the will to truth. With the will to truth it stands or falls” M. Wertheimer. "I don't think that full articles devoted to close, critical readings of other researchers' work have a very important role to play in vision science" Anonymous reviewer.
“Gestalt” is a fashionable buzzword in the vision research community. The people invoking it tend to have little to no understanding of the groundbreaking ideas the term represents, which they badly misrepresent. This is the case in Karlovich & Wallisch (2021), who misuse Gestalt citations to cover the theoretical gap left by a dominant vision research tradition uninterested in and incapable of addressing problems of shape and organization in vision.
In a recent article introducing a new illusion, Karlovich & Wallisch (2021) include the following statement:
“Kanizsa’s triangle also illustrates a different principle, by which organisms interpret scenes in ways that… are probabilistically most plausible (Koffka, 1935).”
The reference to “probabilistic plausibility” is vague. Elsewhere, the authors refer to “probabilistic inference,” which is a little less vague. Are they saying that “organisms interpret scenes” according to some some probability function applicable to “scenes”? That would be absurd, since every “scene” we experience is unique and novelty awaits around every corner. We are even able to experience “impossible” figures.
Based on the preprint (Karlovich and Wallisch, 2020) of this article (on which I have commented on PubPeer), it appears the authors are, indeed, defining “probabilistic plausibility” in terms of the frequency with which a “scene,” or aspects of it, have been experienced. Referring to the perception of the Kanizsa triangle figure, they state:
“In this case, it is more likely that common shapes (circles and triangles) are in this scene than 3 uncommon shapes (pacmen) that are arranged in this fashion by sheer coincidence.”
Are black discs covered by white triangles more common in your world than “pacman” shapes? Probably not if you’ve played a lot of pacman – or even a little. It wouldn’t matter anyway, you’d still see the Kanizsa triangle. As I pointed out in my PubPeer comment, the book cited by the authors – Kanizsa’s 1979 “Organization in Vision” – contains, in addition to the triangle demo, a number of other original figures designed precisely to rebut both the “common occurrence” and the related “coincidence” arguments. (He falsifies it with respect to the latter in figures 12.11 and 12.13a,b, 12.26a,b, and the former in figures 2.27, 2.28).
Although Karlovich & Wallisch removed the offending sentence from their final product, I doubt their theoretical position has been refined, only made less explicit. The vagueness of the probabilistic claim is its strength – it seems to make sense, though even when we put it in the most reasonable possible terms – that we are simply making the best “guess” (so to speak) given the retinal point “information” – it explains nothing. It is impossible to prove whether the physical counterpart of a perceived object is (or has been) more or less “probable,” or even to explain how one might go about assessing this. More importantly, it avoids the scientific question of the nature of the processes that transform this point “information” into a unique, fully articulated, 3D mental representation – “probable” or not. This is the problem the Gestaltists a. first recognized and b. tackled, in ground-breaking fashion. How the visual system performs its amazing feats, and whether the results of these feats meet some probabilistic criterion, are two entirely different questions. The first has deep physiological and psychological significance and heuristically potent implications, the second can never be more than an untestable content-free verbalism.
In making their case, the Gestaltists went to great pains to counter the perennially popular but facile “experience” explanations of perception (see e.g. Koffka, 1935 pp. 155-159, “Experimental disproof of the empiristic theory”). They showed, over and over, that such pseudoexplanations are logically and empirically (in the scientific sense) groundless. Such explanations have not become better or more viable with age, as Gilchrist (2003) made clear in his review of Purves and Lotto’s (2003) empiricist offering.
Such explanations are, moreover, antithetical to the Gestaltists main thesis regarding the complex dynamic physiological principles underlying organization in vision. (It is worth noting that these ideas are closely echoed by the most sophisticated current ideas on the nature of living organisms up to and including the nature of conscious processes (see. e.g. Deacon, 2011)). Of course, the Gestaltists failed in their crusade against simplistic, additive, probabilistic, stimulus-response concepts which continue to dominate, and that have, as ever, led to an impasse or “replication/reproducibility crisis.” (Visual neuroscience, for example, has a severe replication problem, generally masked by the normalization of p-hacking (via irreproducible “analysis pipelines”)).
Given that the “probabilistic plausibility” trope represents the Gestaltists bete noire, and given that Karlovich & Wallisch evidently have never taken a particular interest in the relevant literature, why did they choose to cite Koffka by way of a theoretical gloss in introducing their illusion?
I presume they needed some references relevant to shape perception for their introduction. Since the psychophysical tradition that dominates today has only one-dimensional, stimulus-response (“line detectors,” etc) and correlational, “probabilistic” (e.g. “natural scene statistics”) concepts to offer, none of which make any principled contact with the problem of perceiving shape, Gestalt references had to fill in the gap – rebranded in terms of the structure-blind, statistical, “data-driven” concepts the founders fought so hard against and that left Wallisch and Karlovich without a relevant citation.
The Gestalt program was shattered by the Nazis and went to America for burial. It has now been dug up to be used as a prop by stranded probabilists like Wallisch parading as experts in visual perception and…Gestalt principles (link is to interview in Reyes, 2021). First time tragedy, second time farce.
Deacon T. 2011. Incomplete Nature. W. W. Norton & Company.
Gilchrist, A., 2003. Looking Backward. Nature Neuroscience, 6(6), pp.550-550.
Kanizsa, G. 1979. Organization in vision. New York: Praeger.
Koffka, K. 1935. Principles of Gestalt Psychology. Harcourt, Brace and Company, New York.
Karlovich, M.W. and Wallisch, P., 2021. Scintillating Starbursts: Concentric Star Polygons Induce Illusory Ray Patterns. i-Perception, 12(3), p.20416695211018720.
Karlovich, M.W. and Wallisch, P., 2020. Introducing the scintillating starburst: Illusory ray patterns from spatial coincidence detection. Psyarxiv.
Purves, D. and Lotto, B. 2003. Why we see what we do: An empirical theory of vision. Sinauer Associates.
Reyes, M. 2021. Want art you can’t look away from? Popular Science.
I’m publishing a brief email exchange I had with the new Editor-in-Chief of the Journal of Experimental Psychology: Human Perception and Performance involving a recently-published article. In my email I point out that the article promotes a “theory” that has failed on any number of grounds, as I well knew having published, in addition to critiques, a simple, falsifying test. I had involved myself in these rather thankless and time-consuming tasks because the issue was relevant to me; bullshit science takes up all the air in the room, and it has taken up all the air in vision science for decades. The authors of this “theory” had continued repeating the same tropes for years without addressing the problems, reviewers simply giving them a pass. I would critique them on PubPeer, but since the site began censoring my posts a year ago I’m never sure if a comment will go up. This last one was taking a long time (it eventually did post), and out of frustration I emailed the EiC, not because I expected anything – I frankly didn’t expect a reply – but simply to share the information. Her reply brought back shades of Susan Fiske – indifference to the facts (re-framed as mere accusations from a bruised ego), contempt for the messenger, sarcasm, arrogance. Below is my email and her reply:
Dear Dr. Gauthier,
JEP:HPP recently published an article by Radonjic & Gilchrist (“Large depth effects on lightness in the absence of a large luminance range.”) In relating various effects to the “anchoring theory,” the authors neglect to mention that the anchoring theory’s proposed explanation of simultaneous contrast has failed to pass a simple test (Maniatis (2015/Journal of Vision) whose validity has been acknowledged by Gilchrist. They also fail to note that the explanation had earlier been shown to entail clear inconsistencies (Maniatis (2014/Vision Science), a fact acknowledged by the senior author in his published response. I have toassume JEP reviewers were aware of these publications or findings – Maniatis (2015) has been downloaded well over a thousand times (in addition to being open-access) and Maniatis (2014) was referred to by Gilchrist in his ECVP 2014 Perception lecture. These facts must be relevant to readers in evaluating any claims by Radonjic & Gilchrist with respect to “anchoring theory’s” validity as an explanatory tool. Yet, time and again, even the identical claim falsified by Maniatis (2015) has been allowed into the literature. In the present case, the authors state that:
Anchoring theory proposes that (a) the visual system segregates the image into functional units for lightness computation (frameworks), largely based on depth and shadow boundaries and (b) the lightness of a surface is a weighted average of its lightness value computed within its local framework (surfaces it is immediately grouped with) and the global framework (entire image).
Again, in practice the local/global “frameworks” proposed have uncontroversially failed as explanation in anchoring theory’s central paradigm, the classic simultaneous contrast demonstration. (It is trivial to say that lightness depends generally on some type of grouping principles or “frameworks,” as this has been clear for at least 100 years. The same might be said of the claimed “qualitative consistency” with the “scale normalization principle,” as the relative nature of perceived lightness in general is a classic finding of the Gestalt era).
It seems to me contrary to scientific ethics to conceal relevant information in a publication in a major journal.
Dear Dr. Maniatis,
thanks for writing. This work was peer-reviewed by several experts in the field, and revised and eventually based on their comments. I am the first to recognize that peer-review is not perfect. And I personally have wondered why my opinion was not recognized in published articles in my field many times. It can be frustrating. There is room for disagreement in science, and there are many mechanisms that offer the opportunity to debate. However, I don’t believe anyone here has been concealing information here – I honestly think making such accusations isn’t in anyone’s best interest. I’m sorry I can’t offer you much to appease you – it’s just not my job to adjudicate the specifics of this disagreement. I look forward to reading more about your work, and feel free to send your best research to JEP:HPP.
Dear Dr. Gauthier,
We’re not talking about a matter of opinion, or a belief, or an accusation, or a matter for debate, or about me, but about a simple, relevant fact, of which Gilchrist is well aware, and which he did, in fact, conceal. A failed theory is being falsely presented as a going concern. Now, you’re also aware of this fact. Interestingly, it seems to bother you far less than my temerity in pointing it out. The usual priorities. (I’m curious, by the way, about the “many mechanisms that offer the opportunity to debate,” to which you refer).
“Peer review isn’t perfect,” so feel free to send us your best bullshit. I think Isabel’s response perfectly illustrates why we need open PPPR.
Post publication peer review (PPPR) site PubPeer was created for the purpose of enabling discussion of published articles in a manner accessible to the community. And why not? Everyone knows that the success of scientific research hinges on open, critical discussion.
On the other hand…what could be more threatening than open commentary to a research community lacking confidence in the quality of its offerings?
The threat was real, especially due to the fact that PubPeer allowed the names of commenters to be concealed if they so desired. It was well known that criticism of their betters by early career researchers amounted to career suicide, which kept them in line nicely. Meanwhile, the Letters to the Editor category of article, originally designed to enable criticism of published papers, had been effectively neutered, such Letters being almost impossible to publish, reflexively rejected on the basis of tone, lacking “new data” (i.e. not being a regular article), failing to “move science forward.” Add to this the acknowledged fact that contemporary researchers were (and are) failing to meet the bare minimum requirement for their productions to be considered scientific – that is, that their experiments be replicable – and it becomes clear that free critical commentary couldn’t have been more necessary to, or less welcomed by, the research establishment.
Attacks against PubPeer came in various forms that included a court case brought by an academic who lost a lucrative position at a university thanks to anonymous posts. He demanded PubPeer unmask the critic. What difference could the name of the critic make…well, you know. Fortunately, PubPeer won that case.
Next, a journal editor, flustered by the loss of narrative control, smeared criticism on site as “Vigilante Science” declaring it a dire threat to the scientific process. PubPeer’s founders defended their ground in a blog post aptly titled “Vigilant Scientists.” “We believe” they wrote, that
a greater problem, which PubPeer can help to address, is the flood of low-quality, overinterpreted and ultimately unreliable research being experienced in many scientific fields…we believe it is imperative that all possible users of published research be made aware of potential problems as fully and as quickly as possible….The central mission of PubPeer is to facilitate this exchange of information. We…aim to remove barriersand discouragements to commenting.
Apparently parrying complaints that the mere existence of critical comments could damage researchers’ reputations and prospects, they also observed that:
scientists should be able to explain and defend the work they have chosen to publish. And in reality no competent scientist would experience the slightest difficulty in defending their work, if it is defensible…
People who rush to judgment on the sole basis that a comment about some minor detail exists on PubPeer have only themselves to blame. If they are scientists they should definitely know better, and we actively advise readers to form their own opinion of comments.
PubPeer’s original FAQs similarly emphasized that it wasn’t the moderators’ job to ensure the validity of comments, only that they be factual and publicly verifiable.
PP does not review comments scientifically…so factual comments conforming to our guidelines may still be wrong, misguided or unconvincing. For this reason we insist that readers… make up their own minds about comment content.
Signs of Trouble
I was a intensive user of PubPeer, after learning, like many others before me, that the LtE category of submissions was effectively defunct. The site virtually never moderated my posts; once, when inquiring about a post that had been taken down, I was politely told that this had occurred in error and that I was generally very good about following their guidelines.
My strategy with respect to the issues I was concerned with was to target as many of the offending articles as I possibly could. This initially involved vision science, and inevitably led to neuroscience due to the close connection between the two.
For example, I would explain, article after article, why the popular “spatial filter” concept was a non-starter from the get-go. My first blog post was on this issue and intended to save me having to endlessly repeat myself. Comments were almost never rebutted; and when they were, responses were ineffective and more than a little revealing. Authors often preferred to communicate via email rather than openly responding on the site.
Another of my pet peeves was the use of arbitrary statistical assumptions in the context of post hoc correlation-fishing, a normalized approach in neuroscience and other fields. My interest in this topic ultimately put me openly at odds with a PubPeer insider and member of this research community, Boris Barbour. A Twitter conversation with him foretold the censorship and bizarre rule changes that were to come.
The views Boris expressed in this conversation were diametrically opposed to the original philosophy of PubPeer. When it came to arbitrary assumptions, he took the view that authors should be givens the benefit of the doubt:
One addition. When you wrote that comment, I'm guessing you didn't know what the influence of those choices was. Did you assume the worst? That the authors had given them no thought or even that they were favourable to the authors' hypothesis?
If you made those comments only because the parameter choices were not justified specifically to your satisfaction, but without further insight from you as to the alternatives, yes that would be the "purely formal" part. As to their importance to you, you didn't explain.
Boris’s comments are in reference to my challenging arbitrary (and untestable) “priors,” a staple of post hoc “Bayesian” analyses. Notice the principle of his objections: He puts the burden on the critic to show that their arbitrariness is a problem, rather than on authors to support, either by argument or evidence, theoretical assumptions that will have a dispositive influence on their conclusions. Readers are supposed to give authors the benefit of the doubt (we shouldn’t “assume the worst”), to trust that the missing rationale exists in the authors’ minds – and that it is sound. Also of note is the indication, in the second tweet, that the untestable prior-based approach, even if akin to p-hacking – is non-negotiable – a choice must be made among alternative “priors.” Boris would apparently be able to live with a critic making any type of argument of this nature, as this would imply buy-in to the approach, rather than a root-and-branch challenge. This reminded me of the situation with LtE’s, in which critics might occasionally be allowed to quibble over alternative versions of dominant paradigms or conventional assumptions, but serious criticism of those paradigms or assumptions as such, however sound, were non grata.
And now, the same goes for PubPeer. Soon after my Twitter conversation with Boris, and without warning – I was grey-listed, and it became almost impossible for me to post the kind of comments that had previously gone unchallenged and been described as following guidelines very well. Asking for the reason, I was merely told that guidelines were “evolving.” Evidently, it was important to begin applying the relevant censorship rules before they had been formally introduced. Many months went by without any changes to the site’s guidelines justifying the new moderation.
Finally, this past May, a rambling, repetitive fine print set of points was appended to the site’s FAQ’s, wholly contradicting the spirit of the moderation expressed in the excerpt quoted above and, at one point, even the letter. Whereas it was originally not the moderators,’ but the reader’s prerogative to judge whether a factual comment was “wrong, misguided or unconvincing,” the position was now that:
Comments that are obviously erroneous or unclear will be rejected, in particular in the context of a series of misguided or potentially malicious comments.
(The two opposing statements briefly co-existed on the site until, perhaps in consequence of tweets I posted flagging the contradiction, the earlier section was excised.)
The vagueness of the language in this excerpt, typical of much of the 482-word addendum, is striking. How clear, how convincing, how relevant or appropriate a comment is are largely subjective judgments; and malice is a state of mind inaccessible to moderators, potential malice not even that. The statement erases all transparency from the moderation process and gives moderators complete, black-box latitude to reject any comment falling outside their personal comfort zone. The architects of the guidelines understand this; it’s not a bug, but a feature of the new, improved PubPeer:
We acknowledge that this is potentially arbitrary, because it depends on the moderators’ expertise.
It doesn’t only depend on moderators’ expertise, obviously limited given the wide range of topics – the entire literature – covered by PubPeer. It depends on the moderators feelings. What could better illustrate the arrogance and lack of principle of the new PubPeer? And what could be the reason for this total rejection of the site’s original commitment to “remov[ing] barriers and discouragements to commenting” in favor of opaque and arbitrary barriers to commenting – for responding to complaints about unjustifiable censorship by formally asserting a principle of arbitrary moderation?
What else is in the fine print?
Given the open-ended license-to-censor endowed by the 38-word segment quoted above, more additions seem redundant. They’re worth examining nonetheless, as a window into administrators’ attitudes and as attempts at self-justification. (About half the text involves image issues, which I won’t address).
Conspiracy theories and disinformation will be blocked, including links to external sources containing such content.
References to “conspiracy theories and disinformation” are just as vague as references to “unclear” or “misguided” posts. The term conspiracy theory is typically used to discredit, a priori, a point of view unsanctioned by official sources, regardless of evidence for or against; the term disinformation typically refers to false information disseminated by governments for propaganda purposes. On what basis are judgments on what counts as conspiracy theory or disinformation to be made by PubPeer moderators? We already know they may be “arbitrary;” will they also be independent? Do moderators’ political views now constitute a criterion of moderation?
General comments should be carefully linked to the specific article, and we may limit campaigns making such comments on many papers.
No objection to requiring comments to be linked (carefully!) to the article being critiqued; but the limit on the number of papers to which a relevant comment may be appended by a particular “campaigning” individual is mystifying. What could be the justification for such a stipulation? How many is too many, and why? I’ve been subjected to this restriction, most strikingly when I tried to flag a clearly false assumption promulgated over decades in the case of a recent article by vision science celebrity George Sperling. The comment was just as valid in this case as it was in previous ones, but no way, no how will PubPeer allow it to post. (I’ve been compiling all rejected comments in a separate blog post). In effect, this policy immunizes users of conventional practices from criticism.
Criticism of articles simply for using Bayesian statistics is not considered useful. For such articles, it remains acceptable to discuss alternative choices of prior substantively.
“…not considered useful…remains acceptable…” The stipulation is a reflection of the views expressed in Barbour’s tweets. The problem with “Bayesian” statistics is that there is no objective means of choosing among “alternatives.” This is a very useful point that cannot be made often enough; there is no general defense available to practitioners. The idea that “scientists should be able to explain and defend the work they have chosen to publish” is apparently no longer considered “useful” in the case of this widespread, conventional practice. Protection of indefensible but conventional practices seems clearly to have been on the minds of the moderators when making the adjustments to their FAQs.
Criticism of specific analysis and modelling methods should explain why they are unreasonable/erroneous and significant. This requires engagement with the purpose, design and methods of the article. Appending a brain-dump to a random quotation is insufficient.
This, again, is a barrier to comments with general relevance. It was my practice to flag general issues (like the spatial filtering issue mentioned above) with links to the related blog posts. I don’t know what qualifies, in PubPeer moderators minds, as a “brain-dump.” I also have no doubt that my blog posts were on their minds when they deployed this nasty, dismissive term revealing a hostility previously unexpressed because unjustifiable, but now formalized as (hopelessly opaque) policy.
This is not a homework site. Although requests for explanation can be acceptable, you should provide evidence of having tried to understand the relevant sections of the paper (and, if appropriate, the cited literature) and also of understanding the significance of the issue raised. Put another way, we prefer comments that provide expert insight over those that display ignorance.
“We prefer…” Once again, the guideline is vague and, above all, subjective. Who cares what moderators prefer if they can’t articulate – defend – their reasons adequately? Are they really going to censor comments for not rising to what they personally view as the level of “expert insight”? There can be little doubt that the kinds of views that would pass muster in this context would typically be those respecting conventional wisdom. It also appears that the idea that it wasn’t the site’s responsibility to protect against some readers giving too much weight to “minor” or unworthy comments no longer applied. Like Caesar’s wife, it was apparently deemed that establishment productions “must be above suspicion” and “avoid attracting negative attention or scrutiny.”
Is it the tone?
PubPeer actually calls itself “the online Journal Club;” it was supposed to be a place where readers of the literature could participate in discussion of scientific papers as they saw fit, free of the filter of moderator preferences, interests, levels of expertise, biases, emotions, intuitions, etc. And authors could respond as they saw fit. The job of filtering and negotiation was supposed to fall mainly to users of the site and the authors of critiqued articles, supposed to be able to “defend their work if it is defensible.” In my experience, they never rose to the challenge, allies perpetually whining about tone while assiduously avoiding discussion of substance. The following exchange on a PubPeer thread I initiated tells the story in a nutshell:
Notiochelidon Pileata: This reviewer consistently makes aggressive and ill-informed comments on PubPeer. The tone here is completely unnecessary.
Pluchea Kelleri: “Can Notiochelidon Pileata cite clear specific examples where Lydia M. Maniatis’ comments have been “uninformed”?”
“I’ve only found Lydia’s comments to be concise, appropriate and, most importantly, correct. I’d ask that she not change her tone at all.”
(Notiochelidon declined to elaborate).
For reasons as yet undisclosed, PubPeer has jettisoned their original philosophy in favor of its opposite – something worse than the broken review system it was supposed to replace, where at least reviewers had to put some effort in excusing their rejections of critical Letters.
What happened? As discussed, PubPeer was under pressure from the start to protect fragile conventions from public discussion and fragile reputations from challenge, with establishment figures like Susan Fiske bloviating about “methodological terrorists;” but they didn’t fold. Now they have.
Whistleblowers always lose
We should perhaps, also look to the experience of biomedical researcher Paul Brookes to understand what happened to PubPeer. Brookes was the architect of the now-shuttered blog science-fraud.org. The site was basically a form of PubPeer with only one commenter; according to a 2014 Science magazine article, Brookes “cited 275 papers as having apparent problems, such as undisclosed but noticeable slicing of gels, duplication of bands, or the unacknowledged reuse of images.” “Out of 275 papers discussed, there have so far been 16 retractions and 47 corrections…I found that the public papers have a seven to eight-fold higher level of corrections and retractions.” Brookes often relied on tips from anonymous colleagues.
I’d say Brookes’ activity constituted an important service to the community. But as he well understood, this isn’t how they see it; as soon as his cover was blown, “Brookes stopped posting to his blog, removed all the materials already posted, and confirmed his identity the next day.” The potential impact on his career was dire:
“I am 41 years old, so I have another 25 years of this to go before I retire. I have to continue to get grants, to publish papers, and obviously if there are people out there who are upset with me, then maybe they will review my grants badly, maybe they will review my papers badly. The potential for retaliation is there; there is really no way to get around this.”
Brookes also received legal threats, and thanks to zero support from his university, had to hire his own attorney. As mentioned earlier, PubPeer was actually sued unsuccessfully, with all of the financial burdens and loss of time that entails.
It’s clear that PubPeer aren’t intending to, because they cannot, enforce their vague, subjective rules in any catholic way. They are merely a pretext for choking off commentary by particular individuals, who are first grey-listed, then monitored closely. If you have questions about why a comment was removed, they point merely you to their FAQs page – clearly no response at all.
The title and text of this post are part of an attempt to clarify and amplify a point I’ve been hammering on in previous posts, i.e. that neuroscience, as it is practiced today, is a pseudoscience, largely because it relies on post hoc correlation-fishing. For this reason, studies (so-called) have no path to failure the first time they are performed, and always fail the second.
As previously detailed, practitioners simply record some neural activity within a particular time frame; describe some events going on in the lab during the same time frame; then fish around for correlations between the events and the “data” collected. Correlations, of course, will always be found. Even if, instead of neural recordings and “stimuli” or “tasks” we simply used two sets of random numbers, we would find correlations, simply due to chance. What’s more, the bigger the dataset, the more chance correlations we’ll turn out (Calude & Longo (2016)). So this type of exercise will always yield “results;” and since all we’re called on to do is count and correlate, there’s no way we can fail. Maybe some of our correlations are “true,” i.e. represent reliable associations; but we have no way of knowing; and in the case of complex systems, it’s extremely unlikely. It’s akin to flipping a coin a number of times, recording the results, and making fancy algorithms linking e.g. the third throw with the sixth, and hundredth, or describing some involved pattern between odd and even throws, etc. The possible constructs, or “models” we could concoct are endless. But if you repeat the flips, your results will certainly be different, and your algorithms invalid.
Which is why the popular type of study I’ve just described is known not to replicate. And while a lot of ink has been spilled (not least in the pages of Nature) over the ongoing “replication crisis” in neuroscience; while we even have a “Center for Reproducible Neuroscience” at Stanford; while paper after paper has pointed out the barrenness of the procedure (Jonas & Kording’s (2017) “Can a neuroscientist understand a microprocessor?” was a popular one); while the problems with post hoc inferences have been known to philosophers and scientists for hundreds of years; the technique remains the dominant one. As Konrad Kording has admitted, practitioners get around the non-replication problem simply by avoiding doing replications.
So there you have it; a sure-fire method for learning…nothing.
By a happy accident, I was able to slip into PubPeer the type of comment moderators for several months now have been routinely censoring sans explanation. What happened next shows quite clearly that these comments aren’t being censored because they lack relevance or substance, but because they hit too close to the mark.
In reading over a paper on hippocampal activity by Chen et al (2019)/Current Biology, I had difficulty finding any reference to sample sizes. I thought this was odd, and posted a brief comment about it on PubPeer. Because I’m grey-listed (which happened without warning or explanation), my comments never post right away, if at all. Soon after, I realized that the authors had actually provided sample sizes in figure captions, whereupon I deleted (or so I thought) the comment from my still-awaiting-moderation post. But it ended up posting. So I edited it, modifying it to ask a different sample-size-related question.
I was also curious about how the behavioral data was collected – I didn’t think the text gave enough information on certain issues. The article says that data are available on request, I emailed Steve Ramirez, senior and corresponding author, to make the request.
Surprisingly, his initial response didn’t refer to this request at all; rather, it consisted of a reaction to my PubPeer comment, as follows:
Thank you so much for your email and question! We posted the individual N, stats, and so on in our figure legends, and chose our N values for the histology and behavior based on previous engram papers that demonstrated such N provided sufficient statistical power (e.g. Liu et al, Nature, 2012; Denny et al. Neuron, 2014; Tanaka et al. Neuron, 2014). These N values and the corresponding stats were also taken as a standard for circuit level / behavioral optogenetic papers (e.g. Tye et al. Nature, 2012; Stuber et al. Nature, 2011) in which we compare across animals for histology and utilize, for instance, a T-test, or across animals and across light on-off-on-off epochs for behavioral data and utilize two-way anovas with repeated measures.
That said, I’d be more than happy to help in any capacity hereafter and thank you again! Other groups have analyzed their data with both similar and diverging sets of statistics and corresponding justifications for such analyses that, too, have yielded pleasantly nuanced results that I’m always thrilled to chat about and brainstorm over. I hope you had a wonderful Thanksgiving and upcoming holiday season as well!
I didn’t find his answer very satisfactory – all he was doing was passing the buck, but I what I really wanted was the dataset, so I simply thanked him for his comments and asked again, saying:
Thanks for your reply, I’d also be happy to chat about the issues you mention, but in this email I was just asking for the info that, at the end of your article, under “Data and software availability,” you say can be made available:
“For full behavioral datasets and cell counts, please contact the Lead Contact, Dr. Steve Ramirez (firstname.lastname@example.org).”
Absolutely! Are there any in particular I can send your way? They’ll all be straightforward and annotated excel files too to make life easier — always happy to share and help!
I replied that I’d be interested in the behavioral data.
Meanwhile, having been able to edit my PubPeer comment once, and in light of Steve’s enthusiastic reaction to my first one, I went back in and added some more comments. These edits apparently flew under the moderators’ radar; they typically would have nipped such comments in the bud, (as in fact they did a little while later).
Steve responded enthusiastically both to my request and to my new comments, which he evidently found valuable. What follows are his complete responses, which included excerpts from my PubPeer comments (which I’ve placed in italics) and his replies. I’ve bolded a few sections for emphasis, and added some reactions.
Absolutely. I’ll send [the dataset] over shortly (on my commute to work) when I’m back on my work laptop, and in the meantime I’d be very happy to clarify some points raised on pubpeer — thank you so much for the comments, as these always help us to continue to perform as rigorous of science as possible. Very much appreciated!
Steve delayed, then left the country without sending the dataset, assuring me when I followed up that he would send it when he got back. I think he may have realized that I intended to look at it critically. (After not receiving it I emailed Current Biology, but it looks like they’ve decided not to respond either). He did, however, respond to my comments point by point:
The authors say that: “No statistical methods were used to determine sample size; the number of subjects per group were based on those in previously published studies and are reported in figure captions.” To which previously published studies the researchers are referring, and on what basis do they consider those studies’ sample sizes to be valid? If they (or their editors) don’t feel that sample sizes should be selected based on some type of statistical test, then why mention this issue at all? If it is important, then the reference to other, unnamed previous publications is rather inadequate.
Addressed in previous email – thank you again! [See above]
As mentioned earlier, all the email does is pass the buck – and if you dig back you find that there’s no there there, either.
“Exploration of the context while off Dox increased eYFP-expressing (eYFP+) cells in both the dorsal and ventral DG relative to on-Dox controls (Figures 1G, 1H, 1J, and 1K). The following day, mice that explored the same context showed a significant increase in the number of overlapping eYFP+ (i.e., cells labeled by the 1st exposure) and c-Fos+ cells (i.e., cells labeled by the 2nd exposure) in the dorsal but not ventral DG (relative to chance overlap)…”
How is “chance” ascertained here? The brain is highly-condition-sensitive, to both external and internal events, in ways we don’t understand. There’s really no possible “no context” condition; and there was no “explored a different context” control. When the mice are returned to their normal cages, they are also returning to familiar territory; when they are being moved from one place to another, these may also be familiar experiences. All of this affects the brain. Given our level of ignorance about the brain and the countless confounds, I don’t see what the authors could validly be using as their “chance” baseline.
I 100% agree that chance is a tricky thing when it comes to the brain, since it’s a statistical measure applied to a system as complex as the brain, in which we don’t know what true chance would look like. So in that sense, we took the next best approach and utilized statistical chance, i.e. the odds of a set of cells being labeled by one fluorophore (N[number of cells labeled] / Total number of cells in the area) multiplied by the odds a set of cells are labeled by the second flourophore (N[number of cells labeled] / Total number of cells in the area), and we use the resulting number as statistical chance. We think of it as the brain’s way of flipping a coin twice and landing at heads twice, but you’re spot on that there’s no a priori reason that brain indeed operates in such a quantitative manner.
What Ramirez is admitting here is truly, and I mean truly, astonishing. He’s saying that two sets of measurements are being compared on the assumption that these measurements of a bunch of neurons – whichever bunch of neurons we choose to record from at whatever time we choose to record from them – out of the billions of neurons in an organ we don’t understand – will be distributed in the same random, decontextualized way the results of a series of coin flips would be distributed. We have no reason for making the assumption – but hey, we’ll just do it anyway! Steve doesn’t seem to think there’s anything wrong with that, or anything embarrassing about admitting it.
“Together, these data demonstrate that the dorsal DG is reactivated following retrieval of both a neutral or aversive context memory, whereas cells in the ventral DG show reactivation only in a shock-paired environment.” Sentences like this should raise alarm bells. They are passive (they don’t corroborate or falsify predictions made before examining the data) descriptions of correlations in a sample examined post hoc, meaning they can’t count as evidence of causal connections. In other words, data analyzed in this way can’t be taken to “demonstrate” anything. Claims are as speculative post as they were prior to experiment.
I again totally agree here and thank you for the great point! I have no idea what causality truly would look like in the brain or what a ground truth looks like when it comes to a principle of the brain, and we hesitate to use the word “causality” for that reason. I believe most of our data our correlative or can be interpreted as a result of a given perturbation, but this by no means has to equate to causal. I do believe that term gets thrown around a lot these days with optogenetic / chemogenetic studies, and the reality is that once we perturb an area and networks respond accordingly, perhaps causality can be observed as a brain-wide phenomenon which we’re just started to test out thankfully.
He has no idea how the brain works (“what causality would truly look like in the brain”), yet in the paper he is making assertions about how one thing affects another – assertions about causal relationships. He doesn’t seem to see the contradiction.
In addition: “Post hoc analyses (Newman-Keuls) were used to characterize treatment and interaction effects, when statistically significant (alpha set at p < 0.05, two-tailed).”According to graphpad.com, maker of Prism, one of the software packages employed in this study: “It is difficult to articulate exactly what null hypotheses the Newman-Keuls test actually tests, so difficult to interpret its results.” And from the same source: “Although the whole point of multiple comparison post tests is to keep the chance of a Type I error in any comparison to be 5%, in fact the Newman-Keuls test doesn’t do this.”
“Acute stimulation of a fear memory via either the dorsal or the ventral DG drove freezing behavior and promoted place avoidance (Figures 2I, 2J, 2L, and 2M). Acute stimulation in the female exposure groups promoted place preference but did not affect fear behavior (Figures 2I, 2J, 2L, and 2M).”
While the hippocampus clearly has a role in enabling memory formation and/or retrieval, it cannot possibly contain memorie. Even if we take Chen et al’s report at face value, all we would be able to say is that certain cells display a certain type of activity when the mouse has a sharp fear reaction, and also may instigate an acute fear reaction when stimulated. The claim that this activity is causing the mouse to experience mental imagery corresponding to some particular “context” – out of the infinite number and variety of contexts a normal mouse may experience in its lifetime – is a bridge too far. There aren’t enough cells in the hippocampus to accommodate all possible “contexts.” The authors need to be a little more modest in their claims.
I couldn’t agree more! I personally think that memories are a distributed brain-wide phenomenon in which circuits and networks utilize spatial-temporal codes to process information, as opposed to having a memory localized to a single X-Y-Z coordinate point. Even within the hippocampus with over 1M cells, the permutations possible of a defined set of cells utilizing a temporal code to process contexts is an astronomically, perhaps wonderfully, big number of experiences that it can be involved in — not to say that therefore memories are located in the hippocampus because it technically can process them, but that it’s contribution to enabling numerous memories I don’t believe has an upper limit that we know of, but this is total speculation on my end! However, I do actually believe that our set of experiments beginning with Liu et al. 2012 up to 2019 really hint that we can partly predict what the animal’s internal “experience” (used very loosely here) is actually like (see Joselyn et al. 2015 Nature Neuroscience) for a fantastic review. In short, we “tag” cells active during a defined period of time, say, exposure to context A, and we’ve done numerous experiments that suggest these cells are specific to that environment with minimal “noise”, i.e. without other contexts that the animal experiences spilling over, given the time period of our tagging system. And when we manipulate these sets of cells, the animals show fear responses specific to that context, i.e. Figure 3 of Chen et al., that suggests that at least some aspects of context A are coming back “online,” which dovetails which previous data from a 2013 false memory paper we had as well. That said, I’m fully on board that we’re not at the stage where we can say with certainty what the animal’s mental imagery looks like, though a handful of papers from the Deisseroth lab recently have hinted that we can really force a mental “image” to come back online and force the animal to behave as those it’s experiencing that image. In our hands, we believe that stimulating these cells in the hippocampus has a sort of domino effect in which downstream circuits become activated and this ultimately leads to memory recall, and that the hippocampus is a key node involved in bringing the brain-wide networks involved in memory back online. So it’s not that the memory is located in the hippocampus, it’s more that the hippocampus contains a set of cells which, when activated, are sufficient to activate memory recall by engaging the rest of the systems in the brain involved in that discrete experience too.
Notice that he never addresses the basic point about problems with the Newman-Keuls test, right at the top. Notice also that his claims are far more vague and speculative than his published paper makes it sound.
I hope this helps and thank you again for the fantastic back and forth!
I was actually avoiding getting into a back and forth until I got the dataset. I waited three days, then wrote this:
Just following up on the dataset request.
Thanks for your responses; I think it would be useful if I incorporated them into my PubPeer comments.
I waited several more days in case he wanted to object to the use of his comments in the PubPeer thread, but received no further response. I went ahead and posted his replies on PubPeer. I was truly amazed by them – not because I didn’t already know the score when it comes to contemporary neuroscience practice, but because I couldn’t believe how casually he admitted what to me was obvious malpractice. He apparently took me for an insider with (compromised) skin in the game, and dropped his guard. I began tweeting right and left, hoping to raise just a tiny bit of the concern and indignation I feel over this state of affairs. PubPeer got wind of my post and duly removed both my original comments (no part of which they allow to be reposted) and Steve’s cheerful, appreciative responses. I suspect Steve or allies got in touch with them and, despite his private candor, made sure PubPeer readers, and outsiders in general, wouldn’t learn about neuroscience’s coin-flipping approach to science.