I believe the so-called Bayesian practices represent a de facto attempt to legitimize ineffective scientific practice that is unable to reduce uncertainty (i.e. confounds in the data) to acceptable levels, but still want to present a facade of progress. For robust findings with heuristic power they substitute endless discussions about how probable a result is and whose probability estimate should be believed; but since probability estimates are not testable, the discussion just goes on and on, generating papers but not knowledge. These are discussions that at best, highest level of rationality lead to a Humean dead end. I recently commented on a couple of papers authored or co-authored by Andrew Gelman, who is perhaps a more enlightened Bayesian but who, as I see it, is still too fond of uncertainty. Below is an edited/expanded version of PubPeer comments I made on Gelman and Shalizi (2013)/”Philosophy and the practice of Bayesian statistics,” as well as a shorter comment on Feldman (2017)/”What are the “true” statistics of the environment?”
Gelman and Shalizi (2013) take a stand against “Bayesian” statistics’ “subjective prior,” which is good, as the subjective prior is ridiculous. However, their view of science as essentially a counting game – the key characteristic of the “Bayesian” school on behalf of which they advocate (albeit with caveats) – is a view that drains scientific research of its potential to add to knowledge of the world.
Statistical models have their place. However, the emphasis on statistics has become culturally conjoined with conceptual and logical laziness – with counting without giving too much consideration to what, exactly, is being counted; the results of such practice are given the honorary title of “data.”
Despite their rejection of the subjective prior, Gelman and Shalizi, as Bayesians, require a prior probability to plug into their formulas. That means they need to take into account all of the forces and all of the factors in the universe that led up to this moment, and given current conditions, take a position on the probability of outcome X. Obviously, there is no rational way to do this. But, they argue, it doesn’t matter much what prior you choose, because the prior is testable, and adjustable.
How is it testable? By taking the data already generated and running a sampling simulation, plugging the results into the formula and checking the match. But this means that all the factors that influenced the outcome remain implicit, unanalyzed, unacknowledged, possibly unknown, uncritically mashed together in the statistical blender. And this doesn’t correct for potential sampling error.
Here’s what would be a better test (assuming it was worth testing “priors,” which it isn’t): Test it on a new sample. How would this sample be chosen? We would have to specify conditions. How would we know what conditions to specify? The actual confounds in any situation are infinite; we would have to narrow them down. We would have to ask the original data collectors what we should control for. This information is not contained in the prior, which is just numbers, not qualities. Our description of our prior, in other words, would need to contain a footnote about the conditions to which it applies. (And for new data to apply to the previous estimate, we would have to ensure that conditions, implicit or explicit, remained unchanged).
What if we took a stab at replication with a new dataset, and the distribution of data reflecting the particular correlation of interest were compatible with our previous prior assumption? Would this mean that our prior was “correct?” Would this mean that the next sample we tested would also work this way? Would this mean that factors that weren’t implicated in our present sampling won’t radically affect future sampling distributions? How would we know? What if the sample turned up a different prior value? Would this mean that our previous selection was not correct, or that we got a different outcome by… chance? But we calculated the prior based on “data!” Does this mean anything at all? Will it allow us to predict any future events with any confidence?
Probability distributions are probabilistic, measured in a small slice of time. Blindly taking the events from that small slice of time and rehashing them (sorry if I’m misunderstanding simulation, please correct me), doesn’t tell us why we got a particular distribution in the first place or that we should expect it to hold in the future. So, in contrast to what Gelman and Shalizi are claiming, the choice of prior probability isn’t testable. It can only be inferred as a matter of induction, which the authors say they’re against. (Plus, the argument seems circular. We need the prior probability to generate the numbers that we need to validate it.)
The problem here is that scientists use numbers to test theories, but theories aren’t about numbers. Theories consist of ideas, arguments, creative solutions to real-world problems, referring to real-world situations, that are as yet unresolved. Scientists don’t sit around taking stabs at untestable-in-principle “priors.” They actually predict what they expect with certainty to happen, not what they think might happen, with probability x. When they succeed, the scientific project may make one of its magical leaps forward.
In other words, a genuine finding in science happens when someone says, in this and this place, at this time, under these specific conditions, you will observe X. If you do observe X, then, assuming that X was otherwise very unlikely to be observed, and assuming that your grounds for expecting it (your arguments/theory) are rational and empirically sound, then we may use the arguments to generate new predictions about other things that we predict, in light of the theory, should definitely happen, and test them. This definiteness of prediction applies even in the context of measurement error; the assumption, for example, that errors are random leads to definite predictions about true values, whether or not they are for practical reasons currently testable.
If all science did was to estimate probabilities, (on god knows what basis), then not observing X at that place at that time under those conditions would, unless the probability was assumed to be 1, be meaningless, whether or not a Bayesian somewhere chose to alter their belief or selection of prior.
The key to scientific discovery is creative thinking about possibility, using facts, (not usually distributions) and going beyond them, making bold guesses (Feynman says colorfully that you need “imagination in a straitjacket”); not looking backward at confounded data and fitting them to statistical distributions. This is why science is risky, and its successes highly improbable in prospect. Whereas fitting data is a sure thing.
Say I want to construct a projectile that will intercept a heavenly body that may float across my sky at some moment. Success will presuppose many complex assumptions with practical implications; and a hit will validate those assumptions, at least for the time being. Even a close call will have value, because of the way it will reflect on those underlying, unobservable assumptions.
The current statistical approach to science is, conversely, risk-free; Bayesians would throw up a million projectiles, count the hits, compare the distribution to their imagined priors, adjust the priors, whatever. But the hits and misses won’t teach them anything, or help them predict anything (in the way our scientist can) because they occur randomly, not by design, not on the basis of a theory, which they understand and test. This seems to be the nature of the current approach to science in many domains. It should be evident that no new knowledge may come of it, except by isolated and uninterpretable luck.
To take a risk-free example cited in the article, we may look at the study on “red states” vs “blue states” by Gelman et al (2010). The goal of this study is stated as follows:
“Income inequality in the United States has risen during the past several decades. Has this produced an increase in partisan voting differences between rich and poor?”
So, basically, do rich people vote differently than poor people?
It should be obvious that the question is very vague, regardless of approach. It should be obvious that answering it entails imagining and taking into account a great number of confounds.
In fact, the authors’ question is a little more specific than initially stated; it refers to the U.S., and to the two main political parties, and to the individual States, and to a particular time frame. Still, confounds are necessarily legion.
The question seems to imply an expectation; rich vote differently from poor. (After all, the authors probably wouldn’t ask, do brunettes vote differently from blondes?) So the authors think there might be a difference. On what basis do they think this? It isn’t particularly clear from the conceptually-vague, barely-there introduction. (They also seem to have misunderstood the Democratic Party, and the interests which it serves).
Strikingly, the authors are up-front about their lack of interest in forming an organized, control-enabling argument to justify going to the trouble to collect voting data (as well as readers’ trouble in reading it):
“We offer no sweeping story here; instead, we share some statistical observations on inequality and voting over time in the country as a whole and in the states…” (“Sweeping story” is apparently Bayesian for “hypothesis-which-allows-us-to-control-for-specific-factors-we-consider-relevant-and-thus-would-render-our-data-interpretable).
The results are highly predictable, “revealing patterns that suggest complex connections between inequality, geography, and partisan voting in the United States.”
Well, thanks for sharing, but I could have told you that ** the situation was complex** without collecting or analyzing any data at all…If I had to put a number to it would say the chances were 100%. Not much new knowledge there.
At the end, they conclude that:
“Income predicts vote choice about as well now as it did 30 years ago, but with a new geographic pattern. In poor states, income is associated with Republican voting much more than before, while in many rich states, the relation between income and vote choice is nearly zero.”
So, income never predicted vote choice very well, it still doesn’t, and we don’t know why, but here are the stats for this time and this place.
The short version of the story is that the authors made a prediction – there will be a general correlation between income inequality and voting patterns – that was not borne out, at least not in any regular, interpretable way. This failure is due to the conceptual laziness in doing the work to consider the problem more deeply, and instead making a casual, crude prediction and producing muddy, confounded, hardly informative “data”that they generously offer us their confounded data set as a free gift.
Science is about thinking up rational stories about things we can’t observe, and that we can’t count, that enable and successful predictions about things we can observe, if we know where and how to look. It’s not about counting and sharing blindly collected “data.” Who cares what the “prior” of a successful, or an unsuccessful, well-founded prediction is supposed to be? The point is to improve the theory so it makes definite predictions. We’re trying to radically reduce uncertainty, not engage in endless, fruitless, philosophical-in-the-worst-sense-of-the-word discussions about how to measure it.
People involved in science understand that you need theories to find the useful data to begin with. One of my favorite quotes is from Darwin, who said that without a theory you might as well count the stones on Brighton Beach. Similarly, Leonardo wrote that: “He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast.” “Practice” here is data-collection with poor theoretical/methodological preparation.
“Statistical hypotheses” are hypotheses without a theory behind them. As such they’re just a crapshoot, as Leonardo and Darwin understood. Which is ironic, because Bayesianism passes itself off as no-risk; just collect more “data” and you’ll get nearer the probability-truth-number. As J. Gallant said in a recent PubPeer conversation, in science, “FIrst, you measure.” I speculate that the spread of “Bayesian” pseudoscientific techniques is correlated with the increased and suffocating control over science by risk-averse bureaucrats and businessmen.
The wool’s been pulled over their eyes, though.
The authors’ claim to tilt toward hypothesis-testing in the Popperian style:
“Popper tried to say how science ought to work…”
No, he tried to explain how science actually works, when it works. You can agree or disagree with his analysis, but the point was to explain the type of practice that produces progress.
“We have generally found Popper’s ideas on probability and statistics to be of little use and will not discuss them here.”
Have you discussed them elsewhere? To casually dismiss the views of a serious thinker like Popper as though they were beneath consideration doesn’t seem very responsible.
An old blog post by Gelman on Popper and Bayes indicates to me that he hasn’t grasped Popper’s insights. He says:
“Our progress in applied modeling has fit the Popperian pattern pretty well: we build a model out of available parts and drive it as far as it can take us, and then a little farther.”
By “available parts” I assume Gelman is talking about available facts; but scientific “models” go beyond available facts. (As a Bayesian, he may not even be referring to facts about the natural world, but disembodied numerical values drained of reference, what Bayesians and others often refer to as “data.”) The “Popperian pattern,” as understood by working scientists such as Feynman, is to make smart guesses that go well beyond the available information; Popper emphasized especially the value of bold guesses, more likely than not to fail but highly fruitful when successful. It was not a philosophy of available parts (facts) being lumped together, but of creative leaps of faith based on rational arguments. (I need to reread Kuhn, but it’s my impression that the two, Popper and Kuhn, are not actually as different as people think. Failed hypotheses being replaced by fundamentally different ones are revolutionary moments in science; and it is also a fact that older hypotheses predicted many otherwise unsuspected facts, including facts that were their downfall. This is one of the surprising things about the hypthetico-deductive process, and why Popper described fruitful hypotheses that fail (i.e. all hypotheses) as approximations to truth.
That arguments about probabilities end up running around in circles in a dead-end cul-de-sac is illustrated by a recent article by Feldman (2017) “What are the “true” statistics of the environment?” He concludes:
“In Conventional Wisdom, cognitive agents can achieve optimal inference by adopting a statistical model that is close to the true probabilities governing the environment as possible, and they are relentlessly driven by evolution toward such a model. In the subjectivist framework advocated here, distinct observers form an interconnected network of partially overlapping but distinguishable belief systems, none of whom has special claim to the truth. On this view—as in traditional Bayesian philosophy—“true” probabilities are not accessible and play no role. To speak of certain environmental probabilities as objectively true—no matter how accustomed many of us are to speaking that way—is a fallacy.”
Bayesians have been groping their way to a Humean epistemology in the context of which, in the words of Bertrand Russell, the man who believes that he is a poached egg is to be condemned solely on the basis that he is in the minority.
Having read Feldman’s concluding remarks, what is the point of reading the arguments leading up to them? It’s simply a case of “Your guess is as good as mine.”
Science does have a special claim to truth, as evidenced by it’s success in controlling natural phenomena. Otherwise, why do we make a distinction between scientific and other beliefs? What is the basis for this distinction?