Anyone who’s ever taken an introductory psych course has probably heard the story of “Clever Hans,” the horse who, for a time, was credited with possessing intellectual gifts far above his pay grade. He was, to all appearances, able to perform arithmetic, guess at the composers of melodies, string together letters to make words and sentences, spell out the name of the painter of a picture, and so on, all by pawing at the ground with his hoof an appropriate number of times. (Letters were given number tags).
A natural suspicion that the whole performance was a hoax was challenged; first, by the experience of a well-known zoologist, who proceeded to ask Hans the questions himself, and still managed to elicit correct answers, and then by the investigation of a scientific commission set up by the German board of education, which reported that they could c discern no signals, intentional or unintentional, by which Hans might be being alerted to the correct response.
Eventually, however, biologist/psychologist Oscar Pfungst and colleagues, on the basis of careful experiments, were able to demonstrate that the impossible was not possible, that Hans was, indeed, receiving subtle, unintentional cues indicating the correct answers.
How did they achieve this? Not, in the first instance, by observing yet more instances. Their first move was to prove, by a simple test, that Hans was getting outside help in answering the questions.
Specifically, they showed that when the questioners themselves did not know the answers to the questions, or were screened off from the horse’s view, Hans’s performance plummeted:
With blinkers on, Hans performance was impaired, but he was still able to produce correct responses with some frequency.
When larger blinkers were employed, or a tent separating him from his questioner, performance collapsed. (The investigators had surmised from Hans’ strenuous efforts to see his questioner, that the normal blinkers had allowed glimpses and some degree of success in response, and so had proceeded to this refinement).
Similarly, if the questioner did not know the answer, Hans performance was poor. And still, Hans, even in this case, it was sometimes better than it should have been if Hans were not figuring things out for himself. It turned out to be necessary that no one in the room know the right answers for Hans to fail completely.
It thus became apparent that his questioners had, unbeknownst to themselves, been signaling the horse when to stop hoofing. Yet, so subtle were the clues, that even after this link between questioner and horse had been thus carefully demonstrated, clued-in, eagle-eyed spectators of the Hans show continued to insist that no such signs existed! (Hans’s trainer, meanwhile, at first stunned by the horse’s failures when unable to view his questioner, soon recovered his faith in the horse’s abilities and remained “as ardent an exponent of the belief in the horse’s intelligence as he had ever been.”)
In order to ascertain the source of Hans information, Pfungst conducted a careful series of experiments. Eschewing “natural conditions” (so popular in today’s vision science practice), he felt it necessary to set up special, controlled conditions:
The observations on the horse under ordinary conditions would have been quite insufficient for arriving at a decision as to the tenability of the several possible explanations. For this purpose experimentation with controlled conditions was necessary.
An unusually acute observer, Pfungst noticed that:
As soon as the experimenter had given a problem to the horse, he, involuntarily, bent his head and trunk slightly forward and the horse would then put the right foot forward and begin to tap, without, however, returning it each time to its original position. As soon as the desired number of taps was given, the questioner would make a slight upward jerk of the head. Thereupon the horse would immediately swing his foot in a wide circle, bringing it back to its original position. (This movement, which in the following exposition we shall designate as “the back step”, was never included in the count.). Now, after Hans had ceased tapping, the questioner would raise his head and trunk to their normal position. This second, far coarser movement was not the signal for the back-step, but always followed it. But whenever this second movement was omitted, Hans, who had already brought back his foot to the original position and had thereby put it out of commission, as it were, would give one more tap with his left foot.
These observations required unusual powers of discernment on the part of the observer, both because they were very minute, and because they were mixed in with others; in the case of the trainer, that very vivacious gentleman made sundry accompanying movements and was constantly moving back and forth. To abstract from these the essential and really effective movements was truly difficult. Other questioners had their own behavioral quirks. There was also the question of the timing of the cues – did they really come before, and not after, Hans made his decision to halt his count?
Again, it was understood by the investigators that the observations needed corroboration via carefully controlled experiments, and that such corroboration entailed the achievement of virtually perfect correlation among hypothesized influences (and thus virtually perfect potential control – producing them “at pleasure”) – over outcomes.
If it was true that these movements of the questioner guided the horse in his tapping, then the following must be shown: First, that the same movements were observed in Mr. von Osten in every case of successful response; secondly, that they recurred in the same order or with only slight individual changes in the case of all who were able to obtain successful responses from the horse, and that they were absent or occurred at the wrong time in all cases of unsuccessful response. Furthermore, it was observed that it was possible to bring about unsuccessful reactions on the part of the horse as soon as the movements were voluntarily suppressed, and conversely, that by voluntarily giving the necessary signs the horse might be made to respond at pleasure; so that anyone who possessed the knowledge of the proper signs could thereby gain control over the process of response on the part of the horse.
Pfungst was clearly not interested in merely observing outcomes and modelling their probability distributions, but in actively testing hypotheses that would lead to perfect predictability. He successfully carried out his program, achieving control (and thus predictive accuracy) over Hans’ performance.
It’s worth reading about how ingeniously and carefully the investigators explored, via series of questions and answers, all the facets and variations of Hans’ performance. And it’s worth noting, finally, that the ultimately successful project hinged on a series of testable conjectures that step-by-step refuted any vestiges of the notion that Hans was answering the questions on his own steam. Even the (eventually) key question “what visual cues was Hans using” did not emerge from the initial data, but was the product of insightful speculation and targeted tests.
How would a”Bayesian” have approached the problem of Hans?
How would a “Bayesian” have approached the problem of Hans’ intelligence? Would access to the very latest”Bayesian” techniques and automated packages have helped achieve, or even accelerate, Pfungst’s discoveries? Is discovery even part of the Bayesian program?
From my understanding of “Bayesianism” (and I would ask anyone to please correct me if I’m wrong), one would begin with a numerical estimate of one’s prior belief.
Belief in what? That horses are capable of doing sums? Of guessing composers? Etc? Or that Hans the specific individual horse is capable of such achievements? Or that he is not capable? Or that his trainer is feeding him the answers? Or that his trainer is feeding him the answers visually? The choice of which question to assign a prior to is perhaps obvious to a Bayesian, but not to me. So I’ll go with this: “How probable is it that Hans the individual horse is capable of doing sums in his head?”
We consult our feelings and prior information, and pick a number. This is our “prior.” We’ll assume it starts out pretty low. We might want to write it down so as not to forget it, because we’ll need to plug it in for our future calculations.
Now, according to Andrew Gelman, “Bayesian inference is conservative in that it goes with what is already known, unless the new data force a change.”
What is the new data, in this case? It could only be, the results of Hans’ public performances.
In these performances, Hans always answered the questions correctly. In other words, these data are undeviatingly positive. The more successful performances you were to observe, the more your belief in Hans’ abilities should increase, and the more your prior will tend to be “swamped” by the new data. (If you are a lazy Bayesian, and forego observation, your prior may be less labile.)
Keeping an open mind, you continue to wait for, or actively seek out, new data. You go to more shows, read more news stories, checking to see if Hans’ performance is holding up. You keep updating.
You read the Commission’s report. Positive, positive, positive. You update again. Your belief probability should be pretty high by now; unless you’re a skeptic, (a complication that may present a computational challenge which will not be dealt with here).
You look at Pfungst’s data, and Hans’ mixed success rate there. You factor this new data into your probability estimate of Hans’ ability to do sums, which previous positive data had raised up; your estimate should probably fall, but it’s your decision. You keep an open mind, waiting on any new data that might rain down at any moment, or actively searching it out near and far.
It’s also, I suppose, possible that your prior knowledge caused you to discount all of this data, so your probability belief stays low…But as a Bayesian, is it really your role to question or interpret the data, as opposed to simply summing it up? How does rational interpretation figure into Stan algorithms? These complications will be discussed more below, but they are not typically of concern to Bayesians…What is of concern is that a number label be attached to their beliefs based on the data that are chosen for consumption and distributional analysis. The number is the thing. Statistical packages are now available to assist those not comfortable with the necessary mathematical operations on the data. These operations will help all who wish to achieve a probability estimate with which they feel more or less comfortable. Because it’s subjective, you don’t have to justify it to anyone; and agreement with Stan’s output is also optional. We each have our own truth, and that’s ok.
What’s the problem?
It should be obvious, first, that bare-naked references to “data” are vapid. “Data” – in this case correct answers vs incorrect answers from Hans’ performances, from the investigations of the commission, from the various tests (under various conditions) conducted by Pfungst during his process of discovery, could pile up ad infinitum. Even though Bayesians claim not to be “frequentists,” they actually employ data in a frequentist fashion, except that they include an initial term that is made up by each individual, the “prior” probability of…whatever. These frequency distributions of bare-naked data are supposed to inform an individual’s current probability belief. But this belief, arrived at in this way, can be of no theoretical value, i.e. of no interest to anyone interested in actually solving the problem at hand.
The events (here, correct/incorrect answers) that count as “data” are never directly linked to the truth or falsity of a hypothesis – here, the hypothesis that Hans is (or, alternatively, is not) thinking for himself. The same data, in this case, are consistent with either view, and relative frequencies of outcomes have nothing to say about the relative truth of either one. The data are wholly contingent on conditions; conditions count, but not in a Bayesian/frequentist way. They count in a logical way. When Hans gets things wrong with blinders on, we might surmise that they are acting as a distraction, especially as he was still performing relatively well. Or we might interpret this as an incomplete restriction on his view of cues, and try again with bigger blinders. Is the application of bigger blinders data-driven? How about in the case of the later tests, carefully designed and controlled to test various possibilities?
It should be clear that the positive data from the original public demonstrations of Hans’ abilities should not count in the same way that the negative data from certain of Pfungst’s experiments count. In other words, they cannot be meaningfully simply summed and organized into a probability distribution. The two kinds of data – “positive” and “negative” should not be pitted against each other as though they were coin tosses. Their value depends on the creative and logical interpretation (Feynman’s “imagination in a straitjacket”) of previous-data-plus-conditions, leading to new theory-inspired tests and interpretations of new data-plus-conditions.
Anyone who chooses to calibrate their beliefs on the basis of simple summing of the bare-naked data, without references to conditions and their theoretical implications, will obviously never achieve the control over outcomes that was achieved by Pfungst in the case of Hans. What’s more, the probability belief achieved in this way would fail to reflect the distribution of the responses under an infinite number of conditions, including when the investigator undertook to completely obstruct Hans’ access to revealing signals. This product of Bayesian inference, in other words, like all products of Bayesian inference, would have no real-world value.
The next question is, would it be appropriate to call theory-based selection and control of conditions, “data-driven”? If not, then science is not “data-driven.”
According to Technopedia, Data driven is an adjective used to refer to a process or activity that is spurred on by data, as opposed to being driven by mere intuition or personal experience. In other words, the decision is made with hard empirical evidence and not speculation or gut feel.
So, not theory, not “mere intuition…experience, speculation.” Just bare-naked data.
Without a theory, as Darwin observed, you might as well count the stones on Brighton Beach. Yet the “data-driven,” “data-mining,” blind correlation-seeking (multivariate analysis) framework has come to dominate (defending an intellectually vacant psychology unable to achieve reliable predictions) to the point that a “Bayesian” cottage industry has sprung up to turn the resulting confusion into a probability estimate the gullible can believe in. Wasn’t that a function of religion – to produce a (false) sense of control over things that were not under our control (because beyond our understanding)? Weird.
Short version: The probability of “the truth of a hypothesis” is supposed, by Bayesians, to be judged on the basis of the probability of certain events supposed to signal the action of the hypothesized forces or principles; But the probability of any observable event (like Han’s responses) is wholly contingent on conditions. The frequency of the events may be altered ad lib. Thus any reference to the probability of an event – “datum” – must contain reference to the specific conditions under which it arose. Explanation, i.e. inference as to factors dispositive to the outcome, requires the removal, via speculation of what those might be, of confounds, such that the outcome may be predicted with overwhelming confidence. Otherwise, the procedure is impotent in predicting outcomes, in controlling outcomes, and in using that control to generate new kinds of events (or series of events) with virtually zero probability of occurring prior to the selection and control of special conditions; and in so doing, corroborate prediction and, provisionally, the assumptions that led to them.
Continuing random thoughts
When, for example, the orbit of Uranus was shown not to quite agree with Newton’s prediction, the procedure was not simply to say, well, I guess we have to adjust our probability estimates for the locations of Uranus, or for our belief in Newton’s laws, or whatever. The response was to speculate (!!) about what could be going on. Was Newton’s theory wrong…or was there another body influencing the motion in accordance with Newton’s theory…? On investigation in light of this speculation, Neptune was discovered. Was this data-driven?