By Pascal Michaillat*, Brown University
It is commonly believed that the lack of experimental evidence typical in the social sciences slows but does not prevent the replacement of existing theories by newer, better ones. A simple model of scientific research and promotion challenges that belief, however. In the model, scientists are slightly biased toward other scientists with similar beliefs—a well-documented behavior—and as a result, when a science lacks evidence to discriminate between theories, better theories may never be adopted.
Knowledge is an ever-advancing frontier, with newer and better paradigms replacing the paradigms that, until then, were the newest and best. This process of scientific discovery moves forward on the strength of evidence. But in some fields, it is harder to obtain evidence than in others. For instance, it is much harder to collect experimental evidence in medicine and in the social sciences than in the natural sciences.
Such lack of experimental evidence, many economists believe, may slow but will not prevent the adoption of newer, better theories. The belief, which dates back to Milton Friedman’s essay on “The Methodology of Positive Economics” (1953), is that by accumulating empirical evidence, the worst theory will eventually be “weeded-out”. It is a comforting belief; but is it also true?
In a recent paper, George Akerlof and I evaluate this belief using a simple model of scientific research and promotion. The model allows us to conduct a thought experiment. Imagine that two theories, one better and one worse, are competing for acceptance by the scientific community. Under which conditions will the better theory eventually prevail?
In modeling science, we follow the description of the scientific process that Thomas Kuhn gives in “The Structure of Scientific Revolutions” (1962). There are two paradigms, one giving a better description of the world than the other. Scientists adhere to one or the other paradigm, and scientific inquiry occurs in an environment fashioned after the tenure system in academia: advisees trained by tenured scientists hope to become tenured scientists themselves. The strength of a paradigm is measured by the fraction of tenured scientists adhering to it; a paradigm is weeded out as more and more tenured scientists adhere to the other paradigm.
To mirror the environment in which scientific production is evaluated, we allow scientists to be slightly biased toward other scientists with similar beliefs. This type of bias is widespread in human communities. It corresponds to what sociologists call “homophily” and what social psychologists call “intergroup bias”. The bias has also been observed in science. Scientists have been found to favor others from the same school of thought at every level of academic evaluation: hiring, award of grants and honors, conference invitations, tenure evaluations, and so on.
“[…] when a science lacks evidence to discriminate between theories, even with only a slight amount of bias, there is a risk that inferior paradigms may prevail.”
Using this model, we find that when a field lacks evidence to discriminate between theories, even with only a slight amount of bias, there is a risk that inferior paradigms may prevail. Specifically, the conditions under which inferior paradigms prevail when they are in contest with better ones are these: if scientific tests lack power, or if tests are rarely used in determining admittance of a paradigm into the fellowship of established scientists, then the chances of getting trapped in an inferior paradigm are high. Lack of power does not just slow scientific progress; with even the slightest bias, it may bring it to a halt.
History shows that worse paradigms have indeed persisted and at society’s cost. Two examples illustrate such persistence and its costs. The first example is from medicine: the persistence of bloodletting in the 19th century. In the 1830s Pierre-Charles-Alexandre Louis, a practicing physician in Paris, ran an early-day randomized controlled trial on pneumonia patients. The first group was treated with bloodletting in the first four days of the disease; the other group, with bloodletting in days five to nine. Louis found a much higher fatality rate for those with early treatment, which was difficult to explain if bloodletting was as beneficial to pneumonia patients as it was thought to be. Yet Louis’ results did not have significant effect on practice. Neither did later, more conclusive findings by John Hughes Bennett at the Edinburgh Royal Infirmary. The reason is that physicians were very averse to statistical methods. Medical historian John Harley Warner culled doctors’ letters and found that physicians viewed themselves as professionals with clinical duties toward their patients. For them, it was a denial of clinical duty to base judgment in individual cases on statistical samples of unknown patients in different locales and in different circumstances. In 19th medicine, physicians’ criteria for promotion rested on a candidate’s ability to carry out existing medical practice, not on the candidate’s adoption of procedures that passed high-powered scientific tests.
The second example is from macroeconomics: the nonadoption of underconsumption theory in the early 20th century. In 1887, Uriel Crocker, a Boston lawyer, published an article in the first issue of the Quarterly Journal of Economics regarding the possibility of excess supply during economic downturns. This proposition went against the existing paradigm at the time that supply creates its own demand, and drew sharp opposition. Harvard professor Silas Macvane responded to the article with a comment that concluded the proposed theory was “absurd”. As a result, Crocker’s idea was forgotten. It was not until the Great Depression generated a powerful test of the old paradigm that supply creates its own demand that economists no longer dismissed underconsumption theory—which was rediscovered by Keynes in the “General Theory” (1936)—as absurd. But Crocker never became known as Keynes’ precursor; in fact, Crocker’s article has only been cited three times according to Google Scholar (one of which is by us). Thus, a new, better paradigm languished for almost half a century in the absence of high-power scientific tests to distinguish between the new paradigm and the old.
The natural sciences have been immensely successful in the past 500 years. Our model suggests that two features have played an important role in their continuous progress. First, the natural sciences have made remarkable discoveries of high-power tests capable of distinguishing between true and false paradigms. Second, established scientists have been committed to admit into their ranks those whose work respects the findings of high-power tests, insofar as they are available.
“[…] Can we be certain that our biased beliefs, expressed at every stage of academic life (…) do not prevent newer, better theories from being picked up as they become available?”
Those of us working in the social sciences, then, have two things to consider. First, the development and adoption of new empirical methods increasing the power of scientific tests should be applauded and encouraged further. In economics, the adoptions of randomized controlled trials, administrative datasets, and laboratory experiments aim at increasing power, which is moving the field in the right direction. Nevertheless, it seems unlikely that our methods will ever match the power of those employed in the natural sciences: we will never be able to stick people in a Large Hadron Collider. Furthermore, some fields are inherently in a more difficult situation than others. Anthropology works with small populations, so by nature it will be hard to design high-power tests. In economics, it is hard to imagine running experiments with entire countries, placing a bound on the power of tests in macroeconomics.
As such, can we be certain that our biased beliefs, expressed at every stage of academic life—when we review journal articles or grant applications, send seminar and conference invitations, write letters of recommendations, make hiring and promotion decisions—do not prevent newer, better theories from being picked up as they become available? Whether or not we know it, we may have homophilous tendencies and occasionally express them. But as our model suggests, even a small amount of such bias can have large consequences. What then can we do to help the better paradigm prevail in our respective fields?
*Pascal Michaillat is an assistant professor in economics at Brown University. His research is mostly in macroeconomics, with a focus on unemployment and policies related to unemployment. Working in macroeconomics, however, makes one wonder what makes a good model and why some models are used more than others. Some of his research is trying to address these questions.