On the poverty of quantitative analyses
Posted: 2024-02-15 · Last updated: 2024-02-27
How can quantitative analyses give rise to policy? Only through rigorous a priori reasoning. Let me give you an example. Suppose you have a huge data set on living conditions in a city. You run a multiple linear regression of rent on income and other variables. Suppose further that all the assumptions for this regression are met—in other words, your estimates are BLUE. Your model is completely valid. You obtain the following result:
$$ \text{Rent} = \alpha + \beta \, \text{Income} + \gamma Z + \varepsilon $$
where, for the sake of argument, $\alpha = 0$, $\beta = 0.3$, and $\gamma$ contains coefficients for variables not of interest to the discussion, $Z$.
In this case, $\beta$ represents the average treatment effect of a unit increase in Income on Rent.
For example, if my Income changed from €1,000 to €2,000, ceteris paribus, my Rent would be expected to increase by €300.
What can we conclude from this? Surprisingly little. Even though the regression is entirely valid by assumption, our coefficient estimates do not tell a story by themselves.
How does the causal effect of Income on Rent come to pass? There are many possible mechanisms.
One story is that everyone's landlord is also their employer. It may very well be the case that there is some rule in the employment contract that states that 30% of Income are to be paid for rent. In that cases, there is a very direct relationship of Income on Rent.
Many other stories are conceivable. It could be that the government pays a stipend equal to 30% of one's Income for housing, and that it is prohibited to pay more or less.
However, the relationship need not be that mechanic. Another story is that higher Incomes lead people to move to nicer housing. Recall that $\beta$, under our assumptions, represents the average treatment effects—not everyone's. In this story, we could conclude that on average, people spend about 30% of their Income (and thus, their change in Income) on Rent.
The regression does not tell us the underlying mechanism. Why is that important? The reason is very simple. We cannot form policy recommendations based solely on quantitative analyses, even if these analyses are entirely valid. Clearly, the policy recommendations are vastly different: In the first story, 30% could be seen as a substantial chunk of Income that is in essence (re)captured by the employer. In the second story, we would see a stark differential between the system described and the system commonly still observed in many Western countries; we would simply need more information. In the third story, however—the story applicable to many Western countries—the 30% come about as a byproduct of voluntary choice of individuals. It is not that my landlord observes my Income and determines Rent accordingly. Indeed, it is me who determines my Rent because I can move should my income change.
Yet, as “big data” and “machine learning” have become popular, it has become important to understand this inherent limitation of quantitative analyses: they tell us nothing about process. But process is crucially important for economic analyses and the derivation of policy recommendations. This becomes ever clearer if we observe that $\beta$ changes over time. Should $\beta$ later rise to 0.4, we will not be able to conclude that there is anything wrong with that without understanding the mechanism underlying the causal connection between Rent and Income. In the first and second stories, we could be rightly concerned about the change. But in the third story, the increase in $\beta$ could be the result of a change in individual preference and choice. Clearly, the factors leading to a change in preferences cannot all be measured in $Z$—they cannot be kept constant over time. And this does not even address the fact that Rent is determined through supply and demand and we have implicitly assumed that supply remains constant!
Revolutions in the credibility of quantitative analyses have, unfortunately, obscured the deep, implicit and often unmeasurable processes underlying causal relationships. This is not to say that quantitative analyses don't matter or cannot improve. As positive descriptions of reality, they have great relevance. But no matter how sophisticated, they cannot by themselves be used for policy. It is important we understand that quantitative analyses have a contextual poverty to them that cannot be overcome except through a rigorous a priori qualitative exploration of the processes and mechanisms at play. The point here is a very simple one; but it is a profound real-life challenge of empirical research.
I am sure that empirical researchers do not miss this issue. Indeed, their papers typically feature substantial qualitative discussions of the context of their study. This post addresses policymakers who seek to improve the scientific backing of their proposals. It turns out that this is surprisingly difficult. Policymakers must construct institutions within which humans engage in exchange. That some of these configurations of these institutions—such as culture—“don't matter” (i.e., form part of the error term) is always just an assumption. Even if we can attribute a change in Rent to a change in Income, this does not mean we understand the situation, and thus we are epistemically limited from copying the institution studied to someplace else or make changes to it and expect results.
There is another point here. Who’s to decide that Rent is what matters? Policymakers endogenously select the outcome variables that they find important. Policymakers cannot optimize some utilitarian welfare function, nor do they. Even if we knew the institutional conditions leading to the emergence of our coefficient estimates, there is no inherent connection to policy. Surely, if I presented an (entirely valid) study that the number of polka-dotted socks owned by individuals in a city falls when water consumption increases, everyone would understand that it cannot translate into policy. Yet, when we hear about Rent, Income, CO₂ levels, inflation, COVID infection rates, leisure activities and so on, we attach some degree of goodness or badness to them. We need to be aware, however, that policymakers are able to pick not just studies that conform to their ideas, but outcome variables. In addition to the study on polka-dotted socks, another policymaker could present a study that showed that the likelihood of being a Genesis fan increases in water consumption, and another one could plausibly demonstrate that water consumption increases purchases of videotapes. There is no inherent ranking of these outcome variables, and disputes about which “facts” should matter cannot be resolved by science. They are unavoidably resolved by political institutions. Science, by itself, is not a challenge to and not an argument for policy. It does not resolve the question of priorities.