Your R code may not be yours
Posted: 2025-11-22 · Last updated: 2025-11-22 · Permalink
tl;dr: Your R analysis code likely has to be licensed under the GNU GPL. My r-snippets README tells you how to proceed for maximum legal compliance.
I have a confession: Until recently, I believed my research code was mine to license however I pleased. Then I discovered a significant legal gray area that should concern anyone sharing R code publicly.
While preparing a replication package, I stumbled onto a contested question in open source licensing: If your R code loads certain popular GPL-licensed packages, can you legally license it however you want? The answer is uncertain, but the safer and more logical interpretation suggests you cannot, strictly speaking.
Many of the most popular R packages used in econometrics and empirical research are licensed under the GPL (GNU General Public License). The sandwich package is perhaps the most important case. It’s utterly ubiquitous for robust standard errors. But it’s far from alone: lmtest, plm, fixest, AER, ivreg, and MASS are all GPL-licensed. These aren’t obscure packages. These are veritable workhorses of empirical economics.
Now, if these packages were licensed under the LGPL (Lesser GPL), there would be no controversy. The LGPL explicitly permits linking to libraries without viral copyleft effects. But they aren’t. They’re licensed under the GPL, and this is where things get complicated.
According to the Free Software Foundation’s interpretation of the GPL, code that loads and uses GPL libraries is a “combined” or derived work and must itself be licensed under the GPL. Under this reading, that library(sandwich) at the top of your analysis script makes your entire file a GPL work. Your carefully crafted LICENSE file declaring your replication code as CC0 or MIT is potentially invalid. You may have inadvertently contravened copyright law.
There is no settled case law on whether loading a GPL library in an interpreted language creates a derivative work under copyright law. This matters enormously, and there are two competing theories:
The FSF’s position: The GPL FAQ states that when an interpreter provides “bindings” to GPL facilities, the interpreted program is “effectively linked” to those facilities. Under this view, library(sandwich) creates a combined work that must be distributed under GPL terms. This interpretation has some logical force: if there were no difference between GPL and LGPL for interpreted languages, why would LGPL exist? Under the idea of “software freedom” espoused by the FSF, it simply cannot make a difference whether the language is interpreted or not, as the GPL is supposed to protect users’ rights. These rights are paradoxically best-protected by a maximally infectious copyleft license.
The R community’s position: The well-known “R Packages” book by Hadley Wickham states it’s their “personal opinion that the license of your package doesn’t need to be compatible with the licenses of R packages that you merely use by calling their exported R functions.” The R Foundation clarified in 2009 that R code doesn’t need to be GPL-licensed just because it uses R. Thousands of CRAN packages use MIT licenses despite likely depending on GPL packages, suggesting widespread acceptance of this interpretation. Certainly is is correct that the mere use of R the programming language itself does not impose a particular license on R code, but interfacing too closely with R APIs may (see below).
Which interpretation is correct? Legally, we don’t know. There’s been no court case. But here’s the critical point: the FSF’s interpretation is safer from a compliance perspective. If you want to be conservative about license compliance, treating your code as GPL when it loads GPL packages is the less risky choice. (Indeed, the risk is zero, but it is sad that we cannot be even more liberal.) The FSF’s position is also logically more convincing:
Perhaps I can use an economic analogy: if the use of a GPL-licensed work is highly substitutable, the GPL does not apply to your use. But if the use is not highly substitutable, the GPL applies. In other words, if your code is so narrow that it relies on a particular (GPL-licensed) implementation, it is a derivative. However, copyright determinations involve many factors (creativity, expression vs idea, transformative use), so this is more of a heuristic. Nonetheless, as analysis code often works around particular, highly specialized implementations, it is likely to constitute a derivative.
There is an important exception: if you’re only using your code privately or within your organization, the GPL doesn’t restrict you. The copyleft provisions only apply when you distribute the code to others. But when you submit a replication package to a journal, post your code on GitHub, or share your analysis with another scientist, you’re distributin’.
I suspect there are thousands of replication packages sitting in journal data repositories right now with licensing terms that might be problematic under the stricter interpretation. Researchers who carefully chose licenses for their code, possibly unaware that this legal question even exists. Code released as “public domain” or CC0 that might need to be GPL.
Are these researchers violating copyright law? Under the FSF’s interpretation, possibly. Under the R community’s interpretation, no. The legal ambiguity around interpreted languages means there’s room for disagreement, even among sophisticated users of open source software. That uncertainty itself is a problem when we want clear terms for code reuse.
When I discovered this issue, I had to make a choice for my r-snippets repository. These snippets use (excellent) GPL packages. I could have relied on the permissive interpretation, but I opted for the conservative approach: I updated the README to license them under GPL.
For my own replication package, we’re taking the same conservative approach, even though I tend to prefer more permissible licenses. (It depends on context, though.)
Let me be very clear about one thing: If you’re writing an R package meant to be used as a library by others, strongly consider LGPL instead of GPL.
The LGPL (Lesser GNU General Public License) was designed precisely to avoid this ambiguity. It allows your library to remain free and open source while explicitly permitting users to link to it without their code becoming a derived work. No legal uncertainty, no competing interpretations. This works as long as your library doesn’t bind too closely to R-internal APIs. If your library uses standard features of R the language, LGPL is fine.
I’ve done exactly this with some of my own software (unrelated to R). uproot is licensed under the LGPL specifically to avoid creating this problem for users. If you’re writing a library that’s meant to be called by other people’s code (which is basically the definition of an R package), LGPL removes all ambiguity. It is fundamentally a question about where to draw the line. The LGPL draws the line a bit closer than the GPL.
The one piece of good news: the GPL applies to code, not to data or to output generated by that code. You can still license your datasets under CC0 or CC-BY or whatever terms are appropriate. The GPL doesn’t “infect” your data, only your code. My r-snippets README contains further information.
Licensing really matters. And open science depends on clarity about reuse rights. The current situation creates uncertainty: If I find your replication package with a LICENSE file saying MIT but your code loads GPL packages, what am I supposed to conclude? That you’re following the permissive interpretation? That you’re unaware of the question? That you researched it and made a conscious choice?
The GPL is not a bad license. It has served the free software community well for decades, and many people prefer its strong copyleft provisions. I do too, in some cases. But in the context of interpreted languages and package dependencies, its requirements are unclear. If you’re going to use GPL code (and if you’re doing econometrics in R, you almost certainly are), you should at least understand that this legal uncertainty exists and decide which interpretation you’re comfortable with.
So check your code. Check your licenses. If you’re loading GPL packages, you face a choice: follow the conservative FSF interpretation and license your code as GPL, or rely on the R community’s permissive interpretation. Given the uncertainty, the GPL approach is arguably safer. But whichever you choose, you should make that choice consciously.
Your R code might not be yours to license freely. The law is unclear, but now you know the question exists.