source("renv/activate.R")
Sys.setenv(RENV_DOWNLOAD_FILE_METHOD = getOption("download.file.method"))
options(repos = c(CRAN = "https://cloud.r-project.org"))
Organizing, managing, and executing reproducible research projects is hard because there are so many possible reasons why a project cannot be reproduced.
- No data? Not reproducible.
- No code? Not reproducible.
- Software is unavailable? Not reproducible.
- Computational environment is unspecified? Maybe.
- Software versions are unknown? Eventually no.
We think that we have one of these threats to reproducibility whipped, then some unexpected development or complication throws it all back in doubt. I think this is especially true for someone like me: an intermediate R user who has no background in computer science or programming other than what I have learned on my own in the last couple years. This post reflects on my learning how to use renv
as part of a reproducible research workflow and how I have handled a couple problems along the way.
Welcome renv
The renv
package solves (more or less) the software versioning issue. With a couple commands all R package versions are recorded in a file, allowing the R package environment of the project to be rebuilt from scratch. I have used renv
for several months now and have appreciated how much it has simplified the task of installing and staying current with package updates. It has also streamlined the storage of R packages on my computer.
In a nutshell, here’s how it works. Once renv
is installed init()
creates a lockfile (renv.lock
) that holds package version information and details about how packages were installed. For example, if a package is installed from github, that is recorded along with the username of the repository. When the project is shared, the lockfile can be read by restore()
to install all packages exactly as they are recorded. The snapshot()
function adds additional packages as they are installed in the project and status()
reports packages that need to be installed and/or added to the lockfile. It’s really that easy!
First Headache
Learning to use renv
and incorporating it into my workflow wasn’t without a couple headaches. At the start, I couldn’t use renv::
install()which works similarly to base
renv::install.packages()but is more flexible and intuitive. I would run
renv::install("somepackage")
and an error would be returned, to the effect that “package ‘somepackage’ is not available” Yet utils::install.packages()
, the “base” package installation function (utils is part of the R distribution), worked just fine.
Apparently many others have had this problem. What it came down to for me is that R and renv
were using different download methods. These methods can be checked using getOption("download.file.method")
and renv:::renv_download_method()
for R and renv
respectively. It seems for me (on Windows) that the two available methods are lib
and libcurl
. These are closely related software libraries/tools created and maintained by the cURL (Client for URLs) project which enable internet file transfers.
I resolved this problem by including one line in my Rstudio project .Rprofile
file: Sys.setenv(RENV_DOWNLOAD_FILE_METHOD = "libcurl")
. Being in the .Rprofile
within the project, this command is always run when loading up the project in Rstudio. A more robust solution that dynamically retrieves the download method currently used by R is what I use now: Sys.setenv(RENV_DOWNLOAD_METHOD = getOption("download.file.method"))
(see this stackoverflow question)
Second Headache
That first problem seemed to happen again some months later. Again, renv::install()
would return an error stating that the package was not available. Apparently, the repository being targeted by renv
was the issue. I don’t know what caused it, but I am guessing some update in the backend of renv or its dependencies was responsible.
The call getOption("repos")
returns the repository currently being used which for me was something like https://packagemanager.posit.co/cran/
. I resolved the problem by manually setting the “repos” option in my project’s .Rprofile
file. Thus after solving these two headaches, I start new Rstudio projects using renv
with my .Rprofile
looking like this:
Unanticipated Benefits
The above problems were really not too hard to resolve even for someone with a limited software knowledge base like me. The documentation and support provided by Kevin Ushey, principal developer of renv
, and the rest of the Posit team really is superb. I have no doubt that this package will remain stable and functioning going ahead.
As I have adopted renv
into my normal project workflow I have discovered a few benefits beyond supporting research reproducibility. First, the renv::install()
and renv::update()
functions are very smooth and intuitive, much better than utils::install.packages()
. In particular, I run renv::update()
every week or month to automatically update all project packages - including renv
itself! Both functions are faster than the utils
function and provide more informative errors.
A second benefit is the efficient caching of R packages on my computer. After the first hundred or so installed packages, the disk space required becomes noticeable. Before renv
, I occasionally installed packages twice in different locations and accumulated different versions over time. renv
keeps a common cache for the device which it then links to in individual renv
projects. This means that for a given package and version, it will only ever be installed once. My computer only has a 200GB hard drive so saving a GB here or there is very nice.
A third benefit is the quick installation of new-to-me packages. Often dependencies are already included and built in my cache so all renv
has to do is link to the cache for that dependency. Thus, downloads and builds from binaries are minimized and new packages are ready to use in mere moments.
I think all scientists and researchers who use R and who are committed to research reproducibility should adopt renv
into their workflow.