Which R packages do scientists use?

An analysis of 19,000 R scripts hosted by The Dataverse Project

Author

Vincent Arel-Bundock

Published

January 13, 2023

I downloaded 1.9162^{4} R scripts from 4539 projects hosted by the The Dataverse Project. This notebook reports usage statistics for R packages in this large sample of real-life scientific applications.

To download data from Dataverse, I adapted a script from Trisovic et al. (2022) and wrote original Python code. Then, I used the renv::dependencies() function from the renv package for R (Ushey, 2022) to extract the names of R packages used in each script.1

WARNING: This was a very quick job and I did very little quality control on the data. Please take all this with a grain of salt.

Trisovic, A., Lau, M.K., Pasquier, T. et al. A large-scale study on research code quality and execution. Sci Data 9, 60 (2022). https://doi.org/10.1038/s41597-022-01143-6

Ushey K (2022). renv: Project Environments. R package version 0.16.0, https://rstudio.github.io/renv/.

Number of projects and packages over time

dat = dat[date > anytime("2013-12-31 UTC")]
Warning in check_tzones(e1, e2): 'tzone' attributes are inconsistent
dat[, month := anytime(format(date, "%Y-%m-15"))]

projects = dat[, .(N = length(unique(dataset_id))), by = "month"]
packages = dat[, .N, by = "month"]

p1 = ggplot(projects, aes(month, N)) + 
    geom_line() +
    labs(x = "", y = "", title = "Projects")
p2 = ggplot(packages, aes(month, N)) + 
    geom_line() +
    labs(x = "", y = "", title = "Packages")
p1 + p2

Number of projects uploaded to Dataverse and number of packages used in those projects, by month.

Usage statistics for R packages loaded at least twice

# count only one use per project
dat_count = dat[, .(date = min(date)), by = c("dataset_id", "Package")]

dat_count = dat_count[, .(`Number of times loaded` = .N), by = "Package"]
dat_count = dat_count[order(-`Number of times loaded`)]
dat_count = dat_count[`Number of times loaded` > 1]

DT::datatable(dat_count, options = list(pageLength = 50), rownames = FALSE, width = 300)

Footnotes

  1. renv was not able to parse every script.↩︎