Which R packages do scientists use?
An analysis of 19,000 R scripts hosted by The Dataverse Project
January 13, 2023
pkgs =
> renv ggplot2 anytime DT here parallel patchwork
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> data.table
> TRUE
dat =
n_projects = 4539
n_scripts = 19162
# n_projects = length(Sys.glob("~/Dropbox/research/dataverse/archive/*"))
# n_scripts = length(Sys.glob("~/Dropbox/research/dataverse/archive/**/*R"))
package ‘renv’ was built under R version 4.5.2
package ‘ggplot2’ was built under R version 4.5.2
package ‘data.table’ was built under R version 4.5.2
I downloaded r n_scripts R scripts from r n_projects projects hosted by the The Dataverse Project. This notebook reports usage statistics for R packages in this large sample of real-life scientific applications.
To download data from Dataverse, I adapted a script from Trisovic et al. (2022) and wrote original Python code. Then, I used the renv::dependencies() function from the renv package for R (Ushey, 2022) to extract the names of R packages used in each script.1
WARNING: This was a very quick job and I did very little quality control on the data. Please take all this with a grain of salt.
Trisovic, A., Lau, M.K., Pasquier, T. et al. A large-scale study on research code quality and execution. Sci Data 9, 60 (2022). https://doi.org/10.1038/s41597-022-01143-6
Ushey K (2022). renv: Project Environments. R package version 0.16.0, https://rstudio.github.io/renv/.
Number of projects and packages over time
dat = dat
dat
> dataset_id Package date
>
> 1: 46760 car 2015-07-29 11:32:37
> 2: 46890 epiR 2016-03-11 17:54:53
> 3: 46890 irr 2016-03-11 17:54:53
> 4: 46890 austin 2016-03-11 17:54:53
> 5: 46935 MCMCpack 2016-03-11 17:33:12
> ---
> 41818: 6789403 sjstats 2022-12-06 10:41:05
> 41819: 6789403 srvyr 2022-12-06 10:41:05
> 41820: 6789403 tidyverse 2022-12-06 10:41:05
> 41821: 6789403 weights 2022-12-06 10:41:05
> 41822: 6789403 writexl 2022-12-06 10:41:05
> dataset_citation
>
> 1: Simons, Joseph; Mallinson, Daniel J., 2015, ""Replication data for: Party Control and Perverse Effects in Majority-Minority Districting: Replication Challenges When Using DW-NOMINATE"", https://doi.org/10.7910/DVN/28763, Harvard Dataverse, V1, UNF:6:COakdf2t21U/4QgnYTB5cQ== [fileUNF]
> 2: Dolezal, Martin; Ennser-Jedenastik, Laurenz; Müller, Wolfgang C.; Winkler, Anna Katharina, 2016, ""Replication data for: Analyzing Manifestos in their Electoral Context: A New Approach Applied to Austria, 2002â\u0080\u00932008"", https://doi.org/10.7910/DVN/27864, Harvard Dataverse, V1
> 3: Dolezal, Martin; Ennser-Jedenastik, Laurenz; Müller, Wolfgang C.; Winkler, Anna Katharina, 2016, ""Replication data for: Analyzing Manifestos in their Electoral Context: A New Approach Applied to Austria, 2002â\u0080\u00932008"", https://doi.org/10.7910/DVN/27864, Harvard Dataverse, V1
> 4: Dolezal, Martin; Ennser-Jedenastik, Laurenz; Müller, Wolfgang C.; Winkler, Anna Katharina, 2016, ""Replication data for: Analyzing Manifestos in their Electoral Context: A New Approach Applied to Austria, 2002â\u0080\u00932008"", https://doi.org/10.7910/DVN/27864, Harvard Dataverse, V1
> 5: Schwarz, Daniel; Traber, Denise; Benoit, Kenneth, 2016, ""Replication data for: Estimating Intra-Party Preferences: Comparing Speeches to Votes"", https://doi.org/10.7910/DVN/27702, Harvard Dataverse, V1, UNF:6:lzjSYjrMMhScH7QDurVAAw== [fileUNF]
> ---
> 41818: Stefkovics, Adam, 2022, ""Global warming vs. climate change frames. Revisiting framing effects based on new experimental evidence collected in 30 European countries"", https://doi.org/10.7910/DVN/OYWFB9, Harvard Dataverse, V1, UNF:6:q7NQHWV34EwjSer8wMmT+g== [fileUNF]
> 41819: Stefkovics, Adam, 2022, ""Global warming vs. climate change frames. Revisiting framing effects based on new experimental evidence collected in 30 European countries"", https://doi.org/10.7910/DVN/OYWFB9, Harvard Dataverse, V1, UNF:6:q7NQHWV34EwjSer8wMmT+g== [fileUNF]
> 41820: Stefkovics, Adam, 2022, ""Global warming vs. climate change frames. Revisiting framing effects based on new experimental evidence collected in 30 European countries"", https://doi.org/10.7910/DVN/OYWFB9, Harvard Dataverse, V1, UNF:6:q7NQHWV34EwjSer8wMmT+g== [fileUNF]
> 41821: Stefkovics, Adam, 2022, ""Global warming vs. climate change frames. Revisiting framing effects based on new experimental evidence collected in 30 European countries"", https://doi.org/10.7910/DVN/OYWFB9, Harvard Dataverse, V1, UNF:6:q7NQHWV34EwjSer8wMmT+g== [fileUNF]
> 41822: Stefkovics, Adam, 2022, ""Global warming vs. climate change frames. Revisiting framing effects based on new experimental evidence collected in 30 European countries"", https://doi.org/10.7910/DVN/OYWFB9, Harvard Dataverse, V1, UNF:6:q7NQHWV34EwjSer8wMmT+g== [fileUNF]
> month
>
> 1: 2015-07-15
> 2: 2016-03-15
> 3: 2016-03-15
> 4: 2016-03-15
> 5: 2016-03-15
> ---
> 41818: 2022-12-15
> 41819: 2022-12-15
> 41820: 2022-12-15
> 41821: 2022-12-15
> 41822: 2022-12-15
projects = dat
packages = dat
p1 = +
+
p2 = +
+
p1 + p2
'tzone' attributes are inconsistent
Usage statistics for R packages loaded at least twice
# count only one use per project
dat_count = dat
dat_count = dat_count
dat_count = dat_count
dat_count = dat_count
DT::
-
renvwas not able to parse every script. ↩
Loading source...