Consider you have a server that is running behind a firewall and, for security reasons, cannot make external http(s) requests. Further, you have R running on this server and you need to install a set of packages. The simple approach of
install.packages("<pkg-name>", repo = "<favorite cran mirror>")
is not an option since you will have no access to the CRAN repository.
Another option would be to download the source files (.tar.gz files) form CRAN or BioConductor, transfer those files to the sever via FTP, and then install the packages via
install.packages("<path to pkg-name_version.tar.gz", type = "source")
This approach will work well, with one big exception, the dependencies of the package may not be on the server. How do you get the source files for all the dependencies of your package? What about the dependencies of the dependencies, and the dependencies of the dependencies of the dependencies? Simply, how do you install R packages on a machine that is not allowed to make external http(s) requests?
Here is how I approached this problem. On my local machine, a machine with internet access, I ran a script (a script that will be shown and explained in detail below) which will download all the dependencies and dependencies of dependencies, etc., from both CRAN and BioConductor, and generate a makefile
to install the packages in the correct order, i.e., in an order such that the dependencies are met.
When the script finished, the source files and the makefile
can be transfered to the server without external http(s) request authority. Running the makefile
will install the packages, and is an easy way to track and report install errors.
We need to define what constitutes a dependency. In a package DESCRIPTION
file packages listed under the field Depends
, Imports
, and LinkingTo
are what we will consider dependencies. Suggests
and Enhances
are omitted as they are not needed for the package to work.
Build Dependencies
An R script build-dep-list.R
has been written and is expected to be evaluated from the command line via
--vanilla build-dep-list.R [pkg1] [pkg2] [...] [pkgn] Rscript
Where pkg1
is the name of the first known package to download, pkg2
the second known package to download, …, and pkgn
the nth package to download. The script will download all the dependencies for pkg1
, ...
, pkgn
, and the dependencies of the dependencies, and so on. The script will also generate a makefile
to help with the installation of the packages, aiming to get the order of the installs so that the install of pkg1
, ...
, pkgn
will not error.
The full script can be found on my github page. The script will be broken up into pieces here with additional detail and explanation.
When I develop scripts that I expect to evaluated in the terminal, I will start the script with a check of interactive()
. If in an interactive session we’ll have set variables to values needed for testing and development, and if not in an interactive session we’ll use the command line arguments to define the value of the variables. This could also be edited so that the expected evaluation would be done in an interactive session. For then work we will have the character
vector OUR_PACKAGES
to store the names of the packages we want/need to install.
if (interactive()) {
<- c("graph", "gRbase", "gRain", "jsonlite", "plotly", "SHELF",
OUR_PACKAGES "rjson", "svglite", "magrittr")
else {
} <- commandArgs(trailingOnly = TRUE)
OUR_PACKAGES }
We also need to define the repositories which we will query for the packages. We’ll use RStudio’s CRAN mirror and the repository for BioConductor.
# Repositories to look for packages
<- "https://cran.rstudio.com/"
CRAN <- "https://bioconductor.org/packages/release/bioc/" BIOC
Now, let’s look into the packages. Packages are classified into three priority classes, “base”, “recommended”, and “NA”. The “base” packages are standard an R installation, and the ‘recommended’ are in any standard installation of R. All other packages have Priority == NA
.
<- utils::installed.packages()
ipkgs "Priority"] %in% "base", "Package"]
ipkgs[ipkgs[, ## base compiler datasets graphics grDevices grid
## "base" "compiler" "datasets" "graphics" "grDevices" "grid"
## methods parallel splines stats stats4 tcltk
## "methods" "parallel" "splines" "stats" "stats4" "tcltk"
## tools utils
## "tools" "utils"
"Priority"] %in% "recommended", "Package"]
ipkgs[ipkgs[, ## boot class cluster codetools foreign
## "boot" "class" "cluster" "codetools" "foreign"
## KernSmooth lattice MASS Matrix mgcv
## "KernSmooth" "lattice" "MASS" "Matrix" "mgcv"
## nnet rpart spatial survival
## "nnet" "rpart" "spatial" "survival"
Some packages will have dependencies on the “base” and/or “recommended” packages. We will need to know these packages and omit them form the packages we will need to download and install.
<-
base_pkgs unname(utils::installed.packages()[utils::installed.packages()[, "Priority"] %in% c("base", "recommended"), "Package"])
Next step, get a list of the available packages from CRAN and BioConductor. The return from available.packages
is a matrix with all the information we will need about the packages.
<- available.packages(repos = c(CRAN, BIOC))
available_pkgs str(available_pkgs)
## chr [1:13659, 1:17] "A3" "abbyyR" "abc" "abc.data" "ABC.RAP" ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:13659] "A3" "abbyyR" "abc" "abc.data" ...
## ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ...
"Package"] %in% OUR_PACKAGES,
available_pkgs[available_pkgs[, c("Package", "Version", "Depends", "Imports", "LinkingTo",
"Repository")]
## Package Version
## gRain "gRain" "1.3-0"
## gRbase "gRbase" "1.8-3"
## jsonlite "jsonlite" "1.5"
## magrittr "magrittr" "1.5"
## plotly "plotly" "4.7.1"
## rjson "rjson" "0.2.15"
## SHELF "SHELF" "1.3.0"
## svglite "svglite" "1.2.1"
## graph "graph" "1.56.0"
## Depends
## gRain "R (>= 3.0.2), methods, gRbase (>= 1.7-2)"
## gRbase "R (>= 3.0.2), methods"
## jsonlite "methods"
## magrittr NA
## plotly "R (>= 3.2.0), ggplot2 (>= 2.2.1)"
## rjson "R (>= 3.1.0)"
## SHELF "R (>= 3.3.1)"
## svglite "R (>= 3.0.0)"
## graph "R (>= 2.10), methods, BiocGenerics (>= 0.13.11)"
## Imports
## gRain "igraph, graph, magrittr, functional, Rcpp (>= 0.11.1)"
## gRbase "graph, igraph, magrittr, Matrix, RBGL, Rcpp (>= 0.11.1)"
## jsonlite NA
## magrittr NA
## plotly "tools, scales, httr, jsonlite, magrittr, digest, viridisLite,\nbase64enc, htmltools, htmlwidgets (>= 0.9), tidyr, hexbin,\nRColorBrewer, dplyr, tibble, lazyeval (>= 0.2.0), crosstalk,\npurrr, data.table"
## rjson NA
## SHELF "ggplot2, grid, shiny, stats, graphics, tidyr, MASS, ggExtra"
## svglite "Rcpp, gdtools (>= 0.1.6)"
## graph "stats, stats4, utils"
## LinkingTo
## gRain "Rcpp (>= 0.11.1), RcppArmadillo, RcppEigen, gRbase (>=\n1.8-0)"
## gRbase "Rcpp (>= 0.11.1), RcppArmadillo, RcppEigen"
## jsonlite NA
## magrittr NA
## plotly NA
## rjson NA
## SHELF NA
## svglite "Rcpp, gdtools, BH"
## graph NA
## Repository
## gRain "https://cran.rstudio.com/src/contrib"
## gRbase "https://cran.rstudio.com/src/contrib"
## jsonlite "https://cran.rstudio.com/src/contrib"
## magrittr "https://cran.rstudio.com/src/contrib"
## plotly "https://cran.rstudio.com/src/contrib"
## rjson "https://cran.rstudio.com/src/contrib"
## SHELF "https://cran.rstudio.com/src/contrib"
## svglite "https://cran.rstudio.com/src/contrib"
## graph "https://bioconductor.org/packages/release/bioc/src/contrib"
In this example we see that the packages listed in OUR_PACKAGES
except the graph
package can be downloaded from CRAN. graph
and at least one dependencies, BiocGenerics
will need to be downloaded from BioConductor.
The next step in building the list of dependencies and a script for installing them is done in the following while
loop. We start with a character vector pkgs_to_download
which is initially equivalent to OUR_PACKAGES
. We will iterate through this vector, appending the dependencies in order. Use the tools::package_dependencies
function to generate a list of the packages dependencies, and dependencies of dependencies, and so on.
In the while
loop we get a list of the dependencies for a package, stored in the deps
object. We will omit any of the base and recommended packages from the deps
object and then append deps
to the pkgs_to_download
vector in the position immediately to the right of the current package being looked up. When the indexer i
is incremented, the next package to be considered will be the first dependency. This process continues until all the dependencies have been explored. Lastly, we reverse the order of the elements of pkgs_to_download
so that we have the packages listed in a useful install order, i.e., pkgs_to_download[1]
should be installed before pkgs_to_download[2]
, etc. After reversing the order of the elements of pkgs_to_download
we look only at the unique elements. By default, the first occurrence of an element will be keep and the repeated elements will be omitted. By reversing the order then taking the unique values, the deepest level of dependency will be retained for a specific package.
<- OUR_PACKAGES
pkgs_to_download <- 1L
i while(i <= length(pkgs_to_download)) {
<-
deps unlist(tools::package_dependencies(packages = pkgs_to_download[i],
which = c("Depends", "Imports", "LinkingTo"),
db = available_pkgs,
recursive = FALSE),
use.names = FALSE)
<- deps[!(deps %in% base_pkgs)]
deps <- append(pkgs_to_download, deps, i)
pkgs_to_download <- i + 1L
i
}<- unique(rev(pkgs_to_download)) pkgs_to_download
If you are having a difficult time envisioning what the above does, let’s look at and example for the dplyr
package. In this example we’ll print out the list of dependencies at each step through the while loop. Note that packages such as Rcpp
will be assessed multiple times, but the final list will only have Rcpp
listed once.
<- "dplyr"
dplyr_dependencies <- 1L
i while(i <= length(dplyr_dependencies)) {
cat("\ni =", i, "\nLooking up dependencies for", dplyr_dependencies[i], "\n")
<-
deps unlist(tools::package_dependencies(packages = dplyr_dependencies[i],
which = c("Depends", "Imports", "LinkingTo"),
db = available_pkgs,
recursive = FALSE),
use.names = FALSE)
<- deps[!(deps %in% base_pkgs)]
deps <- append(dplyr_dependencies, deps, i)
dplyr_dependencies
cat(dplyr_dependencies[i], "dependencies:", paste(deps, collapse = ", "),
"\ndplyr_dependencies =", paste(dplyr_dependencies, collapse = ", "), "\n")
<- i + 1L
i
}##
## i = 1
## Looking up dependencies for dplyr
## dplyr dependencies: assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
## dplyr_dependencies = dplyr, assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 2
## Looking up dependencies for assertthat
## assertthat dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 3
## Looking up dependencies for bindrcpp
## bindrcpp dependencies: Rcpp, bindr, plogr
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 4
## Looking up dependencies for Rcpp
## Rcpp dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 5
## Looking up dependencies for bindr
## bindr dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 6
## Looking up dependencies for plogr
## plogr dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 7
## Looking up dependencies for glue
## glue dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 8
## Looking up dependencies for magrittr
## magrittr dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 9
## Looking up dependencies for pkgconfig
## pkgconfig dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 10
## Looking up dependencies for rlang
## rlang dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 11
## Looking up dependencies for R6
## R6 dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 12
## Looking up dependencies for Rcpp
## Rcpp dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr
##
## i = 13
## Looking up dependencies for tibble
## tibble dependencies: cli, crayon, pillar, rlang
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, crayon, pillar, rlang, BH, plogr
##
## i = 14
## Looking up dependencies for cli
## cli dependencies: assertthat, crayon
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr
##
## i = 15
## Looking up dependencies for assertthat
## assertthat dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr
##
## i = 16
## Looking up dependencies for crayon
## crayon dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr
##
## i = 17
## Looking up dependencies for crayon
## crayon dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr
##
## i = 18
## Looking up dependencies for pillar
## pillar dependencies: cli, crayon, rlang, utf8
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 19
## Looking up dependencies for cli
## cli dependencies: assertthat, crayon
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 20
## Looking up dependencies for assertthat
## assertthat dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 21
## Looking up dependencies for crayon
## crayon dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 22
## Looking up dependencies for crayon
## crayon dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 23
## Looking up dependencies for rlang
## rlang dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 24
## Looking up dependencies for utf8
## utf8 dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 25
## Looking up dependencies for rlang
## rlang dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 26
## Looking up dependencies for BH
## BH dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
##
## i = 27
## Looking up dependencies for plogr
## plogr dependencies:
## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr
<- unique(rev(dplyr_dependencies))
dplyr_dependencies
dplyr_dependencies## [1] "plogr" "BH" "rlang" "utf8" "crayon"
## [6] "assertthat" "cli" "pillar" "tibble" "Rcpp"
## [11] "R6" "pkgconfig" "magrittr" "glue" "bindr"
## [16] "bindrcpp" "dplyr"
Now that we have pkgs_to_download
, a character vector of package names that we need to download, we can use the download.packages
function to do so. The object dwnld_pkgs
is a 2 column matrix with the name and file path to the source file for each package.
# Download the needed packages into the pkg-source-files directory
unlink("pkg-source-files/*")
dir.create("pkg-source-files/", showWarnings = FALSE)
<-
dwnld_pkgs download.packages(pkgs = pkgs_to_download,
destdir = "pkg-source-files",
repos = c(CRAN, BIOC),
type = "source")
head(dwnld_pkgs)
The last step for the script to run on a machine with external http(s) request authority, is to build a makefile
to install all the needed packages. I prefer the makefile
over a bash script because of the default error handling that a make
provided compared to a bash script.
cat("all:\n",
paste0("\tR CMD INSTALL ", dwnld_pkgs[, 2], "\n"),
sep = "",
file = "makefile")
For this example, the first several lines of the makefile
are:
all:
R CMD INSTALL pkg-source-files/magrittr_1.5.tar.gz
R CMD INSTALL pkg-source-files/BH_1.66.0-1.tar.gz
R CMD INSTALL pkg-source-files/withr_2.1.1.tar.gz
R CMD INSTALL pkg-source-files/Rcpp_0.12.15.tar.gz
R CMD INSTALL pkg-source-files/gdtools_0.1.6.tar.gz
R CMD INSTALL pkg-source-files/svglite_1.2.1.tar.gz
Note that magrittr is the last package in the OUR_PACKAGES
object and has no dependencies, thus is the first package installed. The svglite package is the second to last package in OUR_PACKAGES
and it will be installed after the dependencies BH, withr, Rcpp, and gdtools are installed.
Installing on the Remote Machine
Now that the source files have been downloaded and the makefile
generated, move the pkg-source-files
directory and the makefile
to the remote machine and run the makefile. If the makefile fails, there might be some system dependencies that need to be updated.
Download the script and/or contribute
The build-dependency-list.R
file can be found on my github page.