doFuture: An Overview on using Foreach to Parallelize via the Future Framework
Henrik Bengtsson
Source:vignettes/doFuture-1-overview.md.rsp
doFuture-1-overview.RmdTL;DR
To run foreach() in parallel, install R packages
doFuture
and futurize, and
call:
library(futurize)
plan(multisession)
y <- foreach(x = 1:4, y = 1:10) %do% {
z <- x + y
slow_sqrt(z)
} |> futurize()That’s it - easy!
Introduction
The foreach
package implements a map-reduce API with functions
foreach() and times() that provide us with
powerful methods for iterating over one or more sets of elements with
options to do it in parallel.
The future package provides a generic API for using futures in R. A future is a simple yet powerful mechanism to evaluate an R expression and retrieve its value at some point in time. Futures can be resolved in many different ways depending on which strategy is used. You can resolve them sequential, in parallel on your local computer, on remove computers, in the cloud, on a high-performance compute (HPC) cluster, or via any future backend available.
The doFuture package provides a bridge between foreach and the future parallelization framework. Specifically, the doFuture package provides three alternatives for using futures with foreach:
y <- foreach(...) %do% { ... } |> futurize()y <- foreach(...) %dofuture% { ... }registerDoFuture()+y <- foreach(...) %dopar% { ... }.
Alternative 1: futurize() (recommended)
The first alternative (recommended) uses
futurize() of the futurize package.
An example is:
library(futurize)
plan(multisession)
y <- foreach(x = 1:4, y = 1:10) %do% {
z <- x + y
slow_sqrt(z)
} |> futurize()This alternative is the recommended and most clean way to let
foreach() parallelize via the future framework, especially
if you start out from scratch. All you need to remember is to pipe it to
futurize(), and, yes, it is correct to use
%do% here. In addition to multisession,
parallelization can be done via any compliant future backend.
Identification of globals, random number generation (RNG), and error
handling is handled the same way as elsewhere in the future ecosystem.
We recommend to use futurize(), because it is consistent
with how we parallelize lapply() and
purrr::map() using futurize. With
futurize(), you do not have to explicitly load
doFuture - instead doFuture will serve
futurize() under the hood.
See help("futurize", package = "futurize") for more
details and examples on this approach.
Alternative 2: %dofuture%
The second alternative (formely recommended), which uses
%dofuture%, avoids having to use
registerDoFuture(). The %dofuture% operator
provides a more consistent behavior than %dopar%,
e.g. there is a unique set of foreach arguments instead of one per
possible adapter. An example is:
library(doFuture)
plan(multisession)
y <- foreach(x = 1:4, y = 1:10) %dofuture% {
z <- x + y
slow_sqrt(z)
}This alternative was the recommended way to let
foreach() parallelize via the future framework, but now we
recommend using futurize() instead, especially if you start
out from scratch.
See help("%dofuture%", package = "doFuture") for more
details and examples on this approach.
Alternative 3: registerDoFuture() +
%dopar%
The third alternative is based on the traditional
foreach approach where one registers a foreach adapter
to be used by %dopar%. A popular adapter is
doParallel::registerDoParallel(), which parallelizes on the
local machine using the parallel package. This package
provides registerDoFuture(), which parallelizes using the
future package, meaning any future-compliant parallel
backend can be used.
An example is:
library(doFuture)
registerDoFuture()
plan(multisession)
y <- foreach(x = 1:4, y = 1:10) %dopar% {
z <- x + y
slow_sqrt(z)
}This alternative is useful if you already have a lot of R code that
uses %dopar% and you just want to switch to using the
future framework for parallelization. Using
registerDoFuture() is also useful when you wish to use the
future framework with packages and functions that use
foreach() and %dopar% internally, but still do
not support futurize(), e.g. NMF.
See help("registerDoFuture", package = "doFuture") for
more details and examples on this approach.