# The number of physical cores in the hardware:
parallel::detectCores(logical = FALSE))
# The number of possible independent processes that can
# be simultaneously used:
parallel::detectCores(logical = TRUE)Unleashing the power of Apple Silicon for R: Parallel processing on M1/M2
Parallel computation is a big deal in machine learning (ML), especially when working with large datasets, complex models (like deep neural networks), or intensive tasks like hyperparameter tuning. Parallel computation refers to executing multiple calculations or processes simultaneously.
In ML, this helps speed up training and inference by utilizing multiple processing units (CPU cores, GPUs, TPUs, clusters). For my (small) projects in ML, I run everything on a single-CPU machine, an M2 MacBook Air, and write my code in R. Calculations can be excruciatingly slow; however, many tasks are also considered embarrassingly parallel. For example, models created during resampling are independent of each other and can be fit simultaneously without issue. Right now I’m running everything sequentially, but I want to switch to parallel processing to speed things up. In this post, I will review popular methods that are available in R to go parallel with MacBooks equipped with M1 and M2 chips.
Apple’s M1 and M2 chips are custom-designed ARM-based processors that mark Apple’s move away from Intel. These chips power Macs and iPads, offering significant boosts in performance and efficiency.
M1 Chip (2020):
- First Apple Silicon chip.
- 5nm process, 8-core CPU (4 performance + 4 efficiency), up to 8-core GPU.
- Unified memory architecture (up to 16 GB).
- High performance per watt; fanless in MacBook Air.
- Found in MacBook Air, MacBook Pro 13”, Mac Mini, and iMac 24”.
M2 Chip (2022):
- Second-generation chip.
- Improved 5nm design, up to 18% faster CPU and 35% faster GPU than M1.
- Supports up to 24 GB of unified memory.
- Enhanced media engine and ProRes acceleration.
- Used in updated MacBook Air, MacBook Pro 13”, and newer Macs.
For computations conducted on a single computer, the number of possible worker processes is determined by the parallel package:
My MacBook Air has:
- 4 Performance Cores (P-cores)
- 4 Efficiency Cores (E-cores)
When training models, especially with resampling (like cross-validation, bootstrapping), I want consistency and high throughput, which is best achieved on the performance cores. So:
- Choosing 4 cores ensures only the P-cores are used
- Avoids overloading the system, leaving E-cores and other system resources free for background tasks or RStudio
Using all 8 logical cores (4P + 4E) can lead to inconsistent performance, because efficiency cores are slower and not ideal for heavy computational tasks.
Overview
R supports three popular parallel processing methods that work well on M1/M2 MacBooks. Programming hints on how to set up/activate and deactivate/reset each of the three methods are provided in the table below.
| Method | Setup / Activate | Deactivate / Reset |
|---|---|---|
| doParallel | library(doParallel) cl <- makeCluster(4) registerDoParallel(cl) |
stopCluster(cl) |
| doMC | library(doMC) registerDoMC(cores = 4) |
registerDoSEQ() |
| furrr + future | library(furrr) library(future) plan(multisession, workers = 4) |
plan(sequential) |
The table below summarizes the key pros and cons of each method.
| Method | Pros | Cons |
|---|---|---|
| doParallel | ✅ Cross-platform (Mac/Linux/Windows) ✅ Works with foreach, caret, tidymodels |
❌ Verbose setup ❌ Full R sessions increase memory |
| doMC | ✅ Simple & fast on macOS ✅ Lightweight (uses forked processes) |
❌ Not Windows-compatible ❌ Not safe for GUI/Shiny apps |
| furrr + future | ✅ Clean tidyverse integration ✅ Works with future_map() ✅ Scales to cloud |
❌ More overhead ❌ Full R sessions |
Example 1 (The Ames housing dataset) The Ames housing dataset is a well-known real estate dataset used for regression modeling, especially as an improved alternative to the Boston Housing dataset. It contains detailed information about residential homes in Ames, Iowa, sold between 2006 and 2010 (here).
Here’s a clean and minimal setup to load the essential libraries for working with the Ames Housing dataset using a tidymodels regression workflow.
Code
# Data handling and wrangling
library(tidyverse)
# For the Ames Housing dataset
library(modeldata)
# Modeling and resampling
library(tidymodels)
tidymodels_prefer()The dataset can be loaded easily in R using the modeldata package.
Code
data(ames, package = "modeldata")
ames <- ames %>% mutate(Sale_Price = log10(Sale_Price))Suppose a regression model is to be fit to the pre-logged sale prices (Sale_Price). In this post, we will focus on a small subset of the predictors available in the Ames housing dataset:
- The neighborhood-(qualitative): physical locations within Ames city limits
- The gross above-grade-living area-(continuous): the standard measure for determining the amount of space in residential properties)
- The year built (Year_Built)-(discrete): original construction date
- The type of building (Bldg_Type)-(nominal): type of dwelling
I set up my regression modeling using tidymodels, including:
- Data splitting
- Recipe for preprocessing
- Model specification
- Workflow combining recipe + model
My workflow fits together with v-fold cross-validation in the tidymodels framework. Using the strata = Sale_Price argument in initial_split() or vfold_cv(), I use stratified sampling, which helps ensure that the distribution of the response variable (Sale_Price) remains balanced across splits.
The recipe defines the preprocessing steps applied to the dataset before modeling, via step_*() functions without immediately executing them; it is only a specification of what should be done.
Code
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
set.seed(1004)
ames_folds <- vfold_cv(ames_train, v = 10, strata = Sale_Price)
ames_rec <-
recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type,
data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_dummy(all_nominal_predictors())I fit a random forest model to the training set using the ranger engine, which uses the ranger package for fast computation. Random forests are very powerful — they can learn complex patterns in the data with high accuracy. One big advantage is that they don’t need much preprocessing, so they’re easy to use. The tradeoff is that they can be slower to train, especially on large datasets.
Code
rf_model <- rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("regression")
rf_wflow <- workflow() %>%
add_recipe(ames_rec) %>%
add_model(rf_model)The control_resamples() function is used to customize how results are saved during cross-validation. save_pred = TRUE saves the predictions from each fold, so we can analyze or plot them later; save_workflow = TRUE saves the workflow used in each fold, which is helpful if we want to inspect the model or preprocessing steps afterward.
Code
keep_pred <- control_resamples(save_pred = TRUE, save_workflow = TRUE)Sequential approach
Code
set.seed(1003)
start_time <- Sys.time()
rf_res <- rf_wflow %>%
fit_resamples(resamples = ames_folds, control = keep_pred)
end_time <- Sys.time()
metrics <- collect_metrics(rf_res)
performance <- list()
performance[[1]] <- metrics %>%
filter(.metric %in% c("rmse", "rsq")) %>%
pull(mean)
duration <- c()
x <- end_time-start_time
duration[1] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))Approach using doParallel
Code
library(doParallel)
set.seed(1003)
cl <- makeCluster(4)
registerDoParallel(cl)
start_time <- Sys.time()
rf_res <- rf_wflow %>%
fit_resamples(resamples = ames_folds, control = keep_pred)
end_time <- Sys.time()
stopCluster(cl)
metrics <- collect_metrics(rf_res)
performance[[2]] <- metrics %>%
filter(.metric %in% c("rmse", "rsq")) %>%
pull(mean)
x <- end_time - start_time
duration[2] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))Approach using doMC
Code
set.seed(1003)
library(doMC)
registerDoMC(cores = 4)
start_time <- Sys.time()
rf_res <- rf_wflow %>%
fit_resamples(resamples = ames_folds, control = keep_pred)
end_time <- Sys.time()
registerDoSEQ()
metrics <- collect_metrics(rf_res)
performance[[3]] <- metrics %>%
filter(.metric %in% c("rmse", "rsq")) %>%
pull(mean)
x <- end_time - start_time
duration[3] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))Approach using furrr + future
Code
library(furrr)
library(future)
set.seed(1003)
plan(multisession, workers = 4)
start_time <- Sys.time()
rf_res <- rf_wflow %>%
fit_resamples(resamples = ames_folds, control = keep_pred)
end_time <- Sys.time()
plan(sequential)
metrics <- collect_metrics(rf_res)
performance[[4]] <- metrics %>%
filter(.metric %in% c("rmse", "rsq")) %>%
pull(mean)
x <- end_time - start_time
duration[4] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))Comparative analysis
Training time (in seconds) for each method is shown below.
| Method | Time, s |
|---|---|
| doMC | 6.06 |
| doParallel | 7.94 |
| purrr + future | 10.26 |
| sequential | 13.34 |
Recommended core use on M1/M2 Macs
Use 4 cores to target the performance cores only. This choice offers a good balance between performance and system responsiveness.
Summary
- Use
doMCfor lightweight, fast, Mac-only workflows. - Use
doParallelif you need Windows compatibility or more control. - Use
furrr + futurefor a modern, tidyverse-friendly setup and future scalability.