Unleashing the power of Apple Silicon for R: Parallel processing on M1/M2

machine learning

Parallel computation is a big deal in machine learning (ML), especially when working with large datasets, complex models (such as deep neural networks), or computationally intensive tasks like hyperparameter tuning. At its core, parallel computation means executing multiple calculations or processes simultaneously. In ML, this can significantly speed up training and inference by leveraging multiple processing units, such as CPU cores, GPUs, TPUs, or computing clusters. In this post, I’ll review some of the most popular methods available in R for running computations in parallel.

Author

Angelo Maria Sabatini

Published

April 22, 2025

Parallel computation is a big deal in machine learning (ML), especially when working with large datasets, complex models (like deep neural networks), or intensive tasks like hyperparameter tuning. Parallel computation refers to executing multiple calculations or processes simultaneously.

In ML, this helps speed up training and inference by utilizing multiple processing units (CPU cores, GPUs, TPUs, clusters). For my (small) projects in ML, I run everything on a single-CPU machine, an M2 MacBook Air, and write my code in R. Calculations can be excruciatingly slow; however, many tasks are also considered embarrassingly parallel. For example, models created during resampling are independent of each other and can be fit simultaneously without issue. Right now I’m running everything sequentially, but I want to switch to parallel processing to speed things up. In this post, I will review popular methods that are available in R to go parallel with MacBooks equipped with M1 and M2 chips.

Apple M1 and M2 Macs: A quick overview

Apple’s M1 and M2 chips are custom-designed ARM-based processors that mark Apple’s move away from Intel. These chips power Macs and iPads, offering significant boosts in performance and efficiency.

M1 Chip (2020):

First Apple Silicon chip.
5nm process, 8-core CPU (4 performance + 4 efficiency), up to 8-core GPU.
Unified memory architecture (up to 16 GB).
High performance per watt; fanless in MacBook Air.
Found in MacBook Air, MacBook Pro 13”, Mac Mini, and iMac 24”.

M2 Chip (2022):

Second-generation chip.
Improved 5nm design, up to 18% faster CPU and 35% faster GPU than M1.
Supports up to 24 GB of unified memory.
Enhanced media engine and ProRes acceleration.
Used in updated MacBook Air, MacBook Pro 13”, and newer Macs.

For computations conducted on a single computer, the number of possible worker processes is determined by the parallel package:

# The number of physical cores in the hardware:
parallel::detectCores(logical = FALSE))

# The number of possible independent processes that can 
# be simultaneously used:  
parallel::detectCores(logical = TRUE)

My MacBook Air has:

4 Performance Cores (P-cores)
4 Efficiency Cores (E-cores)

When training models, especially with resampling (like cross-validation, bootstrapping), I want consistency and high throughput, which is best achieved on the performance cores. So:

Choosing 4 cores ensures only the P-cores are used
Avoids overloading the system, leaving E-cores and other system resources free for background tasks or RStudio

Using all 8 logical cores (4P + 4E) can lead to inconsistent performance, because efficiency cores are slower and not ideal for heavy computational tasks.

Overview

R supports three popular parallel processing methods that work well on M1/M2 MacBooks. Programming hints on how to set up/activate and deactivate/reset each of the three methods are provided in the table below.

Method	Setup / Activate	Deactivate / Reset
doParallel	`library(doParallel)` `cl <- makeCluster(4)` `registerDoParallel(cl)`	`stopCluster(cl)`
doMC	`library(doMC)` `registerDoMC(cores = 4)`	`registerDoSEQ()`
furrr + future	`library(furrr)` `library(future)` `plan(multisession, workers = 4)`	`plan(sequential)`

The table below summarizes the key pros and cons of each method.

Method	Pros	Cons
doParallel	✅ Cross-platform (Mac/Linux/Windows) ✅ Works with `foreach`, `caret`, `tidymodels`	❌ Verbose setup ❌ Full R sessions increase memory
doMC	✅ Simple & fast on macOS ✅ Lightweight (uses forked processes)	❌ Not Windows-compatible ❌ Not safe for GUI/Shiny apps
furrr + future	✅ Clean tidyverse integration ✅ Works with `future_map()` ✅ Scales to cloud	❌ More overhead ❌ Full R sessions

Example 1 (The Ames housing dataset) The Ames housing dataset is a well-known real estate dataset used for regression modeling, especially as an improved alternative to the Boston Housing dataset. It contains detailed information about residential homes in Ames, Iowa, sold between 2006 and 2010 (here).

Here’s a clean and minimal setup to load the essential libraries for working with the Ames Housing dataset using a tidymodels regression workflow.

Code

# Data handling and wrangling
library(tidyverse)

# For the Ames Housing dataset
library(modeldata)

# Modeling and resampling
library(tidymodels)
tidymodels_prefer()

The dataset can be loaded easily in R using the modeldata package.

Code

data(ames, package = "modeldata")
ames <- ames %>% mutate(Sale_Price = log10(Sale_Price))

Suppose a regression model is to be fit to the pre-logged sale prices (Sale_Price). In this post, we will focus on a small subset of the predictors available in the Ames housing dataset:

The neighborhood-(qualitative): physical locations within Ames city limits
The gross above-grade-living area-(continuous): the standard measure for determining the amount of space in residential properties)
The year built (Year_Built)-(discrete): original construction date
The type of building (Bldg_Type)-(nominal): type of dwelling

I set up my regression modeling using tidymodels, including:

Data splitting
Recipe for preprocessing
Model specification
Workflow combining recipe + model

My workflow fits together with v-fold cross-validation in the tidymodels framework. Using the strata = Sale_Price argument in initial_split() or vfold_cv(), I use stratified sampling, which helps ensure that the distribution of the response variable (Sale_Price) remains balanced across splits.

The recipe defines the preprocessing steps applied to the dataset before modeling, via step_*() functions without immediately executing them; it is only a specification of what should be done.

Code

set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price) 
ames_train <- training(ames_split)
ames_test  <- testing(ames_split)

set.seed(1004)
ames_folds <- vfold_cv(ames_train, v = 10, strata = Sale_Price) 

ames_rec <- 
  recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type,
         data = ames_train) %>%
  step_log(Gr_Liv_Area, base = 10) %>% 
  step_dummy(all_nominal_predictors())

I fit a random forest model to the training set using the ranger engine, which uses the ranger package for fast computation. Random forests are very powerful — they can learn complex patterns in the data with high accuracy. One big advantage is that they don’t need much preprocessing, so they’re easy to use. The tradeoff is that they can be slower to train, especially on large datasets.

Code

rf_model <- rand_forest(trees = 1000) %>% 
    set_engine("ranger") %>% 
    set_mode("regression")

rf_wflow <- workflow() %>% 
    add_recipe(ames_rec) %>% 
    add_model(rf_model)

The control_resamples() function is used to customize how results are saved during cross-validation. save_pred = TRUE saves the predictions from each fold, so we can analyze or plot them later; save_workflow = TRUE saves the workflow used in each fold, which is helpful if we want to inspect the model or preprocessing steps afterward.

Code

keep_pred <- control_resamples(save_pred = TRUE, save_workflow = TRUE)

Sequential approach

Code

set.seed(1003) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time() 

metrics <- collect_metrics(rf_res) 
performance <- list()
performance[[1]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

duration <- c()
x <- end_time-start_time
duration[1] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using doParallel

Code

library(doParallel) 

set.seed(1003) 

cl <- makeCluster(4) 
registerDoParallel(cl) 

start_time <- Sys.time() 
rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 
end_time <- Sys.time()

stopCluster(cl)

metrics <- collect_metrics(rf_res)  
performance[[2]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[2] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using doMC

Code

set.seed(1003) 

library(doMC) 
registerDoMC(cores = 4) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time()

registerDoSEQ()

metrics <- collect_metrics(rf_res)  
performance[[3]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[3] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using furrr + future

Code

library(furrr) 
library(future) 

set.seed(1003) 

plan(multisession, workers = 4) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time()

plan(sequential)

metrics <- collect_metrics(rf_res)  
performance[[4]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[4] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Comparative analysis

Training time (in seconds) for each method is shown below.

Timing results
Method	Time, s
doMC	6.06
doParallel	7.94
purrr + future	10.26
sequential	13.34

Recommended core use on M1/M2 Macs

Use 4 cores to target the performance cores only. This choice offers a good balance between performance and system responsiveness.

Summary

Use doMC for lightweight, fast, Mac-only workflows.
Use doParallel if you need Windows compatibility or more control.
Use furrr + future for a modern, tidyverse-friendly setup and future scalability.

Parallel icons created by juicy_fish - Flaticon