Unleashing the power of Apple Silicon for R: Parallel processing on M1/M2

machine learning
Parallel computation is a big deal in machine learning (ML), especially when working with large datasets, complex models (such as deep neural networks), or computationally intensive tasks like hyperparameter tuning. At its core, parallel computation means executing multiple calculations or processes simultaneously. In ML, this can significantly speed up training and inference by leveraging multiple processing units, such as CPU cores, GPUs, TPUs, or computing clusters. In this post, I’ll review some of the most popular methods available in R for running computations in parallel.
Author

Angelo Maria Sabatini

Published

April 22, 2025

Parallel computation is a big deal in machine learning (ML), especially when working with large datasets, complex models (like deep neural networks), or intensive tasks like hyperparameter tuning. Parallel computation refers to executing multiple calculations or processes simultaneously.

In ML, this helps speed up training and inference by utilizing multiple processing units (CPU cores, GPUs, TPUs, clusters). For my (small) projects in ML, I run everything on a single-CPU machine, an M2 MacBook Air, and write my code in R. Calculations can be excruciatingly slow; however, many tasks are also considered embarrassingly parallel. For example, models created during resampling are independent of each other and can be fit simultaneously without issue. Right now I’m running everything sequentially, but I want to switch to parallel processing to speed things up. In this post, I will review popular methods that are available in R to go parallel with MacBooks equipped with M1 and M2 chips.

Apple M1 and M2 Macs: A quick overview

Apple’s M1 and M2 chips are custom-designed ARM-based processors that mark Apple’s move away from Intel. These chips power Macs and iPads, offering significant boosts in performance and efficiency.

M1 Chip (2020):

  • First Apple Silicon chip.
  • 5nm process, 8-core CPU (4 performance + 4 efficiency), up to 8-core GPU.
  • Unified memory architecture (up to 16 GB).
  • High performance per watt; fanless in MacBook Air.
  • Found in MacBook Air, MacBook Pro 13”, Mac Mini, and iMac 24”.

M2 Chip (2022):

  • Second-generation chip.
  • Improved 5nm design, up to 18% faster CPU and 35% faster GPU than M1.
  • Supports up to 24 GB of unified memory.
  • Enhanced media engine and ProRes acceleration.
  • Used in updated MacBook Air, MacBook Pro 13”, and newer Macs.

For computations conducted on a single computer, the number of possible worker processes is determined by the parallel package:

# The number of physical cores in the hardware:
parallel::detectCores(logical = FALSE))

# The number of possible independent processes that can 
# be simultaneously used:  
parallel::detectCores(logical = TRUE)

My MacBook Air has:

When training models, especially with resampling (like cross-validation, bootstrapping), I want consistency and high throughput, which is best achieved on the performance cores. So:

Using all 8 logical cores (4P + 4E) can lead to inconsistent performance, because efficiency cores are slower and not ideal for heavy computational tasks.

Overview

R supports three popular parallel processing methods that work well on M1/M2 MacBooks. Programming hints on how to set up/activate and deactivate/reset each of the three methods are provided in the table below.

Method Setup / Activate Deactivate / Reset
doParallel library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)
stopCluster(cl)
doMC library(doMC)
registerDoMC(cores = 4)
registerDoSEQ()
furrr + future library(furrr)
library(future)
plan(multisession, workers = 4)
plan(sequential)

The table below summarizes the key pros and cons of each method.

Method Pros Cons
doParallel ✅ Cross-platform (Mac/Linux/Windows)
✅ Works with foreach, caret, tidymodels
❌ Verbose setup
❌ Full R sessions increase memory
doMC ✅ Simple & fast on macOS
✅ Lightweight (uses forked processes)
❌ Not Windows-compatible
❌ Not safe for GUI/Shiny apps
furrr + future ✅ Clean tidyverse integration
✅ Works with future_map()
✅ Scales to cloud
❌ More overhead
❌ Full R sessions

Example 1 (The Ames housing dataset) The Ames housing dataset is a well-known real estate dataset used for regression modeling, especially as an improved alternative to the Boston Housing dataset. It contains detailed information about residential homes in Ames, Iowa, sold between 2006 and 2010 (here).

Here’s a clean and minimal setup to load the essential libraries for working with the Ames Housing dataset using a tidymodels regression workflow.

Code
# Data handling and wrangling
library(tidyverse)

# For the Ames Housing dataset
library(modeldata)

# Modeling and resampling
library(tidymodels)
tidymodels_prefer()

The dataset can be loaded easily in R using the modeldata package.

Code
data(ames, package = "modeldata")
ames <- ames %>% mutate(Sale_Price = log10(Sale_Price))

Suppose a regression model is to be fit to the pre-logged sale prices (Sale_Price). In this post, we will focus on a small subset of the predictors available in the Ames housing dataset:

  • The neighborhood-(qualitative): physical locations within Ames city limits
  • The gross above-grade-living area-(continuous): the standard measure for determining the amount of space in residential properties)
  • The year built (Year_Built)-(discrete): original construction date
  • The type of building (Bldg_Type)-(nominal): type of dwelling

I set up my regression modeling using tidymodels, including:

  • Data splitting
  • Recipe for preprocessing
  • Model specification
  • Workflow combining recipe + model

My workflow fits together with v-fold cross-validation in the tidymodels framework. Using the strata = Sale_Price argument in initial_split() or vfold_cv(), I use stratified sampling, which helps ensure that the distribution of the response variable (Sale_Price) remains balanced across splits.

The recipe defines the preprocessing steps applied to the dataset before modeling, via step_*() functions without immediately executing them; it is only a specification of what should be done.

Code
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price) 
ames_train <- training(ames_split)
ames_test  <- testing(ames_split)

set.seed(1004)
ames_folds <- vfold_cv(ames_train, v = 10, strata = Sale_Price) 

ames_rec <- 
  recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type,
         data = ames_train) %>%
  step_log(Gr_Liv_Area, base = 10) %>% 
  step_dummy(all_nominal_predictors())

I fit a random forest model to the training set using the ranger engine, which uses the ranger package for fast computation. Random forests are very powerful — they can learn complex patterns in the data with high accuracy. One big advantage is that they don’t need much preprocessing, so they’re easy to use. The tradeoff is that they can be slower to train, especially on large datasets.

Code
rf_model <- rand_forest(trees = 1000) %>% 
    set_engine("ranger") %>% 
    set_mode("regression")

rf_wflow <- workflow() %>% 
    add_recipe(ames_rec) %>% 
    add_model(rf_model)

The control_resamples() function is used to customize how results are saved during cross-validation. save_pred = TRUE saves the predictions from each fold, so we can analyze or plot them later; save_workflow = TRUE saves the workflow used in each fold, which is helpful if we want to inspect the model or preprocessing steps afterward.

Code
keep_pred <- control_resamples(save_pred = TRUE, save_workflow = TRUE)

Sequential approach

Code
set.seed(1003) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time() 

metrics <- collect_metrics(rf_res) 
performance <- list()
performance[[1]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

duration <- c()
x <- end_time-start_time
duration[1] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using doParallel

Code
library(doParallel) 

set.seed(1003) 

cl <- makeCluster(4) 
registerDoParallel(cl) 

start_time <- Sys.time() 
rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 
end_time <- Sys.time()

stopCluster(cl)

metrics <- collect_metrics(rf_res)  
performance[[2]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[2] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using doMC

Code
set.seed(1003) 

library(doMC) 
registerDoMC(cores = 4) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time()

registerDoSEQ()

metrics <- collect_metrics(rf_res)  
performance[[3]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[3] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Approach using furrr + future

Code
library(furrr) 
library(future) 

set.seed(1003) 

plan(multisession, workers = 4) 

start_time <- Sys.time() 

rf_res <- rf_wflow %>% 
    fit_resamples(resamples = ames_folds, control = keep_pred) 

end_time <- Sys.time()

plan(sequential)

metrics <- collect_metrics(rf_res)  
performance[[4]] <- metrics %>%
    filter(.metric %in% c("rmse", "rsq")) %>%
               pull(mean)

x <- end_time - start_time
duration[4] <- as.numeric(regmatches(x,regexpr("\\d*\\.?\\d+",x)))

Comparative analysis

Training time (in seconds) for each method is shown below.

Timing results
Method Time, s
doMC 6.06
doParallel 7.94
purrr + future 10.26
sequential 13.34

Summary

  • Use doMC for lightweight, fast, Mac-only workflows.
  • Use doParallel if you need Windows compatibility or more control.
  • Use furrr + future for a modern, tidyverse-friendly setup and future scalability.

Parallel icons created by juicy_fish - Flaticon