Parallel computing with R

I discovered the Snowfall package sometimes ago, by chance, while trying to iteratively extract the area of intersecting polygons. Despite clearing all the workspace variables at the end of each loop, the memory was increasing irremediably until R crashed. Since I was not able to solve the problem directly, I decided to circumvent it by trying to find a way to open a new “R session” (on a new core) at each iteration, execute a function (in my case it was computing the area of the intersection between polygons), and then close the session. This is exactly what the Snowfall package proposes with a lot of function able to execute parallel calculations.

We can for example use the function sfApply, which is a parallel version of apply, to find the ith largest value in each row of a matrix. First, we need to load the package and define the function imax returning the ith maximum value of a vector x.

library(snowfall)

imax=function(x,i){
  if(i>length(x)){
    print("Try again!")
  }else{
    sort(x,decreasing=TRUE)[i]
  }  
}

We can now apply the function imax to each row of a random matrix M. Thanks to snowfall the task can be executed in parallel on several CPUs.

M=matrix(runif(10000),100,100)

sfInit(parallel=TRUE,cpus=3)
    res=sfApply(M,1,imax,i=3)
sfStop()
comments powered by Disqus