A rant

April, 2019

If you’re making claims about performance, please illustrate them with compelling examples.

FastR?

R is a wildly popular system for data exploration, analysis and visualization. It partially derives from and extends the S language developed at Bell Labs by Becker and Chambers.¹ Most users run the open-source implementation developed and maintained by the R foundation² (that I call GNU R below). But other implementations exist, importantly including the successor of the original S implementation now owned by TIBCO.³ Indeed, because R (and S and SPLUS before it) is backed by a well-defined and open language specification,⁴ many implementations of the language are available including renjin⁵ and fastr,⁶ and several others.

Despite primarily using GNU R from the R project, I really appreciate alternate implementations of the language. The alternatives expand availability of R across many settings (like running inside Java VMs and within other applications like databases, etc.) as well as pushing the frontiers of the language by adding new features, correcting bugs and other problems, and improving performance.

The FastR project⁷ is particularly impressive. That implementation runs on the Graal VM⁸, making R inter-operable with most JVM and LLVM languages⁹–this basically means that R can share data with code written in other languages without copying. The Graal VM also provides good performance generally, and does a pretty amazing job of maintaining low-level (C/Fortran/C++) R package compatibility. Fastr is worthy of serious consideration, especially if you primarily run JVM-based applications and want them to interact easily with R.

FastR is fast, but…

The FastR project is a technological tour de force (along with GraalVM and related components). Its name reflects perhaps its most advertised feature, performance. FastR can exhibit significantly better performance than GNU R for certain programs, and I do think that this is an important feature. But I’m not sure that the number of those cases is really very large.

Indeed GNU R is often criticized for bad performance and a lot of alternative languages like Julia, or C++ via the elegant Rcpp package, or compilers and VMs like FastR are advertised as superior to R for performance. This is a criticism that I find at least partially unfounded.

If you’re reading this, you’ve probably heard that all you need to do to get good performance with languages like R is to think in “vectorized” terms about your R programs. If you’ve haven’t read Patrick Burns’ “The R Inferno”¹⁰, then you should read that now!

Well, it’s true! At least in part. But the term “vectorization” could convey the impression that you somehow need to think differently or unusually about your program. More often, the so-called “vectorized” approach is the more natural and intuitive approach. Consider, for instance, matrix multiplication. Which approach seems more intuitive?

z = x %*% y

or,

m = nrow(x)
n = ncol(x)
p = ncol(y)
z = matrix(0, m, p)
for(i in 1:m) {
  for(j in 1:n) {
    for(k in 1:p) {
      z[i, k] = z[i, k] + x[i, j] * y[j, k]
    }
  }
}

The matrix multiplication operator syntax is not only more succinct and natural, it’s more abstract: a uniform syntax for efficient computation involving sparse, dense, and otherwise structured matrices.

In spite of this, I often see rather silly advertisements for performance using similar examples. The FastR documentation presents this example of “unparalleled performance”:¹¹

set.seed(1)
x = matrix(runif(1000000), 1000, 1000)

mutual_R = function(joint_dist) {
  joint_dist = joint_dist / sum(joint_dist)
  mutual_information = 0
  num_rows = nrow(joint_dist)
  num_cols = ncol(joint_dist)
  colsums = colSums(joint_dist)
  rowsums = rowSums(joint_dist)
  for(i in seq_along(1:num_rows)) {
    for(j in seq_along(1:num_cols)) {
      temp = log((joint_dist[i,j] / (colsums[j] * rowsums[i])))
      if(!is.finite(temp)){
        temp = 0
      }
      mutual_information = mutual_information + joint_dist[i,j] * temp
    }
  }
  mutual_information
}

For comparison, the FastR example presents a C++ version using Rcpp:

#include <RcppArmadillo.h>
#include <cmath>

//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;

// [[Rcpp::export]]
double mutual_cpp(arma::mat joint_dist){
  joint_dist = joint_dist / sum(sum(joint_dist));
  double mutual_information = 0;
  int num_rows = joint_dist.n_rows;
  int num_cols = joint_dist.n_cols;
  arma::mat colsums = sum(joint_dist, 0);
  arma::mat rowsums = sum(joint_dist, 1);
  for(int i = 0; i < num_rows; ++i){
    for(int j = 0; j <  num_cols; ++j){
      double temp = log((joint_dist(i, j) / (colsums[j] * rowsums[i])));
      if(!std::isfinite(temp)){
        temp = 0;
      }
      mutual_information += joint_dist(i, j) * temp;
    }
  }
  return mutual_information;
}

// [[Rcpp::export]]
List mutual_test(arma::mat joint_dist){
  return List::create(Named("sum") = sum(joint_dist));
}

Yikes!

Observe that this example simply divides the entries of the scaled joint_dist matrix by the outer product of two vectors, defined by the row sums and the column sums of the matrix (literally using the definition of outer product expressed as a for loop). With this observation, the above example can be equivalently written:

mutual_R2 = function(joint_dist) {
  a = joint_dist / sum(joint_dist)
  b = log(a / (rowSums(a) %o% colSums(a)))
  b[is.infinite(b)] = 0
  sum(a * b)
}

Not only is the outer-product code simpler and easier to follow but it’s pretty fast! Let’s compare these versions on my little chromebook laptop:

library(Rcpp)
sourceCpp("r_mutual.cpp")

print(system.time({m1 = mutual_R(x)}))
#   user  system elapsed 
#  1.420   0.008   1.426 
print(system.time({m2 = mutual_R2(x)}))
#   user  system elapsed 
#  0.140   0.000   0.143
print(system.time({m3 = mutual_cpp(x)}))
#   user  system elapsed 
#  0.120   0.004   0.125 

print(m1 - m2)
# [1] 8.049117e-16
print(m1 - m3)
# [1] -2.775558e-17

The outer product approach is about as fast as the Rcpp implementation, and as it turns out also about the same speed as mutual_R() run under FastR.

This is not a rant about FastR.

I don’t want this brief note to come off too harshly, but I see silly examples like the one above semi-often. FastR is truly an impressive achievement. There are certainly cases where either FastR or Rcpp not only come in handy, but are almost essential to get the best performance in R. But examples like the one above just don’t show their possible performance gains in the best light.

A rant

FastR?

FastR is fast, but…

This is not a rant about FastR.

References