A brief note on visualizing confusion matrices

I sometimes find myself creating and inspecting confusion matrices. A confusion matrix simply shows the tally of predicted class (along rows, say) versus actual class (along columns) for data observations in a test set. Such matrices can arise, for instance, when evaluating the prediction of levels of multiclass data. A diagonal matrix (all off-diagonal entries zero) is a visually-appealing way to represent a perfect prediction of 1:1 correspondence between predicted and actual classes.

Especially in situations with lots of classes, a confusion matrix that concentrates its largest values along each row near the diagonal makes it easier to quickly see how much the matrix deviates from a diagonal matrix–that is how far off it is from a perfect prediction. This is particularly true for heat map-like visualizations of confusion matrices.

R’s widely-used and wonderful caret package (https://cran.r-project.org/package=caret) includes a confusionMatrix function. Despite the nice supplementary statistics it provides, it doesn’t permute the confusion matrix in a nice way. (It also requires that the prediction and actual data are uniformly defined factor variables, which I find a bit irritating).

This note defines a simple function that uses a version of Kuhn’s famous Hungarian algorithm (https://en.wikipedia.org/wiki/Hungarian_algorithm), provided by Kurt Hornik’s excellent clue package (https://cran.r-project.org/package=clue), to solve for a permutation for making nice confusion matrices.

For simplicity, I’ll illustrate the idea with the tiny built-in iris dataset which has only three classes. But this method is geared for visualizing larger confusion matrices with lots of classes.

Here is the permutation function (inspired by the R mailing list reference described in the function documentation below):

require(clue)

## Loading required package: clue

#' Find a (row) permutation of A that minimizes `norm(A[p,]-B, 'F')`.
#' @param A a matrix
#' @param B (optional) defaults to a diagonal matrix conformable with A
#' @return A permutation vector for the rows of A.
#' @note
#' If A is not square some attempt is made to produce a reasonable permutation
#' (see code). Uses the linear-sum assignment problem (LSAP) solver in the
#' "clue" package. This routine is useful for permuting confusion matrices.
#' See  http://r.789695.n4.nabble.com/reordering-of-matrix-rows-to-maximize-the-sum-of-the-diagonal-tt2062867.html#a2065679
perm_min <- function(A, B) {
  N <- nrow(A)
  if(ncol(A) < nrow(A)) {
    A <- cbind(A, matrix(0, nrow(A), nrow(A) - ncol(A)))
  } 
  if(nrow(A) < ncol(A)) {
    A <- rbind(A, matrix(0, ncol(A) - nrow(A), ncol(A)))
  } 
  if(missing(B)) B <- diag(1, nrow(A))
  n <- nrow(A)
  D <- matrix(NA, n, n)
  for (i in 1:n) {
      for (j in 1:n) {
      D[j, i] <- (sum((B[j, ] - A[i, ])^2))
      }
  }
  i <- c(solve_LSAP(D))
  i[i <= N]
}

And here is a very simple unsupervised clustering/classification example. It applies principal components to the first four columns of iris, and then clusters them with kmeans to predict classes. For comparison, the actual classes are defined in the fifth column of the iris data frame called ‘Species.’

set.seed(1)
projection <- svd(scale(iris[,1:4], center=TRUE))$u[, 1:2]   # 2-d PCA
prediction <- kmeans(projection, centers=3)[["cluster"]]

# Now compare kmeans clusters with true classes:
(confusion <- as.matrix(table(prediction, iris[["Species"]])))

##           
## prediction setosa versicolor virginica
##          1      1         37        14
##          2      0         13        36
##          3     49          0         0

Note that the largest entries of the confusion matrix are not on the diagonal :( so, let’s permute to try to balance the matrix…

i <- perm_min(confusion)
confusion[i,]

##           
## prediction setosa versicolor virginica
##          3     49          0         0
##          1      1         37        14
##          2      0         13        36

Much nicer! See how the larger elements of each row/column are permuted to near the diagonal?

Caret’s confusionMatrix function does not nicely permute the matrix and, somewhat irritatingly, requires that the prediction and actual values be of a uniformly-defined factor type…

library(caret)

## Loading required package: ggplot2

## Loading required package: lattice

tryCatch(
  confusionMatrix(prediction, iris[["Species"]]),
error = print)

## <simpleError: `data` and `reference` should be factors with the same levels.>

arrrgh…

confusionMatrix(factor(prediction), factor(as.integer(iris[["Species"]])))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2  3
##          1  1 37 14
##          2  0 13 36
##          3 49  0  0
## 
## Overall Statistics
##                                          
##                Accuracy : 0.0933         
##                  95% CI : (0.052, 0.1516)
##     No Information Rate : 0.3333         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : -0.36          
##                                          
##  Mcnemar's Test P-Value : <2e-16         
## 
## Statistics by Class:
## 
##                      Class: 1 Class: 2 Class: 3
## Sensitivity          0.020000  0.26000   0.0000
## Specificity          0.490000  0.64000   0.5100
## Pos Pred Value       0.019231  0.26531   0.0000
## Neg Pred Value       0.500000  0.63366   0.5050
## Prevalence           0.333333  0.33333   0.3333
## Detection Rate       0.006667  0.08667   0.0000
## Detection Prevalence 0.346667  0.32667   0.3267
## Balanced Accuracy    0.255000  0.45000   0.2550

The code and associated files are on sr.ht

https://git.sr.ht/~bwlewis/rgems