R package caret混淆矩阵与缺失类别。

I am using the function confusionMatrix in the R package caret to calculate some statistics for some data I have. I have been putting my predictions as well as my actual values into the table function to get the table to be used in the confusionMatrix function as so:

我用R package caret中的函数混淆矩阵来计算一些数据。我已经将我的预测和实际值放入表函数中，以便让表在混乱矩阵函数中使用:

table(predicted,actual)

However, there are multiple possible outcomes (e.g. A, B, C, D), and my predictions do not always represent all the possibilities (e.g. only A, B, D). The resulting output of the table function does not include the missing outcome and looks like this:

然而，有多种可能的结果(例如，A, B, C, D)，我的预测并不总是代表所有的可能性(例如，只有A, B, D)。

    A    B    C    D
A  n1   n2   n2   n4  
B  n5   n6   n7   n8  
D  n9  n10  n11  n12
# Note how there is no corresponding row for `C`.

The confusionMatrix function can't handle the missing outcome and gives the error:

困惑矩阵函数无法处理丢失的结果，并给出错误:

Error in !all.equal(nrow(data), ncol(data)) : invalid argument type

Is there a way I can use the table function differently to get the missing rows with zeros or use the confusionMatrix function differently so it will view missing outcomes as zero?

是否有一种方法，我可以用不同的表格来用不同的方法来得到缺失的行，或者用不同的方法来使用混乱矩阵函数，这样它就会把遗漏的结果看成是零?

As a note: Since I am randomly selecting my data to test with, there are times that a category is also not represented in the actual result as opposed to just the predicted. I don't believe this will change the solution.

注意:由于我是随机选择我的数据来测试的，所以有时候一个类别也没有在实际的结果中表示出来，而不是仅仅是预测的结果。我不相信这会改变解决方案。

3 个解决方案

#1

You can use union to ensure similar levels:

您可以使用union来确保类似的级别:

library(caret)

# Sample Data
predicted = c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference = c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4

u = union(predicted, reference)
t = table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)

#2

First note that confusionMatrix can be called as confusionMatrix(predicted, actual) in addition to being called with table objects. However, the function throws an error if predicted and actual (both regarded as factors) do not have the same number of levels.

首先要注意的是，混淆矩阵除了被称为表对象之外，还可以称为混淆矩阵(预测，实际)。然而，如果预测和实际(都被认为是因素)没有相同数量的级别，则函数会抛出一个错误。

This (and the fact that the caret package spit an error on me because they don't get the dependencies right in the first place) is why I'd suggest to create your own function:

这(以及caret包在我身上吐了一个错误，因为他们一开始就没有依赖关系)，这就是为什么我建议创建您自己的函数:

# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
  # You've mentioned that neither actual nor predicted may give a complete
  # picture of the available classes, hence:
  numClasses <- max(act, pred)
  # Sort predicted and actual as it simplifies what's next. You can make this
  # faster by storing `order(act)` in a temporary variable.
  pred <- pred[order(act)]
  act  <- act[order(act)]
  sapply(split(pred, act), tabulate, nbins=numClasses)
}

# Generate random data since you've not provided an actual example.
actual    <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)

print( createConfusionMatrix(actual, predicted) )

which will give you:

这将给你:

      1  2  3  4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,]  0  0  0  0
[4,] 89 77 82 83

#3

I had the same problem and here is my solution:

我有同样的问题，这是我的解决方案:

tab <- table(my_prediction, my_real_label)
if(nrow(tab)!=ncol(tab)){

missings <- setdiff(colnames(tab),rownames(tab))

missing_mat <- mat.or.vec(nr = length(missings), nc = ncol(tab))
tab  <- as.table(rbind(as.matrix(tab), missing_mat))
rownames(tab) <- colnames(tab)
}

my_conf <- confusionMatrix(tab)

Cheers Cankut

欢呼声Cankut

#1