I am using the function confusionMatrix
in the R package caret
to calculate some statistics for some data I have. I have been putting my predictions as well as my actual values into the table
function to get the table to be used in the confusionMatrix
function as so:
我用R package caret中的函数混淆矩阵来计算一些数据。我已经将我的预测和实际值放入表函数中,以便让表在混乱矩阵函数中使用:
table(predicted,actual)
However, there are multiple possible outcomes (e.g. A, B, C, D), and my predictions do not always represent all the possibilities (e.g. only A, B, D). The resulting output of the table
function does not include the missing outcome and looks like this:
然而,有多种可能的结果(例如,A, B, C, D),我的预测并不总是代表所有的可能性(例如,只有A, B, D)。
A B C D
A n1 n2 n2 n4
B n5 n6 n7 n8
D n9 n10 n11 n12
# Note how there is no corresponding row for `C`.
The confusionMatrix
function can't handle the missing outcome and gives the error:
困惑矩阵函数无法处理丢失的结果,并给出错误:
Error in !all.equal(nrow(data), ncol(data)) : invalid argument type
Is there a way I can use the table
function differently to get the missing rows with zeros or use the confusionMatrix
function differently so it will view missing outcomes as zero?
是否有一种方法,我可以用不同的表格来用不同的方法来得到缺失的行,或者用不同的方法来使用混乱矩阵函数,这样它就会把遗漏的结果看成是零?
As a note: Since I am randomly selecting my data to test with, there are times that a category is also not represented in the actual result as opposed to just the predicted. I don't believe this will change the solution.
注意:由于我是随机选择我的数据来测试的,所以有时候一个类别也没有在实际的结果中表示出来,而不是仅仅是预测的结果。我不相信这会改变解决方案。
3 个解决方案
#1
15
You can use union
to ensure similar levels:
您可以使用union来确保类似的级别:
library(caret)
# Sample Data
predicted = c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference = c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4
u = union(predicted, reference)
t = table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)
#2
5
First note that confusionMatrix
can be called as confusionMatrix(predicted, actual)
in addition to being called with table
objects. However, the function throws an error if predicted
and actual
(both regarded as factor
s) do not have the same number of levels.
首先要注意的是,混淆矩阵除了被称为表对象之外,还可以称为混淆矩阵(预测,实际)。然而,如果预测和实际(都被认为是因素)没有相同数量的级别,则函数会抛出一个错误。
This (and the fact that the caret
package spit an error on me because they don't get the dependencies right in the first place) is why I'd suggest to create your own function:
这(以及caret包在我身上吐了一个错误,因为他们一开始就没有依赖关系),这就是为什么我建议创建您自己的函数:
# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
# You've mentioned that neither actual nor predicted may give a complete
# picture of the available classes, hence:
numClasses <- max(act, pred)
# Sort predicted and actual as it simplifies what's next. You can make this
# faster by storing `order(act)` in a temporary variable.
pred <- pred[order(act)]
act <- act[order(act)]
sapply(split(pred, act), tabulate, nbins=numClasses)
}
# Generate random data since you've not provided an actual example.
actual <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)
print( createConfusionMatrix(actual, predicted) )
which will give you:
这将给你:
1 2 3 4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,] 0 0 0 0
[4,] 89 77 82 83
#3
0
I had the same problem and here is my solution:
我有同样的问题,这是我的解决方案:
tab <- table(my_prediction, my_real_label)
if(nrow(tab)!=ncol(tab)){
missings <- setdiff(colnames(tab),rownames(tab))
missing_mat <- mat.or.vec(nr = length(missings), nc = ncol(tab))
tab <- as.table(rbind(as.matrix(tab), missing_mat))
rownames(tab) <- colnames(tab)
}
my_conf <- confusionMatrix(tab)
Cheers Cankut
欢呼声Cankut
#1
15
You can use union
to ensure similar levels:
您可以使用union来确保类似的级别:
library(caret)
# Sample Data
predicted = c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference = c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4
u = union(predicted, reference)
t = table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)
#2
5
First note that confusionMatrix
can be called as confusionMatrix(predicted, actual)
in addition to being called with table
objects. However, the function throws an error if predicted
and actual
(both regarded as factor
s) do not have the same number of levels.
首先要注意的是,混淆矩阵除了被称为表对象之外,还可以称为混淆矩阵(预测,实际)。然而,如果预测和实际(都被认为是因素)没有相同数量的级别,则函数会抛出一个错误。
This (and the fact that the caret
package spit an error on me because they don't get the dependencies right in the first place) is why I'd suggest to create your own function:
这(以及caret包在我身上吐了一个错误,因为他们一开始就没有依赖关系),这就是为什么我建议创建您自己的函数:
# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
# You've mentioned that neither actual nor predicted may give a complete
# picture of the available classes, hence:
numClasses <- max(act, pred)
# Sort predicted and actual as it simplifies what's next. You can make this
# faster by storing `order(act)` in a temporary variable.
pred <- pred[order(act)]
act <- act[order(act)]
sapply(split(pred, act), tabulate, nbins=numClasses)
}
# Generate random data since you've not provided an actual example.
actual <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)
print( createConfusionMatrix(actual, predicted) )
which will give you:
这将给你:
1 2 3 4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,] 0 0 0 0
[4,] 89 77 82 83
#3
0
I had the same problem and here is my solution:
我有同样的问题,这是我的解决方案:
tab <- table(my_prediction, my_real_label)
if(nrow(tab)!=ncol(tab)){
missings <- setdiff(colnames(tab),rownames(tab))
missing_mat <- mat.or.vec(nr = length(missings), nc = ncol(tab))
tab <- as.table(rbind(as.matrix(tab), missing_mat))
rownames(tab) <- colnames(tab)
}
my_conf <- confusionMatrix(tab)
Cheers Cankut
欢呼声Cankut