I've trained a Linear Regression model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:
我用R插入符训练了一个线性回归模型。我现在正在尝试生成混淆矩阵并继续收到以下错误:
Error in confusionMatrix.default(pred, testing$Final) : the data and reference factors must have the same number of levels
confusionMatrix.default(pred,测试$ Final)出错:数据和参考因子必须具有相同的级别数
EnglishMarks <- read.csv("E:/Subject Wise Data/EnglishMarks.csv",
header=TRUE)
inTrain<-createDataPartition(y=EnglishMarks$Final,p=0.7,list=FALSE)
training<-EnglishMarks[inTrain,]
testing<-EnglishMarks[-inTrain,]
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)
modFit<-train(Final~UT1+UT2+HalfYearly+UT3+UT4,method="lm",data=training)
pred<-format(round(predict(modFit,testing)))
confusionMatrix(pred,testing$Final)
The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!
生成混淆矩阵时会发生错误。两个对象的级别相同。我无法弄清问题是什么。它们的结构和水平如下。它们应该是一样的。任何帮助将非常感谢,因为它让我破解!
> str(pred)
chr [1:148] "85" "84" "87" "65" "88" "84" "82" "84" "65" "78" "78" "88" "85"
"86" "77" ...
> str(testing$Final)
int [1:148] 88 85 86 70 85 85 79 85 62 77 ...
> levels(pred)
NULL
> levels(testing$Final)
NULL
2 个解决方案
#1
6
Do table(pred)
and table(testing$Final)
. You will see that there is at least one number in the testing set that is never predicted (i.e. never present in pred
). This is what is meant why "different number of levels". There is an example of a custom made function to get around this problem here.
做表(pred)和表(测试$ Final)。您将看到测试集中至少有一个数字从未预测过(即从未出现在pred中)。这就是“不同级别”的原因。有一个自定义函数的例子来解决这个问题。
However, I found that this trick works fine:
但是,我发现这个技巧很好用:
table(factor(pred, levels=min(test):max(test)),
factor(test, levels=min(test):max(test)))
It should give you exactly the same confusion matrix as with the function.
它应该为您提供与函数完全相同的混淆矩阵。
#2
0
Something like the follows seem to work for me. The idea is similar to that of @nayriz:
像下面这样的东西似乎对我有用。这个想法类似于@nayriz:
confusionMatrix(
factor(pred, levels = 1:148),
factor(testing$Final, levels = 1:148)
)
The key is to make sure the factor levels match.
关键是要确保因子水平匹配。
#1
6
Do table(pred)
and table(testing$Final)
. You will see that there is at least one number in the testing set that is never predicted (i.e. never present in pred
). This is what is meant why "different number of levels". There is an example of a custom made function to get around this problem here.
做表(pred)和表(测试$ Final)。您将看到测试集中至少有一个数字从未预测过(即从未出现在pred中)。这就是“不同级别”的原因。有一个自定义函数的例子来解决这个问题。
However, I found that this trick works fine:
但是,我发现这个技巧很好用:
table(factor(pred, levels=min(test):max(test)),
factor(test, levels=min(test):max(test)))
It should give you exactly the same confusion matrix as with the function.
它应该为您提供与函数完全相同的混淆矩阵。
#2
0
Something like the follows seem to work for me. The idea is similar to that of @nayriz:
像下面这样的东西似乎对我有用。这个想法类似于@nayriz:
confusionMatrix(
factor(pred, levels = 1:148),
factor(testing$Final, levels = 1:148)
)
The key is to make sure the factor levels match.
关键是要确保因子水平匹配。