混淆矩阵中的误差数据和参考因素必须具有相同的层次

时间:2022-05-07 19:17:20

I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:

我用R卡训练了一个树模型。我现在正在尝试生成一个混淆矩阵,并不断得到以下错误:

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

默认值(predicsionmatrix .default, testdata$catgeory):数据和引用因子必须具有相同的级别

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!

产生混淆矩阵时发生错误。两个对象的级别是相同的。我搞不懂是什么问题。它们的结构和层次如下所示。它们应该是一样的。任何帮助都将非常感谢,因为它使我崩溃!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"       

6 个解决方案

#1


1  

Maybe your model is not predicting a certain factor. Use the table() function instead of confusionMatrix() to see if that is the problem.

也许你的模型没有预测某个因素。使用表()函数而不是混淆矩阵()来查看这是否是问题所在。

#2


1  

Try specifying na.pass for the na.action option:

尝试指定na。通过对na。行动选项:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

#3


0  

The length problem you're running into is probably due to the presence of NAs in the training set -- either drop the cases that are not complete, or impute so that you do not have missing values.

您正在遇到的长度问题可能是由于在训练集中存在NAs——或者删除不完整的情况,或者假定没有丢失的值。

#4


0  

I had same issue but went ahead and changed it after reading data file like so..

我也有同样的问题,但是在读取数据文件之后,我继续修改。

data = na.omit(data)

data = na.omit(数据)

Thanks all for pointer!

感谢所有指针!

#5


-1  

Try use:

尝试使用:

confusionMatrix(table(Argument 1, Argument 2)) 

Thats worked for me.

这为我工作。

#6


-2  

Might be there are missing values in the testdata, Add the following line before "predictionsTree <- predict(treeFit, testdata)" to remove NAs. I had the same error and now it works for me.

可能在testdata中有缺失的值,在“predictionsTree <- prediction (treeFit, testdata)”之前添加以下行,以删除NAs。我也犯了同样的错误,现在对我来说是可行的。

testdata <- testdata[complete.cases(testdata),]

#1


1  

Maybe your model is not predicting a certain factor. Use the table() function instead of confusionMatrix() to see if that is the problem.

也许你的模型没有预测某个因素。使用表()函数而不是混淆矩阵()来查看这是否是问题所在。

#2


1  

Try specifying na.pass for the na.action option:

尝试指定na。通过对na。行动选项:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

#3


0  

The length problem you're running into is probably due to the presence of NAs in the training set -- either drop the cases that are not complete, or impute so that you do not have missing values.

您正在遇到的长度问题可能是由于在训练集中存在NAs——或者删除不完整的情况,或者假定没有丢失的值。

#4


0  

I had same issue but went ahead and changed it after reading data file like so..

我也有同样的问题,但是在读取数据文件之后,我继续修改。

data = na.omit(data)

data = na.omit(数据)

Thanks all for pointer!

感谢所有指针!

#5


-1  

Try use:

尝试使用:

confusionMatrix(table(Argument 1, Argument 2)) 

Thats worked for me.

这为我工作。

#6


-2  

Might be there are missing values in the testdata, Add the following line before "predictionsTree <- predict(treeFit, testdata)" to remove NAs. I had the same error and now it works for me.

可能在testdata中有缺失的值,在“predictionsTree <- prediction (treeFit, testdata)”之前添加以下行,以删除NAs。我也犯了同样的错误,现在对我来说是可行的。

testdata <- testdata[complete.cases(testdata),]