confusionMatrix用于R中的逻辑回归

I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:

我想使用我的训练数据和我的测试数据为我的逻辑回归计算两个混淆矩阵:

logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

i set the threshold of predicted probability at 0.5:

我将预测概率的阈值设置为0.5:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      train$LoanStatus_B == 1))

And the the code below works well for my training set. However, when i use the test set:

以下代码适用于我的训练集。但是,当我使用测试集时:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      test$LoanStatus_B == 1))

it gave me an error of

它给了我一个错误

Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length

Why is this? How can I fix this? Thank you!

为什么是这样?我怎样才能解决这个问题?谢谢!

1 个解决方案

#1

I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix from the caret package to compute and display confusion matrices, but you don't need to table your results before that call.

我认为使用预测存在问题,因为您忘记提供新数据。此外,您可以使用插入符包中的confusionMatrix函数来计算和显示混淆矩阵,但您无需在调用之前对结果进行表格处理。

Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.

在这里,我创建了一个包含代表性二进制目标变量的玩具数据集,然后我训练了一个类似于你所做的模型。

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

Now, you can predict the data (for example, your training set) and then use confusionMatrix() that takes two arguments:

现在,您可以预测数据(例如,您的训练集),然后使用带有两个参数的confusionMatrix():

your predictions
the observed classes

观察到的课程

library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set    
pdata <- predict(logitMod, newdata = train, type = "response")

# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)

Here are the results

结果如下

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 66 33
         1  0  1

               Accuracy : 0.67            
                 95% CI : (0.5688, 0.7608)
    No Information Rate : 0.66            
    P-Value [Acc > NIR] : 0.4625

#1