glmnet & caret: ROC，敏感性，训练模型的特异性

I would like to use GLMNET to fit a binomial logistic regression model. I could use either caret or the glmnet-package directly. Lets take the data(BinomialExample) as an example to execute the following code where I´ve implemented both:

我想用GLMNET来适应二项逻辑回归模型。我可以直接使用插入符号或glmnet包。允许将数据(BinomialExample)为例,执行下面的代码我´已经实现:

#rm(list = ls(all.names = TRUE))

library(glmnet)    
library(caret)
data(BinomialExample)

y[y==0] = "low"
y[y==1] = "high"
y <- as.factor(y)

#split data in training & validation set

set.seed(1)
splitSample <- createDataPartition(y, p = 0.8, list = FALSE)
training_expression <- x[splitSample,]
training_phenotype <- y[splitSample]
validation_expression <- x[-splitSample,]
validation_phenotype <- y[-splitSample]

#####################
##GLMNET with CARET##
#####################
eGrid <- expand.grid(.alpha=seq(0.1,0.9, by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(verboseIter=TRUE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv") 

set.seed(1)
netFit <- train(x = training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit) 
trainROC <- netFitPerf[,1]
trainSens <- netFitPerf[,2]
trainSpec <- netFitPerf[,3] 
trainAlpha <- netFit$bestTune[,1]
trainLambda <- netFit$bestTune[,2]
print(sprintf("ROC: %s Sens: %s Spec: %s Alpha: %s Lambda: %s", round(trainROC,2), round(trainSens,2), round(trainSpec,2), round(trainAlpha,2),round(trainLambda,2))) 

predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)

######################
#GLMNET without CARET#
######################
set.seed(1)
elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", type.measure = "class", nfolds=10, alpha=0.5, nlambda = 100) 
plot(elasticnet)
predict_validation <- predict(elasticnet, newx = validation_expression, s = c(elasticnet$lambda.min), type = "class")
confusionMatrix(predict_validation,validation_phenotype)

As you can see if I use the caret packet I can easily print the ROC, Sensitivity and Specificity of the model. However I was not able to find a similar way to print ROC, Sens, Spec if I use glmnet directly without CARET - is there a similar way to get these metrics?

如您所见，如果我使用插入符号包，我可以很容易地打印ROC，模型的敏感性和特异性。然而，我无法找到类似的方法来打印ROC、Sens、Spec，如果我直接使用glmnet，没有插入符号——是否有类似的方法来获得这些度量?

Thanks for your help!

谢谢你的帮助!

1 个解决方案

#1

You can get the values you want from various objects produced by your glmnet workflow. For example, if you do

您可以从glmnet工作流生成的各种对象中获得您想要的值。例如，如果你这样做。

cm = confusionMatrix(predict_validation,validation_phenotype)

then cm$byClass includes Specificity and Sensitivity:

那么cm$byClass包括特异性和敏感性:

cm$byClass
     Sensitivity          Specificity       Pos Pred Value       Neg Pred Value           Prevalence 
       0.8181818            1.0000000            1.0000000            0.8000000            0.5789474 
  Detection Rate Detection Prevalence    Balanced Accuracy 
       0.4736842            0.4736842            0.9090909

Likewise, you can get Lambda from elasticnet$lambda.min and alpha from gsub(".*alpha = ([0-9]\\.[0-9]*).*","\\1",deparse(elasticnet$glmnet.fit$call)[2]) (although there may be a better way than that monstrous piece of code). Actually, since the alpha value is an input to the function, you don't even need to extract it. However, if you cross-validate on alpha in addition to lambda you'd need to use a loop to try out multiple alpha values and then you'd need some way to extract the alpha value of the best model. If you decide to include alpha in the cross-validation, be sure to read the Details section of cv.glmnet.

同样地，你可以从弹性元中得到。gsub的min和alpha (")([0-9]\\.[0-9]\.[0-9]*).*"，"\\1"，离开(弹性网$glmnet.fit$call)[2])(尽管可能会有比这更大的代码的更好的方法)。实际上，因为alpha值是函数的输入，你甚至不需要提取它。但是，如果你对除lambda之外的alpha进行交叉验证，你需要使用一个循环来尝试多个alpha值，然后你需要某种方法来提取最佳模型的alpha值。如果您决定在交叉验证中包含alpha，请务必阅读cv.glmnet的Details部分。

For the AUC of the ROC curve, cv.glmnet will give you that, but you'd need to use type.measure="auc" instead of type.measure="class", which would change how the best model is selected. Also, with this particular data sample, you need to use fewer CV folds, but that might not be an issue with your real data. For example:

对于ROC曲线的AUC, cv。glmnet会提供这个，但是您需要使用type。测量= " auc”而不是类型。度量=“类”，这将改变如何选择最佳模型。此外，对于这个特定的数据示例，您需要使用更少的CV折叠，但这可能与实际数据无关。例如:

elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", 
                        type.measure = "auc", nfolds=5, alpha=0.5, nlambda = 100)

Then, to get the AUC:

然后，获得AUC:

elasticnet$cvm[which(elasticnet$lambda==elasticnet$lambda.min)]

或

max(elasticnet$cvm)

If you want to calculate the AUC without using AUC to select the best model, you might have to calculate that yourself or use a pre-existing function for this, such as auc from the pROC package.

如果您想在不使用AUC选择最佳模型的情况下计算AUC，您可能需要自己计算或者为此使用预先存在的函数，例如来自pROC包的AUC。

#1