I need to make a table with the TPR and FPR values, as well as precision and recall. I am using the roc_curve and precision_recall_curve functions from sklearn.metrics package in python. My problem is that every function give me a different vector for the thresholds, and I need only one, to merge the values as columns in a single table. Could anyone help me?
我需要制作一个包含TPR和FPR值的表格,以及精确度和召回率。我在python中使用sklearn.metrics包中的roc_curve和precision_recall_curve函数。我的问题是每个函数都为我提供了一个不同的阈值向量,我只需要一个,将值合并为一个表中的列。谁能帮助我?
Thanks in advance
提前致谢
1 个解决方案
#1
The threshold values have two major differences.
阈值有两个主要差异。
-
The orders are different.
roc_curve
has thresholds in decreasing order, whileprecision_recall_curve
has thresholds in increasing order.订单不同。 roc_curve具有递减顺序的阈值,而precision_recall_curve具有递增顺序的阈值。
-
The numbers are different. In
roc_curve
,n_thresholds = len(np.unique(probas_pred))
, while inprecision_recall_curve
the numbern_thresholds = len(np.unique(probas_pred)) - 1
. In the latter, the smallest threshold value fromroc_curve
is not included. In the same time, the last precision and recall values are 1. and 0. respectively with no corresponding threshold. Therefore, the numbers of items for tpr, fpr, precision and recall are the same.数字不同。在roc_curve中,n_thresholds = len(np.unique(probas_pred)),而在precision_recall_curve中,数字n_thresholds = len(np.unique(probas_pred)) - 1.在后者中,不包括来自roc_curve的最小阈值。同时,最后的精度和召回值分别为1.和0.没有相应的阈值。因此,tpr,fpr,精度和召回的项目数是相同的。
So, back to your question, how to make a table to include tpr, fpr, precision and recall with corresponding thresholds? Here are the steps:
那么,回到你的问题,如何使表格包括tpr,fpr,精度和召回与相应的阈值?以下是步骤:
- Discard the last precision and recall values
- Reverse the precision and recall values
- Compute the precision and recall values corresponding to the lowest threshold value from the thresholds of
roc_curve
- Put all the values into the same table
丢弃最后的精度和召回值
反转精度和召回值
根据roc_curve的阈值计算与最低阈值对应的精度和召回值
将所有值放在同一个表中
#1
The threshold values have two major differences.
阈值有两个主要差异。
-
The orders are different.
roc_curve
has thresholds in decreasing order, whileprecision_recall_curve
has thresholds in increasing order.订单不同。 roc_curve具有递减顺序的阈值,而precision_recall_curve具有递增顺序的阈值。
-
The numbers are different. In
roc_curve
,n_thresholds = len(np.unique(probas_pred))
, while inprecision_recall_curve
the numbern_thresholds = len(np.unique(probas_pred)) - 1
. In the latter, the smallest threshold value fromroc_curve
is not included. In the same time, the last precision and recall values are 1. and 0. respectively with no corresponding threshold. Therefore, the numbers of items for tpr, fpr, precision and recall are the same.数字不同。在roc_curve中,n_thresholds = len(np.unique(probas_pred)),而在precision_recall_curve中,数字n_thresholds = len(np.unique(probas_pred)) - 1.在后者中,不包括来自roc_curve的最小阈值。同时,最后的精度和召回值分别为1.和0.没有相应的阈值。因此,tpr,fpr,精度和召回的项目数是相同的。
So, back to your question, how to make a table to include tpr, fpr, precision and recall with corresponding thresholds? Here are the steps:
那么,回到你的问题,如何使表格包括tpr,fpr,精度和召回与相应的阈值?以下是步骤:
- Discard the last precision and recall values
- Reverse the precision and recall values
- Compute the precision and recall values corresponding to the lowest threshold value from the thresholds of
roc_curve
- Put all the values into the same table
丢弃最后的精度和召回值
反转精度和召回值
根据roc_curve的阈值计算与最低阈值对应的精度和召回值
将所有值放在同一个表中