精确召回曲线下面积在R或其他汇总量

时间:2021-07-16 09:05:39

I plan to use the precision-recall plot (PR plot) to compare models. See the attached figure (partial screenshot, sorry!) below. Obviously I have the true positives, true negatives, false positives and false negatives at hand, and I need a a single summary quantity for each model. Here are my questions:

我计划用精确回忆图(PR图)来比较模型。请看附件的图片(部分截图,抱歉!)显然,我有真正的优点,真正的缺点,假阳性和假阴性,而且我需要一个单一的汇总数量为每个模型。这里是我的问题:

  1. Area Under the PR curve (AUC) is the first quantity, but I don't know how to calculate that in R. I do NOT want to use any package like ROCR because all the codes are written by myself and I hope to write my own codes using the quantities available. It seems that there are many ways -- I hope to know which one is the most implementable.

    PR曲线下面积(AUC)是第一个量,但我不知道如何计算r,我不想使用像ROCR这样的包,因为所有的代码都是我自己写的,我希望用可用的数量写我自己的代码。看起来有很多方法——我希望知道哪一种最容易实现。

  2. Another quantity is the F-measure: a measure that combines precision and recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score. However, I am curious if this is better than the AUC in #1 or they are describing different things? Moreover, since I have a bunch of Recall and Precision values, how can I calculate a single F measure in this case (see Figure below).

    另一个量是F-measure:一种结合了精度和回忆的度量,是精度和回忆的调和平均值,是传统的F-measure或平衡的F-score。然而,我很好奇这是否比第一条中的AUC更好,还是他们描述了不同的东西?此外,由于我有很多回忆和精度值,在这种情况下如何计算单个F度量(见下图)。

Thank you!

谢谢你!

精确召回曲线下面积在R或其他汇总量

1 个解决方案

#1


3  

To calculate the AUC of a curve, you can use a numeric integration function such as trapz() in the caTools package.

要计算曲线的AUC,可以在caTools包中使用数字集成函数trapz()。

auc <- trapz(recall, precision)

auc < - trapz(记得,精度)

The F-score is the harmonic mean for a given cutoff value. In your case, you would get many F-scores for each curve so it would not summarize the curve as you like.

F-score是给定截止值的调和平均值。在你的例子中,你会得到很多f -score,所以它不会像你想的那样总结曲线。

The AUC describes the performance of the model across possible values of the continuous output from the model. The F-score describes a model at a particular cutpoint. It is more of a way to combine recall and precision to a single statistic.

AUC描述了模型在模型连续输出的可能值上的性能。F-score在特定的断点处描述一个模型。它更像是一种将回忆和精确结合在一起的方法。

Be careful when explaining it though. Usually, AUC is discussed in the context of sensitivity and specificity.

不过解释的时候要小心。通常,AUC是在敏感性和特异性的背景下讨论的。

#1


3  

To calculate the AUC of a curve, you can use a numeric integration function such as trapz() in the caTools package.

要计算曲线的AUC,可以在caTools包中使用数字集成函数trapz()。

auc <- trapz(recall, precision)

auc < - trapz(记得,精度)

The F-score is the harmonic mean for a given cutoff value. In your case, you would get many F-scores for each curve so it would not summarize the curve as you like.

F-score是给定截止值的调和平均值。在你的例子中,你会得到很多f -score,所以它不会像你想的那样总结曲线。

The AUC describes the performance of the model across possible values of the continuous output from the model. The F-score describes a model at a particular cutpoint. It is more of a way to combine recall and precision to a single statistic.

AUC描述了模型在模型连续输出的可能值上的性能。F-score在特定的断点处描述一个模型。它更像是一种将回忆和精确结合在一起的方法。

Be careful when explaining it though. Usually, AUC is discussed in the context of sensitivity and specificity.

不过解释的时候要小心。通常,AUC是在敏感性和特异性的背景下讨论的。