精度分数:ValueError:不能处理二元和连续的混合。

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data :

我用linear_model。从scikit-learn的线性回归作为一个预测模型。它是有效的，而且是完美的。我有一个问题，用精度评分标准来评估预测的结果。这是我的真实数据:

array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0])

And this is my predictive Data :

这是我的预测数据:

array([ 0.07094605,  0.1994941 ,  0.19270157,  0.13379635,  0.04654469,
    0.09212494,  0.19952108,  0.12884365,  0.15685076, -0.01274453,
    0.32167554,  0.32167554, -0.10023553,  0.09819648, -0.06755516,
    0.25390082,  0.17248324])

My code :

我的代码:

accuracy_score(y_true, y_pred, normalize=False)

And this is the error message :

这是错误信息:

"ValueError: Can't handle mix of binary and continuous"

“ValueError:无法处理二元和连续的混合”

Help ? Thank you.

帮助吗?谢谢你！

4 个解决方案

#1

accuracy_score(y_true, y_pred.round(), normalize=False)

if you prefer to have more control on the threshold use (y_pred>threshold).astype(int) instead of y_pred.round() where threshold is your value to separate the two classes.

如果您希望在阈值使用上有更多的控制(y_pred>阈值).astype(int)而不是y_pred.round()，阈值是将两个类分开的值。

#2

The problem is that the true y is binary (zeros and ones), while your predictions are not. You probably generated probabilities and not predictions, hence the result :) Try instead to generate class membership, and it should work!

问题是，真正的y是二进制的(0和1)，而你的预测不是。您可能产生了概率而不是预测，因此结果是:)尝试生成类成员，并且它应该工作!

#3

accuracy_score is a classification metric, you cannot use it for a regression problem.

精度分数是一个分类指标，不能用于回归问题。

You can see the available regression metrics here

您可以在这里看到可用的回归指标。

#4

Maybe this helps someone who finds this question:

也许这能帮助找到这个问题的人:

As JohnnyQ already pointed out, the problem is that you have non-binary (not 0 nor 1) values in your y_pred, i. e. when adding

正如JohnnyQ已经指出的，问题是在y_pred中有非二进制(非0或1)值，即添加时。

print(((y_pred != 0.) & (y_pred != 1.)).any())

you will see True in the output. (The command finds out if there is any value that is not 0 or 1).

您将在输出中看到True。(命令发现是否有任何值不是0或1)。

You can see your non-binary values using:

您可以使用以下方法查看非二进制值:

non_binary_values = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)]
non_binary_idxs = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)].index

A print statement can output the above derivated variables.

print语句可以输出上述派生变量。

Finally, this function can clean your data of all non-binary entries:

最后，该函数可以清除所有非二进制项的数据:

def remove_unlabelled_data(X, y):
    drop_indexes = X[(y['score'] != 1) & (y['score'] != 0)].index
    return X.drop(drop_indexes), y.drop(drop_indexes)

#1

accuracy_score(y_true, y_pred.round(), normalize=False)