I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data :
我用linear_model。从scikit-learn的线性回归作为一个预测模型。它是有效的,而且是完美的。我有一个问题,用精度评分标准来评估预测的结果。这是我的真实数据:
array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0])
And this is my predictive Data :
这是我的预测数据:
array([ 0.07094605, 0.1994941 , 0.19270157, 0.13379635, 0.04654469,
0.09212494, 0.19952108, 0.12884365, 0.15685076, -0.01274453,
0.32167554, 0.32167554, -0.10023553, 0.09819648, -0.06755516,
0.25390082, 0.17248324])
My code :
我的代码:
accuracy_score(y_true, y_pred, normalize=False)
And this is the error message :
这是错误信息:
"ValueError: Can't handle mix of binary and continuous"
“ValueError:无法处理二元和连续的混合”
Help ? Thank you.
帮助吗?谢谢你!
4 个解决方案
#1
9
accuracy_score(y_true, y_pred.round(), normalize=False)
if you prefer to have more control on the threshold use (y_pred>threshold).astype(int)
instead of y_pred.round()
where threshold
is your value to separate the two classes.
如果您希望在阈值使用上有更多的控制(y_pred>阈值).astype(int)而不是y_pred.round(),阈值是将两个类分开的值。
#2
1
The problem is that the true y is binary (zeros and ones), while your predictions are not. You probably generated probabilities and not predictions, hence the result :) Try instead to generate class membership, and it should work!
问题是,真正的y是二进制的(0和1),而你的预测不是。您可能产生了概率而不是预测,因此结果是:)尝试生成类成员,并且它应该工作!
#3
1
accuracy_score is a classification metric, you cannot use it for a regression problem.
精度分数是一个分类指标,不能用于回归问题。
You can see the available regression metrics here
您可以在这里看到可用的回归指标。
#4
1
Maybe this helps someone who finds this question:
也许这能帮助找到这个问题的人:
As JohnnyQ already pointed out, the problem is that you have non-binary (not 0 nor 1) values in your y_pred
, i. e. when adding
正如JohnnyQ已经指出的,问题是在y_pred中有非二进制(非0或1)值,即添加时。
print(((y_pred != 0.) & (y_pred != 1.)).any())
you will see True
in the output. (The command finds out if there is any value that is not 0 or 1).
您将在输出中看到True。(命令发现是否有任何值不是0或1)。
You can see your non-binary values using:
您可以使用以下方法查看非二进制值:
non_binary_values = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)]
non_binary_idxs = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)].index
A print statement can output the above derivated variables.
print语句可以输出上述派生变量。
Finally, this function can clean your data of all non-binary entries:
最后,该函数可以清除所有非二进制项的数据:
def remove_unlabelled_data(X, y):
drop_indexes = X[(y['score'] != 1) & (y['score'] != 0)].index
return X.drop(drop_indexes), y.drop(drop_indexes)
#1
9
accuracy_score(y_true, y_pred.round(), normalize=False)
if you prefer to have more control on the threshold use (y_pred>threshold).astype(int)
instead of y_pred.round()
where threshold
is your value to separate the two classes.
如果您希望在阈值使用上有更多的控制(y_pred>阈值).astype(int)而不是y_pred.round(),阈值是将两个类分开的值。
#2
1
The problem is that the true y is binary (zeros and ones), while your predictions are not. You probably generated probabilities and not predictions, hence the result :) Try instead to generate class membership, and it should work!
问题是,真正的y是二进制的(0和1),而你的预测不是。您可能产生了概率而不是预测,因此结果是:)尝试生成类成员,并且它应该工作!
#3
1
accuracy_score is a classification metric, you cannot use it for a regression problem.
精度分数是一个分类指标,不能用于回归问题。
You can see the available regression metrics here
您可以在这里看到可用的回归指标。
#4
1
Maybe this helps someone who finds this question:
也许这能帮助找到这个问题的人:
As JohnnyQ already pointed out, the problem is that you have non-binary (not 0 nor 1) values in your y_pred
, i. e. when adding
正如JohnnyQ已经指出的,问题是在y_pred中有非二进制(非0或1)值,即添加时。
print(((y_pred != 0.) & (y_pred != 1.)).any())
you will see True
in the output. (The command finds out if there is any value that is not 0 or 1).
您将在输出中看到True。(命令发现是否有任何值不是0或1)。
You can see your non-binary values using:
您可以使用以下方法查看非二进制值:
non_binary_values = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)]
non_binary_idxs = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)].index
A print statement can output the above derivated variables.
print语句可以输出上述派生变量。
Finally, this function can clean your data of all non-binary entries:
最后,该函数可以清除所有非二进制项的数据:
def remove_unlabelled_data(X, y):
drop_indexes = X[(y['score'] != 1) & (y['score'] != 0)].index
return X.drop(drop_indexes), y.drop(drop_indexes)