I'm a big fan of mlxtend's plot_decision_regions
function, (http://rasbt.github.io/mlxtend/#examples , https://*.com/a/43298736/1870832)
我是mlxtend的plot_decision_regions函数的忠实粉丝,(http://rasbt.github.io/mlxtend/#examples,https://*.com/a/43298736/1870832)
It accepts an X
(just two columns at a time), y
, and (fitted) classifier clf
object, and then provides a pretty awesome visualization of the relationship between model predictions, true y-values, and a pair of independent variables.
它接受一个X(一次只有两列),y和(拟合)分类器clf对象,然后提供模型预测,真实y值和一对独立变量之间关系的非常棒的可视化。
A couple restrictions: X
and y
have to be numpy arrays, and clf
needs to have a predict()
method. Fair enough. My problem is that in my case, the classifier clf
object I would like to visualize has already been fitted on a Pandas DataFrame...
一些限制:X和y必须是numpy数组,而clf需要有一个predict()方法。很公平。我的问题是,在我的情况下,我想要可视化的分类器clf对象已经安装在Pandas DataFrame上...
import numpy as np
import pandas as pd
import xgboost as xgb
import matplotlib
matplotlib.use('Agg')
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
)
# Fit a Classifier to the data
# This classifier is fit on the data as a Pandas DataFrame
X = df[['Planned_End', 'Actual_End']]
y = df['Late']
clf = xgb.XGBClassifier()
clf.fit(X, y)
So now when I try to use plot_decision_regions
passing X/y as numpy arrays...
所以现在当我尝试使用plot_decision_regions将X / y作为numpy数组传递时...
# Plot Decision Region using mlxtend's awesome plotting function
plot_decision_regions(X=X.values,
y=y.values,
clf=clf,
legend=2)
I (understandably) get an error that the model can't find the column names of the dataset it was trained on
我(可以理解)得到一个错误,模型无法找到它所训练的数据集的列名
ValueError: feature_names mismatch: ['Planned_End', 'Actual_End'] ['f0', 'f1']
expected Planned_End, Actual_End in input data
training data did not have the following fields: f1, f0
In my actual case, it would be a big deal to avoid training our model on Pandas DataFrames. Is there a way to still produce decision_regions
plots for a classifier trained on a Pandas DataFrame?
在我的实际情况中,避免在Pandas DataFrames上训练我们的模型将是一件大事。有没有办法仍然为在Pandas DataFrame上训练的分类器生成decision_regions图?
1 个解决方案
#1
0
Try to change:
尝试改变:
X = df[['Planned_End', 'Actual_End']].values
y = df['Late'].values
and proceed to:
并继续:
clf = xgb.XGBClassifier()
clf.fit(X, y)
plot_decision_regions(X=X,
y=y,
clf=clf,
legend=2)
OR fit & plot
using X.values
and y.values
或者使用X.values和y.values进行拟合和绘图
#1
0
Try to change:
尝试改变:
X = df[['Planned_End', 'Actual_End']].values
y = df['Late'].values
and proceed to:
并继续:
clf = xgb.XGBClassifier()
clf.fit(X, y)
plot_decision_regions(X=X,
y=y,
clf=clf,
legend=2)
OR fit & plot
using X.values
and y.values
或者使用X.values和y.values进行拟合和绘图