ValueError:模型的特性数量必须与输入匹配。模型输入nfeature n_feature >

时间:2022-02-18 20:27:22

I am trying to implement isolation forest for 9 input features Used the example from http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

我正在为使用http://scikit-learn.org/stable/auto_examples/ensemble ble/plot_isolation_forest.html#sphx-glr-auto-example -plot- plot-isolation-forest-py的9个输入特性实现隔离森林

My train and test set has 9 features and hence I am creating Xtrian and Xtest of same feature size

我的train和测试集有9个特性,因此我创建了相同特性大小的Xtrian和Xtest

X.shape 
(100, 9)
 >> X_train.shape
(200, 9)

My code :

我的代码:

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)

# Generate train data
X = 0.3 * rng.randn(100, 9)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 9)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 9))

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
           ["training observations",
            "new regular observations", "new abnormal observations"],
           loc="upper left")
plt.show()

But I am getting error

但是我得到了错误

---------------------------------------------------------------------------

ValueError: Number of features of the model must match the input. Model n_features is 9 and input n_features is 2

In my case my error shows : Model n_features is 9 and input n_features is 2

在我的例子中,我的错误显示:模型n_features是9,输入n_features是2。

Any inputs on what I am missing here:

关于我在这里所缺少的任何信息:

1 个解决方案

#1


1  

Even though you've fit a model with 9 features, the plotting section of the code is still presuming only two dimensions, as was the case in the example you're working off of:

即使您已经安装了一个具有9个特性的模型,代码的绘图部分仍然假设只有两个维度,就像您正在处理的示例中的情况一样:

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Look at the shape of the np.c_() array being passed into clf.decision_function():

查看传入clf.decision_function()的np.c_()数组的形状:

np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)

You're getting the error because clf is expecting 9-D input, but you're only providing a 2-D array.

您会得到错误,因为clf期望得到9-D输入,但您只提供了一个2-D数组。

The classifier itself should still be accessible without any trouble. You can still use decision_function() and predict() methods, for example, but you won't be able to plot all 9 dimensions using the code you're going off of - it was only designed to plot in 2-D. Even running np.meshgrid() with 9 dimensions will almost certainly throw a MemoryError - see this discussion for more on that.

分类器本身仍然是可访问的,没有任何问题。例如,您仍然可以使用decision_function()和prediction()方法,但是您不能使用您将要处理的代码来绘制所有9个维度——它只设计成2d图形。即使运行具有9个维度的np.meshgrid()也几乎肯定会抛出一个MemoryError—请参阅本讨论。

Trying to plot 9-D space isn't going to be very helpful here, anyway. You might focus instead on more standard visual representations of classifier strength, like ROC curves or even a good old fashioned confusion matrix.

尝试绘制9-D空间在这里并不是很有用。相反,您可以关注更标准的分类器强度的视觉表示,比如ROC曲线,甚至是一个很好的老式混淆矩阵。

#1


1  

Even though you've fit a model with 9 features, the plotting section of the code is still presuming only two dimensions, as was the case in the example you're working off of:

即使您已经安装了一个具有9个特性的模型,代码的绘图部分仍然假设只有两个维度,就像您正在处理的示例中的情况一样:

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Look at the shape of the np.c_() array being passed into clf.decision_function():

查看传入clf.decision_function()的np.c_()数组的形状:

np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)

You're getting the error because clf is expecting 9-D input, but you're only providing a 2-D array.

您会得到错误,因为clf期望得到9-D输入,但您只提供了一个2-D数组。

The classifier itself should still be accessible without any trouble. You can still use decision_function() and predict() methods, for example, but you won't be able to plot all 9 dimensions using the code you're going off of - it was only designed to plot in 2-D. Even running np.meshgrid() with 9 dimensions will almost certainly throw a MemoryError - see this discussion for more on that.

分类器本身仍然是可访问的,没有任何问题。例如,您仍然可以使用decision_function()和prediction()方法,但是您不能使用您将要处理的代码来绘制所有9个维度——它只设计成2d图形。即使运行具有9个维度的np.meshgrid()也几乎肯定会抛出一个MemoryError—请参阅本讨论。

Trying to plot 9-D space isn't going to be very helpful here, anyway. You might focus instead on more standard visual representations of classifier strength, like ROC curves or even a good old fashioned confusion matrix.

尝试绘制9-D空间在这里并不是很有用。相反,您可以关注更标准的分类器强度的视觉表示,比如ROC曲线,甚至是一个很好的老式混淆矩阵。