ValueError:模型的特性数量必须与输入匹配。模型输入nfeature n_feature >

时间:2022-02-18 20:27:22

I am trying to implement isolation forest for 9 input features Used the example from

我正在为使用 ble/plot_isolation_forest.html#sphx-glr-auto-example -plot- plot-isolation-forest-py的9个输入特性实现隔离森林

My train and test set has 9 features and hence I am creating Xtrian and Xtest of same feature size


(100, 9)
 >> X_train.shape
(200, 9)

My code :



import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)

# Generate train data
X = 0.3 * rng.randn(100, 9)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 9)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 9))

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z,

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
           ["training observations",
            "new regular observations", "new abnormal observations"],
           loc="upper left")

But I am getting error



ValueError: Number of features of the model must match the input. Model n_features is 9 and input n_features is 2

In my case my error shows : Model n_features is 9 and input n_features is 2


Any inputs on what I am missing here:


1 个解决方案



Even though you've fit a model with 9 features, the plotting section of the code is still presuming only two dimensions, as was the case in the example you're working off of:


# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Look at the shape of the np.c_() array being passed into clf.decision_function():


np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)

You're getting the error because clf is expecting 9-D input, but you're only providing a 2-D array.


The classifier itself should still be accessible without any trouble. You can still use decision_function() and predict() methods, for example, but you won't be able to plot all 9 dimensions using the code you're going off of - it was only designed to plot in 2-D. Even running np.meshgrid() with 9 dimensions will almost certainly throw a MemoryError - see this discussion for more on that.


Trying to plot 9-D space isn't going to be very helpful here, anyway. You might focus instead on more standard visual representations of classifier strength, like ROC curves or even a good old fashioned confusion matrix.




Even though you've fit a model with 9 features, the plotting section of the code is still presuming only two dimensions, as was the case in the example you're working off of:


# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Look at the shape of the np.c_() array being passed into clf.decision_function():


np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)

You're getting the error because clf is expecting 9-D input, but you're only providing a 2-D array.


The classifier itself should still be accessible without any trouble. You can still use decision_function() and predict() methods, for example, but you won't be able to plot all 9 dimensions using the code you're going off of - it was only designed to plot in 2-D. Even running np.meshgrid() with 9 dimensions will almost certainly throw a MemoryError - see this discussion for more on that.


Trying to plot 9-D space isn't going to be very helpful here, anyway. You might focus instead on more standard visual representations of classifier strength, like ROC curves or even a good old fashioned confusion matrix.
