模型的特征数必须与输入匹配?

时间:2022-02-18 20:26:58

I'm trying to use a RandomForestClassifier on some data I have. The code is below:

我在尝试用一个随机森林分类器来处理一些数据。下面的代码是:

print train_data[0,0:20]
print train_data[0,21::]
print test_data[0]

print 'Training...'
forest = RandomForestClassifier(n_estimators=100)
forest = forest.fit( train_data[0::,0::20], train_data[0::,21::] )

print 'Predicting...'
output = forest.predict(test_data)

but this generates the following error:

但这产生了以下错误:

ValueError: Number of features of the model must match the input. Model n_features is 3 and input n_features is 21

ValueError:模型的特性数量必须与输入匹配。模型n_features是3,输入n_features是21。

The output from the first three print statements is:

前三种打印语句的输出为:

[   0.            0.            0.            0.            1.            0.
    0.            0.            0.            0.            1.            0.
    0.            0.            0.           37.7745986  -122.42589168
    0.            0.            0.        ]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  1.  0.]
[   0.            0.            0.            0.            0.            0.
    0.            1.            0.            0.            1.            0.
    0.            0.            0.            0.           37.73505101
 -122.3995877     0.            0.            0.        ]

I had assumed that the data was in the correct format for my fit/predict calls, but it is erroring out on the predict. Can anyone see what I am doing wrong here?

我假设数据的格式适合我的fit/预测调用,但它在预测上是错误的。有人能看出我做错了什么吗?

1 个解决方案

#1


1  

The input data used to train the model is train_data[0::,0::20], which I think is a mistake (why skip features in between?) -- it should be train_data[0::,0:20] instead based on the debug prints you did in the beginning.

用于训练模型的输入数据是train_data[0:: 20],我认为这是一个错误(为什么要跳过中间的特性?)——它应该是train_data[0:: 0:20],而是基于您在开始时所做的调试打印。

Also, it seems that the last column represents the labels in both train_data and test_data. When predicting, you might want to pass test_data[:, :20] instead of test_data when calling thepredict function.

另外,最后一列似乎表示train_data和test_data中的标签。在预测时,您可能想要传递test_data[::20],而不是在调用预测函数时使用test_data。

#1


1  

The input data used to train the model is train_data[0::,0::20], which I think is a mistake (why skip features in between?) -- it should be train_data[0::,0:20] instead based on the debug prints you did in the beginning.

用于训练模型的输入数据是train_data[0:: 20],我认为这是一个错误(为什么要跳过中间的特性?)——它应该是train_data[0:: 0:20],而是基于您在开始时所做的调试打印。

Also, it seems that the last column represents the labels in both train_data and test_data. When predicting, you might want to pass test_data[:, :20] instead of test_data when calling thepredict function.

另外,最后一列似乎表示train_data和test_data中的标签。在预测时,您可能想要传递test_data[::20],而不是在调用预测函数时使用test_data。