xgboost算法的学习小案例

时间:2022-09-09 07:40:02
# xgboost
#预测集直接从pandas转入就行
data_predict2 = data_predict.ix[:, 2:]
#训练集的特征与对应的标签
dtrain = xgb.DMatrix(data_train.ix[:, :-1], label=data_train.ix[:, -1], missing=np.nan)
#训练的时候看的观测值,跟训练集是一样的
dwatch = xgb.DMatrix(data_watch.ix[:, :-1], label=data_watch.ix[:, -1], missing=np.nan)
#把预测的pandas转为Dmatrix
dtest = xgb.DMatrix(data_predict2, missing=np.nan)

dtrain2 = xgb.DMatrix(data_2hour_stack, label=data_2hour_label_stack, missing = np.nan)
dtest2 = xgb.DMatrix(data_2hour_test_stack, missing = np.nan)
#对应的参数调整
param = {'silent': 1,
         'eta': 0.1,
         'nthread': 8,
         'objective': 'reg:linear',
         'eval_metric': 'rmse'
         }
circle = 1000
watchlist = [(dtrain, 'train'), (dwatch, 'watch')]
xgb_model = xgb.train(param, dtrain, num_boost_round=circle, evals=watchlist)
predict_xgb = xgb_model.predict(dtest)

解决训练集与测试集特征维度最大值不一样的问题:
feature_names mismatch XGBoost错误解析

python和pandas数据类型之间的转化:
http://blog.csdn.net/flyfrommath/article/details/69388675

修改columns的方法
ypred1 = DataFrame(ypred1)
ypred1_column=list(ypred1.columns)
ypred1.rename(columns={ypred1_column[0]: ‘label’}, inplace=True)

参考的网址:
http://blog.csdn.net/leichaoaizhaojie/article/details/52629549
http://blog.csdn.net/u011089523/article/details/72812019
http://blog.csdn.net/lujiandong1/article/details/52743396