Xgboost如何处理缺失值

时间:2022-12-20 21:59:15

Xgboost如何处理缺失值呢,最简单直观的方式可以通过下面这个算法流程来表示。实际处理时,可以将缺失值设置成missing=-999或missing=-9999。

Xgboost如何处理缺失值


#一个例子

train, target, test = Data()
dtrain = xgb.DMatrix(train, label=target, missing=-9999)
dtest = xgb.DMatrix(test,missing=-9999)
watchlist = [(dtrain, 'train')]

param = {'booster':'gbtree',
'objective': 'binary:logistic',
'eval_metric':'auc',
'gamma': 0.1,
'max_depth': 6,
'lambda': 150,
'subsample': 0.8,
'colsample_bytree': 0.7,
'colsample_bylevel': 0.6,
'eta': 0.14,
'tree_method': 'exact',
'seed': 0,
'nthread': 10
}

bst = xgb.train(param, dtrain, 2500, watchlist, early_stopping_rounds=40)

dtest = xgb.DMatrix(test, missing=-9999)

参考文献:

[1] T. Chen, C. Guestrin, "XGBoost: A Scalable Tree Boosting System" Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, pp.785-794, 2016.