1、利用xgboost做特征组合
1)XGBModel.apply(self, X, ntree_limit=0)
return the predicted leaf every tree for each sample
X: 训练集特征,features matrix
ntree_limit: 预测时数的个数, Limit number of trees in the prediction; defaults to 0 (use all trees)。
def apply(self, X, ntree_limit=0):
"""Return the predicted leaf every tree for each sample.
Parameters
----------
X : array_like, shape=[n_samples, n_features]
Input features matrix.
ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).
Returns
-------
X_leaves : array_like, shape=[n_samples, n_trees]
For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within
``[0; 2**(self.max_depth+1))``, possibly with gaps in the numbering.
"""
test_dmatrix = DMatrix(X, missing=self.missing)
return self.get_booster().predict(test_dmatrix,
pred_leaf=True,
ntree_limit=ntree_limit)
- GBDT与GBDT+LR区别
我理解如下:
GBDT: 拟合上次预测后与实际结果的残差(即拟合[(y-y1^)-y^2]).
GBDT+LR: 即将GBDT每课树预测的结果,通过线性再次组合,自动学习每次的权重。
菜鸟学习中,笔记方便自己后期学习理解,边学习边修改中,如有不正确的地方,烦请指正。