决策树的超参数有:
- max_depth(树的深度)
- max_leaf_nodes(叶子结点的数目)
- max_features(最大特征数目)
- min_samples_leaf(叶子结点的最小样本数)
- min_samples_split(中间结点的最小样本树)
- min_weight_fraction_leaf(叶子节点的样本权重占总权重的比例)
- min_impurity_split(最小不纯净度)也可以调整
1、数据准备与《sklearn中决策树的使用》中相同,这里不再累述、
2、使用步骤
from import DecisionTreeClassifier
model_DD = DecisionTreeClassifier()
max_depth = range(1,10,1)
min_samples_leaf = range(1,10,2)
tuned_parameters = dict(max_depth=max_depth, min_samples_leaf=min_samples_leaf)
from sklearn.model_selection import GridSearchCV
DD = GridSearchCV(model_DD, tuned_parameters,cv=10)
(X_train, y_train)
print("Best: %f using %s" % (DD.best_score_, DD.best_params_))
y_prob = DD.predict_proba(X_test)[:,1] # This will give you positive class prediction probabilities
y_pred = (y_prob > 0.5, 1, 0) # This will threshold the probabilities to give class predictions.
(X_test, y_pred)
print('The AUC of GridSearchCV Desicion Tree is', roc_auc_score(y_test,y_pred))
#DD.grid_scores_
test_means = DD.cv_results_[ 'mean_test_score' ]
#test_stds = DD.cv_results_[ 'std_test_score' ]
#(DD.cv_results_).to_csv('DD_min_samples_leaf_maxdepth.csv')
# plot results
test_scores = (test_means).reshape(len(max_depth), len(min_samples_leaf))
for i, value in enumerate(max_depth):
(min_samples_leaf, test_scores[i], label= 'test_max_depth:' + str(value))
()
( 'min_samples_leaf' )
( 'accuray' )
()