当我使用scikit-learn时,我如何发现我的树在哪些属性上分裂?

时间:2021-01-31 07:01:07

I have been exploring scikit-learn, making decision trees with both entropy and gini splitting criteria, and exploring the differences.

我一直在探索scikitt -learn,用熵和基尼的分割标准建立决策树,并探索它们之间的差异。

My question, is how can I "open the hood" and find out exactly which attributes the trees are splitting on at each level, along with their associated information values, so I can see where the two criterion make different choices?

我的问题是,我如何“打开引擎盖”,找出树在每个层次上的具体属性,以及它们相关的信息值,以便我能看到这两个标准在哪里做出不同的选择?

So far, I have explored the 9 methods outlined in the documentation. They don't appear to allow access to this information. But surely this information is accessible? I'm envisioning a list or dict that has entries for node and gain.

到目前为止,我已经研究了文档中列出的9种方法。他们似乎不允许访问这些信息。但这些信息肯定是可以获取的吗?我正在构想一个包含节点和增益条目的列表或命令。

Thanks for your help and my apologies if I've missed something completely obvious.

如果我错过了一些非常明显的东西,谢谢你的帮助和我的道歉。

2 个解决方案

#1


26  

Directly from the documentation ( http://scikit-learn.org/0.12/modules/tree.html ):

直接从文档(http://scikit-learn.org/0.12/modules/tree.html):

from StringIO import StringIO
out = StringIO()
out = tree.export_graphviz(clf, out_file=out)

There is also the tree_ attribute in your decision tree object, which allows the direct access to the whole structure.

决策树对象中还有tree_属性,它允许直接访问整个结构。

And you can simply read it

你可以直接读出来

clf.tree_.children_left #array of left children
clf.tree_.children_right #array of right children
clf.tree_.feature #array of nodes splitting feature
clf.tree_.threshold #array of nodes splitting points
clf.tree_.value #array of nodes values

for more details look at the source code of export method

有关详细信息,请参阅导出方法的源代码

In general you can use the inspect module

通常您可以使用inspect模块

from inspect import getmembers
print( getmembers( clf.tree_ ) )

to get all the object's elements

获取对象的所有元素

当我使用scikit-learn时,我如何发现我的树在哪些属性上分裂?

#2


9  

If you just want a quick look at which what is going on in the tree, try:

如果你只是想快速地看看树里发生了什么,试着:

zip(X.columns[clf.tree_.feature], clf.tree_.threshold, clf.tree_.children_left, clf.tree_.children_right)

where X is the data frame of independent variables and clf is the decision tree object. Notice that clf.tree_.children_left and clf.tree_.children_right together contain the order that the splits were made (each one of these would correspond to an arrow in the graphviz visualization).

其中X为自变量的数据框架,clf为决策树对象。注意,clf.tree_。children_left clf.tree_。children en_right一起包含了分割的顺序(每一个都对应于graphviz可视化中的一个箭头)。

#1


26  

Directly from the documentation ( http://scikit-learn.org/0.12/modules/tree.html ):

直接从文档(http://scikit-learn.org/0.12/modules/tree.html):

from StringIO import StringIO
out = StringIO()
out = tree.export_graphviz(clf, out_file=out)

There is also the tree_ attribute in your decision tree object, which allows the direct access to the whole structure.

决策树对象中还有tree_属性,它允许直接访问整个结构。

And you can simply read it

你可以直接读出来

clf.tree_.children_left #array of left children
clf.tree_.children_right #array of right children
clf.tree_.feature #array of nodes splitting feature
clf.tree_.threshold #array of nodes splitting points
clf.tree_.value #array of nodes values

for more details look at the source code of export method

有关详细信息,请参阅导出方法的源代码

In general you can use the inspect module

通常您可以使用inspect模块

from inspect import getmembers
print( getmembers( clf.tree_ ) )

to get all the object's elements

获取对象的所有元素

当我使用scikit-learn时,我如何发现我的树在哪些属性上分裂?

#2


9  

If you just want a quick look at which what is going on in the tree, try:

如果你只是想快速地看看树里发生了什么,试着:

zip(X.columns[clf.tree_.feature], clf.tree_.threshold, clf.tree_.children_left, clf.tree_.children_right)

where X is the data frame of independent variables and clf is the decision tree object. Notice that clf.tree_.children_left and clf.tree_.children_right together contain the order that the splits were made (each one of these would correspond to an arrow in the graphviz visualization).

其中X为自变量的数据框架,clf为决策树对象。注意,clf.tree_。children_left clf.tree_。children en_right一起包含了分割的顺序(每一个都对应于graphviz可视化中的一个箭头)。