反馈:Apache Spark决策树的可视化

时间:2021-12-01 23:09:24

One of the issues I've run into with Apache Spark, is visualizing Decision Trees.

我在Apache Spark中遇到的一个问题是可视化决策树。

I can produce a tree using DecisionTree.trainClassifier. and I can get some rudimentary output using :



But ideally, the current output:


    If (feature 0 <= -35.0)
  If (feature 24 <= 176.0)
    Predict: 2.1
  If (feature 24 = 176.0)
    Predict: 4.2
  Else (feature 24 > 176.0)
    Predict: 6.3
Else (feature 0 > -35.0)
  If (feature 24 <= 11.0)
    Predict: 4.5
  Else (feature 24 > 11.0)
    Predict: 10.2

could be output as JSON, or something parseable, so that we could layer in a D3 Visualization library. Using the example above...

可以输出为JSON或可解析的东西,以便我们可以在D3 Visualization库中进行分层。使用上面的例子......

"node": [
        "rule":"feature 0 <= -35.0",
                  "rule":"feature 24 <= 176.0",
                      "rule":"feature 20 < 116.0",
                      "predict":  2.1
                      "rule":"feature 20 = 116.0",
                      "predict": 4.2
                      "rule":"feature 20 > 116.0",
                      "predict": 6.3
                "rule":"feature 0 > -35.0",
                      "rule":"feature 3 <= 11.0",
                      "predict": 4.5
                      "rule":"feature 3 > 11.0",
                      "predict": 10.2



1 个解决方案



I came across this project Decision-Tree-Visualization-Spark For Visualizing Decision Tree model

我遇到了这个项目Decision-Tree-Visualization-Spark For Visualizing Decision Tree model

It has two steps


  • Parse Spark Decision Tree output to a JSON format.
  • 将Spark Spark Decision Tree输出解析为JSON格式。

  • Use the JSON file as an input to a D3.js visualization.
  • 使用JSON文件作为D3.js可视化的输入。

For the parser check Dt.py


The input to the function def tree_json(tree) is your models toDebugString()

函数def tree_json(tree)的输入是你的模型toDebugString()



I came across this project Decision-Tree-Visualization-Spark For Visualizing Decision Tree model

我遇到了这个项目Decision-Tree-Visualization-Spark For Visualizing Decision Tree model

It has two steps


  • Parse Spark Decision Tree output to a JSON format.
  • 将Spark Spark Decision Tree输出解析为JSON格式。

  • Use the JSON file as an input to a D3.js visualization.
  • 使用JSON文件作为D3.js可视化的输入。

For the parser check Dt.py


The input to the function def tree_json(tree) is your models toDebugString()

函数def tree_json(tree)的输入是你的模型toDebugString()