python实现决策树C4.5算法(ID3基础上改进)

时间:2021-12-19 03:19:24

一、概论 
C4.5主要是在ID3的基础上改进,ID3选择(属性)树节点是选择信息增益值最大的属性作为节点。而C4.5引入了新概念“信息增益率”,C4.5是选择信息增益率最大的属性作为树节点。 
二、信息增益 
python实现决策树C4.5算法(ID3基础上改进)

以上公式是求信息增益率(ID3的知识点) 
三、信息增益率 
python实现决策树C4.5算法(ID3基础上改进) 
信息增益率是在求出信息增益值在除以python实现决策树C4.5算法(ID3基础上改进)。 
例如下面公式为求属性为“outlook”的python实现决策树C4.5算法(ID3基础上改进)值: 
python实现决策树C4.5算法(ID3基础上改进) 
四、C4.5的完整代码

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> *
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> scipy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> *
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> math <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> log
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> operator

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#计算给定数据的香浓熵:</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">calcShannonEnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span>
numEntries = len(dataSet)
labelCounts = {} <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#类别字典(类别的名称为键,该类别的个数为值)</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet:
currentLabel = featVec[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> currentLabel <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts.keys(): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#还没添加到字典里的类型</span>
labelCounts[currentLabel] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>;
labelCounts[currentLabel] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>;
shannonEnt = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出每种类型的熵</span>
prob = float(labelCounts[key])/numEntries <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#每种类型个数占所有的比值</span>
shannonEnt -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> shannonEnt; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回熵</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按照给定的特征划分数据集</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">splitDataSet</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, axis, value)</span>:</span>
retDataSet = []
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按dataSet矩阵中的第axis列的值等于value的分数据集</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> featVec[axis] == value: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#值等于value的,每一行为新的列表(去除第axis个数据)</span>
reducedFeatVec = featVec[:axis]
reducedFeatVec.extend(featVec[axis+<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:])
retDataSet.append(reducedFeatVec)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> retDataSet <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回分类后的新矩阵</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择最好的数据集划分方式</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">chooseBestFeatureToSplit</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span>
numFeatures = len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>])-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求属性的个数</span>
baseEntropy = calcShannonEnt(dataSet)
bestInfoGain = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>; bestFeature = -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> range(numFeatures): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求所有属性的信息增益</span>
featList = [example[i] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet]
uniqueVals = set(featList) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#第i列属性的取值(不同值)数集合</span>
newEntropy = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>
splitInfo = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求第i列属性每个不同值的熵*他们的概率</span>
subDataSet = splitDataSet(dataSet, i , value)
prob = len(subDataSet)/float(len(dataSet)) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该值在i列属性中的概率</span>
newEntropy += prob * calcShannonEnt(subDataSet) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求i列属性各值对于的熵求和</span>
splitInfo -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>);
infoGain = (baseEntropy - newEntropy) / splitInfo; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出第i列属性的信息增益率</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> infoGain;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(infoGain > bestInfoGain): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#保存信息增益率最大的信息增益率值以及所在的下表(列值i)</span>
bestInfoGain = infoGain
bestFeature = i
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> bestFeature

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出出现次数最多的分类名称</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">majorityCnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(classList)</span>:</span>
classCount = {}
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classList:
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classCount.keys(): classCount[vote] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
classCount[vote] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>), reverse=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> sortedClassCount[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>][<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建树</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTree</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, labels)</span>:</span>
classList = [example[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet]; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建需要创建树的训练数据的结果列表(例如最外层的列表是[N, N, Y, Y, Y, N, Y])</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> classList.count(classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == len(classList): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#如果所有的训练数据都是属于一个类别,则返回该类别</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>];
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#训练数据只给出类别数据(没给任何属性值数据),返回出现次数最多的分类名称</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> majorityCnt(classList);

bestFeat = chooseBestFeatureToSplit(dataSet); <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择信息增益最大的属性进行分(返回值是属性类型列表的下标)</span>
bestFeatLabel = labels[bestFeat] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据下表找属性名称当树的根节点</span>
myTree = {bestFeatLabel:{}} <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#以bestFeatLabel为根节点建一个空树</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">del</span>(labels[bestFeat]) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#从属性列表中删掉已经被选出来当根节点的属性</span>
featValues = [example[bestFeat] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出该属性所有训练数据的值(创建列表)</span>
uniqueVals = set(featValues) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该属性的所有值得集合(集合的元素不能重复)</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据该属性的值求树的各个分支</span>
subLabels = labels[:]
myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据各个分支递归创建树</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> myTree <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#生成的树</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#实用决策树进行分类</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">classify</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(inputTree, featLabels, testVec)</span>:</span>
firstStr = inputTree.keys()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]
secondDict = inputTree[firstStr]
featIndex = featLabels.index(firstStr)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> secondDict.keys():
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> testVec[featIndex] == key:
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> type(secondDict[key]).__name__ == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'dict'</span>:
classLabel = classify(secondDict[key], featLabels, testVec)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>: classLabel = secondDict[key]
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classLabel

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的训练数据(生成二维列表)</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTrainData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines()
labelLine = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>];
labels = labelLine.strip().split()
lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>]
dataSet = [];
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set:
data = line.split();
dataSet.append(data);
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet, labels


<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的测试数据(生成二维列表)</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTestData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines()
lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>]
dataSet = [];
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set:
data = line.strip().split();
dataSet.append(data);
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet

myDat, labels = createTrainData()
myTree = createTree(myDat,labels)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> myTree
bootList = [<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'outlook'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'temperature'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'humidity'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'windy'</span>];
testList = createTestData();
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> testData <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> testList:
dic = classify(myTree, bootList, testData)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> dic</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul>

五、C4.5与ID3的代码区别 
python实现决策树C4.5算法(ID3基础上改进)
如上图,C4.5主要在第52、53行代码与ID3不同(ID3求的是信息增益,C4.5求的是信息增益率)。 
六、训练、测试数据集样例

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">训练集:

outlook temperature humidity windy
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">---------------------------------------------------------</span>
sunny hot high <span class="hljs-constant" style="box-sizing: border-box;">false</span> N
sunny hot high <span class="hljs-constant" style="box-sizing: border-box;">true</span> N
overcast hot high <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y
rain mild high <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y
rain cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y
rain cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span> N
overcast cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span> Y

测试集
outlook temperature humidity windy
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--------------------------------------------------------- </span>
sunny mild high <span class="hljs-constant" style="box-sizing: border-box;">false</span>
sunny cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span>
rain mild <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span>
sunny mild <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span>
overcast mild high <span class="hljs-constant" style="box-sizing: border-box;">true</span>
overcast hot <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span>
rain mild high <span class="hljs-constant" style="box-sizing: border-box;">true</span> </code>