Python机器学习：决策树1

昨天学习了KNN，今天来看到决策树，这是一种常用的机器学习算法，回归和分类都可以使用，我看着书上的示例，感觉这个和switch或者连续的if、else条件这些控制流一样：

它其实是很朴素的思想：有一个集合，其中的每个样本由若干个属性构成，那么决策树就是通过贪心策略来挑选最优属性，对于离散属性，就用不同的属性值作为节点，对于连续属性，就用属性的特定分割点来作为节点，每个样本划分到不同的子树中去，再在各个子树上通过递归对子树上的样本进行划分，知道满足一定条件为止，他有着很强的数据拟合能力，往往产生过拟合现象，所以要对决策树进行剪枝，以减小复杂度，提高泛化能力，常用的算法有：ID3、C4.5、CART算法等。

其实看到上面那个图，我就想着小小的试验一下，当然不是算法，就是简单的控制流：

#include<iostream>
using namespace std;
struct node
{
    int age;
    int color;
    int smell; 
};
int main(){
    node appletest[3];
    cout<<"input tester:"<<endl;
    for(int i=0;i<3;i++){
    cout<<"input tester "<<i<<"'s age:";
    cin>>appletest[i].age;
    cout<<"input tester "<<i<<"'s color:";
    cin>>appletest[i].color;
    cout<<"input tester "<<i<<"'s smell:";
    cin>>appletest[i].smell;
    }
    cout<<"start "<<endl;
// 1是最好的，2是一般，3是不好
    for(int i=0;i<3;i++){
        if (appletest[i].age<10)
        {
            if (appletest[i].color==1)
            {
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
                
            }
            else if(appletest[i].color==2){
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
            else{
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
        }
        else if(appletest[i].age>=10&&appletest[i].age<25){
            if (appletest[i].color==1)
            {
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
                
            }
            else if(appletest[i].color==2){
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
            else{
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
        }
       else{
            if (appletest[i].color==1)
            {
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
                
            }
            else if(appletest[i].color==2){
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
            else{
                if (appletest[i].smell==1)
                {
                    cout<<"apple "<<i<<"is great"<<endl;
                }
                else if(appletest[i].smell==2){
                    cout<<"apple "<<i<<"not really great"<<endl;
                }
                else{
                    cout<<"apple "<<i<<"is bad"<<endl;
                }
            }
        }
    }
    return 0;
}

很长，很繁琐，看来还得是用算法才行，不过，感谢前人积淀，有很多算法供我们使用，真好！

一、特征属性：

要构建一棵决策树，关键就在于每次划分子树的时候，选择哪个属性特征进行划分，信息论中，我们用熵来描述随机变量分布的不确定性，对于离散型随机变量X，假设有n个取值，分别是： $Python机器学习：决策树1$ 我们用频率来表示概率，随机变量的概率分布为：

$Python机器学习：决策树1$

那么，X的熵，就是p的熵，定义为：

$Python机器学习：决策树1$ $Python机器学习：决策树1$

那么，在给定离散型随机变量(X,Y)，假设X和Y的取值个数分别是n和m，那么其联合分布律为：

$Python机器学习：决策树1$

边缘分布率为：

$Python机器学习：决策树1$

给定X条件下Y的条件熵：

$Python机器学习：决策树1$

根据上面的定义，我们引入信息增益的概念，信息增益最早用于决策树模型的特征选择指标，就是ID3算法的核心，对于给定样本集合 $Python机器学习：决策树1$ ，设 $Python机器学习：决策树1$ , $Python机器学习：决策树1$ 是数据集中任一属性变量，其中 $Python机器学习：决策树1$ 表示该属性的可能取值，对使用属性 $Python机器学习：决策树1$ 进行数据集划分获得的信息增益进行定义：

$Python机器学习：决策树1$

其中 $Python机器学习：决策树1$ 表示属性 $Python机器学习：决策树1$ 取值为 $Python机器学习：决策树1$ 时的样本子集， $Python机器学习：决策树1$ 是对应的样本数目， $Python机器学习：决策树1$ 是 $Python机器学习：决策树1$ 中标签为 $Python机器学习：决策树1$ 的样本数目。

好了，今天就学到这里吧，早上起床有点迟了，电脑没油了，先不写了

秒客网

Python机器学习：决策树1

相关文章