clear all; clc; [S1,S2,S3,S4,S5,S6,S7,S8,classity]=textread('Pima-training-set.txt','%f %f %f %f %f %f %f %f %s');%Pima-training-set.txt D=[S1 S2 S3 S4 S5 S6 S7 S8]; AttributName={ 'preg','plas','pres','skin','insu','mass','pedi','age'}; t=classregtree(D,classity,'names',AttributName); t=prune(t,'level',1);% j减掉最后5层 view(t); yfit1=eval(t,D,1); count1=0; for i=1:length(yfit1) if(strcmp(classity(i,1),yfit1(i,1))) count1=count1+1; end end fprintf('Accuracy of training is:%d \n',count1/length(yfit1)); costsum=zeros(10,1); %10_1,9_2... for k=1:10 cost=test(t,'cross',D,classity); costsum=costsum+cost; end costsum=costsum/10; i=1:10; plot(i,costsum,'-o');xlabel('交叉次数');ylabel('错误率'); title('Training-决策树k 倍交叉错误率曲线'); [T1,T2,T3,T4,T5,T6,T7,T8,kind]=textread('Pima-prediction-set.txt','%f %f %f %f %f %f %f %f %s'); D2=[T1 T2 T3 T4 T5 T6 T7 T8]; yfit=eval(t,D2,1); count=0; for i=1:length(yfit) if(strcmp(kind(i,1),yfit(i,1))) count=count+1; end end fprintf('Accuracy of prediction is:%d \n',count/length(yfit)); 本次实验的采用的数据集是糖尿病的一个数据集。
根据图中决策树的尺寸和错误率的分布函数对决策树进行适当裁剪(prune);
2)已知决策树计算测试数据类:
yfit = treeval(t,X)
[yfit,node,cname] = treeval(...)%cname获得测试数据类;
3)裁剪决策树:
t2 = treeprune(t1,'level',level)%裁剪t1树的最后level级
t2 = treeprune(t1,'nodes',nodes)