本文简单整理自《模式分类》第二版的第六章,先上一张图,描述了三层神经网络的基本概念(图片看不清的请在图片上“右键》新标签页中打开”)。
多层神经网络的理论基础参见《模式分类》第六章,这里没有做相关讨论。下面将简单分析一个stochasic backpropagation的matlab代码
function [test_targets, Wh, Wo, J] = Backpropagation_Stochastic(train_patterns, train_targets, test_patterns, params)
% Classify using a backpropagation network with stochastic learning algorithm
% Inputs:
% training_patterns - Train patterns
% training_targets - Train targets
% test_patterns - Test patterns
% params - Number of hidden units, Convergence criterion, Convergence rate
%
% Outputs
% test_targets - Predicted targets
% Wh - Hidden unit weights
% Wo - Output unit weights
% J - Error throughout the training
[Nh, Theta, eta] = process_params(params);
iter = 1;
[Ni, M] = size(train_patterns);
No = 1;
Uc = length(unique(train_targets));
%If there are only two classes, remap to {-1,1}
if (Uc == 2)
train_targets = (train_targets>0)*2-1;
end
%Initialize the net: In this implementation there is only one output unit, so there
%will be a weight vector from the hidden units to the output units, and a weight matrix
%from the input units to the hidden units.
%The matrices are defined with one more weight so that there will be a bias
w0 = max(abs(std(train_patterns')'));
Wh = rand(Nh, Ni+1).*w0*2-w0; %Hidden weights
Wo = rand(No, Nh+1).*w0*2-w0; %Output weights
Wo = Wo/mean(std(Wo'))*(Nh+1)^(-0.5);
Wh = Wh/mean(std(Wh'))*(Ni+1)^(-0.5);
rate = 10*Theta;
J(1) = 1e3;
while (rate > Theta),
%Randomally choose an example
i = randperm(M);
m = i(1);
Xm = train_patterns(:,m);
tk = train_targets(m);
%Forward propagate the input:
%First to the hidden units
gh = Wh*[Xm; 1];
[y, dfh] = activation(gh);
%Now to the output unit
go = Wo*[y; 1];
[zk, dfo] = activation(go);
%Now, evaluate delta_k at the output: delta_k = (tk-zk)*f'(net)
delta_k = (tk - zk).*dfo;
%...and delta_j: delta_j = f'(net)*w_j*delta_k
delta_j = dfh'.*Wo(1:end-1).*delta_k;
%w_kj <- w_kj + eta*delta_k*y_j
Wo = Wo + eta*delta_k*[y;1]';
%w_ji <- w_ji + eta*delta_j*[Xm;1]
Wh = Wh + eta*delta_j'*[Xm;1]';
iter = iter + 1;
%Calculate total error
J(iter) = 0;
for i = 1:M,
J(iter) = J(iter) + (train_targets(i) - activation(Wo*[activation(Wh*[train_patterns(:,i); 1]); 1])).^2;
end
J(iter) = J(iter)/M;
rate = abs(J(iter) - J(iter-1))/J(iter-1)*100;
if (iter/100 == floor(iter/100)),
disp(['Iteration ' num2str(iter) ': Total error is ' num2str(J(iter))])
end
end
disp(['Backpropagation converged after ' num2str(iter) ' iterations.'])
%Classify the test patterns
test_targets = zeros(1, size(test_patterns,2));
for i = 1:size(test_patterns,2),
test_targets(i) = activation(Wo*[activation(Wh*[test_patterns(:,i); 1]); 1]);
end
if (Uc == 2)
test_targets = test_targets >0;
end
function [f, df] = activation(x)
a = 1.716;
b = 2/3;
f = a*tanh(b*x);
df = a*b*sech(b*x).^2;
算法本身是梯度下降算法的一种扩展。迭代地按一定规则逐步更新w值使算法达到局部最优,w更新的规则是
w(m+1) = w(m) + Δw(m)
因为是三层网络,所以要对Wkj和Wji分别进行更新,这就是
Wo = Wo + eta*delta_k*[y;1]';代码中的
Wh = Wh + eta*delta_j'*[Xm;1]';
[f, df] = activation(x)
实现上图中提到的activation函数,f为节点输出端的值,df为f(net)的差分即f'(net).
我们没对
Nh, Theta, eta
这三个参数进行特定的选择,默认依次为5, 0.1, 0.1,表示隐节点个数为5,dJ<0.1时结束循环,算法中的η更新速度为0.1,使用其的分了结果如下图,由此可知效果不是很好。
用于对比的SVM效果如下,SVM的分类效果很好。
以上只是最简单的神经网络的一种训练方式,要获得好的效果还需要做大量的改进。
SVM的出现比神经网络晚3~4年,SVM的出现就是为了与神经网络竞争而产生的,2006年,神经网络一族为了打败SVM,提出了深度学习(Deep Learning)算法,最近这个算法非常火,有机器学习志向的应该好好研究。
Refrences:
[1] To C. A. Rosen and C. W. Stork, patten classfication, edition 2.