多层神经网络

本文简单整理自《模式分类》第二版的第六章，先上一张图，描述了三层神经网络的基本概念（图片看不清的请在图片上“右键》新标签页中打开”）。

多层神经网络

多层神经网络的理论基础参见《模式分类》第六章，这里没有做相关讨论。下面将简单分析一个stochasic backpropagation的matlab代码

function [test_targets, Wh, Wo, J] = Backpropagation_Stochastic(train_patterns, train_targets, test_patterns, params)

% Classify using a backpropagation network with stochastic learning algorithm
% Inputs:
% 	training_patterns   - Train patterns
%	training_targets	- Train targets
%   test_patterns       - Test  patterns
%	params              - Number of hidden units, Convergence criterion, Convergence rate
%
% Outputs
%	test_targets        - Predicted targets
%   Wh                  - Hidden unit weights
%   Wo                  - Output unit weights
%   J                   - Error throughout the training

[Nh, Theta, eta] = process_params(params);
iter	         = 1;

[Ni, M]          = size(train_patterns);
No		         = 1;

Uc               = length(unique(train_targets));
%If there are only two classes, remap to {-1,1}
if (Uc == 2)
    train_targets    = (train_targets>0)*2-1;
end

%Initialize the net: In this implementation there is only one output unit, so there
%will be a weight vector from the hidden units to the output units, and a weight matrix
%from the input units to the hidden units.
%The matrices are defined with one more weight so that there will be a bias
w0		= max(abs(std(train_patterns')'));
Wh		= rand(Nh, Ni+1).*w0*2-w0; %Hidden weights
Wo		= rand(No, Nh+1).*w0*2-w0; %Output weights

Wo    = Wo/mean(std(Wo'))*(Nh+1)^(-0.5);
Wh    = Wh/mean(std(Wh'))*(Ni+1)^(-0.5);

rate	= 10*Theta;
J(1)    = 1e3;

while (rate > Theta),
    %Randomally choose an example
    i	= randperm(M);
    m	= i(1);
    Xm = train_patterns(:,m);
    tk = train_targets(m);

    %Forward propagate the input:
    %First to the hidden units
    gh				= Wh*[Xm; 1];
    [y, dfh]		= activation(gh);
    %Now to the output unit
    go				= Wo*[y; 1];
    [zk, dfo]	= activation(go);

    %Now, evaluate delta_k at the output: delta_k = (tk-zk)*f'(net)
    delta_k		= (tk - zk).*dfo;

    %...and delta_j: delta_j = f'(net)*w_j*delta_k
    delta_j		= dfh'.*Wo(1:end-1).*delta_k;

    %w_kj <- w_kj + eta*delta_k*y_j
    Wo				= Wo + eta*delta_k*[y;1]';

    %w_ji <- w_ji + eta*delta_j*[Xm;1]
    Wh				= Wh + eta*delta_j'*[Xm;1]';

    iter 			= iter + 1;

    %Calculate total error
    J(iter)    = 0;
    for i = 1:M,
        J(iter) = J(iter) + (train_targets(i) - activation(Wo*[activation(Wh*[train_patterns(:,i); 1]); 1])).^2;
    end
    J(iter) = J(iter)/M; 
    rate  = abs(J(iter) - J(iter-1))/J(iter-1)*100;

    if (iter/100 == floor(iter/100)),
        disp(['Iteration ' num2str(iter) ': Total error is ' num2str(J(iter))])
    end

end

disp(['Backpropagation converged after ' num2str(iter) ' iterations.'])

%Classify the test patterns
test_targets = zeros(1, size(test_patterns,2));
for i = 1:size(test_patterns,2),
    test_targets(i) = activation(Wo*[activation(Wh*[test_patterns(:,i); 1]); 1]);
end

if (Uc == 2)
    test_targets  = test_targets >0;
end



function [f, df] = activation(x)

a = 1.716;
b = 2/3;
f	= a*tanh(b*x);
df	= a*b*sech(b*x).^2;

算法本身是梯度下降算法的一种扩展。迭代地按一定规则逐步更新w值使算法达到局部最优，w更新的规则是

w(m+1) = w(m) + Δw(m)

因为是三层网络，所以要对Wkj和Wji分别进行更新，这就是

    Wo				= Wo + eta*delta_k*[y;1]';
    Wh				= Wh + eta*delta_j'*[Xm;1]';

代码中的

[f, df] = activation(x)

实现上图中提到的activation函数，f为节点输出端的值，df为f(net)的差分即f'(net).

我们没对

Nh, Theta, eta

这三个参数进行特定的选择，默认依次为5, 0.1, 0.1，表示隐节点个数为5，dJ<0.1时结束循环，算法中的η更新速度为0.1，使用其的分了结果如下图，由此可知效果不是很好。

多层神经网络

用于对比的SVM效果如下，SVM的分类效果很好。

多层神经网络

以上只是最简单的神经网络的一种训练方式，要获得好的效果还需要做大量的改进。

SVM的出现比神经网络晚3~4年，SVM的出现就是为了与神经网络竞争而产生的，2006年，神经网络一族为了打败SVM，提出了深度学习（Deep Learning）算法，最近这个算法非常火，有机器学习志向的应该好好研究。

Refrences:

[1] To C. A. Rosen and C. W. Stork, patten classfication, edition 2.

秒客网

多层神经网络

相关文章