Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

时间:2022-03-16 11:01:45

理论知识Optimization: Stochastic Gradient DescentConvolutional Neural Network

CNN卷积神经网络推导和实现Deep learning:五十一(CNN的反向求导及练习)

Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

ufldl学习笔记与编程作业:Convolutional Neural Network(卷积神经网络)

【UFLDL】Exercise: Convolutional Neural Network

卷积神经网络全面解析

CNN卷积神经网络学习笔记3:权值更新公式推导

深度学习(卷积神经网络)一些问题总结

交叉熵代价函数

基础知识

下面是Convolutional Neural Network的翻译

概述

CNN是由一个或多个卷积层(其后常跟一个下采样层)和一个或多个全连接层组成的多层神经网络。CNN的输入是2维图像(或者其他2维输入,如语音信号)。它通过局部连接和权值共享,再通过池化可得到平移不变特征。CNN的另一个优点就是易于训练,相比同样隐含层单元的全连接网络,它需要训练的参数个数要少得多。本文将介绍CNN的结构和后向传播算法,该算法用于计算对模型参数的梯度。卷积和池化可看前面相应的教程。

结构

CNN由一些卷积层和下采样层交替组成,也可视需要在最后加全连接层。一个卷积层的输入是m*m*r的图像,其中m是图像的高度和宽度,r是通道数,如RGB图像的r=3。卷积层有k个滤波器(或核函数),大小为n*n*q,其中n小于图像的维数,q小于等于r且每个滤波器的q可能不一样。滤波器的大小产生局部连接结构,该结构是由每个滤波器与输入图像卷积得到k个特征图,每个特征图大小为m-n+1。然后,每个特征图通过p*p连续区域的平均或最大池化的方式来子采样,其中p一般取2(当输入为小图像时,如MNIST)和5(当输入是大图像时)之间。在子采样层的前后均需对每个特征图加一个附加偏置项和sigmoid非线性变化。下图显示了一个由卷积层和子采样层组成的CNN。其中,相同颜色的单元共享权值。

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

图1.卷积神经网络的带池化的第一层。相同颜色的神经元共享权值,不同颜色神经元表示不同的特征图。

在卷积层的最后可能会有一些全连接层。该层是与一个标准多层神经网络中的层是一样的。

后向传播

δ(l+1)中是l+1层的残差,代价函数为J(W,b;x,y),其中(W,b)是参数,(x,y)分别是训练数据和标签。则l层的残差和梯度分别为:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

如果l层是一个卷积层和子采样层,则其残差为:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

其中,k是滤波器个数,Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)是激活函数的偏层数。通过计算传入池化层每个神经元的残差,子采样必须通过池化层传播残差。

最后,为了计算特征图的梯度,利用边缘处理卷积运算得到残差矩阵,再翻转残差矩阵。在卷积层翻转滤波器和最后翻转残差矩阵效果是一样的。

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

其中,

a(L)是L层的输入,a(1)是输入图像。Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)是一个合理的卷积运算,该卷积是第l层的第i个输入与对第k个滤波器的残差相卷。

 练习

练习内容:UFLDL:Exercise: Convolutional Neural Network。利用卷积神经网络实现数字分类。该神经网络有2层,第一层是卷积和子采样层,第二层是全连接层。即:本节的网络结构为:一个卷积层+一个pooling层+一个softmax层。本节练习中,输入图像为28*28,卷积核大小为9*9,卷积层特征个数(即:卷积核个数)为20个,池化连续区域为2*2,输出为类别为10类。

参考:【UFLDL】Exercise: Convolutional Neural Network讲解非常详细

注意:本练习中的卷积核,并不是由自编码器学习的特征,而是随机随机始化所得

一些matlab函数

1.addpath

语法:

添加路径:addpath('当前路径中的文件夹名1','当前路径下的文件夹名2','当前路径中的文件夹名n');【即可一次性添加多个路径】

addpath('./上级目录中的文件夹1','./上级目录中的文件夹2','./上级目录中的文件夹n');

addpath('../更上一级目录中的文件夹1','../更上一级目录中的文件夹2','../更上一级目录中的文件夹n');

2.conv2的计算过程

3.sub2ind函数

ind2sub函数可以用来把矩阵元素的index转换成对应的下标(determines the equivalent subscript values corresponding to a single index into an array)

例如: 一个4*5的矩阵A,第2行第2个元素的index的6(matlab中matrix是按列顺序排列),可以用ind2sub函数来计算这个元素的下标 [I,J] = ind2sub(size(A),6)

matlab中sub2ind函数

4.sparse和full函数

Deep Learning 6_深度学习UFLDL教程:Softmax Regression_Exercise(斯坦福大学深度学习教程)

下面这句话经常可见:

groundTruth = full(sparse(labels, 1:numImages, 1));

它得到的结果是这样一个矩阵:在第i行第j列元素值为1,其他元素为0,其中,i是向量labels内的第k个元素值,j是向量1:numImages内的第k个元素值。

故,在cnnCost.m中计算cost的代码为:

logProbs = log(probs);
labelIndex=sub2ind(size(logProbs), labels', 1:size(logProbs,2));
%找出矩阵logProbs的线性索引,行由labels指定,列由1:size(logProbs,)指定,生成线性索引返回给labelIndex
values = logProbs(labelIndex);
cost = -sum(values);
weightDecayCost = (weightDecay/) * (sum(Wd(:) .^ ) + sum(Wc(:) .^ ));
cost = cost / numImages+weightDecayCost;

可把它替换为:

groundTruth = full(sparse(labels, :numImages, ));
cost = -./numImages*groundTruth(:)'*log(probs(:))+(weightDecay/2.)*(sum(Wd(:).^2)+sum(Wc(:).^2)); %加入一个惩罚项

变得效率更快,代码更简洁。

练习步骤

STEP 1:实现CNN代价函数和梯度计算

STEP 1a: Forward Propagation

STEP 1b: Calculate Cost

代价函数:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

其中,J(W,b)为:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

STEP 1c: Backpropagation

softmax 层误差:softmaxError,见Deep learning:五十一(CNN的反向求导及练习)

pool 层误差:poolError,这一层首先根据公式δ= Wδl+1 * f'(zl)(pool层没有f'(zl)这一项)计算该层的error。即poolError为:δ= Wδl+1 

展开poolError为unpoolError,

convolution层误差:convError,还是根据公式δ= Wδl+1 * f'(zl)来计算

STEP 1d: Gradient Calculation

Wd和bd的梯度计算公式:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

Step 2: Gradient Check

非常重要的一步

Step 3: Learn Parameters

在minFuncSGD中加上冲量的影响即可。

Step 4: Test

结果为:

Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

代码

cnnTrain.m

%% Convolution Neural Network Exercise

%  Instructions
% ------------
%
% This file contains code that helps you get started in building a single.
% layer convolutional nerual network. In this exercise, you will only
% need to modify cnnCost.m and cnnminFuncSGD.m. You will not need to
% modify this file. %%======================================================================
%% STEP : Initialize Parameters and Load Data
% Here we initialize some parameters used for the exercise. % Configuration
imageDim = ;
numClasses = ; % Number of classes (MNIST images fall into classes)
filterDim = ; % Filter size for conv layer,*
numFilters = ; % Number of filters for conv layer
poolDim = ; % Pooling dimension, (should divide imageDim-filterDim+) % Load MNIST Train
addpath ../common/;
images = loadMNISTImages('../common/train-images-idx3-ubyte');
images = reshape(images,imageDim,imageDim,[]);
labels = loadMNISTLabels('../common/train-labels-idx1-ubyte');
labels(labels==) = ; % Remap to % Initialize Parameters,theta=(,),=**++**+
theta = cnnInitParams(imageDim,filterDim,numFilters,poolDim,numClasses); %%======================================================================
%% STEP : Implement convNet Objective
% Implement the function cnnCost.m. %%======================================================================
%% STEP : Gradient Check
% Use the file computeNumericalGradient.m to check the gradient
% calculation for your cnnCost.m function. You may need to add the
% appropriate path or copy the file to this directory. % DEBUG=false; % set this to true to check gradient
DEBUG = true;
if DEBUG
% To speed up gradient checking, we will use a reduced network and
% a debugging data set
db_numFilters = ;
db_filterDim = ;
db_poolDim = ;
db_images = images(:,:,:);
db_labels = labels(:);
db_theta = cnnInitParams(imageDim,db_filterDim,db_numFilters,...
db_poolDim,numClasses); [cost grad] = cnnCost(db_theta,db_images,db_labels,numClasses,...
db_filterDim,db_numFilters,db_poolDim); % Check gradients
numGrad = computeNumericalGradient( @(x) cnnCost(x,db_images,...
db_labels,numClasses,db_filterDim,...
db_numFilters,db_poolDim), db_theta); % Use this to visually compare the gradients side by side
disp([numGrad grad]); diff = norm(numGrad-grad)/norm(numGrad+grad);
% Should be small. In our implementation, these values are usually
% less than 1e-.
disp(diff); assert(diff < 1e-,...
'Difference too large. Check your gradient computation again'); end; %%======================================================================
%% STEP : Learn Parameters
% Implement minFuncSGD.m, then train the model. % 因为是采用的mini-batch梯度下降法,所以总共对样本的循环次数次数比标准梯度下降法要少
% 很多,因为每次循环中权值已经迭代多次了
options.epochs = ;
options.minibatch = ;
options.alpha = 1e-;
options.momentum = .; opttheta = minFuncSGD(@(x,y,z) cnnCost(x,y,z,numClasses,filterDim,...
numFilters,poolDim),theta,images,labels,options);
save('theta.mat','opttheta'); %%======================================================================
%% STEP : Test
% Test the performance of the trained model using the MNIST test set. Your
% accuracy should be above % after epochs of training testImages = loadMNISTImages('../common/t10k-images-idx3-ubyte');
testImages = reshape(testImages,imageDim,imageDim,[]);
testLabels = loadMNISTLabels('../common/t10k-labels-idx1-ubyte');
testLabels(testLabels==) = ; % Remap to [~,cost,preds]=cnnCost(opttheta,testImages,testLabels,numClasses,...
filterDim,numFilters,poolDim,true); acc = sum(preds==testLabels)/length(preds); % Accuracy should be around 97.4% after epochs
fprintf('Accuracy is %f\n',acc);

cnnCost.m

function [cost, grad, preds] = cnnCost(theta,images,labels,numClasses,...
filterDim,numFilters,poolDim,pred)
% Calcualte cost and gradient for a single layer convolutional
% neural network followed by a softmax layer with cross entropy
% objective.
%
% Parameters:
% theta - unrolled parameter vector
% images - stores images in imageDim x imageDim x numImges
% array
% numClasses - number of classes to predict
% filterDim - dimension of convolutional filter
% numFilters - number of convolutional filters
% poolDim - dimension of pooling area
% pred - boolean only forward propagate and return
% predictions
%
%
% Returns:
% cost - cross entropy cost
% grad - gradient with respect to theta (if pred==False)
% preds - list of predictions for each example (if pred==True) if ~exist('pred','var')
pred = false;
end; weightDecay = 0.0001; imageDim = size(images,); % height/width of image
numImages = size(images,); % number of images %% Reshape parameters and setup gradient matrices % Wc is filterDim x filterDim x numFilters parameter matrix %convolution参数
% bc is the corresponding bias % Wd is numClasses x hiddenSize parameter matrix where hiddenSize
% is the number of output units from the convolutional layer %这个convolutional layer应该是包含了卷积层和pool层
% bd is corresponding bias
[Wc, Wd, bc, bd] = cnnParamsToStack(theta,imageDim,filterDim,numFilters,...
poolDim,numClasses); % Same sizes as Wc,Wd,bc,bd. Used to hold gradient w.r.t above params.
Wc_grad = zeros(size(Wc));
Wd_grad = zeros(size(Wd));
bc_grad = zeros(size(bc));
bd_grad = zeros(size(bd)); %%======================================================================
%% STEP 1a: Forward Propagation
% In this step you will forward propagate the input through the
% convolutional and subsampling (mean pooling) layers. You will then use
% the responses from the convolution and pooling layer as the input to a
% standard softmax layer. %% Convolutional Layer
% For each image and each filter, convolve the image with the filter, add
% the bias and apply the sigmoid nonlinearity. Then subsample the
% convolved activations with mean pooling. Store the results of the
% convolution in activations and the results of the pooling in
% activationsPooled. You will need to save the convolved activations for
% backpropagation.
convDim = imageDim-filterDim+; % dimension of convolved output
outputDim = (convDim)/poolDim; % dimension of subsampled output % convDim x convDim x numFilters x numImages tensor for storing activations
activations = zeros(convDim,convDim,numFilters,numImages); % outputDim x outputDim x numFilters x numImages tensor for storing
% subsampled activations
activationsPooled = zeros(outputDim,outputDim,numFilters,numImages); %%% YOUR CODE HERE %%% %调用之前写的两个函数
activations = cnnConvolve(filterDim, numFilters, images, Wc, bc);
activationsPooled = cnnPool(poolDim, activations); % Reshape activations into -d matrix, hiddenSize x numImages,
% for Softmax layer
activationsPooled = reshape(activationsPooled,[],numImages);%就变成了传统的softmax模式 %% Softmax Layer
% Forward propagate the pooled activations calculated above into a
% standard softmax layer. For your convenience we have reshaped
% activationPooled into a hiddenSize x numImages matrix. Store the
% results in probs. % numClasses x numImages for storing probability that each image belongs to
% each class.
probs = zeros(numClasses,numImages); %%% YOUR CODE HERE %%%
z = Wd*activationsPooled;
z = bsxfun(@plus,z,bd);
%z = Wd * activationsPooled+repmat(bd,[,numImages]);
z = bsxfun(@minus,z,max(z,[],));%减去最大值,减少一个维度,防止溢出
z = exp(z);
probs = bsxfun(@rdivide,z,sum(z,));
preds = probs;
%%======================================================================
%% STEP 1b: Calculate Cost
% In this step you will use the labels given as input and the probs
% calculate above to evaluate the cross entropy objective. Store your
% results in cost. cost = ; % save objective into cost %%% YOUR CODE HERE %%%
logProbs = log(probs);
labelIndex=sub2ind(size(logProbs), labels', 1:size(logProbs,2));
%找出矩阵logProbs的线性索引,行由labels指定,列由1:size(logProbs,)指定,生成线性索引返回给labelIndex
values = logProbs(labelIndex);
cost = -sum(values);
weightDecayCost = (weightDecay/) * (sum(Wd(:) .^ ) + sum(Wc(:) .^ ));
cost = cost / numImages+weightDecayCost;
%Make sure to scale your gradients by the inverse size of the training set
%if you included this scale in the cost calculation otherwise your code will not pass the numerical gradient check. % Makes predictions given probs and returns without backproagating errors.
if pred
[~,preds] = max(probs,[],);
preds = preds';
grad = ;
return;
end; %%======================================================================
%% STEP 1c: Backpropagation
% Backpropagate errors through the softmax and convolutional/subsampling
% layers. Store the errors for the next step to calculate the gradient.
% Backpropagating the error w.r.t the softmax layer is as usual. To
% backpropagate through the pooling layer, you will need to upsample the
% error with respect to the pooling layer for each filter and each image.
% Use the kron function and a matrix of ones to do this upsampling
% quickly. %%% YOUR CODE HERE %%%
%softmax残差
targetMatrix = zeros(size(probs));
targetMatrix(labelIndex) = ;
softmaxError = probs-targetMatrix; %pool层残差
poolError = Wd'*softmaxError;
poolError = reshape(poolError, outputDim, outputDim, numFilters, numImages); unpoolError = zeros(convDim, convDim, numFilters, numImages);
unpoolingFilter = ones(poolDim);
poolArea = poolDim*poolDim;
%展开poolError为unpoolError
for imageNum = :numImages
for filterNum = :numFilters
e = poolError(:, :, filterNum, imageNum);
unpoolError(:, :, filterNum, imageNum) = kron(e, unpoolingFilter)./poolArea;
end
end convError = unpoolError .* activations .* ( - activations); %%======================================================================
%% STEP 1d: Gradient Calculation
% After backpropagating the errors above, we can use them to calculate the
% gradient with respect to all the parameters. The gradient w.r.t the
% softmax layer is calculated as usual. To calculate the gradient w.r.t.
% a filter in the convolutional layer, convolve the backpropagated error
% for that filter with each image and aggregate over images. %%% YOUR CODE HERE %%%
%softmax梯度
Wd_grad = (/numImages).*softmaxError * activationsPooled'+weightDecay * Wd; % l+1层残差 * l层激活值
bd_grad = (/numImages).*sum(softmaxError, ); % Gradient of the convolutional layer
bc_grad = zeros(size(bc));
Wc_grad = zeros(size(Wc)); %计算bc_grad
for filterNum = : numFilters
e = convError(:, :, filterNum, :);
bc_grad(filterNum) = (/numImages).*sum(e(:));
end %翻转convError
for filterNum = : numFilters
for imageNum = : numImages
e = convError(:, :, filterNum, imageNum);
convError(:, :, filterNum, imageNum) = rot90(e, );
end
end for filterNum = : numFilters
Wc_gradFilter = zeros(size(Wc_grad, ), size(Wc_grad, ));
for imageNum = : numImages
Wc_gradFilter = Wc_gradFilter + conv2(images(:, :, imageNum), convError(:, :, filterNum, imageNum), 'valid');
end
Wc_grad(:, :, filterNum) = (/numImages).*Wc_gradFilter;
end
Wc_grad = Wc_grad + weightDecay * Wc; %% Unroll gradient into grad vector for minFunc
grad = [Wc_grad(:) ; Wd_grad(:) ; bc_grad(:) ; bd_grad(:)]; end

cnnConvolve.m

function convolvedFeatures = cnnConvolve(filterDim, numFilters, images, W, b)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
% filterDim - filter (feature) dimension
% numFilters - number of feature maps
% images - large images to convolve with, matrix in the form
% images(r, c, image number)
% W, b - W, b for features from the sparse autoencoder
% W is of shape (filterDim,filterDim,numFilters)
% b is of shape (numFilters,)
%
% Returns:
% convolvedFeatures - matrix of convolved features in the form
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum) numImages = size(images, );
imageDim = size(images, );
convDim = imageDim - filterDim + ; convolvedFeatures = zeros(convDim, convDim, numFilters, numImages); % Instructions:
% Convolve every filter with every image here to produce the
% (imageDim - filterDim + ) x (imageDim - filterDim + ) x numFeatures x numImages
% matrix convolvedFeatures, such that
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum) is the
% value of the convolved featureNum feature for the imageNum image over
% the region (imageRow, imageCol) to (imageRow + filterDim - , imageCol + filterDim - )
%
% Expected running times:
% Convolving with images should take less than seconds
% Convolving with images should take around minutes
% (So to save time when testing, you should convolve with less images, as
% described earlier) for imageNum = :numImages
for filterNum = :numFilters % convolution of image with feature matrix
convolvedImage = zeros(convDim, convDim); % Obtain the feature (filterDim x filterDim) needed during the convolution %%% YOUR CODE HERE %%%
filter = squeeze(W(:,:,filterNum));
% Flip the feature matrix because of the definition of convolution, as explained later
filter = rot90(squeeze(filter),); % Obtain the image
im = squeeze(images(:, :, imageNum)); % Convolve "filter" with "im", adding the result to convolvedImage
% be sure to do a 'valid' convolution %%% YOUR CODE HERE %%%
convolvedImage = conv2(im,filter,'valid');
% Add the bias unit
% Then, apply the sigmoid function to get the hidden activation %%% YOUR CODE HERE %%%
convolvedImage = bsxfun(@plus,convolvedImage,b(filterNum));
convolvedImage = ./ (+exp(-convolvedImage)); convolvedFeatures(:, :, filterNum, imageNum) = convolvedImage;
end
end end

cnnPool.m

function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
% poolDim - dimension of pooling region
% convolvedFeatures - convolved features to pool (as given by cnnConvolve)
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum)
%
% Returns:
% pooledFeatures - matrix of pooled features in the form
% pooledFeatures(poolRow, poolCol, featureNum, imageNum)
% numImages = size(convolvedFeatures, );
numFilters = size(convolvedFeatures, );
convolvedDim = size(convolvedFeatures, ); pooledFeatures = zeros(convolvedDim / poolDim, ...
convolvedDim / poolDim, numFilters, numImages); % Instructions:
% Now pool the convolved features in regions of poolDim x poolDim,
% to obtain the
% (convolvedDim/poolDim) x (convolvedDim/poolDim) x numFeatures x numImages
% matrix pooledFeatures, such that
% pooledFeatures(poolRow, poolCol, featureNum, imageNum) is the
% value of the featureNum feature for the imageNum image pooled over the
% corresponding (poolRow, poolCol) pooling region.
%
% Use mean pooling here. %%% YOUR CODE HERE %%%
for imageNum = :numImages
for featureNum = :numFilters
featuremap = squeeze(convolvedFeatures(:,:,featureNum,imageNum));
pooledFeaturemap = conv2(featuremap,ones(poolDim)/(poolDim^),'valid');
pooledFeatures(:,:,featureNum,imageNum) = pooledFeaturemap(:poolDim:end,:poolDim:end);
end
end
end

computeNumericalGradient.m

function numgrad = computeNumericalGradient(J, theta)
% numgrad = computeNumericalGradient(J, theta)
% theta: a vector of parameters
% J: a function that outputs a real-number. Calling y = J(theta) will return the
% function value at theta. % Initialize numgrad with zeros
numgrad = zeros(size(theta)); %% ---------- YOUR CODE HERE --------------------------------------
% Instructions:
% Implement numerical gradient checking, and return the result in numgrad.
% (See Section 2.3 of the lecture notes.)
% You should write code so that numgrad(i) is (the numerical approximation to) the
% partial derivative of J with respect to the i-th input argument, evaluated at theta.
% I.e., numgrad(i) should be the (approximately) the partial derivative of J with
% respect to theta(i).
%
% Hint: You will probably want to compute the elements of numgrad one at a time. epsilon = 1e-; for i =:length(numgrad)
oldT = theta(i);
theta(i)=oldT+epsilon;
pos = J(theta);
theta(i)=oldT-epsilon;
neg = J(theta);
numgrad(i) = (pos-neg)/(*epsilon);
theta(i)=oldT;
if mod(i,)==
fprintf('Done with %d\n',i);
end;
end; %% ---------------------------------------------------------------
end

minFuncSGD.m

function [opttheta] = minFuncSGD(funObj,theta,data,labels,...
options)
% Runs stochastic gradient descent with momentum to optimize the
% parameters for the given objective.
%
% Parameters:
% funObj - function handle which accepts as input theta,
% data, labels and returns cost and gradient w.r.t
% to theta.
% theta - unrolled parameter vector
% data - stores data in m x n x numExamples tensor
% labels - corresponding labels in numExamples x vector
% options - struct to store specific options for optimization
%
% Returns:
% opttheta - optimized parameter vector
%
% Options (* required)
% epochs* - number of epochs through data
% alpha* - initial learning rate
% minibatch* - size of minibatch
% momentum - momentum constant, defualts to 0.9 %%======================================================================
%% Setup
assert(all(isfield(options,{'epochs','alpha','minibatch'})),...
'Some options not defined');
if ~isfield(options,'momentum')
options.momentum = 0.9;
end;
epochs = options.epochs;
alpha = options.alpha;
minibatch = options.minibatch;
m = length(labels); % training set size
% Setup for momentum
mom = 0.5;
momIncrease = ;
velocity = zeros(size(theta)); %%======================================================================
%% SGD loop
it = ;
for e = :epochs % randomly permute indices of data for quick minibatch sampling
rp = randperm(m); for s=:minibatch:(m-minibatch+)
it = it + ; % increase momentum after momIncrease iterations
if it == momIncrease
mom = options.momentum;
end; % get next randomly selected minibatch
mb_data = data(:,:,rp(s:s+minibatch-));
mb_labels = labels(rp(s:s+minibatch-)); % evaluate the objective function on the next minibatch
[cost grad] = funObj(theta,mb_data,mb_labels); % Instructions: Add in the weighted velocity vector to the
% gradient evaluated above scaled by the learning rate.
% Then update the current weights theta according to the
% sgd update rule %%% YOUR CODE HERE %%%
velocity = mom*velocity+alpha*grad; % 见ufldl教程Optimization: Stochastic Gradient Descent
theta = theta-velocity; fprintf('Epoch %d: Cost on iteration %d is %f\n',e,it,cost);
end; % aneal learning rate by factor of two after each epoch
alpha = alpha/2.0; end; opttheta = theta; end

cnnInitParams.m

function theta = cnnInitParams(imageDim,filterDim,numFilters,...
poolDim,numClasses)
% Initialize parameters for a single layer convolutional neural
% network followed by a softmax layer.
%
% Parameters:
% imageDim - height/width of image
% filterDim - dimension of convolutional filter
% numFilters - number of convolutional filters
% poolDim - dimension of pooling area
% numClasses - number of classes to predict
%
%
% Returns:
% theta - unrolled parameter vector with initialized weights %% Initialize parameters randomly based on layer sizes.
assert(filterDim < imageDim,'filterDim must be less that imageDim'); Wc = 1e-*randn(filterDim,filterDim,numFilters); outDim = imageDim - filterDim + ; % dimension of convolved image % assume outDim is multiple of poolDim
assert(mod(outDim,poolDim)==,...
'poolDim must divide imageDim - filterDim + 1'); outDim = outDim/poolDim;
hiddenSize = outDim^*numFilters; % we'll choose weights uniformly from the interval [-r, r]
r = sqrt() / sqrt(numClasses+hiddenSize+);
Wd = rand(numClasses, hiddenSize) * * r - r; bc = zeros(numFilters, );
bd = zeros(numClasses, ); % Convert weights and bias gradients to the vector form.
% This step will "unroll" (flatten and concatenate together) all
% your parameters into a vector, which can then be used with minFunc.
theta = [Wc(:) ; Wd(:) ; bc(:) ; bd(:)]; end

cnnParamsToStack.m

function [Wc, Wd, bc, bd] = cnnParamsToStack(theta,imageDim,filterDim,...
numFilters,poolDim,numClasses)
% Converts unrolled parameters for a single layer convolutional neural
% network followed by a softmax layer into structured weight
% tensors/matrices and corresponding biases
%
% Parameters:
% theta - unrolled parameter vectore
% imageDim - height/width of image
% filterDim - dimension of convolutional filter
% numFilters - number of convolutional filters
% poolDim - dimension of pooling area
% numClasses - number of classes to predict
%
%
% Returns:
% Wc - filterDim x filterDim x numFilters parameter matrix
% Wd - numClasses x hiddenSize parameter matrix, hiddenSize is
% calculated as numFilters*((imageDim-filterDim+)/poolDim)^
% bc - bias for convolution layer of size numFilters x
% bd - bias for dense layer of size hiddenSize x outDim = (imageDim - filterDim + )/poolDim;
hiddenSize = outDim^*numFilters; %% Reshape theta
indS = ;
indE = filterDim^*numFilters;
Wc = reshape(theta(indS:indE),filterDim,filterDim,numFilters);
indS = indE+;
indE = indE+hiddenSize*numClasses;
Wd = reshape(theta(indS:indE),numClasses,hiddenSize);
indS = indE+;
indE = indE+numFilters;
bc = theta(indS:indE);
bd = theta(indE+:end); end

cnnExercise.m

%% Convolution and Pooling Exercise

%  Instructions
% ------------
%
% This file contains code that helps you get started on the
% convolution and pooling exercise. In this exercise, you will only
% need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
% this file. %%======================================================================
%% STEP : Initialization and Load Data
% Here we initialize some parameters used for the exercise. imageDim = ; % image dimension filterDim = ; % filter dimension
numFilters = ; % number of feature maps numImages = ; % number of images poolDim = ; % dimension of pooling region % Here we load MNIST training images
addpath ../common/;
images = loadMNISTImages('../common/train-images-idx3-ubyte');
images = reshape(images,imageDim,imageDim,numImages); W = randn(filterDim,filterDim,numFilters);
b = rand(numFilters); %%======================================================================
%% STEP : Implement and test convolution
% In this step, you will implement the convolution and test it on
% on a small part of the data set to ensure that you have implemented
% this step correctly. %% STEP 1a: Implement convolution
% Implement convolution in the function cnnConvolve in cnnConvolve.m %% Use only the first images for testing
convImages = images(:, :, :); % NOTE: Implement cnnConvolve in cnnConvolve.m first!
convolvedFeatures = cnnConvolve(filterDim, numFilters, convImages, W, b); %% STEP 1b: Checking your convolution
% To ensure that you have convolved the features correctly, we have
% provided some code to compare the results of your convolution with
% activations from the sparse autoencoder % For random points
for i = :
filterNum = randi([, numFilters]);
imageNum = randi([, ]);
imageRow = randi([, imageDim - filterDim + ]);
imageCol = randi([, imageDim - filterDim + ]); patch = convImages(imageRow:imageRow + filterDim - , imageCol:imageCol + filterDim - , imageNum); feature = sum(sum(patch.*W(:,:,filterNum)))+b(filterNum);
feature = ./(+exp(-feature)); if abs(feature - convolvedFeatures(imageRow, imageCol,filterNum, imageNum)) > 1e-
fprintf('Convolved feature does not match test feature\n');
fprintf('Filter Number : %d\n', filterNum);
fprintf('Image Number : %d\n', imageNum);
fprintf('Image Row : %d\n', imageRow);
fprintf('Image Column : %d\n', imageCol);
fprintf('Convolved feature : %0.5f\n', convolvedFeatures(imageRow, imageCol, filterNum, imageNum));
fprintf('Test feature : %0.5f\n', feature);
error('Convolved feature does not match test feature');
end
end disp('Congratulations! Your convolution code passed the test.'); %%======================================================================
%% STEP : Implement and test pooling
% Implement pooling in the function cnnPool in cnnPool.m %% STEP 2a: Implement pooling
% NOTE: Implement cnnPool in cnnPool.m first!
pooledFeatures = cnnPool(poolDim, convolvedFeatures); %% STEP 2b: Checking your pooling
% To ensure that you have implemented pooling, we will use your pooling
% function to pool over a test matrix and check the results. testMatrix = reshape(:, , );
expectedMatrix = [mean(mean(testMatrix(:, :))) mean(mean(testMatrix(:, :))); ...
mean(mean(testMatrix(:, :))) mean(mean(testMatrix(:, :))); ]; testMatrix = reshape(testMatrix, , , , ); pooledFeatures = squeeze(cnnPool(, testMatrix)); if ~isequal(pooledFeatures, expectedMatrix)
disp('Pooling incorrect');
disp('Expected');
disp(expectedMatrix);
disp('Got');
disp(pooledFeatures);
else
disp('Congratulations! Your pooling code passed the test.');
end

参考文献:

论文Notes on Convolutional Neural Networks,Jake Bouvrie

——