caffe-windows之手写体数字识别例程mnist

时间:2024-05-13 11:03:07

caffe-windows之手写体数字识别例程mnist

一、训练测试网络模型

1.准备数据

Caffe不是直接处理原始数据的,而是由预处理程序将原始数据变换存储为LMDB格式,这种方式可以保持较高的IO效率,加快训练时的数据加载速度。模型通常用ProtoBuffer文本格式表述,训练结果保存为ProtoBuffer二进制文件或是HDF5格式文件。

  1. 下载数据至数据文件夹D:\Ammy\caffe\caffe-master\data\mnist

  2. 编写数据转换脚本,将原始数据转换成lmdb数据格式,包括训练数据转换脚本create_minist_trainlmdb.bat和测试数据转换脚本create_minist_testlmdb.bat,保存至数据文件夹,脚本内容具体如下,也可以直接在命令行窗口(cmd打开)直接运行。

     <-----脚本格式解析:数据转换运行程序+' '+图像数据+' '+标签数据+' '+保存路径----- >
    
     <-----create_minist_trainlmdb.bat----->
    D:\Ammy\caffe\caffe-master\Build\x64\Release\convert_mnist_data.exe ./train-images.idx3-ubyte ./train-labels.idx1-ubyte ..\..\examples\mnist\mnist_train_lmdb
    pause <-----create_minist_testlmdb.bat----->
    D:\Ammy\caffe\caffe-master\Build\x64\Release\convert_mnist_data.exe ./t10k-images.idx3-ubyte ./t10k-labels.idx1-ubyte ..\..\examples\mnist\mnist_test_lmdb
    pause
  3. 双击运行脚本生成lmdb格式数据,examples\mnist\文件夹中会出现mnist_test_lmdb和mnist_train_lmdb两个文件夹,每个文件夹中都有data.mdb和lock.mdb两个文件;

2. 训练模型

打开命令行窗口,进入caffe的根目录下..\caffe-master,运行命令Build\x64\Release\caffe.exe train -solver examples\mnist\lenet_solver.prototxt

运行开始之后,程序进行反复迭代,在迭代10000次之后停止,然后进行测试,显示结果准确率为99.04%,在examples\mnist文件下新出现了4个新文件:后缀caffemodel的文件是模型权值文件和后缀为solverstate的文件为模型训练快照文件,有了快照下次就可以从快照保存的训练点开始训练,当训练过程因为某些以外原因中断的时候,快照就非常重要了。

在命令行窗口中的打印结果的部分解读如下,具体可看《深度学习21天》:

D:\Ammy\caffe\caffe-master>Build\x64\Release\caffe.exe train -solver examples\mnist\lenet_solver.prototxt

//输出格式:日期  时间  进程号  源码文件:代码行号] 输出信息
I0225 11:39:19.423166 8404 caffe.cpp:211] Use CPU.
I0225 11:39:19.427173 8404 solver.cpp:48] //打印解析后的训练超参数文件examples\mnist\lenet_solver.prototxt
Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.01
display: 100
... I0225 11:39:19.430171 8404 solver.cpp:91] Creating training net from net file: examples/mnist/lenet_train_test.prototxt //创建训练网络
I0225 11:39:19.434173 8404 net.cpp:332] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
I0225 11:39:19.435174 8404 net.cpp:332] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0225 11:39:19.437175 8404 net.cpp:58] //打印网络配置文件examples/mnist/lenet_train_test.prototxt Initializing net from parameters:
name: "LeNet"
state {
phase: TRAIN
level: 0
stage: ""
... //搭建训练网络
//Mnist层
I0225 11:39:19.460191 8404 net.cpp:100] Creating Layer mnist
//产生两个输出,data为图片数据,label为标签数据
I0225 11:39:19.463192 8404 net.cpp:418] mnist -> data
I0225 11:39:19.464193 8404 net.cpp:418] mnist -> label
//打开训练集lmdb
I0225 11:39:19.464193 4988 db_lmdb.cpp:40] Opened lmdb examples/mnist/mnist_train_lmdb
//data为四维数组,尺寸为(64,1,28,28)
I0225 11:39:19.466195 8404 data_layer.cpp:41] output data size: 64,1,28,28
I0225 11:39:19.469195 8404 net.cpp:150] Setting up mnist
I0225 11:39:19.470197 8404 net.cpp:157] Top shape: 64 1 28 28 (50176)
I0225 11:39:19.471197 8404 net.cpp:157] Top shape: 64 (64)
//统计内存占用情况,会逐层累计
I0225 11:39:19.473199 8404 net.cpp:165] Memory required for data: 200960
//下面继续搭建其他层,过程类似,不介绍
... //搭建loss层
I0225 11:39:19.567260 8404 net.cpp:100] Creating Layer loss
//需要两个输入,ip2和label,产生一个输出loss
I0225 11:39:19.569265 8404 net.cpp:444] loss <- ip2
I0225 11:39:19.570264 8404 net.cpp:444] loss <- label
I0225 11:39:19.572265 8404 net.cpp:418] loss -> loss
I0225 11:39:19.573266 8404 layer_factory.hpp:77] Creating layer loss
I0225 11:39:19.575266 8404 net.cpp:150] Setting up loss
//loss层的输出结果尺寸为1,loss weight 参数为1
I0225 11:39:19.577268 8404 net.cpp:157] Top shape: (1)
I0225 11:39:19.579272 8404 net.cpp:160] with loss weight 1
//统计内存占用情况
I0225 11:39:19.580272 8404 net.cpp:165] Memory required for data: 5169924
//从后往前统计哪些层需要做反向传播计算
I0225 11:39:19.581271 8404 net.cpp:226] loss needs backward computation.
I0225 11:39:19.583272 8404 net.cpp:226] ip2 needs backward computation.
I0225 11:39:19.584272 8404 net.cpp:226] relu1 needs backward computation.
I0225 11:39:19.586272 8404 net.cpp:226] ip1 needs backward computation.
I0225 11:39:19.587275 8404 net.cpp:226] pool2 needs backward computation.
I0225 11:39:19.589275 8404 net.cpp:226] conv2 needs backward computation.
I0225 11:39:19.590276 8404 net.cpp:226] pool1 needs backward computation.
I0225 11:39:19.592278 8404 net.cpp:226] conv1 needs backward computation.
I0225 11:39:19.594278 8404 net.cpp:228] mnist does not need backward computation.
I0225 11:39:19.596282 8404 net.cpp:270] This network produces output loss
//网络搭建完毕
I0225 11:39:19.597280 8404 net.cpp:283] Network initialization done. //搭建测试网络
... //开始训练模型
I0225 11:39:19.798415 8404 solver.cpp:60] Solver scaffolding done.
I0225 11:39:19.799417 8404 caffe.cpp:252] Starting Optimization
I0225 11:39:19.800420 8404 solver.cpp:279] Solving LeNet
I0225 11:39:19.802418 8404 solver.cpp:280] Learning Rate Policy: inv //测试一次,得到初始分类准确率和损失值,测试结果准确率=0.117,损失值=2.32442
I0225 11:39:19.805423 8404 solver.cpp:337] Iteration 0, Testing net (#0)
I0225 11:39:28.822401 8404 solver.cpp:404] Test net output #0: accuracy = 0.117
I0225 11:39:28.823400 8404 solver.cpp:404] Test net output #1: loss = 2.32442 (* 1 = 2.32442 loss) //开始跌打训练网络,训练时只有loss输出,没有accuracy输出,loss值在不断减少
I0225 11:39:28.972501 8404 solver.cpp:228] Iteration 0, loss = 2.34114
I0225 11:39:28.973502 8404 solver.cpp:244] Train net output #0: loss = 2.34114 (* 1 = 2.34114 loss)
I0225 11:39:28.974503 8404 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I0225 11:39:43.212946 8404 solver.cpp:228] Iteration 100, loss = 0.192031
I0225 11:39:43.213948 8404 solver.cpp:244] Train net output #0: loss = 0.192031 (* 1 = 0.192031 loss)
I0225 11:39:43.214948 8404 sgd_solver.cpp:106] Iteration 100, lr = 0.00992565
...
//solver.prototxt设置test_interval:500,所以每迭代训练500次,做一次测试,此时网络的准确率已经达到97。15%
I0225 11:40:26.279086 8404 solver.cpp:228] Iteration 400, loss = 0.103655
I0225 11:40:26.279086 8404 solver.cpp:244] Train net output #0: loss = 0.103655 (* 1 = 0.103655 loss)
I0225 11:40:26.280086 8404 sgd_solver.cpp:106] Iteration 400, lr = 0.00971013
I0225 11:40:40.106259 8404 solver.cpp:337] Iteration 500, Testing net (#0)
I0225 11:40:49.391418 8404 solver.cpp:404] Test net output #0: accuracy = 0.9715
I0225 11:40:49.392419 8404 solver.cpp:404] Test net output #1: loss = 0.0906553 (* 1 = 0.0906553 loss)
I0225 11:40:49.535516 8404 solver.cpp:228] Iteration 500, loss = 0.140814
I0225 11:40:49.535516 8404 solver.cpp:244] Train net output #0: loss = 0.140814 (* 1 = 0.140814 loss)
... //solver.prototxt设置snapshot:5000,所以没迭代5000次保存一次网络训练快照solverstate和模型参数caffemodel。
I0225 11:53:34.589983 8404 sgd_solver.cpp:106] Iteration 4900, lr = 0.00741498
I0225 11:53:49.304751 8404 solver.cpp:454] Snapshotting to binary proto file examples/mnist/lenet_iter_5000.caffemodel
I0225 11:53:49.351778 8404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_5000.solverstate
I0225 11:53:49.393805 8404 solver.cpp:337] Iteration 5000, Testing net (#0)
I0225 11:53:59.841735 8404 solver.cpp:404] Test net output #0: accuracy = 0.9893
I0225 11:53:59.842738 8404 solver.cpp:404] Test net output #1: loss = 0.0323484 (* 1 = 0.0323484 loss)
I0225 11:53:59.978826 8404 solver.cpp:228] Iteration 5000, loss = 0.0304211
I0225 11:53:59.979827 8404 solver.cpp:244] Train net output #0: loss = 0.030421 (* 1 = 0.030421 loss)
I0225 11:53:59.980828 8404 sgd_solver.cpp:106] Iteration 5000, lr = 0.00737788
... //solver.prototxt设置max_iter: 10000,所以迭代训练10000次之后结束网络训练,此时网络的准确率有98.97%。
I0225 12:07:36.859302 8404 solver.cpp:454] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I0225 12:07:36.899327 8404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate
I0225 12:07:36.978380 8404 solver.cpp:317] Iteration 10000, loss = 0.004084
I0225 12:07:36.978380 8404 solver.cpp:337] Iteration 10000, Testing net (#0)
I0225 12:07:46.090425 8404 solver.cpp:404] Test net output #0: accuracy = 0.9897
I0225 12:07:46.091425 8404 solver.cpp:404] Test net output #1: loss = 0.0291721 (* 1 = 0.0291721 loss)
I0225 12:07:46.092427 8404 solver.cpp:322] Optimization Done.
I0225 12:07:46.094427 8404 caffe.cpp:255] Optimization Done.

3. 测试模型

训练好网络模型之后,可以用它对测试数据集进行批量测试,命令为:

Build\x64\Release\caffe.exe test -model examples\mnist\lenet_train_test.prototxt -weights examples\mnist\lenet_iter_10000.caffemodel -iterations 100

测试模型的输出结果与训练模型过程类似,需要注意的是,测试时会根据batch-size的设定,将测试数据分为多批,然后每批数据分别进行测试,分别得到一个accuracy和loss结果,那么迭代测试多少批呢,就是由命令中的-iterations值控制。最终统计得到一个总的accuracy和loss结果,如下所示。

...
I0225 21:32:55.633750 832 caffe.cpp:309] Batch 48, accuracy = 0.95
I0225 21:32:55.634752 832 caffe.cpp:309] Batch 48, loss = 0.123592
I0225 21:32:55.725814 832 caffe.cpp:309] Batch 49, accuracy = 0.99
I0225 21:32:55.725814 832 caffe.cpp:309] Batch 49, loss = 0.0154715
I0225 21:32:55.726811 832 caffe.cpp:314] Loss: 0.0434295
I0225 21:32:55.728813 832 caffe.cpp:326] accuracy = 0.9846
I0225 21:32:55.730814 832 caffe.cpp:326] loss = 0.0434295 (* 1 = 0.0434295 loss)

二、利用训练好的网络模型测试手写体原图【参考

通过上面的步骤我们训练得到了一个识别手写体数字的网络模型,那么怎么对我们自己写的数字图像进行识别呢?

Caffe提供了调用训练好的网络模型进行图像分类的接口,源码为..\caffe-master\examples/cpp_classification,编译之后生成的classification.exe可执行文件存放在..\caffe-master\Build\x64\Release里,它的命令格式如下,源码分析在caffe-windows中classification.cpp的源码阅读

usage: classification deploy.prototxt network.caffemodel mean.binaryproto labels.txt img.jpg

deploy.prototxt-----模型描述文件
network.caffemodel------模型权值文件
mean.binaryproto-----图像均值文件
labels.txt-----图像类别标签信息
img.jpg-----输入待分类图像

因此要通过上述命令识别手写体图像,必须先得到模型描述文件、模型权值文件、图像均值文件、图像类别标签信息、输入待分类图像。

  • 【模型描述文件】..\caffe-masterexamples\mnist\里的lenet.prototxt网络描述文件。它与lenet_train_test.prototxt具体区别,现在我还不了解。

  • 【模型权值文件】训练得到的lenet_iter_10000.caffemodel,里面存放了训练得到的网络参数。

  • 【图像均值文件】

    Caffe提供了计算图像集均值文件的接口,源码为..\caffe-master\tools\compute_image_mean.cpp,编译之后生成的compute_image_mean.exe可执行文件存放在..\caffe-master\Build\x64\Release里,它的命令格式如下:

      usage: compute_image_mean [FLAGS] INPUT_DB [OUTPUT_DB]		
    
      INPUT_DB------lmdb或leveldb格式图像数据集,如mnist_train_lmdb文件夹
    [OutPut_DB]------输出均值文件的名称,*.binaryproto [FLAGS]可设置项如下
    -backend -----转换格式选择,可设置项有lmdb和leveldb,默认为lmdb;

    这里输入命令为(注意补充好完整路径): compute_image_mean.exe mnist_train_lmdb mean.binaryproto,得到的均值文件mean.binaryproto存放到项目文件夹中。

  • 【图像类别标签信息】

    新建一个txt文件,将标签逐行输入,注意标签顺序和个数要和训练时设定的一样,这里将txt命名为label.txt,输入标签信息如下:

      //label.txt
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
  • 【得到待分类图像】

    手写体图像可以通过“画图”软件得到,但因为mnist训练时用的灰度图,所以需要将图像先转为灰度图,可以用PS、matlab等软件进行转换,http://blog.****.net/zb1165048017/article/details/52217772博客提供了几张手写体灰度图。

最终实验结果

测试之后发现,0.bmp,6.bmp图像存在误检,即使我重新写了两张比较标准的手写体图像6.jpg和6.bmp,还是存在误检,这是怎么回事??

  1. 发现和网络有一定关系,之前用..\caffe-masterexamples\mnist\里的lenet.prototxt网络描述文件,如果用http://blog.****.net/chengzhongxuyou/article/details/50717543中的网络,0.bmp和66.jpg可以识别正确,但是6.bmp和6.jpg还是错的,为什么??

三、利用上述网络训练自己的24类字母数据集实现字母识别【参考

1. 得到实验样本

用蔡的字母数据集做实验,一共有24个类别的字母(A-Z),每类字母有90张左右的样本。抽取每类样本中的70张,共1680张样本组成训练数据集,剩余466张的样本组成测试数据集。所有文件夹存放在一个examples\my_project\char文件夹中。

2. 得到标签文件

因为不同类别的样本原本就分类好,存放在对应类别的文件夹中,因此对应的训练标签文件char-trainData.txt和测试标签文件char-testData.txt很容易完成,C#代码如下,两个标签文件最终存放在examples\my_project\char文件夹中。

FileStream fs_train = new FileStream(@"C:\Users\Administrator\Desktop\char\char-trainData.txt", FileMode.Append);
StreamWriter sw_train = new StreamWriter(fs_train);
FileStream fs_test = new FileStream(@"C:\Users\Administrator\Desktop\char\char-testData.txt", FileMode.Append);
StreamWriter sw_test = new StreamWriter(fs_test); DirectoryInfo flod = new DirectoryInfo(@"C:\Users\Administrator\Desktop\char");//char文件夹 int label = 0;
foreach (DirectoryInfo fd in flod.GetDirectories())
{
int idx = 1;
foreach (FileInfo fi in fd.GetFiles())//each-char文件夹
{
string flodname = fi.DirectoryName;
if (idx < 71)
{ sw_train.WriteLine(flodname.Substring(flodname.Length - 1, 1) + "/"+ fi.Name + " " + label);
}
else
{
sw_test.WriteLine(flodname.Substring(flodname.Length - 1, 1) + "/" + fi.Name + " " + label);
}
idx++;
}
label++;
}
sw_train.Flush();
sw_train.Close();
sw_test.Flush();
sw_test.Close();

3. 数据集转化为lmdb格式

Caffe提供了将“图像+标签”数据集转化为lmdb格式数据的接口,源码为..\caffe-master\tools\convert_imageset.cpp,编译之后生成的convert_imageset.exe可执行文件存放在..\caffe-master\Build\x64\Release里,它的命令格式如下:

usage: convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME

在这个例子中,在caffe-master目录下的执行的命令为:

1. convert_imageset.exe --gray=true --resize_height=28 --resize_width=28 examples\my_project\char\ examples\my_project\char\char-trainData.txt examples\my_project/char/char_trainData_db

2. convert_imageset.exe --gray=true --resize_height=28 --resize_width=28 examples\my_project\char\ examples\my_project\char\char-testData.txt examples\my_project/char/char_testData_db

4. 得到网络模型描述文件和超参数配置文件

【修改lenet_solver.prototxt】

net: "examples/my_projecy/char/lenet_train_test.prototxt"
test_iter: 4 //因为测试样本只有446张,每组110张,只能测4次
snapshot_prefix: "examples/my_project/char/char" //最后一个char不是文件名,是快照文件的前缀名

【lenet_train_test.prototxt】

//改训练第一层的data_param
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/my_project/char/char_trainData_db"//路径修改
batch_size: 70//训练样本每组70,一共训练24次
backend: LMDB
}
}
//改测试第一层的data_param
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/my_project/char/char_testData_db"//路径修改
batch_size: 110//测试样本每组110,一共训练4次
backend: LMDB
}
}
//改输出
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 24//输出24类
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}

执行训练命令

caffe.exe train -solver examples\my_project\char\lenet_solver.prototxt

最终效果不好

为什么训练自己的数据效果不好,是过程出错?样本太少?分类过多?还是网络不适用于处理英文字母?

caffe-windows之手写体数字识别例程mnist