paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)

时间:2022-10-10 15:54:30

文章目录

参考:

一、赛事简介

  • 比赛地址:https://aistudio.baidu.com/aistudio/competition/detail/114/0/introduction
  • 赛题介绍:本次比赛为图像目标识别比赛,要求参赛选手识别出钢铁表面出现缺陷的位置,并给出锚点框的坐标,同时对不同的缺陷进行分类。
  • 数据简介:本数据集来自NEU表面缺陷检测数据集,收集了6种典型的热轧带钢表面缺陷,即氧化铁皮压入(RS)、斑块(Pa)、开裂(Cr)、点蚀(PS)、夹杂(In)和划痕(Sc)。下图为六种典型表面缺陷的示例,每幅图像的分辨率为200 * 200像素。
  • 训练集图片1400张,测试集图片400张
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)

提交内容及格式:

  • 结果文件命名:submission.csv(否则无法成功提交)
  • 结果文件格式:.csv(否则无法成功提交)
  • 结果文件内容:submission.csv结果文件需包含多行记录,每行包括4个字段,内容示例如下:
image_id		bbox		    category_id			confidence
1400		    [0, 0, 0, 0]		    0			    1

各字段含义如下:

  • image_id(int): 图片id
  • bbox(list[float]): 检测框坐标(XMin, YMin, XMax, YMax)
  • category_id: 缺陷所属类别(int),类别对应字典为:{‘ crazing’:0,’inclusion’:1, ’pitted_surface’:2, ,’scratches’:3,’patches’:4,’rolled-in_scale’:5}
  • confidence(float): 置信度
    备注: 每一行记录1个检测框,并给出对应的category_id;同张图片中检测到的多个检测框,需分别记录在不同的行内。

二、ppyoloe+l模型,41.32分 (PaddleDetection-voc)

2.1 安装PaddleDetection

《PaddleDetection-MaskRcnn相关结构以及优化器》

  • 克隆PaddleDetection
  • 安装PaddleDetection文件夹的requirements.txt(python依赖)
  • 编译安装paddledet
  • 安装后确认测试通过
%cd ~/work

#!git clone https://github.com/PaddlePaddle/PaddleDetection.git
#如果github下载代码较慢,可尝试使用gitee
#git clone https://gitee.com/paddlepaddle/PaddleDetection

# 安装其他依赖
%cd PaddleDetection
!pip install -r requirements.txt
# 编译安装paddledet
!python setup.py install

#安装后确认测试通过:
!python ppdet/modeling/tests/test_architectures.py
!python ppdet/modeling/tests/test_architectures.py
W1001 15:08:57.768669  1185 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1001 15:08:57.773610  1185 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
.......
----------------------------------------------------------------------
Ran 7 tests in 2.142s

OK

2.2 数据预处理

2.2.1 解压数据集

# 解压到work下的dataset文件夹

!mkdir dataset
!unzip ../data/test.zip -d dataset
!unzip ../data/train.zip -d dataset

# 重命名为annotations和images
!mv dataset/train/IMAGES dataset/train/images
!mv dataset/train/ANNOTATIONS dataset/train/annotations

2.2.2 自定义数据集(感觉很麻烦,暂时放弃)

  • PaddleDetection的数据处理模块的所有代码逻辑在ppdet/data/中,数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。
  • 数据集定义在source目录下,其中dataset.py中定义了数据集的基类DetDataSet, 所有的数据集均继承于基类,DetDataset基类里定义了如下等方法:
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
  • 当一个数据集类继承自DetDataSet,那么它只需要实现parse_dataset函数即可

parse_dataset根据数据集设置:

  • 数据集根路径dataset_dir
  • 图片文件夹image_dir
  • 标注文件路径anno_path

  取出所有的样本,并将其保存在一个列表roidbs中,每一个列表中的元素为一个样本xxx_rec(比如coco_rec或者voc_rec),用dict表示,dict中包含样本的image, gt_bbox, gt_class等字段。COCO和Pascal-VOC数据集中的xxx_rec的数据结构定义如下:

xxx_rec = {
    'im_file': im_fname,         # 一张图像的完整路径
    'im_id': np.array([img_id]), # 一张图像的ID序号
    'h': im_h,                   # 图像高度
    'w': im_w,                   # 图像宽度
    'is_crowd': is_crowd,        # 是否是群落对象, 默认为0 (VOC中无此字段)
    'gt_class': gt_class,        # 标注框标签名称的ID序号
    'gt_bbox': gt_bbox,          # 标注框坐标(xmin, ymin, xmax, ymax)
    'gt_poly': gt_poly,          # 分割掩码,此字段只在coco_rec中出现,默认为None
    'difficult': difficult       # 是否是困难样本,此字段只在voc_rec中出现,默认为0
}
  • xxx_rec中的内容也可以通过DetDataSet的data_fields参数来控制,即可以过滤掉一些不需要的字段,但大多数情况下不需要修改,按照configs/dataset中的默认配置即可。

  • 此外,在parse_dataset函数中,保存了类别名到id的映射的一个字典cname2cid。在coco数据集中,会利用COCO API从标注文件中加载数据集的类别名,并设置此字典。在voc数据集中,如果设置use_default_label=False,将从label_list.txt中读取类别列表,反之将使用voc默认的类别列表。

2.2.3 准备VOC数据集(直接用这个)

参考:《如何准备训练数据》

  1. 尝试coco数据集

    • COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放
    • 用户数据集转成COCO数据后目录结构如下:
    dataset/xxx/
    ├── annotations
    │   ├── train.json  # coco数据的标注文件
    │   ├── valid.json  # coco数据的标注文件
    ├── images
    │   ├── xxx1.jpg
    │   ├── xxx2.jpg
    │   ├── xxx3.jpg
    │   |   ...
    ...
    
    • json格式标注文件,得自己生成,目前我知道的就是用paddledetection./tools/中提供的x2coco.py,将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据(生成json标准为文件)。这样太麻烦,还不如直接用VOC格式训练。
  2. 尝试自定义数据集(参考《数据处理模块》,重写parse_dataset感觉也很麻烦)

  3. 准备voc数据集(最简单,麻烦一点的就是生成txt文件)

    • 模仿VOC数据集目录结构,新建VOCdevkit文件夹并进入其中,然后继续新建VOC2007文件夹并进入其中,之后新建AnnotationsJPEGImagesImageSets文件夹,最后进入ImageSets文件夹中新建Main文件夹,至此完成VOC数据集目录结构的建立。

    • 将该数据集中的train/annotations/xmlsval/annotations/xmls(如果有val验证集的话)下的所有xml标注文件拷贝到VOCdevkit/VOC2007/Annotations中,

    • 将该数据集中的train/images/val/images/下的所有图片拷贝到VOCdevkit/VOC2007/JPEGImages

    • 最后在数据集根目录下输出最终的trainval.txt和test.txt文件(可用pandas完成,一会说):

  4. 生成VOC格式目录。
    如果觉得后面移动文件很麻烦,可以先生成VOC目录,再将数据集解压到VOC2007中,将其图片和标注文件夹分别重命名为AnnotationsJPEGImages

%cd work
!mkdir VOCdevkit
%cd VOCdevkit
!mkdir VOC2007
%cd VOC2007
!mkdir Annotations JPEGImages ImageSets
%cd ImageSets
!mkdir Main
%cd ../../
  1. 生成trainval.txtval.txt
    由于该数据集中缺少已标注图片名列表文件trainval.txtval.txt,所以需要进行生成,用pandas处理更直观
# 遍历图片和标注文件夹,将所有文件后缀正确的文件添加到列表中
import os
import pandas as pd
ls_xml,ls_image=[],[]
for xml in os.listdir('dataset/train/annotations'):
    if xml.split('.')[1]=='xml':
        ls_xml.append(xml)

for image in os.listdir('dataset/train/images'):
    if image.split('.')[1]=='jpg':
        ls_image.append(image)

读取xml文件列表和image文件名列表之后,要先进行排序。

  • 直接df.sort_values([‘image’,‘xml’],inplace=True)是先对image排序,再对xml排序,是整个表排序而不是单列分别排序,这样的结果是不对的
  • 直接分别排序再合并,结果也不对。因为添加xml列的时候,默认是按key=index来合并的,而两个列表的文件都是乱序的,这样排序后index也不相同,所以需要分别排序后重设索引,且丢弃原先索引,最后再合并。
df=pd.DataFrame(ls_image,columns=['image'])
df.sort_values('image',inplace=True)
df=df.reset_index(drop=True)
s=pd.Series(ls_xml).sort_values().reset_index(drop=True)
df['xml']=s
df.head(3)
	image	xml
0	0.jpg	0.xml
1	1.jpg	1.xml
2	10.jpg	10.xml

训练时,文件都是相对路径,所以要加前缀VOC2007/JPEGImages/VOC2007/Annotations/

%cd VOCdevkit
voc=df.sample(frac=1)
voc.image=voc.image.apply(lambda x : 'VOC2007/JPEGImages/'+str(x))
voc.xml=voc.xml.apply(lambda x : 'VOC2007/Annotations/'+str(x))
voc.to_csv('trainval.txt',sep=' ',index=0,header=0)

# 划分出训练集和验证集,保存为txt格式,中间用空格隔开
train_df=trainval[:1200]
val_df=trainval[1200:]
train_df.to_csv('train.txt',sep=' ',index=0,header=0)
val_df.to_csv('val.txt',sep=' ',index=0,header=0)

!cp -r train/annotations/* ../VOCdevkit/VOC2007/Annotations
!cp -r train/images/* ../VOCdevkit/VOC2007/JPEGImages

查看一张图片信息:

from PIL import Image
image = Image.open('dataset/train/images/0.jpg')
print('width: ', image.width)
print('height: ', image.height)
print('size: ', image.size)
print('mode: ', image.mode)
print('format: ', image.format)
print('category: ', image.category)
print('readonly: ', image.readonly)
print('info: ', image.info)
image.show()

2.3 修改配置文件,准备训练

  • 模型效果请参考ppyolo文档
  • 配置参数参考yolo配置参数说明《如何更好的理解reader和自定义修改reader文件》
  • PaddleYOLO库集成了yolo系列模型,包括yolov5/6/7/X,后续可以试试。
  • 修改配置文件
    • 手动复制一个configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml文件的副本,其它类似,防止改错了无法还原
    • configs/datasets/voc.yml不用复制,免得下次重新写,修改后如下:(TestDataset最好不要写dataset_dir字段,否则后面infer.py推断的时候,选择参数save_results=True会报错label_list label_list.txt not a file
      metric: VOC
      map_type: 11point
      num_classes: 6
      
      TrainDataset:
        !VOCDataSet
          dataset_dir: ../VOCdevkit
          anno_path: train.txt
          label_list: label_list.txt
          data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
      
      EvalDataset:
        !VOCDataSet
          dataset_dir: ../VOCdevkit
          anno_path: val.txt
          label_list: label_list.txt
          data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
      
      TestDataset:
        !ImageFolder
          anno_path: ../VOCdevkit/label_list.txt
      
    • 学习率改为0.00025,epoch=60,max_epochs=72,!LinearWarmup下的epoch改为4。
    • reader部分训练验证集的batch_size从8改为16,推理时bs也改为16,大大加快推理速度(之前默认为1,特别慢)
    • PP-YOLOE+支持混合精度训练,请添加--amp.
    • 使用YOLOv3模型如何通过yml文件修改输入图片尺寸
      • Multi-Scale Training:多尺度训练 。yolov3中作者认为网络输入尺寸固定的话,模型鲁棒性受限,所以考虑多尺度训练。具体的,在训练过程中每隔10个batches,重新随机选择输入图片的尺寸[320,352,416…608](Darknet-19最终将图片缩放32倍,所以一般选择32的倍数)。
      • 模型预测部署需要用到指定的尺寸时,首先在训练前需要修改configs/_base_/yolov3_reader.yml中的TrainReaderBatchRandomResizetarget_size包含指定的尺寸,训练完成后,在评估或者预测时,需要将EvalReaderTestReader中的Resizetarget_size修改成对应的尺寸,如果是需要模型导出(export_model),则需要将TestReader中的image_shape修改为对应的图片输入尺寸 。
      • 如果只改了训练集入网尺寸,验证测试集尺寸不变,会报错
    • Paddle中支持的优化器Optimizer在PaddleDetection中均支持,需要手动修改下配置文件即可

ppyoloe_plus_reader.yml修改如下:(图片都很小,把默认的入网尺寸改了)

worker_num: 4
eval_height: &eval_height 224
eval_width: &eval_width 224
eval_size: &eval_size [*eval_height, *eval_width]

TrainReader:
  sample_transforms:
    - Decode: {}
    - RandomDistort: {}
    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
    - RandomCrop: {}
    - RandomFlip: {}
  batch_transforms:
    - BatchRandomResize: {target_size: [96, 128, 160, 192, 224, 256, 288,320,352], random_size: True, random_interp: True, keep_ratio: False}
    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
    - Permute: {}
    - PadGT: {}
  batch_size: 16
  shuffle: true
  drop_last: true
  use_shared_memory: true
  collate_batch: true

EvalReader:
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
    - Permute: {}
  batch_size: 16

TestReader:
  inputs_def:
    image_shape: [3, *eval_height, *eval_width]
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
    - Permute: {}
  batch_size: 1 #最好是1,下文会说明

训练参数列表:(可通过–help查看)

FLAG 支持脚本 用途 默认值 备注
-c ALL 指定配置文件 None 必选,例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
-o ALL 设置或更改配置文件里的参数内容 None 相较于-c设置的配置文件有更高优先级,例如:-o use_gpu=False
–eval train 是否边训练边测试 False 如需指定,直接--eval即可
-r/–resume_checkpoint train 恢复训练加载的权重路径 None 例如:-r output/faster_rcnn_r50_1x_coco/10000
–slim_config ALL 模型压缩策略配置文件 None 例如--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml
–use_vdl train/infer 是否使用VisualDL记录数据,进而在VisualDL面板中显示 False VisualDL需Python>=3.5
–vdl_log_dir train/infer 指定 VisualDL 记录数据的存储路径 train:vdl_log_dir/scalar infer: vdl_log_dir/image VisualDL需Python>=3.5
–output_eval eval 评估阶段保存json路径 None 例如 --output_eval=eval_output, 默认为当前路径
–json_eval eval 是否通过已存在的bbox.json或者mask.json进行评估 False 如需指定,直接--json_eval即可, json文件路径在--output_eval中设置
–classwise eval 是否评估单类AP和绘制单类PR曲线 False 如需指定,直接--classwise即可
–output_dir infer/export_model 预测后结果或导出模型保存路径 ./output 例如--output_dir=output
–draw_threshold infer 可视化时分数阈值 0.5 例如--draw_threshold=0.7
–infer_dir infer 用于预测的图片文件夹路径 None --infer_img--infer_dir必须至少设置一个
–infer_img infer 用于预测的图片路径 None --infer_img--infer_dir必须至少设置一个,infer_img具有更高优先级
–save_results infer 是否在文件夹下将图片的预测结果保存到文件中 False 可选

2.4 ppyoloe+s,mAP=77.6%

# lr=0.0002,epoch=40,time=2572s
%cd ~/work/PaddleDetection/
!python -u tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco-Copy1.yml \
            --use_vdl=true \
            --vdl_log_dir=vdl_dir/scalar \
            --eval \
            --amp
[10/02 02:51:09] ppdet.engine INFO: Epoch: [45] [ 0/75] learning_rate: 0.000085 loss: 1.671754 loss_cls: 0.781011 loss_iou: 0.174941 loss_dfl: 0.880325 loss_l1: 0.378740 eta: 0:10:17 batch_cost: 0.5481 data_cost: 0.0049 ips: 29.1918 images/s
[10/02 02:51:58] ppdet.engine INFO: Epoch: [46] [ 0/75] learning_rate: 0.000080 loss: 1.672976 loss_cls: 0.782366 loss_iou: 0.173936 loss_dfl: 0.885129 loss_l1: 0.378303 eta: 0:09:36 batch_cost: 0.5459 data_cost: 0.0049 ips: 29.3068 images/s
[10/02 02:52:48] ppdet.engine INFO: Epoch: [47] [ 0/75] learning_rate: 0.000075 loss: 1.679924 loss_cls: 0.791866 loss_iou: 0.173251 loss_dfl: 0.892923 loss_l1: 0.371434 eta: 0:08:55 batch_cost: 0.5490 data_cost: 0.0049 ips: 29.1445 images/s
[10/02 02:53:37] ppdet.engine INFO: Epoch: [48] [ 0/75] learning_rate: 0.000069 loss: 1.669277 loss_cls: 0.785255 loss_iou: 0.173793 loss_dfl: 0.879943 loss_l1: 0.384001 eta: 0:08:14 batch_cost: 0.5546 data_cost: 0.0072 ips: 28.8511 images/s
[10/02 02:54:25] ppdet.engine INFO: Epoch: [49] [ 0/75] learning_rate: 0.000064 loss: 1.653534 loss_cls: 0.783021 loss_iou: 0.173808 loss_dfl: 0.865887 loss_l1: 0.377161 eta: 0:07:32 batch_cost: 0.5445 data_cost: 0.0080 ips: 29.3847 images/s
[10/02 02:55:18] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:55:19] ppdet.engine INFO: Eval iter: 0
[10/02 02:55:24] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 02:55:24] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 77.23%
[10/02 02:55:24] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.900204541994455
[10/02 02:55:24] ppdet.engine INFO: Best test bbox ap is 0.772.
[10/02 02:55:30] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:55:31] ppdet.engine INFO: Epoch: [50] [ 0/75] learning_rate: 0.000059 loss: 1.639129 loss_cls: 0.777811 loss_iou: 0.172295 loss_dfl: 0.865005 loss_l1: 0.371313 eta: 0:06:51 batch_cost: 0.5446 data_cost: 0.0056 ips: 29.3792 images/s
[10/02 02:56:18] ppdet.engine INFO: Epoch: [51] [ 0/75] learning_rate: 0.000054 loss: 1.643418 loss_cls: 0.775152 loss_iou: 0.170642 loss_dfl: 0.866506 loss_l1: 0.374402 eta: 0:06:10 batch_cost: 0.5376 data_cost: 0.0058 ips: 29.7619 images/s
[10/02 02:57:07] ppdet.engine INFO: Epoch: [52] [ 0/75] learning_rate: 0.000050 loss: 1.652525 loss_cls: 0.774686 loss_iou: 0.170963 loss_dfl: 0.863157 loss_l1: 0.375742 eta: 0:05:28 batch_cost: 0.5396 data_cost: 0.0068 ips: 29.6510 images/s
[10/02 02:57:56] ppdet.engine INFO: Epoch: [53] [ 0/75] learning_rate: 0.000045 loss: 1.627508 loss_cls: 0.768282 loss_iou: 0.168563 loss_dfl: 0.865570 loss_l1: 0.368651 eta: 0:04:47 batch_cost: 0.5505 data_cost: 0.0093 ips: 29.0646 images/s
[10/02 02:58:45] ppdet.engine INFO: Epoch: [54] [ 0/75] learning_rate: 0.000041 loss: 1.630234 loss_cls: 0.768148 loss_iou: 0.168092 loss_dfl: 0.868954 loss_l1: 0.361416 eta: 0:04:06 batch_cost: 0.5521 data_cost: 0.0096 ips: 28.9806 images/s
[10/02 02:59:39] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:59:40] ppdet.engine INFO: Eval iter: 0
[10/02 02:59:45] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 02:59:45] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 76.88%
[10/02 02:59:45] ppdet.engine INFO: Total sample number: 200, averge FPS: 34.83328737953738
[10/02 02:59:45] ppdet.engine INFO: Best test bbox ap is 0.772.
[10/02 02:59:47] ppdet.engine INFO: Epoch: [55] [ 0/75] learning_rate: 0.000037 loss: 1.630969 loss_cls: 0.772440 loss_iou: 0.170143 loss_dfl: 0.868614 loss_l1: 0.365805 eta: 0:03:25 batch_cost: 0.5482 data_cost: 0.0057 ips: 29.1862 images/s
[10/02 03:00:35] ppdet.engine INFO: Epoch: [56] [ 0/75] learning_rate: 0.000033 loss: 1.637988 loss_cls: 0.769367 loss_iou: 0.170256 loss_dfl: 0.869540 loss_l1: 0.361416 eta: 0:02:44 batch_cost: 0.5446 data_cost: 0.0055 ips: 29.3816 images/s
[10/02 03:01:24] ppdet.engine INFO: Epoch: [57] [ 0/75] learning_rate: 0.000029 loss: 1.627233 loss_cls: 0.764908 loss_iou: 0.166364 loss_dfl: 0.872990 loss_l1: 0.351342 eta: 0:02:03 batch_cost: 0.5433 data_cost: 0.0054 ips: 29.4497 images/s
[10/02 03:02:12] ppdet.engine INFO: Epoch: [58] [ 0/75] learning_rate: 0.000025 loss: 1.621432 loss_cls: 0.766320 loss_iou: 0.165519 loss_dfl: 0.872478 loss_l1: 0.342992 eta: 0:01:22 batch_cost: 0.5474 data_cost: 0.0084 ips: 29.2273 images/s
[10/02 03:03:01] ppdet.engine INFO: Epoch: [59] [ 0/75] learning_rate: 0.000022 loss: 1.618331 loss_cls: 0.764125 loss_iou: 0.167583 loss_dfl: 0.870914 loss_l1: 0.356742 eta: 0:00:41 batch_cost: 0.5461 data_cost: 0.0093 ips: 29.2967 images/s
[10/02 03:03:50] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 03:03:50] ppdet.engine INFO: Eval iter: 0
[10/02 03:03:55] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 03:03:55] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 77.04%
[10/02 03:03:55] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.8503525942714
[10/02 03:03:55] ppdet.engine INFO: Best test bbox ap is 0.772.

2.5 ppyoloe+l,mAP=84.71%,3571s

60epoch共3571s,差不多一个epoch1分钟。

# bs=16,lr=0.00025,epoch=60,time=3571s
%cd ~/work/PaddleDetection/
!python -u tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco-Copy1.yml \
            --use_vdl=true \
            --vdl_log_dir=vdl_dir/scalar \
            --eval --amp \
            -o output_dir=output/ppyoloe_l_plus_10e\
            snapshot_epoch=2
[10/01 16:52:44] ppdet.engine INFO: Epoch: [50] [ 0/87] learning_rate: 0.000061 loss: 1.627664 loss_cls: 0.770285 loss_iou: 0.169481 loss_dfl: 0.865384 loss_l1: 0.369002 eta: 0:07:53 batch_cost: 0.5706 data_cost: 0.0075 ips: 28.0426 images/s
[10/01 16:53:44] ppdet.engine INFO: Epoch: [51] [ 0/87] learning_rate: 0.000056 loss: 1.632395 loss_cls: 0.771937 loss_iou: 0.170442 loss_dfl: 0.873482 loss_l1: 0.374619 eta: 0:07:07 batch_cost: 0.5764 data_cost: 0.0108 ips: 27.7600 images/s
[10/01 16:54:41] ppdet.engine INFO: Epoch: [52] [ 0/87] learning_rate: 0.000051 loss: 1.632395 loss_cls: 0.770529 loss_iou: 0.171579 loss_dfl: 0.871368 loss_l1: 0.374107 eta: 0:06:19 batch_cost: 0.5642 data_cost: 0.0099 ips: 28.3589 images/s
[10/01 16:55:38] ppdet.engine INFO: Epoch: [53] [ 0/87] learning_rate: 0.000046 loss: 1.612642 loss_cls: 0.753961 loss_iou: 0.171108 loss_dfl: 0.863146 loss_l1: 0.360852 eta: 0:05:32 batch_cost: 0.5599 data_cost: 0.0120 ips: 28.5753 images/s
[10/01 16:56:34] ppdet.engine INFO: Epoch: [54] [ 0/87] learning_rate: 0.000042 loss: 1.606246 loss_cls: 0.752236 loss_iou: 0.168287 loss_dfl: 0.849859 loss_l1: 0.353835 eta: 0:04:44 batch_cost: 0.5536 data_cost: 0.0086 ips: 28.9019 images/s
[10/01 16:57:30] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 16:57:31] ppdet.engine INFO: Eval iter: 0
[10/01 16:57:36] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/01 16:57:36] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 84.07%
[10/01 16:57:36] ppdet.engine INFO: Total sample number: 200, averge FPS: 35.00615985304822
[10/01 16:57:36] ppdet.engine INFO: Best test bbox ap is 0.841.
[10/01 16:57:42] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 16:57:43] ppdet.engine INFO: Epoch: [55] [ 0/87] learning_rate: 0.000038 loss: 1.600404 loss_cls: 0.750651 loss_iou: 0.167692 loss_dfl: 0.850050 loss_l1: 0.356826 eta: 0:03:57 batch_cost: 0.5343 data_cost: 0.0044 ips: 29.9444 images/s
[10/01 16:58:42] ppdet.engine INFO: Epoch: [56] [ 0/87] learning_rate: 0.000034 loss: 1.598746 loss_cls: 0.750651 loss_iou: 0.167707 loss_dfl: 0.855518 loss_l1: 0.357439 eta: 0:03:09 batch_cost: 0.5491 data_cost: 0.0042 ips: 29.1369 images/s
[10/01 16:59:38] ppdet.engine INFO: Epoch: [57] [ 0/87] learning_rate: 0.000030 loss: 1.617750 loss_cls: 0.758843 loss_iou: 0.169149 loss_dfl: 0.867757 loss_l1: 0.359384 eta: 0:02:22 batch_cost: 0.5589 data_cost: 0.0055 ips: 28.6267 images/s
[10/01 17:00:33] ppdet.engine INFO: Epoch: [58] [ 0/87] learning_rate: 0.000026 loss: 1.615083 loss_cls: 0.764672 loss_iou: 0.169149 loss_dfl: 0.866775 loss_l1: 0.358770 eta: 0:01:34 batch_cost: 0.5437 data_cost: 0.0114 ips: 29.4296 images/s
[10/01 17:01:29] ppdet.engine INFO: Epoch: [59] [ 0/87] learning_rate: 0.000023 loss: 1.600423 loss_cls: 0.762148 loss_iou: 0.168355 loss_dfl: 0.865254 loss_l1: 0.353518 eta: 0:00:47 batch_cost: 0.5460 data_cost: 0.0144 ips: 29.3045 images/s
[10/01 17:02:26] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 17:02:27] ppdet.engine INFO: Eval iter: 0
[10/01 17:02:32] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/01 17:02:32] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 84.71%
[10/01 17:02:32] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.10603852107679
[10/01 17:02:32] ppdet.engine INFO: Best test bbox ap is 0.847.
[10/01 17:02:38] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1

《VisualDL可视化的使用方法》

  • !visualdl --logdir PaddleDetection/vdl_dir/scalar这种是打不开的,因为用的是别人的服务器
  • 要使用直接用AI Stadio 上左侧的可视化窗口就行
  • 很多参数用–xx来写是不行的,–help里面列出的都是train.py文件本身加的一些参数,可以用–来更改。train.py之外的参数,需要前面加-o来改,比如-o weight=‘path’,output_dir=‘path’,log_iter/snapshot_epoch/lr等等。这样直接在跑的时候改参数就行,不用每次都改yaml文件。

2.6 推理测试集

  • --draw_threshold :可视化时分数的阈值,默认大于0.5的box会显示出来
  • keep_top_k表示设置输出目标的最大数量,默认值为100,用户可以根据自己的实际情况进行设定。
  • 推理默认只输出图片,以前设置--save_txt=True会输出txt文件存储bbox,新版本--save_txt没了,改成了--save_results=True,存储bbox为json文件。
  • 保存的模型太多了,最佳模型在ppyoloes_plus_80e文件夹,其它都删了
!python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco-Copy1.yml \
                    --infer_dir=../VOCdevkit/test/images \
                    --output_dir=infer_output/ \
                    -o weights=output/ppyoloe_l_plus_80e/best_model.pdparams \
                    --draw_threshold=0.3 \
                    --save_results=True 
from PIL import Image

image_test='infer_output/1406.jpg'
image = Image.open(image_test)
image.show()

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)

  • 注意:测试集使用bs=8可以明显加快推理速度,但是生成的json文件,bbox/labels/score都是一个batch的数据,不是一张图片,处理起来更麻烦。
  • 比如之前bs=1,生成的json文件,每个类别是400items对应400张图,每个item是300个元素值,貌似是一张图预测300个目标
  • bs=8时,item=50,每个item数量不一样,因为有的批不满8张图(推测)

2.7 根据格式提交结果

import glob
import os
import json
import pandas as pd
class Result(object):
    def __init__(self):
        self.imagesPath = '/home/aistudio/work/VOCdevkit/test/images'
        self.bboxPath = '/home/aistudio/work/PaddleDetection/infer_output3/bbox.json'
        self.submissionPath = '/home/aistudio/work/submission.csv'

    def run(self):
        images = self.get_image_ids()
        bbox = self.get_bbox()
        results = []
        for i in range(400):
            image_id = images[i]
            for j in range(len(bbox['bbox'][i])):
                bbox_  = [round(i,4) for i in bbox['bbox'][i][j]]
                item = [
                    image_id,
                    bbox_,
                    int(bbox['label'][i][j]),
                    round(bbox['score'][i][j],2)
                    ]
                results.append(item)
        
        submit = pd.DataFrame(results, columns=['image_id', 'bbox','category_id','confidence'])
        submit[['image_id', 'bbox','category_id','confidence']].to_csv(self.submissionPath, index=False)

    def get_image_ids(self):
        idx=[]
        for image in os.listdir(self.imagesPath):
            if image.split('.')[1]=='jpg':
                idx.append(image.split('.')[0])
        idx.sort()
        return idx

    def get_bbox(self):
        with open(self.bboxPath, 'r', encoding='utf-8') as bbox:
            bbox = json.load(bbox)
        return bbox

resultObj = Result()
resultObj.run()

  最后生成的csv文件,是每张图包含300个检测目标,筛选其中scroe>0.3的作为最终结果一共1302个检测框)。最终得分41.32分。

import pandas as pd
df=pd.read_csv('../submission.csv')
df_demo=df.loc[df.confidence>0.3]
df_demo.to_csv('submission.csv',index=None) # paddledatection文件夹下
df_demo
	  image_id						bbox			category_id	confidence
0		1400	[5.4677, 0.3653, 199.2925, 61.0883]			0		0.54
1		1400	[2.2173, 71.8166, 195.2088, 131.9529]		0		0.47
2		1400	[0.6983, 26.4009, 200.0532, 131.7431]		0		0.44
3		1400	[21.8238, 151.2348, 187.4655, 199.7138]		0		0.32
343		1401	[128.7988, 43.0498, 181.4566, 196.0749]		1		0.89
...	...	...	...	...
119029	1797	[10.5545, 124.7763, 121.6406, 187.9]		0		0.33
119030	1797	[136.0446, 89.9455, 199.4311, 198.8453]		0		0.32
119031	1797	[12.9682, 91.6822, 199.3519, 193.211]		0		0.31
119393	1798	[0.2173, 0.4157, 199.9586, 160.8067]		2		0.83
119626	1799	[5.0449, 107.328, 198.9616, 185.1402]		0		0.39

三、yolov5训练(37.746分)

3.1 导入相关库

import numpy as np
from tqdm.notebook import tqdm
tqdm.pandas()
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import glob

import shutil
import sys
sys.path.append('../input/paddleirondetection')

from joblib import Parallel, delayed
from IPython.display import display

3.2 数据预处理

3.2.1 将数据集移动到work下的dataset文件夹

  • 此时路径为输出文件夹根目录,即/kaggle/working
  • 新建dataset文件夹,将原训练集图片移动到dataset下,重命名为images
  • 原xml标注文件移动到dataset下,重命名为Annotations
  • 测试集移动到dataset
!mkdir dataset
!cp -r ../input/paddleirondetection/test/test dataset
!cp -r ../input/paddleirondetection/train/train/IMAGES dataset # 直接在
!cp -r ../input/paddleirondetection/train/train/ANNOTATIONS dataset
!mv ./dataset/ANNOTATIONS ./dataset/Annotations
!mv ./dataset/IMAGES ./dataset/images
!ls dataset/images

3.2.2 用pandas处理图片名和xml文件名

# 遍历图片和标注文件夹,将所有文件后缀正确的文件添加到列表中
import os
import pandas as pd
ls_xml,ls_image=[],[]
for xml in os.listdir('../input/paddleirondetection/train/train/ANNOTATIONS'):
    if xml.split('.')[1]=='xml':
        ls_xml.append(xml)

for image in os.listdir('../input/paddleirondetection/train/train/IMAGES'):
    if image.split('.')[1]=='jpg':
        ls_image.append(image)
        

df=pd.DataFrame(ls_image,columns=['image'])
df.sort_values('image',inplace=True)
df=df.reset_index(drop=True)
s=pd.Series(ls_xml).sort_values().reset_index(drop=True)
df['xml']=s
df.head(3)
	image	 xml
0	0.jpg	0.xml
1	1.jpg	1.xml
2	10.jpg	10.xml

写入label_list.txt文件,echo -e表示碰到转义符('\n’等)按对应特殊字符处理。(这个是以前VOC数据集用的,可忽略)

!echo -e "crazing\ninclusion\npitted_surface\nscratches\npatches\nrolled-in_scale"  >  dataset/label_list.txt
!cat dataset/label_list.txt
crazing
inclusion
pitted_surface
scratches
patches
rolled-in_scale

3.2.3 生成yolov5格式的标注文件

  • rootpath是Annotations的上一个目录
  • main函数中的list可以从目录读取,也可以从df读取
    • list=df.xml.values
    • list=os.listdir(xmlpath)
  • voc格式数据集标注框是以[xmin,ymin,xmax,ymax]表示
  • 最终在dataset/labels文件夹下,生成的txt标注文件格式是:cls,[x_center,y_center,w,h],且是归一化之后的结果。(将x_center和标注框宽度w除以图像宽度,将y_center与标注框高度h除以图像高度。这样xywh的值域都是[0,1]
5 0.6075 0.14250000000000002 0.775 0.165
5 0.505 0.6825 0.79 0.525

以下转换代码来自github上的objectDetectionDatasets项目:

#!pip install mmcv
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

classes = ['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']


def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    if w>=1:
        w=0.99
    if h>=1:
        h=0.99
    return (x,y,w,h)

def convert_annotation(rootpath,xmlname):
    xmlpath = rootpath + '/Annotations' 
    xmlfile = os.path.join(xmlpath,xmlname)
    with open(xmlfile, "r", encoding='UTF-8') as in_file:
        txtname = xmlname[:-4]+'.txt' # 生成对应的txt文件名
        print(txtname)
        txtpath = rootpath + '/labels' # 生成的.txt文件会被保存在worktxt目录下
        if not os.path.exists(txtpath):
            os.makedirs(txtpath)
        txtfile = os.path.join(txtpath,txtname)
        with open(txtfile, "w+" ,encoding='UTF-8') as out_file:
            tree=ET.parse(in_file)
            root = tree.getroot()
            size = root.find('size')
            w = int(size.find('width').text)
            h = int(size.find('height').text)
            out_file.truncate()
            for obj in root.iter('object'):
                difficult = obj.find('difficult').text
                cls = obj.find('name').text
                if cls not in classes or int(difficult)==1:
                    continue
                cls_id = classes.index(cls)
                xmlbox = obj.find('bndbox')
                b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
                bb = convert((w,h), b)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

 rootpath='dataset'
 xmlpath=rootpath+'/Annotations'
 list=df.xml.values
 for i in range(0,len(list)) :
     path = os.path.join(xmlpath,list[i]) # 判断Annotations下是否是xml文件或XML文件
     if ('.xml' in path)or('.XML' in path):
         convert_annotation(rootpath,list[i])
         print('done', i)
     else:
         print('not xml file',i)
!cat dataset/labels/0.txt
5 0.6075 0.14250000000000002 0.775 0.165
5 0.505 0.6825 0.79 0.525
!ls ../dataset
Annotations  images  label_list.txt  labels  test

3.3 使用yolov5进行训练

3.3.1 安装yolov5

安装完之后路径是working/yolov5

!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt  # install
from yolov5 import utils
display = utils.notebook_init()  # check
YOLOv5 ???? v6.2-181-g8a19437 Python-3.7.12 torch-1.11.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)


Setup complete ✅ (2 CPUs, 15.6 GB RAM, 3884.4/4030.6 GB disk)

3.3.2 yolov5训练函数说明

  1. 生成train.txt和val.txt
  • gbr.yaml内容为:
    yaml:
    names:
    - crazing
    - inclusion
    - pitted_surface
    - scratches
    - patches
    - rolled-in_scale
    nc: 6
    path: /kaggle/working/  # dataset的上一级目录,绝对路径
    train: /kaggle/working/train.txt # train.txt绝对路径,好像也可以用相对路径
    val: /kaggle/working/val.txt
    
  • train.txt是划分好的训练集图片的相对地址(相对于path的地址)
with open('dataset/label_list.txt','r') as file:
    labels=[x.split('\n')[0] for x in file.readlines()]
labels
  ['crazing',
    'inclusion',
    'pitted_surface',
    'scratches',
    'patches',
    'rolled-in_scale']
  1. 生成gbr.yaml,用于训练时指定数据读取
import yaml

shuffle_df=df.sample(frac=1)
train_df=shuffle_df[:1200]
val_df=shuffle_df[1200:]

cwd='/kaggle/working/' # 数据集(dataset)的上一级目录

with open(os.path.join( cwd ,'train.txt'), 'w') as f:
    for path in train_df.image.tolist():
        f.write('./dataset/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
            
with open(os.path.join(cwd , 'val.txt'), 'w') as f:
    for path in val_df.image.tolist():
        f.write('./dataset/images/'+path+'\n')

with open(os.path.join( cwd ,'trainval.txt'), 'w') as f:
    for path in df.image.tolist():
        f.write('./dataset/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
        
data = dict(
    path  = '/kaggle/working/',
    train =  os.path.join( cwd , 'train.txt') ,
    val   =  os.path.join( cwd , 'val.txt' ),
    nc    = 6,
    names = labels,
    )

with open(os.path.join( cwd , 'gbr.yaml'), 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

f = open(os.path.join( cwd , 'gbr.yaml'), 'r')
print('\nyaml:')
print(f.read())
!head -n 3 ../train.txt

输出结果:

  yaml:
   names:
   - crazing
   - inclusion
   - pitted_surface
   - scratches
   - patches
   - rolled-in_scale
   nc: 6
   path: /kaggle/working/
   train: /kaggle/working/train.txt
   val: /kaggle/working/val.txt
   
   ./dataset/images/354.jpg
   ./dataset/images/13.jpg
   ./dataset/images/1395.jpg

3.3.3 固定随机种子,设置超参数

import torch
def set_seeds(seed):
    torch.manual_seed(seed)  # 固定随机种子(CPU)
    if torch.cuda.is_available():  # 固定随机种子(GPU)
        torch.cuda.manual_seed(seed)  # 为当前GPU设置
        torch.cuda.manual_seed_all(seed)  # 为所有GPU设置
    np.random.seed(seed)  # 保证后续使用random函数时,产生固定的随机数
    torch.backends.cudnn.deterministic = True  # 固定网络结构
set_seeds(106)
# 这么写是后面设置wandb输出文件夹时懒得复制一遍PROJECT和NAME,其实也可以不写这一段
DIM       = 256 # img_size
MODEL     = 'yolov5s6'
PROJECT   = 'paddle-iron-detection' # w&b in yolov5
NAME      = f'{MODEL}-dim{DIM}-epoch{EPOCHS}' # w&b for yolov5
NAME
'yolov5s6-dim224-epoch20'

3.3.4 启动wandb跟踪训练结果

  • 可使用github账号注册wandb,点击右上角自己的头像,下拉菜单中选择settings,在设置页下面可以看到API keys
  • !wandb.login(key=api_key)可直接启动wandb,
  • 也可以将API keys添加到kaggle的notebook中,这样每次启动wandb时就不用复制API keys了。方法如下:
    • 在notebook上方菜单栏Add-ons添加Secrets:(label写WANDB,value就是你的API keys)
      paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
    • 输入以下代码启动wandb:(因为API keys加了入环境里面,所以提示你不要分享代码)
import wandb

try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("WANDB")
    wandb.login(key=api_key)
    anonymous = None
except:
    wandb.login(anonymous='must')
    print('To use your W&B account,\nGo to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. \nGet your W&B access token from here: https://wandb.ai/authorize')
wandb: WARNING If you're specifying your api key in code, ensure this code is not shared publicly.
wandb: WARNING Consider setting the WANDB_API_KEY environment variable, or running `wandb login` from the command line.
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc

3.4 yolov5s训练

3.4.1 VOC.yaml训练20个epoch,

(实验发现img=256比默认640效果更好)

!python train.py --img 256 --batch 16 --epochs 20 --optimizer Adam \
          --data ../gbr.yaml --hyp data/hyps/hyp.VOC.yaml\
          --weights yolov5s.pt --project {project} --name {name} 
Model summary: 157 layers, 7026307 parameters, 0 gradients, 15.8 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 7/7 [00:02<00:00,  3.08it/s]
                   all        200        420      0.644      0.672      0.689      0.321
               crazing        200         83      0.515      0.325      0.361      0.112
             inclusion        200         90      0.604      0.711      0.755      0.349
        pitted_surface        200         48      0.829      0.792        0.8      0.415
             scratches        200         59      0.828      0.831        0.9      0.398
               patches        200         64       0.65      0.953       0.91      0.483
       rolled-in_scale        200         76      0.436      0.421      0.408       0.17
Results saved to paddle-iron-detection/yolov5s6-dim224-epoch20

3.4.2 训练结果可视化

这些训练结果都代表啥,可以查看《yolov5 训练结果解析》

  1. 查看训练结果
import pandas as pd
result=pd.read_csv('paddle-iron-detection/yolov5s6-dim224-epoch20/results.csv')
result
  1. 打开wandb网站,查看试验跟踪结果:
    点击wandb主页面,选择project下面自己的项目
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
    点进去就可以看到下面这张图:(loss result)
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
    如果在主页点进去某一个runs
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
    就无法显示上面的图片,只有这个runs的结果,比如metirc:
    paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
  2. 设定wandb输出文件夹,用于可视化展示
OUTPUT_DIR = '{}/{}'.format(PROJECT, NAME)
!ls {OUTPUT_DIR}
F1_curve.png					   results.png
PR_curve.png					   train_batch0.jpg
P_curve.png					   train_batch1.jpg
R_curve.png					   train_batch2.jpg
confusion_matrix.png				   val_batch0_labels.jpg
events.out.tfevents.1664736500.2cd00906b272.888.0  val_batch0_pred.jpg
hyp.yaml					   val_batch1_labels.jpg
labels.jpg					   val_batch1_pred.jpg
labels_correlogram.jpg				   val_batch2_labels.jpg
opt.yaml					   val_batch2_pred.jpg
results.csv					   weights
  • 查看类别分布:
# 这是另一个比赛的图,仅做展示。这个cells的输出我删了,懒得再跑一次了

plt.figure(figsize = (10,10))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/labels_correlogram.jpg'));

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)

  • 查看3个bacth的图片:
# Wandb界面的Mosaics(yolov5s6-dim3000-fold1)

import matplotlib.pyplot as plt
plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch0.jpg'))

plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch1.jpg'))

plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch2.jpg'))

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)

  • 真实框和预测框对比:
fig, ax = plt.subplots(3, 2, figsize = (2*9,3*5), constrained_layout = True)
for row in range(3):
    ax[row][0].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg'))
    ax[row][0].set_xticks([])
    ax[row][0].set_yticks([])
    ax[row][0].set_title(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg', fontsize = 12)
    
    ax[row][1].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg'))
    ax[row][1].set_xticks([])
    ax[row][1].set_yticks([])
    ax[row][1].set_title(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg', fontsize = 12)
plt.show()

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
可以看到,还是有很多没预测出来,也有一些预测框有偏差的。

3.4.3 Objects365.yamll训练20个epoch,结果有提升

Model summary: 157 layers, 7026307 parameters, 0 gradients, 15.8 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 7/7 [00:02<00:00,  2.80it/s]
                   all        200        420      0.694      0.728      0.745      0.381
               crazing        200         83      0.498      0.359      0.391      0.125
             inclusion        200         90      0.638      0.706      0.761      0.371
        pitted_surface        200         48      0.881      0.792      0.829      0.468
             scratches        200         59      0.854      0.894       0.95      0.511
               patches        200         64      0.775      0.984      0.947      0.563
       rolled-in_scale        200         76      0.518      0.632      0.592      0.247

3.4.4 其它尝试:平衡明暗度

观察发现,数据集明暗程度相差很多,利用直方图均衡化,平衡图像的明暗度。

# 处理测试集图片
test_path = '../dataset/test/IMAGES'
test_path1 = test_path+'_equ'
os.makedirs(test_path1,exist_ok=1)
for i in os.listdir(test_path):
  underexpose = cv2.imread(os.path.join(test_path,i))

  equalizeUnder = np.zeros(underexpose.shape, underexpose.dtype)
  equalizeUnder[:, :, 0] = cv2.equalizeHist(underexpose[:, :, 0])
  equalizeUnder[:, :, 1] = cv2.equalizeHist(underexpose[:, :, 1])
  equalizeUnder[:, :, 2] = cv2.equalizeHist(underexpose[:, :, 2])
  cv2.imwrite(os.path.join(test_path1,i),equalizeUnder)
# 处理训练集图片
train_path = '../dataset/images'
train_path1 = test_path+'_equ'
os.makedirs(train_path1,exist_ok=1)
for i in os.listdir(train_path):
  underexpose = cv2.imread(os.path.join(train_path,i))

  equalizeUnder = np.zeros(underexpose.shape, underexpose.dtype)
  equalizeUnder[:, :, 0] = cv2.equalizeHist(underexpose[:, :, 0])
  equalizeUnder[:, :, 1] = cv2.equalizeHist(underexpose[:, :, 1])
  equalizeUnder[:, :, 2] = cv2.equalizeHist(underexpose[:, :, 2])
  cv2.imwrite(os.path.join(train_path1,i),equalizeUnder)
# 将处理后的训练集和测试集、标注文件夹、labels文件夹都移动到新文件夹dataset_equ
!mkdir ../dataset_equ
# 移动训练集
!mv ../dataset/images_equ/ ../dataset_equ
!mv ../dataset/test/IMAGES_equ/ ../dataset_equ
# 移动测试集
!mv ../dataset_equ/images_equ  ../dataset_equ/images

# 移动标注文件,其实是voc格式的标注,已经没用了
!mv ../dataset/Annotations ../dataset_equ

# 移动lables
!cp -r ../dataset/labels ../dataset_equ
!ls ../dataset_equ

移动完后,需要重新写一下gbr.yaml文件

import yaml

cwd='/kaggle/working/' # 数据集(dataset)的上一级目录

with open(os.path.join(cwd,'train_equ.txt'), 'w') as f:
    for path in train_df.image.tolist():
        f.write('./dataset_equ/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
            
with open(os.path.join(cwd ,'val_equ.txt'), 'w') as f:
    for path in val_df.image.tolist():
        f.write('./dataset_equ/images/'+path+'\n')

data = dict(
    path  = '/kaggle/working/',
    train =  os.path.join(cwd,'train_equ.txt') ,
    val   =  os.path.join(cwd,'val_equ.txt' ),
    nc    = 6,
    names = labels,
    )

with open(os.path.join( cwd , 'gbr_equ.yaml'), 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

f = open(os.path.join( cwd , 'gbr_equ.yaml'), 'r')
print('\nyaml:')
print(f.read())
!head -n 3 ../train_equ.txt
!python train.py --img 256 --batch 16 --epochs 20 --optimizer Adam \
          --data ../gbr_equ.yaml --hyp data/hyps/hyp.Objects365.yaml\
          --weights yolov5s.pt --project {project} --name yolov5s-obj-adam20-equ 

结果并不好:

	 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 7/7 [00:02<00:00,  2.66it/s]
       all        200        420       0.57      0.648      0.651      0.328
   crazing        200         83      0.472      0.265      0.329      0.105
 inclusion        200         90      0.543      0.711      0.705      0.328
pitted_surface    200         48      0.685      0.816      0.832      0.527
 scratches        200         59      0.761      0.701      0.787      0.377
   patches        200         64      0.644      0.922      0.898      0.522
rolled-in_scale   200         76      0.314      0.474      0.354      0.109

3.5 yolov5x训练

3.5.1 训练100个epoch

PROJECT   = 'paddle-iron-detection' # w&b in yolov5

!python train.py --img 256 --data ../gbr.yaml --hyp data/hyps/hyp.Objects365.yaml\
          --weights yolov5x.pt --project {project} --name yolov5x-default \
          --patience 20 --epoch 100 --cache

patience20 表示20个epoch内模型都没有优化就会停止训练。cache表示图片会先加载到内存再训练,可以加快训练速度。

训练花了一小时,第98个epoch效果最好,提升了一点。

		Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 7/7 [00:03<00:00,  2.24it/s]
           all         200        420      0.765      0.753      0.794      0.445
        crazing        200         83      0.509      0.449      0.455      0.163
      inclusion        200         90      0.709        0.8       0.85      0.456
 pitted_surface        200         48      0.923      0.833      0.883      0.541
      scratches        200         59      0.945      0.873      0.975      0.536
        patches        200         64      0.845      0.969      0.941      0.672
rolled-in_scale        200         76      0.659      0.592      0.662      0.301
  • 直接推理提交,结果是score=36(没有用augment)
  • 加入验证集,lr=0.1*lr,训练20epoch,每个epoch都保存模型(gbr_all.yaml就是把train.txt换成trainval.txt)
!python train.py --img 256 --data ../gbr_all.yaml --hyp data/hyps/hyp.Objects365.yaml\
          --weights paddle-iron6/yolov5x-default/weights/best.pt --project {project} --name yolov5x-120 \
          --epoch 20 --save-period 1

第19个epoch效果最好,进行推理,提交后分数37.74。

!python detect.py --weights paddle-iron6/yolov5x-1203/weights/best.pt --augment\
          		  --img 256 --conf 0.3 --source ../dataset/test/IMAGES --save-txt --save-conf

3.5.2 其它尝试

  1. 修改错误标签
    标注图片剪裁之后查看,发现有些标签应该是标注错误,在对应的txt文件里修改后训练,效果更差。(不明白为啥还变差了)
import xml.etree.ElementTree as ET
from pathlib import Path
import random

# 原图片、标签文件、裁剪图片路径
img_path = 'dataset/IMAGES'
xml_path = 'train/ANNOTATIONS'
obj_img_path = 'train/clip'

if os.path.exists(obj_img_path) :
    print(f'{obj_img_path} is exist')
else:
    os.mkdir(obj_img_path) # 裁剪目录要先创建,不然后面在此目录接着创建子目录会报错
    
# 声明一个空字典用于储存裁剪图片的类别及其数量
clip= {}

# 把原图片裁剪后,按类别新建文件夹保存,并在该类别下按顺序编号
for img_file in os.listdir(img_path):
    if img_file[-4:] in ['.png', '.jpg']:  # 判断文件是否为图片格式
        img_filename = os.path.join(img_path, img_file)  # 将图片路径与图片名进行拼接,例如‘train/IMAGES\0.jpg’
        img_cv = cv2.imread(img_filename)  # 读取图片

        img_name = (os.path.splitext(img_file)[0])  # 图片索引,如“000.png” 图片名为“000”
        xml_name = xml_path + '\\' + '%s.xml' % img_name  # 完整的标签路径名,例如‘train/ANNOTATIONS\0.xml’

        if os.path.exists(xml_name):  # 判断与图片同名的标签是否存在,因为图片不一定每张都打标
            root = ET.parse(xml_name).getroot()  # 利用ET读取xml文件
            for obj in root.iter('object'):  # 遍历所有目标框
                name = obj.find('name').text  # 获取目标框名称,即label名
                xmlbox = obj.find('bndbox')  # 找到框目标
                x0 = xmlbox.find('xmin').text  # 将框目标的四个顶点坐标取出
                y0 = xmlbox.find('ymin').text
                x1 = xmlbox.find('xmax').text
                y1 = xmlbox.find('ymax').text

                obj_img = img_cv[int(y0):int(y1), int(x0):int(x1)]  # cv2裁剪出目标框中的图片

                clip.setdefault(name, 0)  # 判断字典中有无当前name对应的类别,无则新建
                clip[name] += 1  # 当前类别对应数量 + 1
                my_file = Path(obj_img_path + '/' + name)  # 判断当前name对应的类别有无文件夹
                if 1 - my_file.is_dir():  # 无则新建
                    os.mkdir(str(obj_img_path + '/' + str(name)))
                
                # 保存裁剪图片,图片命名4位,不足补0
                #cv2.imwrite(obj_img_path + '/' + name + '/' + '%04d' % (clip[name]) + '.jpg',obj_img) # 按顺序命名裁剪图片 

				# 裁剪图片名为原图片名+顺序名
                cv2.imwrite(obj_img_path + '/' + name + '/' + img_name+'_'+ '%04d' % (clip[name])+'.jpg',obj_img)

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
可以明显看到有些标签是错的。

  1. 多尺度训练(设置--multi-scale),结果变差了
    多尺度训练是指设置几种不同的图片输入尺度,训练时每隔一定iterations随机选取一种尺度训练,能够在一定程度上提高检测模型对物体大小的鲁棒性。
  2. 启用加权图像策略(--image-weights),结果也变差了。
    主要是为了解决样本不平衡问题。开启后会对于上一轮训练效果不好的图片,在下一轮中增加一些权重
  3. 在100epoch基础上,只训练0和5这两类物体。
    0和5这两类错误率太高,重新读取标注的txt文件,只选取有0或者5这两类的图片进行训练,且标注的txt文件,去掉它标注框。结果训练不理想,感觉是哪里写错了。

3.6 从头训练(效果都不好,也可能调参没调好)

3.6.1 yolov5l从头训练,hyp.scratch-med

!python train.py --img 256 --batch 16 --epochs 50 --weights=None\
                 --data /kaggle/working/gbr.yaml --hyp data/hyps/hyp.scratch-med.yaml\
                 --project kaggle-iron --name yolov5l-scratch --cfg models/yolov5l.yaml 
YOLOv5l summary: 267 layers, 46135203 parameters, 0 gradients, 107.7 GFLOPs
                 Class     Images  Instances          P          R      mAP50   
                   all        200        473      0.573      0.672       0.67      0.323
               crazing        200         66      0.433      0.227      0.324     0.0922
             inclusion        200        127      0.647      0.748      0.741      0.325
        pitted_surface        200         33      0.674      0.727      0.759      0.475
             scratches        200         68      0.519      0.809      0.745      0.303
               patches        200        120      0.738      0.925      0.924      0.535
       rolled-in_scale        200         59      0.424      0.598      0.525       0.21

3.6.2 yolov5l从头训练,hyp.scratch-low

YOLOv5l summary: 267 layers, 46135203 parameters, 0 gradients, 107.7 GFLOPs
                 Class     Images  Instances          P          R      mAP50   
                   all        200        473      0.703      0.679      0.732      0.358
               crazing        200         66      0.793      0.303      0.511      0.162
             inclusion        200        127      0.671      0.756      0.755      0.349
        pitted_surface        200         33      0.737      0.697      0.741      0.446
             scratches        200         68      0.626      0.824       0.84      0.395
               patches        200        120      0.853      0.917      0.936      0.563
       rolled-in_scale        200         59      0.537      0.576      0.606      0.229

3.7 测试集推理

《yolov5 --save-txt 生成的txt怎么设置为覆盖而不是追加到txt中》

3.7.1 测试集上推理

  • --save-txt --save-conf:表示预测结果保存为txt,且保存置信度分数
!python detect.py --weights paddle-iron-detection/yolov5m-dim224-epoch50/weights/best.pt\
                  --img 224 --conf 0.25 --source ../dataset/test/IMAGES --save-txt --save-conf
  1. 查看推理图片:

  最终结果保存在yolov5runs/detect下,每跑一次模型生成一个exp文件夹。我跑了三次,所以结果在runs/detect/exp3/,txt文件在runs/detect/exp3/labels

display.Image(filename='runs/detect/exp3/1401.jpg', width=300)

paddle学习赛——钢铁目标检测(yolov5、ppyoloe+,Faster-RCNN)
2. 查看推理结果(txt文件)

!cat runs/detect/exp3/labels/1401.txt
1 0.7825 0.5775 0.205 0.775 0.478262
  1. yolo中txt文件的说明:
    YOLO模型,会直接把每张图片标注的标签信息保存到一个txt文件中。txt信息说明:
    • 每个txt是一张图片的预测结果,txt文件中每一行是一个标注框
    • 默认只有class(以索引表示)和bbox坐标。bbox坐标格式是yolo格式,即[x_center,y_center,w,h],且被归一化((将x_center和w除以图像宽度,将y_center与h除以图像高度。这样xywh的值域都是[0,1]))
    • 如果设置-save-conf会在bbox后面保存置信度结果

3.7.2 生成比赛结果

需要按照题目要求的格式处理预测结果。

  1. 遍历txt文件,按格式读取到pandas

一开始没有注意数据格式问题,怎么保存csv bbox的逗号都没了,折腾了一天

import pandas as pd
import numpy as np

result_list = []
for name in os.listdir('dataset/test/IMAGES'): # 遍历测试集图片
    idx=name.split('.')[0] # 图片索引
    txt_name = 'uns/detect/exp3/labels/'+idx+'.txt'
    try:                           # 如果这张图片有预测到结果,就写入以下信息
        with open(txt_name, 'r') as f:
            predicts = f.readlines() # 从txt文本读取的是字符串格式,要转为对应的数字格式
            for predict in predicts:
                pred=predict.split(' ')
                cls=pred[0]
                bbox=[float(x) for x in pred[1:5]]
                score=pred[5].rstrip() # 去掉右侧换行符
                result_list.append([idx,bbox,cls,score])
    except:                            # 如果没有预测到检测框,就只返回idx
        result_list.append([idx])
                    
df= pd.DataFrame(result_list,columns=['image_id','bbox','category_id','confidence'])
df.head()
image_id			bbox					category_id	confidence
0	1400			None							None	None
1	1401	[0.7825, 0.5775, 0.205, 0.775]			1	0.478262
2	1402	[0.785, 0.5, 0.42, 1]					2	0.419653
3	1402	[0.445, 0.4875, 0.84, 0.975]			2	0.437668
4	1403	[0.3675, 0.5, 0.165, 1]					3	0.765889
  1. 转换数据格式
# pd.to_numeric也可以将series里面可以转换为数字的值转为数字,不能转换的可以保留原格式/设为缺失值/报错
df.image_id=pd.to_numeric(df.image_id,errors='ignore')  
df.category_id=df.category_id.astype('int')
df.confidence=df.confidence.astype('float')
df.info()

 0   image_id     982 non-null    int64  
 1   bbox         982 non-null    object 
 2   category_id  982 non-null    int32  
 3   confidence   982 non-null    float64
dtypes: float64(1), int32(1), int64(1), object(1)
  1. 定义yolo2voc函数,用于将YOLO格式预测框转为比赛要的VOC格式

代码来自github上的bbox包,用法可参考《kaggle——海星目标检测比赛》帖子中的3.4章节:生成标注文件

# 打印可以看到测试集图片尺寸都是200,200
from PIL import Image

for i,name in enumerate(os.listdir('dataset/test/IMAGES')):
    img_name='dataset/test/IMAGES/'+name
    image = Image.open(img_name)
    print(image.size)
def yolo2voc(bboxes, height=200, width=200):
    """
    yolo => [xmid, ymid, w, h] (normalized)
    voc  => [x1, y1, x2, y1]
    
    """ 
#     bboxes = bboxes.copy().astype(float) # otherwise all value will be 0 as voc_pascal dtype is np.int
    
    bboxes[..., 0::2] *= width
    bboxes[..., 1::2] *= height
    
    bboxes[..., 0:2] -= bboxes[..., 2:4]/2
    bboxes[..., 2:4] += bboxes[..., 0:2]
    
    return bboxes
# yolog格式预测框转为voc格式预测框
df.bbox=df.bbox.apply(lambda x: yolo2voc(np.array(x).astype(np.float32)))

"""
转完格式后,bbox是array格式,直接保存csv文件,bbox这一列没有逗号,我也不知道为啥会这样,坑死我了
必须转为list格式,bbox在保存csv时,列表中才有逗号,不然就是[0.0 3.0 200.0 67.0]的格式
"""
df.bbox=df.bbox.apply(lambda x:list(x)) 

# 比赛提交的csv文件,不需要index,但必须有列名,否则报错异常
df.to_csv('submission.csv',index=None,header=None)

四、FasterRCNN,40.935分(PaddleX)

4.1.构造数据集

将数据按照8:2的比例划分为训练集和测试集。这一部分直接使用了参考代码。

#首先将训练集解压缩
!unzip -oq /home/aistudio/data/data105746/train.zip -d /home/aistudio/work/
#测试集集解压缩
!unzip -oq /home/aistudio/data/data105747/test.zip -d /home/aistudio/work/
#删除生成的_MACOSX
!rm -rf /home/aistudio/work/__MACOSX

#遍历训练数据,将数据以8:2划分为训练集和验证集,如果已经完成了,就不需要在进行此步骤了

import os
name = [name for name in os.listdir('work/train/IMAGES') if name.endswith('.jpg')]

train_name_list=[]
for i in name:
    tmp = os.path.splitext(i)
    train_name_list.append(tmp[0])

# 构造图片-xml的链接文件ori_train.txt
with open("./work/train/ori_train.txt","w") as f:
    for i in range(len(train_name_list)):
        if i!=0: f.write('\n')
        line='IMAGES/'+train_name_list[i]+'.jpg'+" "+"ANNOTATIONS/"+train_name_list[i]+'.xml' 
        f.write(line)

# 构造label.txt
labels=['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']
with open("./work/train/labels.txt","w") as f:
    for i in range(len(labels)):
        line=labels[i]+'\n'
        f.write(line)

# 将ori_train.txt随机按照eval_percent分为验证集文件和训练集文件
# eval_percent 验证集所占的百分比
import random
eval_percent=0.2;

data=[]
with open("work/train/ori_train.txt", "r") as f:
    for line in f.readlines():
        line = line.strip('\n')
        data.append(line)

index=list(range(len(data)))
random.shuffle(index)

# 构造验证集文件
cut_point=int(eval_percent*len(data))
with open("./work/train/val_list.txt","w") as f:
    for i in range(cut_point):
        if i!=0: f.write('\n')
        line=data[index[i]]
        f.write(line)

# 构造训练集文件
with open("./work/train/train_list.txt","w") as f:
    for i in range(cut_point,len(data)):
        if i!=cut_point: f.write('\n')
        line=data[index[i]]
        f.write(line)

4.2 安装需要的PaddleX版本

# 安装paddlex
# 需要注意paddlex1对于版本有所要求,所以最好更新对应的包版本
!pip install "numpy<=1.19.5" -i https://mirror.baidu.com/pypi/simple
!pip install paddlex==2.0.0
#引入需要使用的库
import matplotlib
matplotlib.use('Agg') 
import os
#os.environ['GPU_VISIBLE_DEVICES'] = '0'#似乎不需要使用这条语句
import paddlex as pdx
import numpy as np

4.3 定义数据处理流程

from paddlex import transforms
train_transforms = transforms.Compose([
    #transforms.MixupImage(mixup_epoch=250),
    #transforms.RandomDistort(),
    #transforms.RandomExpand(),
    #transforms.RandomCrop(),
    transforms.RandomResizeByShort(short_sizes=[640, 672, 704, 736, 768, 800],
                          max_size=1333,
                          interp='RANDOM'), 
    transforms.RandomHorizontalFlip(), 
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])#在数据增强方面,大多数增强方式都不利于模型精度的提高,因此只选用了图片翻转,后期为了训练的稳定性,将关掉所有的数据增强。
#另外进行了图片的缩放和归一化便于进行训练。

eval_transforms = transforms.Compose([
    transforms.ResizeByShort(short_size=800, 
                    max_size=1333,
                    interp='CUBIC'), 
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                std=[0.229, 0.224, 0.225])
])

4.4 定义数据集的Dataset

在训练前期仅使用训练集数据进行训练,在训练的末期将所有的图片都用于训练

train_dataset = pdx.datasets.VOCDetection(
    data_dir='work/train',
    file_list='work/train/train_list.txt',
    label_list='work/train/labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
    data_dir='work/train',
    file_list='work/train/val_list.txt',
    label_list='work/train/labels.txt',
    transforms=eval_transforms)

trainval_dataset = pdx.datasets.VOCDetection(
    data_dir='work/train',
    file_list='work/train/ori_train.txt',
    label_list='work/train/labels.txt',
    transforms=eval_transforms)

4.5定义模型网络

出于题目的排名是基于网络精度而进行的,所以选用精度更高的两阶段法Fast-RCNN,并且backbone选用ResNet101_vd

#num_classes = len(train_dataset.labels)
model = pdx.det.FasterRCNN(num_classes=6,
                           backbone='ResNet101_vd')

4.6定义参数的学习率以及优化方式

  因为使用了预训练模型所以在模型训练的初期使用warm-up学习率进行训练,在模型稳定了之后使用余弦退火衰减学习率。(余弦退火衰减学习率效果并不好,所以结果舍去了)
  选择带有动量的SGD作为优化器,同时对所有的参数设置了L2正则化系数。

import paddle
train_batch_size = 8
num_steps_each_epoch = 1120 // train_batch_size
num_epochs = 80

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
    learning_rate=0.06, 
    T_max=num_steps_each_epoch * 12 // 3)
    
warmup_epoch = 1
warmup_steps = warmup_epoch * num_steps_each_epoch
scheduler = paddle.optimizer.lr.LinearWarmup(
    learning_rate=scheduler,
    warmup_steps=warmup_steps,
    start_lr=0.006,
    end_lr=0.06)
    
custom_optimizer = paddle.optimizer.Momentum(
            scheduler,
            momentum=.9,
            weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
            parameters=model.net.parameters())

4.7 开始训练模型(bs=8,200s/epoch)

model.train(num_epochs = num_epochs, 
            train_dataset = train_dataset, 
            train_batch_size=train_batch_size, 
            eval_dataset=eval_dataset, 
            optimizer=custom_optimizer, 
            save_interval_epochs=1, 
            log_interval_steps=280, 
            save_dir='output/T001', 
            pretrain_weights='COCO', 
            early_stop=True, 
            early_stop_patience=5, 
            use_vdl=True,
            metric='coco',
            #pretrain_weights = None,
            #resume_checkpoint = "output/T008_101_vdMCpie3*lr/epoch_38_78.376"
            )
    2022-10-09 02:35:08 [INFO]	There are 556/560 variables loaded into FasterRCNN.

    2022-10-09 03:12:04 [INFO]	[TRAIN] Epoch=14/80, Step=70/560, loss_rpn_cls=0.001508, loss_rpn_reg=0.012554, loss_bbox_cls=0.080150, loss_bbox_reg=0.139579, loss=0.233791, lr=0.000018, time_each_step=0.26s, eta=3:12:33
    2022-10-09 03:12:20 [INFO]	[TRAIN] Epoch=14/80, Step=140/560, loss_rpn_cls=0.001076, loss_rpn_reg=0.011392, loss_bbox_cls=0.074022, loss_bbox_reg=0.163931, loss=0.250421, lr=0.000071, time_each_step=0.24s, eta=3:1:1
    2022-10-09 03:12:38 [INFO]	[TRAIN] Epoch=14/80, Step=210/560, loss_rpn_cls=0.006574, loss_rpn_reg=0.023940, loss_bbox_cls=0.170613, loss_bbox_reg=0.312816, loss=0.513943, lr=0.000160, time_each_step=0.26s, eta=3:11:41
    2022-10-09 03:12:56 [INFO]	[TRAIN] Epoch=14/80, Step=280/560, loss_rpn_cls=0.006941, loss_rpn_reg=0.005008, loss_bbox_cls=0.090919, loss_bbox_reg=0.162063, loss=0.264931, lr=0.000283, time_each_step=0.25s, eta=3:10:20
    2022-10-09 03:13:13 [INFO]	[TRAIN] Epoch=14/80, Step=350/560, loss_rpn_cls=0.019845, loss_rpn_reg=0.019525, loss_bbox_cls=0.047384, loss_bbox_reg=0.059670, loss=0.146424, lr=0.000440, time_each_step=0.24s, eta=3:2:0
    2022-10-09 03:13:30 [INFO]	[TRAIN] Epoch=14/80, Step=420/560, loss_rpn_cls=0.009172, loss_rpn_reg=0.023620, loss_bbox_cls=0.186319, loss_bbox_reg=0.234781, loss=0.453892, lr=0.000629, time_each_step=0.25s, eta=3:4:28
    2022-10-09 03:13:48 [INFO]	[TRAIN] Epoch=14/80, Step=490/560, loss_rpn_cls=0.002352, loss_rpn_reg=0.074411, loss_bbox_cls=0.052982, loss_bbox_reg=0.085663, loss=0.215407, lr=0.000848, time_each_step=0.25s, eta=3:7:50
    2022-10-09 03:14:06 [INFO]	[TRAIN] Epoch=14/80, Step=560/560, loss_rpn_cls=0.001969, loss_rpn_reg=0.014362, loss_bbox_cls=0.112086, loss_bbox_reg=0.211446, loss=0.339863, lr=0.001095, time_each_step=0.26s, eta=3:11:11
    2022-10-09 03:14:06 [INFO]	[TRAIN] Epoch 14 finished, loss_rpn_cls=0.00589173, loss_rpn_reg=0.02162566, loss_bbox_cls=0.089257866, loss_bbox_reg=0.14901184, loss=0.26578707 .
    2022-10-09 03:14:06 [WARNING]	Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
    2022-10-09 03:14:06 [INFO]	Start to evaluate(total_samples=280, total_steps=280)...
    2022-10-09 03:14:29 [INFO]	Start evaluate...
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.36s).
    Accumulating evaluation results...
    DONE (t=0.09s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.426
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.778
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.426
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.265
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.363
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.495
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.264
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.555
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.563
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.429
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.547
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
    2022-10-09 03:14:30 [INFO]	[EVAL] Finished, Epoch=14, bbox_mmap=0.425782 .
    2022-10-09 03:14:34 [INFO]	Model saved in output/T001/best_model.
    2022-10-09 03:14:34 [INFO]	Current evaluated best model on eval_dataset is epoch_14, bbox_mmap=0.42578244961501155
    2022-10-09 03:14:35 [INFO]	Model saved in output/T001/epoch_14.

  1. 第14个epoch效果最好,使用余弦学习率继续训练,结果变差了,舍去。
custom_optimizer = paddle.optimizer.Momentum(
            scheduler2,
            momentum=.9,
            weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
            parameters=model.net.parameters())
  1. 加载最优模型,使用全部数据集和余弦学习率进行训练。(学习率改为1/6)
import  paddle
train_batch_size = 8
num_steps_each_epoch = 1400 // train_batch_size
num_epochs = 10

scheduler3 = paddle.optimizer.lr.CosineAnnealingDecay(
    learning_rate=0.001, 
    T_max=num_steps_each_epoch * 12 // 3)

custom_optimizer = paddle.optimizer.Momentum(
            scheduler3,
            momentum=.9,
            weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
            parameters=model.net.parameters())

  paddelx中训练时默认每个epoch都保存模型,比如best_model文件夹中,model.pdparams应该是存储模型参数,model.pdopt存储的是优化器信息。

  • 设置pretrain_weights = None,resume_checkpoint = "output/bset_model"可以进行断点续传,继续训练,前提是模型和优化器都要一样
  • 如果优化器改了,只能加载模型参数继续训练
# 10epoch,1991s

model.train(num_epochs = num_epochs, 
            train_dataset = trainval_dataset, 
            train_batch_size=train_batch_size, 
            eval_dataset=eval_dataset, 
            optimizer=custom_optimizer, 
            save_interval_epochs=1, 
            log_interval_steps=90, 
            save_dir='output/T002', 
            early_stop=True, 
            early_stop_patience=5, 
            use_vdl=True,
            metric='coco',
            pretrain_weights = "output/T001/best_model/model.pdparams"
            )
    2022-10-09 04:31:14 [INFO]	[TRAIN] Epoch 5 finished, loss_rpn_cls=0.005495656, loss_rpn_reg=0.020394348, loss_bbox_cls=0.08805889, loss_bbox_reg=0.1477338, loss=0.2616827 .
    2022-10-09 04:31:14 [WARNING]	Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
    2022-10-09 04:31:15 [INFO]	Start to evaluate(total_samples=280, total_steps=280)...
    2022-10-09 04:31:38 [INFO]	Start evaluate...
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.35s).
    Accumulating evaluation results...
    DONE (t=0.08s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.469
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.824
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.483
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.353
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.531
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.590
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.596
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.533
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.583
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
    2022-10-09 04:31:39 [INFO]	[EVAL] Finished, Epoch=5, bbox_mmap=0.468618 .
    2022-10-09 04:31:43 [INFO]	Model saved in output/T002/best_model.
    2022-10-09 04:31:43 [INFO]	Current evaluated best model on eval_dataset is epoch_5, bbox_mmap=0.46861803517092365
    2022-10-09 04:31:44 [INFO]	Model saved in output/T002/epoch_5.
    
    2022-10-09 04:44:38 [INFO]	[TRAIN] Epoch 9 finished, loss_rpn_cls=0.004496215, loss_rpn_reg=0.01914196, loss_bbox_cls=0.079838865, loss_bbox_reg=0.13827054, loss=0.24174762 .
    2022-10-09 04:44:38 [WARNING]	Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
    2022-10-09 04:44:38 [INFO]	Start to evaluate(total_samples=280, total_steps=280)...
    2022-10-09 04:45:00 [INFO]	Start evaluate...
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.29s).
    Accumulating evaluation results...
    DONE (t=0.08s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.517
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.869
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.545
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.399
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.573
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.308
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.626
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.628
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.558
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.624
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.657
    2022-10-09 04:45:01 [INFO]	[EVAL] Finished, Epoch=9, bbox_mmap=0.517191 .
    2022-10-09 04:45:05 [INFO]	Model saved in output/T002/best_model.
    2022-10-09 04:45:05 [INFO]	Current evaluated best model on eval_dataset is epoch_9, bbox_mmap=0.5171907953103836
    2022-10-09 04:45:06 [INFO]	Model saved in output/T002/epoch_9.

4.8 进行预测

  上面使用全部数据训练,包括了验证集,所以不能以最优aAP作为指标 ,而是看训练集的loss。在visual中看到第5个epoch,train_loss最小,加载此模型进行推理。
直接使用参考代码进行预测,并生成可以直接提交的CSV文件

import paddlex as pdx
import os
import numpy as np
import pandas as pd

# 读取模型
model = pdx.load_model('output/T002/epoch_5')
#获取测试图片的序号
name = [name for name in os.listdir('work/test/IMAGES') if name.endswith('.jpg')]

test_name_list=[]
for i in name:
    tmp = os.path.splitext(i)
    test_name_list.append(tmp[0])
test_name_list.sort()
# 建立一个标号和题目要求的id的映射
num2index={'crazing':0,'inclusion':1,'pitted_surface':2,'scratches':3,'patches':4,'rolled-in_scale':5}

result_list = []

# 将置信度较好的框写入result_list
for index in test_name_list:
    image_name = 'work/test/IMAGES/'+index+'.jpg'
    predicts = model.predict(image_name)
    for predict in predicts:
        if predict['score']<0.5: continue;
        # 将bbox转化为题目中要求的格式
        tmp=predict['bbox']
        tmp[2]+=tmp[0]
        tmp[3]+=tmp[1]
        line=[index,tmp,num2index[predict['category']],predict['score']]
        result_list.append(line)

result_array = np.array(result_list)
df = pd.DataFrame(result_array,columns=['image_id','bbox','category_id','confidence'])

df.to_csv('output/T002/submission.csv',index=None)

  前14个epoch的best-model,提交预测结果是40.005分,后面又训练5个epoch后,结果是40.935分。

4.9 预测结果可视化

预测图片保存在output/T002/visualize

# 48s
for index in test_name_list:
    image_name = 'work/test/IMAGES/'+index+'.jpg'
    predicts = model.predict(image_name)
    pdx.det.visualize(image_name, predicts, threshold=0.5, save_dir='output/T002/visualize')

五、FasterRCNN+Swin(PaddleDetection COCO,44.07分)

以下基本是第一名方案的代码,使用的是PaddleDetection2.3版本,所以有些地方会不一样。

5.1 数据准备

# 解压文件并移除多余的目录
! unzip /home/aistudio/data/data105746/train.zip -d /home/aistudio/data/steel
!rm -r /home/aistudio/data/steel/__MACOSX
! unzip /home/aistudio/data/data105747/test.zip -d /home/aistudio/data/steel
!rm -r /home/aistudio/data/steel/__MACOSX

# 修改文件名字 JPEGImages  Annotations
!mv /home/aistudio/data/steel/train/ANNOTATIONS  /home/aistudio/data/steel/train/Annotations
!mv /home/aistudio/data/steel/train/IMAGES  /home/aistudio/data/steel/train/JPEGImages

5 .2 使用paddleX拆分数据集

下面是Paddle提供的两个处理数据的命令说明文档。
# 安装paddlex 用于拆分数据集
# 升级pip
!pip install --upgrade pip -i https://mirror.baidu.com/pypi/simple
!pip install "paddlex>2.0.0" -i https://mirror.baidu.com/pypi/simple 
!paddlex --split_dataset --format VOC --dataset_dir /home/aistudio/data/steel/train \
         --val_value 0.001 --test_value 0.0

5.3 安装PaddleDetection2.3,生成coco标注的json文件

# 下载PaddleDetection
%cd /home/aistudio/work
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b release/2.3 

# 进入PaddleDetection
%cd /home/aistudio/work/PaddleDetection
# 安装其它依赖
!pip install -r /home/aistudio/work/PaddleDetection/requirements.txt  
# 临时环境安装
!pip install pycocotools -i https://mirror.baidu.com/pypi/simple
!pip install lap -i https://mirror.baidu.com/pypi/simple
%cd /home/aistudio/work/PaddleDetection/
#转换train
!python tools/x2coco.py \
        --dataset_type voc \
        --voc_anno_dir /home/aistudio/data/steel/train/ \
--voc_anno_list /home/aistudio/data/steel/train/train_list.txt \
--voc_label_list /home/aistudio/data/steel/train/labels.txt \
--voc_out_name /home/aistudio/data/steel/train/voc_train.json

#转换test
!python tools/x2coco.py \
        --dataset_type voc \
        --voc_anno_dir /home/aistudio/data/steel/train/ \
--voc_anno_list /home/aistudio/data/steel/train/val_list.txt \
--voc_label_list /home/aistudio/data/steel/train/labels.txt \
--voc_out_name /home/aistudio/data/steel/train/voc_val.json

!rm -r /home/aistudio/data/steel/train/Annotations/*
!mv /home/aistudio/data/steel/train/*.json /home/aistudio/data/steel/train/Annotations/
/home/aistudio/work/PaddleDetection
Start converting !
100%|████████████████████████████████████| 1399/1399 [00:00<00:00, 15936.76it/s]
Start converting !
100%|██████████████████████████████████████| 1/1 [00:00<00:00, 15987.00it/s]

5.4 配置好训练文件

在试了多种模型后,发现faster_rcnn_swin_tiny_fpn_3x_coco效果最好。接下来就带着大家走一遍训练流程把。

5.4.1 模型注配置文件

  • 复制work/PaddleDetection/configs/faster_rcnn下的faster_rcnn_swin_tiny_fpn_1x_coco.yml
    一般来说,需要修改的就是weights即模型保存路径。及训练轮次,学习率等。

  • 可以将一些需要改动的参数放到此文件中,这样就不会改动了里面得文件导致使用其他模型时还要再去那个文件进行改动。此文件的参数优先级高于其他base文件

# 复制此文件,名为faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml

_BASE_: [
  'faster_rcnn_swin_tiny_fpn_1x_coco.yml',
]
weights: output/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1/model_final

epoch: 42

LearningRate:
  base_lr: 0.0001
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [24, 33]
  - !LinearWarmup
    start_factor: 0.1
    steps: 1000

OptimizerBuilder:
  clip_grad_by_norm: 1.0
  optimizer:
    type: AdamW
    weight_decay: 0.05

5.4.2 faster_rcnn_swin_tiny_fpn_1x_coco

然后打开_BASE_的路径,即faster_rcnn_swin_tiny_fpn_1x_coco.yml文件,
我们最需要改的是 第一个得数据集配置文件,以及训练参数配置文件。

_BASE_: [
  '../datasets/coco_detection-fastertrcnn-swin.yml',
  '../runtime.yml',
  '_base_/optimizer_swin_1x-Copy1.yml',
  '_base_/faster_rcnn_swin_tiny_fpn.yml',
  '_base_/faster_rcnn_swin_reader.yml',
]

5.4 3 coco_detection

打开work/PaddleDetection/configs/datasets/路径下的coco_detection.yml

metric: COCO
num_classes: 6

TrainDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: Annotations/voc_train.json
    dataset_dir: ../../data/steel/train
    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']

EvalDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: Annotations/voc_val.json
    dataset_dir: ../../data/steel/train

TestDataset:
  !ImageFolder
    anno_path: ../../data/steel/train/Annotations/voc_val.json

5.4.4 其他

其他基本不用动。打开work/PaddleDetection/configs/faster_rcnn/_base_/路径下的faster_rcnn_swin_reader.yml。可以修改其中的batch_size=2

5.5 模型训练

# 训练36epoch,7580s,第26个epoch效果最好
!python tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
                       --use_vdl=true --vdl_log_dir=vdl_dir/scalar --eval\
                       -o log_iter=154
[10/09 17:16:55] ppdet.engine INFO: Best test bbox ap is 0.435.
# 单卡断点续训
# !python tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_3x_coco.yml \
#                        -r /home/aistudio/work/output/faster_rcnn_swin_tiny_fpn_3x_coco/best \
#                        --eval  \
#                        --use_vdl=true \
#                        --vdl_log_dir=vdl_dir/scalar \
#                        --eval 

5.6 推理

# 使用作者的最优模型。推理图像和生成txt文件
!python tools/infer.py -c  configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
					   -o weights=/home/aistudio/own_model/34.pdparams \
					   --infer_dir=/home/aistudio/data/steel/test/IMAGES/ \
					   --output_dir=/home/aistudio/data/steel/infer_output\
					   --draw_threshold=0.3 --save_txt=True
# 使用自己的模型推理
!python tools/infer.py -c  configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
					  -o weights=output/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1/best_model.pdparams \
					  --infer_dir=/home/aistudio/data/steel/test/IMAGES/ \
					  --output_dir=/home/aistudio/data/steel/infer_output2\
					  --draw_threshold=0.3 --save_txt=True

处理预测数据:

import csv
import os
headers = ['image_id','bbox','category_id','confidence']
classList = ['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']
rows = []

rootdir = '/home/aistudio/data/steel/infer_output'
list = os.listdir(rootdir) #列出文件夹下所有的目录与文件
for i in range(0,len(list)):
       path = os.path.join(rootdir,list[i])
       if os.path.isfile(path) and path.endswith('txt'):
           txtFile = open(path)
           print(path)
           result = txtFile.readlines()
           for r in result:
               ls = r.split(' ')
               Cls = ls[0]
               sco = float(ls[1])
               xmin = float(ls[2])
               ymin = float(ls[3])
               w = float(ls[4])
               h = float(ls[5])
               xmax = xmin+w
               ymax = ymin+h
               clsID = classList.index(Cls)
               imgID = list[i][:-4]
               row = [imgID,[xmin,ymin,xmax,ymax],clsID,sco]
               rows.append(row)
with open('submission.csv','w')as f:
    f_csv = csv.writer(f)
    f_csv.writerow(headers)
    f_csv.writerows(rows)
import pandas as pd
datafile = pd.read_csv('/home/aistudio/work/PaddleDetection/submission.csv')
# 按照列值排序
data = datafile.sort_values(by="image_id", ascending=True)
data.to_csv('submission_final.csv', mode='a+', index=False)
  • 最后提交的时候要进行排序,不然分数会很低。

  • myconfig为配置文件。own_model为模型。

  • 也尝试过paddledetection套件的其他一些模型,最终效果没有faster_rcnn_swin_tiny_fpn_3x_coco好。大家可以多做尝试。并尝试加一些trick。例如TTA和WBF等进行模型融合。

  • 或者是试试cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml 、ppyolov2_r101vd_dcn_365e_coco.yml

六、总结

单从训练来说:

  • 性价比最高的应该是ppyoloe+,分数41.3,。bs=16,87step/epoch,60s/epoch。((全部训练集训练速度))
  • faster-RCNN,我目前刷到的是40.93分,bs=8,175step/epoch,200s/epoch。(全部训练集训练速度)
  • faster-RCNN+swin,参考方案刷到44分,还没有复现。bs=4,308step/epoch,210s/epoch。(去除验证集168条数据)
  • yolov5s速度最快,分数37左右吧,也可能是我没调好。

推理时间还没有测试。