遥感图像识别已经有很多成熟的模型和实现,这里我们选择yolov5_obb和dota数据集,以说明并实现一种思路:那就是先识别、再标注、再训练的过程。鉴于领域内数据往往比较封闭,对此类数据的标注实现难度较大,所以需要模型迁移。首先基于已经训练的成果,实现初步标绘;而后通过人在回路的修正,获得精确的结果,最后重新投入到数据训练过程中去。通过这种方式,获得专用数据模型,解决定制问题。
一、yolov5_obb简介
普遍认为,计算机视觉图像识别四大基本任务,其中:
(a)图像分类(目标检测):一张图像中是否含某种物体
(b)物体定位(目标检测与目标识别):确定目标位置和所属类别。
(c)语义分割(目标分割和目标分类):对图像进行像素级分类,预测每个像素属于的类别,不区分个体;(所有的CUBE一个颜色)
(d)实例分割(目标分割和目标识别):定位图中每个物体,并进行像素级标注,区分不同个体;(CUBE都是不同颜色)
遥感影像识别属于物体定位(object localization)范畴。一般区分2阶段和1阶段方法,其中yolo为典型实现。其中yolov5版本由于官方的持续支持和扩展,是比较成熟的版本。基于此实现的 yolov5_obb(YOLOv5_DOTA无人机/遥感旋转目标检测项目代码 hukaixuan19970627/yolov5_obb: yolov5 + csl_label)的推断,用于解决遥感数据集上倾斜目标的识别。作者实现了在 Dota数据集上yolov5m/yolov5s/yoov5n的150Epoch的推断结果,可供迁移模型使用。
关于DOTA 数据集可进一步了解: (386条消息) DOTA 数据集:2806 张遥感图像,近 19 万个标注实例_HyperAI超神经的博客-CSDN博客
值得关注的是CSL旋转标注方法,参考资料《旋转目标检测方法解读(CSL, ECCV2020) 旋转目标检测方法解读(CSL, ECCV2020) - 知乎 (zhihu.com)》
二、模型推断(c++实现)
基于已经初步训练的结果,进一步实现标注结果展现。模型推断的实现是相对困难的,主要原因是因为在实现的过程中缺乏有效的调试工具。基于已经能够成功运行的代码去进行修改是问题有效的解决方法,其中hpc203对推断代码进行了改造,并且提供了大量案例。需要注意的是,yolov5_obb对yolov5的代码进行了改造,所以前面的相关操作都需要基于yolov5_obb实现。相关内容整理如下:
1、yolov5_Dota yolov5_Dota | Kaggle 这个是训练的工具,需要使用特定的torch版本,直接基于kaggle就可以运行。
3、从hpc203做出修改上来看,模型输出的结果,由4路转换为1路,可以借助netron进行观测。
转换前:
转换后:
4、hpc203对onnx文件转换生成的修改,进入models/yolo.py,进入到Detect类的forward函数里,插入代码
if torch.onnx.is_in_onnx_export(): for i in range(self.nl): # 分别对三个输出层处理 x[i] = self.m[i](x[i]) # conv bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() y = x[i].sigmoid() z.append(y.view(bs, -1, self.no)) return torch.cat(z, 1)
在export.py里,自定义了一个导出onnx文件的函数,代码片段如下
def my_export_onnx(model, im, file, opset, train, dynamic, simplify, prefix=colorstr('ONNX:')): print('anchors:', model.yaml['anchors']) # wtxt = open('class.names', 'w') #for name in model.names: # wtxt.write(name+'\n') # wtxt.close() # YOLOv5 ONNX export print(im.shape) if not dynamic: f = os.path.splitext(file)[0] + '.onnx' torch.onnx.export(model, im, f, verbose=False, opset_version=12, input_names=['images'], output_names=['output']) else: f = os.path.splitext(file)[0] + '_dynamic.onnx' torch.onnx.export(model, im, f, verbose=False, opset_version=12, input_names=['images'], output_names=['output'], dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'}, # shape(1,3,640,640) 'output': {0: 'batch', 1: 'anchors'} # shape(1,25200,85) }) try: import cv2 net = cv2.dnn.readNet(f) except: exit(f'export {f} failed') exit(f'export {f} sucess')
在官方定义的export_onnx函数里插入调用这个函数,代码截图如下:
python export.py --weights=yolov5s.pt --include=onnx --imgsz=640 python export.py --weights=yolov5s6.pt --include=onnx --imgsz=1280
就能成功生成.onnx文件,并且opencv的dnn模块能读取onnx文件做推理。这两处修改都是浅表的修改,对输入输出层这块进行了一些修改,但是确实是起到了相应的作用。
三、识别(标注)软件设计
由于需要较多交互操作,选择Csharp编写界面,基于GOCW,通过clr的方式调用opencv,这些是比较成熟的方法。GOCW的相关内容可以参考Github。在实现的过程中,对软件界面进行了进一步的细化实现,
这些按钮分别对应功能如下:
打开:打开数据图片
截屏:通过屏幕截图获得
运行推断:调用识别算法
获得标注:获得标注的数据。这一点在下一小节中具体说明
保存:保存结果
比对:标注前和标注后进行比对
返回:返回标注前状态。
截屏:通过屏幕截图获得
运行推断:调用识别算法
获得标注:获得标注的数据。这一点在下一小节中具体说明
保存:保存结果
比对:标注前和标注后进行比对
返回:返回标注前状态。
软件具体操作过程可以参考视频。
四、数据标注
在现有实现的数据预识别基础上,需对识别结果进行进一步标注和修正。我曾经思考是否需要自己开发相关功能,但是很快打消了这个思路。
实现格式转换,借助已有工具来实现,应该是更合理的方法。经过调研,发现roLabelImg 是专用的旋转标注工具:
它的数据格式是这样的:
<annotation verified="yes"> <folder>hsrc</folder> <filename>100000001</filename> <path>/Users/haoyou/Library/Mobile Documents/com~apple~CloudDocs/OneDrive/hsrc/100000001.bmp</path> <source> <database>Unknown</database> </source> <size> <width>1166</width> <height>753</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <type>bndbox</type> <name>ship</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>178</xmin> <ymin>246</ymin> <xmax>974</xmax> <ymax>504</ymax> </bndbox> </object> <object> <type>robndbox</type> <name>ship</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <robndbox> <cx>580.7887</cx> <cy>343.2913</cy> <w>775.0449</w> <h>170.2159</h> <angle>2.889813</angle> </robndbox> </object> </annotation>
那么需要当前的Csharp识别(标注)工具能够生成该格式的数据,使用XmlWriter结合文本处理的方法进行进一步融合,最终结果虽然不高效,但是稳定可控的。
using System; using System.Collections.Generic; using System.IO; using System.Text; using System.Xml; using System.Xml.Serialization; namespace ConsoleApp1 { [XmlRoot("annotation")] public class AnnotationHead { public string folder; public string filename; public string path; public Source source; public Size size; public string segmented; } public class Size { public string width; public string height; public string depth; } public class Source { public string database; } [XmlRoot("object")] public class @object { public string type; public string name; public string pose; public string truncated; public string difficult; public robndClass robndbox; } public class robndClass { public float cx; public float cy; public float w; public float h; public float angle; } public class Test { public static void Main() { //输出规则 XmlWriterSettings settings = new XmlWriterSettings(); settings.Indent = true; settings.IndentChars = " "; settings.NewLineChars = "\r\n"; settings.Encoding = Encoding.UTF8; settings.OmitXmlDeclaration = true; // 不生成声明头 XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces(); namespaces.Add(string.Empty, string.Empty); //输出对象 FileStream stream = new FileStream("part1.xml", FileMode.Create); /////////////////////////Part1/////////////////////////////////// AnnotationHead an = new AnnotationHead(); an.folder = "sandbox"; an.filename = "2022-10-29 15-19-25"; an.path = "F:/sandbox/2022-10-29 15-19-25.png"; Source source = new Source(); source.database = "Unknown"; an.source = source; Size size = new Size(); size.width = "1873"; size.height = "935"; size.depth = "3"; an.size = size; an.segmented = "0"; //实施输出 XmlWriter xmlWriter = XmlWriter.Create(stream, settings); XmlSerializer serializer = new XmlSerializer(typeof(AnnotationHead)); serializer.Serialize(xmlWriter, an, namespaces); //目标销毁 xmlWriter.Close(); stream.Close(); /////////////////////////Part2/////////////////////////////////// stream = new FileStream("part2.xml", FileMode.Create); //定义方法 robndClass il = new robndClass(); il.cx = (float)1484.1; il.cy = (float)521.7274; il.w = (float)40.1731; il.h = (float)194.3416; il.angle = (float)0.18; @object o = new @object(); o.type = "robndbox"; o.name = "ship"; o.pose = "Unspecified"; o.truncated = "0"; o.difficult = "0"; o.robndbox = il; List<@object> objectList = new List<@object>(); objectList.Add(o); objectList.Add(o); objectList.Add(o); serializer = new XmlSerializer(typeof(List<@object>)); XmlWriter xmlWriter2 = XmlWriter.Create(stream, settings); serializer.Serialize(xmlWriter2, objectList, namespaces); xmlWriter2.Close(); stream.Close(); ///////////////////////////merge////////////////////////////////////////// StreamReader sr = new StreamReader("part1.xml"); string strPart1 = sr.ReadToEnd(); strPart1 = strPart1.Substring(0, strPart1.Length - 13); sr.Close(); sr = new StreamReader("part2.xml"); string strPart2 = sr.ReadToEnd(); strPart2 = strPart2.Substring(15, strPart2.Length - 31); sr.Close(); string strOut = strPart1 + strPart2 + "</annotation>"; ////////////////////////////输出/////////////////////////////////////// using (StreamWriter sw = new StreamWriter("result.xml")) { sw.Write(strOut); } } } }
进一步重构和融合:
public static void Main() { AnnotationHead an = new AnnotationHead(); an.folder = "sandbox"; an.filename = "2022-10-29 15-19-25"; an.path = "F:/sandbox/2022-10-29 15-19-25.png"; Source source = new Source(); source.database = "Unknown"; an.source = source; Size size = new Size(); size.width = "18443"; size.height = "935"; size.depth = "3"; an.size = size; an.segmented = "0"; robndClass il = new robndClass(); il.cx = (float)1484.1; il.cy = (float)521.7274; il.w = (float)40.1731; il.h = (float)194.3416; il.angle = (float)0.18; @object o = new @object(); o.type = "robndbox"; o.name = "ship"; o.pose = "Unspecified"; o.truncated = "0"; o.difficult = "0"; o.robndbox = il; List<@object> objectList = new List<@object>(); objectList.Add(o); objectList.Add(o); objectList.Add(o); objectList.Add(o); objectList.Add(o); AnnotationClass ac = new AnnotationClass(an,objectList,"r2.xml"); ac.action(); } } using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using System.IO; using System.Xml.Serialization; namespace ConsoleApp1 { [XmlRoot("annotation")] public class AnnotationHead { public string folder; public string filename; public string path; public Source source; public Size size; public string segmented; } public class Size { public string width; public string height; public string depth; } public class Source { public string database; } [XmlRoot("object")] public class @object { public string type; public string name; public string pose; public string truncated; public string difficult; public robndClass robndbox; } public class robndClass { public float cx; public float cy; public float w; public float h; public float angle; } class AnnotationClass { private string strOutName;//输出文件名称 private AnnotationHead an; private List<@object> objectList; //构造函数 public AnnotationClass(AnnotationHead annotatonHead, List<@object> olist, string strName = "result.xml") { an = annotatonHead; strOutName = strName; objectList = olist; } public void action() { //输出规则 XmlWriterSettings settings = new XmlWriterSettings(); settings.Indent = true; settings.IndentChars = " "; settings.NewLineChars = "\r\n"; settings.Encoding = Encoding.UTF8; settings.OmitXmlDeclaration = true; // 不生成声明头 XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces(); namespaces.Add(string.Empty, string.Empty); //输出对象 FileStream stream = new FileStream("part1.xml", FileMode.Create); /////////////////////////Part1/////////////////////////////////// //实施输出 XmlWriter xmlWriter = XmlWriter.Create(stream, settings); XmlSerializer serializer = new XmlSerializer(typeof(AnnotationHead)); serializer.Serialize(xmlWriter, an, namespaces); //目标销毁 xmlWriter.Close(); stream.Close(); /////////////////////////Part2/////////////////////////////////// stream = new FileStream("part2.xml", FileMode.Create); //定义方法 serializer = new XmlSerializer(typeof(List<@object>)); XmlWriter xmlWriter2 = XmlWriter.Create(stream, settings); serializer.Serialize(xmlWriter2, objectList, namespaces); xmlWriter2.Close(); stream.Close(); ///////////////////////////merge////////////////////////////////////////// StreamReader sr = new StreamReader("part1.xml"); string strPart1 = sr.ReadToEnd(); strPart1 = strPart1.Substring(0, strPart1.Length - 13); sr.Close(); sr = new StreamReader("part2.xml"); string strPart2 = sr.ReadToEnd(); strPart2 = strPart2.Substring(15, strPart2.Length - 31); sr.Close(); string strOut = strPart1 + strPart2 + "</annotation>"; ////////////////////////////输出/////////////////////////////////////// using (StreamWriter sw = new StreamWriter(strOutName)) { sw.Write(strOut); } } } }
这里值得注意的一点是,CSL和OpenCV中角度转换的对应关系应该是:
经过修改后,显示的结果也是准确的:
在全部图片上显示的效果为:
五、结果转换
标注的roImageLabel格式的数据,最后要转换为yolov5_obb的格式,并且导入到Dota数据集中。这块由于实际项目还未涉及,整编一些现有的资料,已经存在相关的工具:
BboxToolkit is a light codebase collecting some practical functions for the special-shape detection, such as oriented detection. The whole project is written by python, which can run in different platform without compliation. We use this project to support the oriented detection benchmark OBBDetection.
现在有xywhθ标注的数据集,要转换成dota标注的格式
使用如下代码,路径需要自己修改:
import math import shutil import os import numpy as np import xml.etree.ElementTree as et dataset_dir = r'D:\dataset\sar\RSDD\RSDD-SAR\JPEGImages' ana_dir = r'D:\dataset\sar\RSDD\RSDD-SAR\Annotations' save_dir = r'D:\dataset\sar\RSDD\dota' data_type = 'test' train_img_dir = r'D:\dataset\sar\RSDD\RSDD-SAR\ImageSets\test.txt' f1 = open(train_img_dir, 'r') train_img = f1.readlines() def rota(center_x1, center_y1, x, y, w, h, a): # 旋转中心点,旋转中心点,框的w,h,旋转角 # a = (math.pi * a) / 180 # 角度转弧度 x1, y1 = x - w / 2, y - h / 2 # 旋转前左上 x2, y2 = x + w / 2, y - h / 2 # 旋转前右上 x3, y3 = x + w / 2, y + h / 2 # 旋转前右下 x4, y4 = x - w / 2, y + h / 2 # 旋转前左下 px1 = (x1 - center_x1) * math.cos(a) - (y1 - center_y1) * math.sin(a) + center_x1 # 旋转后左上 py1 = (x1 - center_x1) * math.sin(a) + (y1 - center_y1) * math.cos(a) + center_y1 px2 = (x2 - center_x1) * math.cos(a) - (y2 - center_y1) * math.sin(a) + center_x1 # 旋转后右上 py2 = (x2 - center_x1) * math.sin(a) + (y2 - center_y1) * math.cos(a) + center_y1 px3 = (x3 - center_x1) * math.cos(a) - (y3 - center_y1) * math.sin(a) + center_x1 # 旋转后右下 py3 = (x3 - center_x1) * math.sin(a) + (y3 - center_y1) * math.cos(a) + center_y1 px4 = (x4 - center_x1) * math.cos(a) - (y4 - center_y1) * math.sin(a) + center_x1 # 旋转后左下 py4 = (x4 - center_x1) * math.sin(a) + (y4 - center_y1) * math.cos(a) + center_y1 return px1, py1, px2, py2, px3, py3, px4, py4 # 旋转后的四个点,左上,右上,右下,左下 for img in train_img: shutil.copy(os.path.join(dataset_dir, img[:-1] + '.jpg'), os.path.join(save_dir, data_type, 'images', img[:-1] + '.jpg')) xml_file = open(os.path.join(ana_dir, img[:-1] + '.xml'), encoding='utf-8') tree = et.parse(xml_file) root = tree.getroot() with open(os.path.join(save_dir, data_type, 'labelTxt', img[:-1] + '.txt'), 'w') as f: f.write('imagesource:GoogleEarth\ngsd:NaN\n') for obj in root.iter('object'): cls = obj.find('name').text box = obj.find('robndbox') x_c = box.find('cx').text y_c = box.find('cy').text h = box.find('h').text w = box.find('w').text theta = box.find('angle').text box = list(map(np.float16, [x_c, y_c, h, w, theta])) box = rota(box[0], box[1], *box) box = list(map(int, box)) box = list(map(str, box)) f.write(' '.join(box)) f.write(' ' + cls + ' 0\n')
六、矛盾困难
未来项目真实使用的场景,其图像体量肯定和dota数据集是有差距的,这样预识别肯定是有难度和错误的。但是首先通过小批量标注实现部分识别结果,而后进行迭代的方法肯定是需要实现的。此外,我们可以考虑调用融合比如easydl的方法功能。最为关键的,是找到实际需要的应用场景,才能将这个流程打通,只有这样才能够获得有用的数据集。
总结项目,关键点如下:
1、yolov5_obb的c++推断编写;
2、识别结果在roimagelabel和yolov5格式之间相关转换;
3、csharp编写界面,c++调用OpenCV实现图像处理的融合。