基本思想:需要对身份证进行分割,以满足业务需求,这里还是简单记录一下一开始的实验过程。
paper: https://arxiv.org/abs/2008.07043 code: https://github.com/yijingru/BBAVectors-Oriented-Object-Detection
第一步:下载源码
ubuntu@ubuntu:~$ git clone https://github.com/yijingru/BBAVectors-Oriented-Object-Detection
Cloning into 'BBAVectors-Oriented-Object-Detection'...
remote: Enumerating objects: 207, done.
remote: Counting objects: 100% (90/90), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 207 (delta 74), reused 65 (delta 65), pack-reused 117
Receiving objects: 100% (207/207), 175.57 KiB | 450.00 KiB/s, done.
Resolving deltas: 100% (114/114), done.
ubuntu@ubuntu:~$ cd BBAVectors-Oriented-Object-Detection/datasets/
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/datasets$ git clone https://github.com/CAPTAIN-WHU/DOTA_devkit
Cloning into 'DOTA_devkit'...
remote: Enumerating objects: 139, done.
remote: Total 139 (delta 0), reused 0 (delta 0), pack-reused 139
Receiving objects: 100% (139/139), 59.28 MiB | 476.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
参考这两个源码的readme操作即可
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/datasets/DOTA_devkit$ sudo apt-get install swig
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
swig is already the newest version (4.0.1-5build1).
0 upgraded, 0 newly installed, 0 to remove and 130 not upgraded.
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/datasets/DOTA_devkit$ swig -c++ -python polyiou.i
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/datasets/DOTA_devkit$ python3 setup.py build_ext --inplace
running build_ext
/home/ubuntu/BBAVectors-Oriented-Object-Detection/datasets/DOTA_devkit/_polyiou.cpython-38-x86_64-linux-gnu.so
测试一下识别效果,然后进行训练;
1)、修改/home/ubuntu/BBAVectors-Oriented-Object-Detection/datasets/DOTA_devkitResultMerge_multi_process.py内容:sys.path.insert(0,"..")修改为sys.path.insert(0,os.path.abspath(os.path.join(__file__, "..")))
2)/home/ubuntu/BBAVectors-Oriented-Object-Detection/datasets/hrsc_evaluation_task1.py修改内容 from . import polyiou 修改为import datasets.DOTA_devkit.polyiou
3)、github下载权重文件
4)、建立文件夹图片和 txt文件
第二步、新建的测试Datasets目录结构及其test.txt内容
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection$ cd Datasets/
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/Datasets$ tree
.
└── dota
├── images
│ └── P0706.png
└── test.txt
2 directories, 2 files
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection/Datasets$ cat dota/test.txt
/home/ubuntu/BBAVectors-Oriented-Object-Detection/Datasets/dota/images/P0706
测试命令及结果
ubuntu@ubuntu:~/BBAVectors-Oriented-Object-Detection$ python3 main.py --dataset dota --data_dir Datasets/dota/ -phase test --input_h 320 --input_w 320 --resume model_50.pth
构建一个pytorch2onnx.py转换脚本
import argparse
import torch
from datasets.dataset_dota import DOTA
from datasets.dataset_hrsc import HRSC
from models import ctrbox_net
import decoder
import os
import test
import time
import numpy as np
import func_utils
import cv2
from onnxruntime.datasets import get_example
import onnxruntime
def torch2onnx(args, model):
image_set_index_file = os.path.join(args.data_dir, args.phase + '.txt')
with open(image_set_index_file, 'r') as f:
lines = f.readlines()
image_lists = [line.strip() for line in lines]
image_path = os.path.join(args.data_dir, 'images')
imgFile = os.path.join(image_path, image_lists[0] + '.png')
image = cv2.imread(imgFile)
input_w=args.input_w
input_h = args.input_h
image = cv2.resize(image, (input_w, input_h))
out_image = image.astype(np.float32) / 255.
out_image = out_image - 0.5
out_image = out_image.transpose(2, 0, 1).reshape(1, 3, input_h, input_w)
dummy_input = torch.from_numpy(out_image).to(ctrbox_obj.device)
input_names = ['input'] # 模型输入的name
output_names = ['output'] # 模型输出的name
print("====", dummy_input.shape)
torch_out = torch.onnx._export(model, dummy_input, args.onnx_model_path,
verbose=False, input_names=input_names, output_names=output_names, opset_version=11)
# test onnx model
example_model = get_example(args.onnx_model_path)
session = onnxruntime.InferenceSession(example_model)
# get the name of the first input of the model
input_name = session.get_inputs()[0].name
# print('onnx Input Name:', input_name)
result = session.run([], {input_name: dummy_input.data.cpu().numpy()})
print("the result is {}".format(result))
print("result[0].shape",result[0].shape)
print("result[1].shape", result[1].shape)
print("result[2].shape", result[2].shape)
print("result[3].shape", result[3].shape)
print("onnx->>模型转换成功!")
def parse_args():
parser = argparse.ArgumentParser(description='BBAVectors Implementation')
parser.add_argument('--num_epoch', type=int, default=1, help='Number of epochs')
parser.add_argument('--batch_size', type=int, default=1, help='Number of batch size')
parser.add_argument('--num_workers', type=int, default=4, help='Number of workers')
parser.add_argument('--init_lr', type=float, default=1.25e-4, help='Initial learning rate')
parser.add_argument('--input_h', type=int, default=608, help='Resized image height')
parser.add_argument('--input_w', type=int, default=608, help='Resized image width')
parser.add_argument('--K', type=int, default=500, help='Maximum of objects')
parser.add_argument('--conf_thresh', type=float, default=0.18, help='Confidence threshold, 0.1 for general evaluation')
parser.add_argument('--ngpus', type=int, default=1, help='Number of gpus, ngpus>1 for multigpu')
parser.add_argument('--resume_train', type=str, default='', help='Weights resumed in training')
parser.add_argument('--resume', type=str, default='model_50.pth', help='Weights resumed in testing and evaluation')
parser.add_argument('--dataset', type=str, default='dota', help='Name of dataset')
parser.add_argument('--data_dir', type=str, default='Dotasets/dota', help='Data directory')
parser.add_argument('--phase', type=str, default='test', help='Phase choice= {train, test, eval}')
parser.add_argument('--wh_channels', type=int, default=8, help='Number of channels for the vectors (4x2)')
parser.add_argument('--onnx_model_path', type=str, default='/home/ubuntu/BBAVectors-Oriented-Object-Detection/weights_dota/model_50.onnx', help='onnx model path')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
dataset = {'dota': DOTA, 'hrsc': HRSC}
num_classes = {'dota': 15, 'hrsc': 1}
heads = {'hm': num_classes[args.dataset],
'wh': 10,
'reg': 2,
'cls_theta': 1
}
down_ratio = 4
model = ctrbox_net.CTRBOX(heads=heads,
pretrained=True,
down_ratio=down_ratio,
final_kernel=1,
head_conv=256)
decoder = decoder.DecDecoder(K=args.K,
conf_thresh=args.conf_thresh,
num_classes=num_classes[args.dataset])
ctrbox_obj = test.TestModule(dataset=dataset, num_classes=num_classes, model=model, decoder=decoder)
save_path = 'weights_' + args.dataset
ctrbox_obj.model = ctrbox_obj.load_model(ctrbox_obj.model, os.path.join(save_path, args.resume))
ctrbox_obj.model = ctrbox_obj.model.to(ctrbox_obj.device)
ctrbox_obj.model.eval()
dataset_module = ctrbox_obj.dataset[args.dataset]
dsets = dataset_module(data_dir=args.data_dir,
phase='test',
input_h=args.input_h,
input_w=args.input_w,
down_ratio=down_ratio)
data_loader = torch.utils.data.DataLoader(dsets,
batch_size=1,
shuffle=False,
num_workers=1,
pin_memory=True)
total_time = []
for cnt, data_dict in enumerate(data_loader):
image = data_dict['image'][0].to(ctrbox_obj.device)
img_id = data_dict['img_id'][0]
print('processing {}/{} image ...'.format(cnt, len(data_loader)))
if cnt==0:
torch2onnx(args,ctrbox_obj.model)
begin_time = time.time()
with torch.no_grad():
pr_decs = ctrbox_obj.model(image)
print(pr_decs)
print("hm.shape=",pr_decs['hm'].shape)
print("wh.shape=", pr_decs['wh'].shape)
print("reg.shape=", pr_decs['reg'].shape)
print("cls_theta.shape=", pr_decs['cls_theta'].shape)
# self.imshow_heatmap(pr_decs[2], image)
torch.cuda.synchronize(ctrbox_obj.device)
decoded_pts = []
decoded_scores = []
predictions = ctrbox_obj.decoder.ctdet_decode(pr_decs)
pts0, scores0 = func_utils.decode_prediction(predictions, dsets, args, img_id, down_ratio)
decoded_pts.append(pts0)
decoded_scores.append(scores0)
# nms
results = {cat: [] for cat in dsets.category}
for cat in dsets.category:
if cat == 'background':
continue
pts_cat = []
scores_cat = []
for pts0, scores0 in zip(decoded_pts, decoded_scores):
pts_cat.extend(pts0[cat])
scores_cat.extend(scores0[cat])
pts_cat = np.asarray(pts_cat, np.float32)
scores_cat = np.asarray(scores_cat, np.float32)
if pts_cat.shape[0]:
nms_results = func_utils.non_maximum_suppression(pts_cat, scores_cat)
results[cat].extend(nms_results)
end_time = time.time()
total_time.append(end_time - begin_time)
# """
ori_image = dsets.load_image(cnt)
height, width, _ = ori_image.shape
# ori_image = cv2.resize(ori_image, (args.input_w, args.input_h))
# ori_image = cv2.resize(ori_image, (args.input_w//args.down_ratio, args.input_h//args.down_ratio))
# nms
for cat in dsets.category:
if cat == 'background':
continue
result = results[cat]
for pred in result:
score = pred[-1]
tl = np.asarray([pred[0], pred[1]], np.float32)
tr = np.asarray([pred[2], pred[3]], np.float32)
br = np.asarray([pred[4], pred[5]], np.float32)
bl = np.asarray([pred[6], pred[7]], np.float32)
tt = (np.asarray(tl, np.float32) + np.asarray(tr, np.float32)) / 2
rr = (np.asarray(tr, np.float32) + np.asarray(br, np.float32)) / 2
bb = (np.asarray(bl, np.float32) + np.asarray(br, np.float32)) / 2
ll = (np.asarray(tl, np.float32) + np.asarray(bl, np.float32)) / 2
box = np.asarray([tl, tr, br, bl], np.float32)
cen_pts = np.mean(box, axis=0)
cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(tt[0]), int(tt[1])), (0, 0, 255), 1, 1)
cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(rr[0]), int(rr[1])), (255, 0, 255), 1, 1)
cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(bb[0]), int(bb[1])), (0, 255, 0), 1, 1)
cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(ll[0]), int(ll[1])), (255, 0, 0), 1, 1)
# cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(tl[0]), int(tl[1])), (0,0,255),1,1)
# cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(tr[0]), int(tr[1])), (255,0,255),1,1)
# cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(br[0]), int(br[1])), (0,255,0),1,1)
# cv2.line(ori_image, (int(cen_pts[0]), int(cen_pts[1])), (int(bl[0]), int(bl[1])), (255,0,0),1,1)
ori_image = cv2.drawContours(ori_image, [np.int0(box)], -1, (255, 0, 255), 1, 1)
# box = cv2.boxPoints(cv2.minAreaRect(box))
# ori_image = cv2.drawContours(ori_image, [np.int0(box)], -1, (0,255,0),1,1)
cv2.putText(ori_image, '{:.2f} {}'.format(score, cat), (int(box[1][0]), int(box[1][1])),
cv2.FONT_HERSHEY_COMPLEX, 0.5, (0, 255, 255), 1, 1)
if args.dataset == 'hrsc':
gt_anno = dsets.load_annotation(cnt)
for pts_4 in gt_anno['pts']:
bl = pts_4[0, :]
tl = pts_4[1, :]
tr = pts_4[2, :]
br = pts_4[3, :]
cen_pts = np.mean(pts_4, axis=0)
box = np.asarray([bl, tl, tr, br], np.float32)
box = np.int0(box)
cv2.drawContours(ori_image, [box], 0, (255, 255, 255), 1)
然后生成数据做了对比,是没问题的
loaded weights from weights_dota/model_50.pth, epoch 50
processing 0/1 image ...
==== torch.Size([1, 3, 608, 608])
the result is [array([[[[1.2619197e-03, 7.6299906e-04, 1.0170639e-03, ...,
2.6458502e-03, 1.8958151e-03, 1.7695725e-03],
[7.6609850e-04, 2.0185113e-03, 1.9146502e-03, ...,
[1.2292266e-03, 1.5050471e-03, 2.5414228e-03, ...,
1.1001229e-03, 5.1316619e-04, 7.1933866e-04]]]], dtype=float32), array([[[[-1.3159164 , -1.4116488 , -1.2485557 , ..., -1.580438 ,
-1.4573082 , -1.2563051 ],
...,
[ 5.4992776 , 5.3635373 , 5.9012012 , ..., 7.125336 ,
6.622304 , 5.8493977 ]]]], dtype=float32), array([[[[0.49028683, 0.5056656 , 0.69759 , ..., 0.6002424 ,
0.634705 , 0.4223461 ],
...,
[0.26842847, 0.33262217, 0.2754776 , ..., 0.29297403,
0.31129774, 0.23753083]]]], dtype=float32), array([[[[0.99999577, 0.9992129 , 0.99947554, ..., 0.99996024,
0.99999607, 0.9999999 ],
...,
[0.99999994, 0.9999903 , 0.99996865, ..., 0.99967766,
0.99994814, 0.9999981 ]]]], dtype=float32)]
------------------------------------
result[0].shape (1, 15, 152, 152)
result[1].shape (1, 10, 152, 152)
result[2].shape (1, 2, 152, 152)
result[3].shape (1, 1, 152, 152)
onnx->>模型转换成功,以上为数据输出!
{'hm': tensor([[[[1.2619e-03, 7.6301e-04, 1.0171e-03, ..., 2.6459e-03,
1.8958e-03, 1.7696e-03],
[7.6605e-04, 2.0185e-03, 1.9146e-03, ...,
[1.2292e-03, 1.5051e-03, 2.5414e-03, ..., 1.1001e-03,
5.1320e-04, 7.1930e-04]]]], device='cuda:0'), 'wh': tensor([[[[-1.3159, -1.4116, -1.2486, ..., -1.5804, -1.4573, -1.2563],
...
[ 5.4993, 5.3635, 5.9012, ..., 7.1253, 6.6223, 5.8494]]]],
device='cuda:0'), 'reg': tensor([[[[0.4903, 0.5057, 0.6976, ..., 0.6002, 0.6347, 0.4223],
...,
[0.2684, 0.3326, 0.2755, ..., 0.2930, 0.3113, 0.2375]]]],
device='cuda:0'), 'cls_theta': tensor([[[[1.0000, 0.9992, 0.9995, ..., 1.0000, 1.0000, 1.0000],
[0.9997, 0.9050, 0.9330, ..., 0.5001, 0.8898, 1.0000],
[0.9996, 0.8914, 0.9539, ..., 0.1638, 0.6598, 0.9997],
...,
[1.0000, 1.0000, 1.0000, ..., 0.9997, 0.9999, 1.0000]]]],
device='cuda:0')}
--------------------------
hm.shape= torch.Size([1, 15, 152, 152])
wh.shape= torch.Size([1, 10, 152, 152])
reg.shape= torch.Size([1, 2, 152, 152])
cls_theta.shape= torch.Size([1, 1, 152, 152])
pytorch->>模型数据输出!
第三步:简化一下模型,转ncnn很顺滑,开始写前后处理吧
ubuntu@ubuntu-Super-Server:~/BBAVectors-Oriented-Object-Detection$ cd weights_dota/
ubuntu@ubuntu-Super-Server:~/BBAVectors-Oriented-Object-Detection/weights_dota$ python3 -m onnxsim model_50.onnx model_
50_sim.onnx
Simplifying...
Checking 0/3...
Checking 1/3...
Checking 2/3...
Ok!
ubuntu@ubuntu-Super-Server:~/sxj731533730/ncnn/build/install/bin$ ./onnx2ncnn ~/BBAVectors-Oriented-Object-Detection/weights_dota/model_50_sim.onnx ~/BBAVectors-Oriented-Object-Detection/weights_dota/model_50_sim.param ~/BBAVectors-Oriented-Object-Detection/weights_dota/model_50_sim.bin
先测试代码ncnn数据保持一致
#include <iostream>
#include <ostream>
#include <random>
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include "net.h"
int main(int argc, char** argv) {
cv::Mat img = cv::imread("F:\\BBAVectors-Oriented-Object-Detection\\Dotasets\\dota\\images\\P0706.png");
ncnn::Net vector_net;
vector_net.load_param("F:\\untitled12\\model\\model_50_sim.param");
vector_net.load_model("F:\\untitled12\\model\\model_50_sim.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR , img.cols, img.rows,608,608);
printf( "input shape: %d %d %d %d\n", in.dims, in.h, in.w, in.c);
const float mean_vals[3] = {0.5f*255.0f, 0.5f*255.0f, 0.5f*255.0f};
const float norm_vals[3] = {1/255.0f, 1/255.0f, 1/255.0f};
in.substract_mean_normalize(mean_vals, norm_vals);
ncnn::Extractor ex = vector_net.create_extractor();
ex.input("input", in);
ncnn::Mat hm, wh,reg,cls_theta;
ex.extract("1104", cls_theta);
ex.extract("1097", wh);
ex.extract("1100", reg);
ex.extract("output", hm);
printf("hm shape: %d %d %d %d\n", hm.dims, hm.c, hm.h, hm.w);
printf( "wh shape: %d %d %d %d\n", wh.dims, wh.c, wh.h, wh.w);
printf( "reg shape: %d %d %d %d\n", reg.dims, reg.c, reg.h, reg.w);
printf( "cls_theta shape: %d %d %d %d\n", cls_theta.dims, cls_theta.c, cls_theta.h, cls_theta.w);
std::cout<<"hm"<<std::endl;
for (int i=0; i<hm.w*hm.h*hm.c; i++)
{
printf("%f ", hm[i]);
}
std::cout<<"wh"<<std::endl;
for (int i=0; i<wh.w*wh.h*wh.c; i++)
{
printf("%f ", wh[i]);
}
std::cout<<"reg"<<std::endl;
for (int i=0; i<reg.w*reg.h*reg.c; i++)
{
printf("%f ", reg[i]);
}
std::cout<<"cls_theta"<<std::endl;
for (int i=0; i<cls_theta.w*cls_theta.h*cls_theta.c; i++)
{
printf("%f ", cls_theta[i]);
}
return 0;
}
测试数据
F:\untitled12\cmake-build-debug\detectVerifyLiveServer.exe
input shape: 3 608 608 3
hm shape: 3 15 152 152
wh shape: 3 10 152 152
reg shape: 3 2 152 152
cls_theta shape: 3 1 152 152
hm
0.001262 0.000763 0.001017 0.001229 0.001331 0.000663 0.000704 0.000438 0.000380 0.000672 0.000989 0.000789 0.001324 0.0
01646 0.001624 0.001194 0.001339 0.001174 0.001439 0.001605 hm
wh
-1.315924 -1.411655 -1.248568 -1.147745 -1.241095 -1.676983 -2.240911 -2.194636 -1.941549 -1.949446 -1.818269 -1.781369
-1.796497 -1.452959 -0.964927 -0.837911 -1.371770 -2.187296 -2.358478 -1.898733 wh
reg
0.490288 0.505667 0.697591 0.478884 0.299787 0.431246 0.551334 0.639653 0.541139 0.570362 0.791881 0.713983 0.486453 0.4
04731 0.569965 0.440954 0.338008 0.550521 0.721764 0.873441 reg
cls_theta
0.999996 0.999213 0.999475 0.999866 0.999911 0.999940 0.999961 0.999991 0.999998 0.999988 0.999976 0.999994 0.999989 0.9
99941 0.999775 0.999891 0.999892 0.999951 0.999959 0.999967 0.999983 0.999995 0.999996 0.999999 0.999999 0.999999 0.9999
第四步、这里还要做一下修改,因为后处理逻辑存在一个池化层处理hm,所以需要修改一下onnx模型,添加一个maxpool层,也可以ncnn手动修改,但是为了后续转mnn更方便就直接在onnx上动手了。代码BBAVectors-Oriented-Object-Detection/decoder.py后处理代码,存在一个池化层处理,因此先改了onnx增加池化层在转化,取数据取两个地方就行,就可以完美跳过后处理的池化层处理。
原模型
修改之后模型
添加池化层代码
import onnx
onnx_model = onnx.load(r'F:\BBAVectors-Oriented-Object-Detection\weights_dota\model_50_sim.onnx')
onnx.checker.check_model(onnx_model)
nodes = onnx_model.graph.node
output = onnx_model.graph.output
print(output)
for idx, n in enumerate(nodes):
if 'Sigmoid' == n.op_type and "output" in n.output[0]:
## add additional softmax node
print(idx)
n.output[0] = str(idx + 1)
_input = str(n.output[0])
_output = "output"
n_sm = onnx.helper.make_node('MaxPool',inputs=[_input],outputs=[_output],kernel_shape=[3, 3],strides=[1, 1],pads=[1,1,1,1])
nodes.append(n_sm)
# 检查模型
onnx.checker.check_model(onnx_model)
# 保存新模型
onnx.save(onnx_model, r'F:\BBAVectors-Oriented-Object-Detection\weights_dota\model_50_sim_pooling.onnx')
测试数据也没啥问题 是一致的 测试一下 经过后处理最大池化出来数据和ncnn的heap 是一致的,这里就贴一下数据
源代码输出数据经过max_pool2d处理之后的第一个维度数据 hmax[0][0][0]
tensor([0.0020, 0.0020, 0.0020, 0.0019, 0.0019, 0.0013, 0.0008, 0.0008, 0.0008,
0.0010, 0.0015, 0.0024, 0.0024, 0.0024, 0.0018, 0.0017, 0.0017, 0.0017,
0.0017, 0.0019, 0.0019, 0.0019, 0.0010, 0.0010, 0.0008, 0.0013, 0.0013,
0.0013, 0.0006, 0.0012, 0.0012, 0.0015, 0.0015, 0.0015, 0.0011, 0.0013,。。。。
ncnn 输出数据经过池化层处理之后的前10个数据
hmax
0.002019 0.002019 0.002019 0.001915 0.001860 0.001331 0.000760 0.000760 0.000760 0.000989....
第五步、整个ncnn的测试代码,代码目录,这个后处理太复杂了,,,感觉作者可以优化一下,好多冗余代码,我先直译吧,有空在优化
测试模型:
链接:https://pan.baidu.com/s/1ienqJAmV_ABx59WBii18Yw
提取码:fa63
测试代码,先放个初版,优化版自己去优化吧~ polyiou.cpp polyiou.h 是代码里用swig生成so文件,原样拷贝连编译就行
#include <iostream>
#include <ostream>
#include <random>
#include <vector>
#include <opencv2/opencv.hpp>
#include "net.h"
#include "polyiou.h"
#define IMAGE_TARGET 608
struct Ploy {
float tr_0;
float tr_1;
float br_0;
float br_1;
float bl_0;
float bl_1;
float tl_0;
float tl_1;
Ploy() : tr_0(0), tr_1(0), br_0(0), br_1(0), bl_0(0), bl_1(0), tl_0(0), tl_1(0) {}
Ploy(float tr_0, float tr_1, float br_0, float br_1, float bl_0, float bl_1, float tl_0, float tl_1) : tr_0(tr_0),
tr_1(tr_1),
br_0(br_0),
br_1(br_1),
bl_0(bl_0),
bl_1(bl_1),
tl_0(tl_0),
tl_1(tl_1) {}
};
struct BBox {
int idx;
float conf;
Ploy ploy;
BBox() : idx(0), conf(0), ploy(0, 0, 0, 0, 0, 0, 0, 0) {}
BBox(int idx, float conf, Ploy ploy) : idx(idx), conf(conf), ploy(ploy) {}
};
void process_nms(std::vector<std::vector<float>> &vec_hm, std::vector<std::vector<float>> &vec_hmax) {
for (int i = 0; i < vec_hmax.size(); i++) {
for (int j = 0; j < vec_hmax[i].size(); j++) {
int a = round(vec_hm[i][j] * 10000);
int b = round(vec_hmax[i][j] * 10000);
vec_hm[i][j] = (a != b) ? 0.0f : vec_hm[i][j];
}
}
}
void process_top_k(std::vector<float> scores, int top_K, std::vector<float> &scores_K, std::vector<int> &index_K) {
std::vector<int> idx(scores.size());
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(),
[&scores](int index_1, int index_2) { return scores[index_1] > scores[index_2]; });
// 获取K值
int k_num = std::min<int>(scores.size(), top_K);
int idx_j = 0;
for (int j = 0; j < k_num; ++j) {
idx_j = idx[j];
index_K.push_back(idx_j);
scores_K.push_back(scores[idx_j]);
}
}
void process_gather_feat(std::vector<std::vector<float>> vec_reg, std::vector<int> ind,
std::vector<std::vector<float>> &item) {
for (int i = 0; i < ind.size(); i++) {
item.push_back(vec_reg[ind[i]]);
}
}
void
process_gather_feat(std::vector<std::vector<int>> topk_inds_item, std::vector<int> topk_ind, std::vector<int> &item) {
for (int i = 0; i < topk_ind.size(); i++) {
item.push_back(topk_inds_item[topk_ind[i] / topk_inds_item[0].size()][topk_ind[i] % topk_inds_item[0].size()]);
}
}
void
process_gather_feat(std::vector<std::vector<int>> topk_inds_item, std::vector<int> topk_ind, std::vector<float> &item) {
for (int i = 0; i < topk_ind.size(); i++) {
item.push_back(topk_inds_item[topk_ind[i] / topk_inds_item[0].size()][topk_ind[i] % topk_inds_item[0].size()]);
}
}
void process_topk(std::vector<std::vector<float>> hm,
int vec_hm_width,
int vec_hm_height,
int top_K,
std::vector<float> &topk_score,
std::vector<int> &topk_inds,
std::vector<float> &topk_clses,
std::vector<float> &topk_ys,
std::vector<float> &topk_xs) {
std::vector<std::vector<int>> vec_topk_inds;
std::vector<std::vector<float>> topk_scores;
for (int i = 0; i < hm.size(); i++) {
std::vector<int> index_K;
std::vector<float> scores_K;
process_top_k(hm[i], top_K, scores_K, index_K);
topk_scores.push_back(scores_K);
vec_topk_inds.push_back(index_K);
index_K.clear();
scores_K.clear();
std::vector<int>().swap(index_K);
std::vector<float>().swap(scores_K);
}
std::vector<std::vector<int>> vec_ys;
std::vector<std::vector<int>> vec_xs;
std::vector<int> topk_ys_item;
std::vector<int> topk_xs_item;
for (int i = 0; i < vec_topk_inds.size(); i++) {
for (int j = 0; j < vec_topk_inds[i].size(); j++) {
vec_topk_inds[i][j] = vec_topk_inds[i][j] % (vec_hm_width * vec_hm_height);
topk_ys_item.push_back(int(vec_topk_inds[i][j] / vec_hm_width));
topk_xs_item.push_back(int(vec_topk_inds[i][j] % vec_hm_width));
}
vec_ys.push_back(topk_ys_item);
vec_xs.push_back(topk_xs_item);
topk_ys_item.clear();
topk_xs_item.clear();
std::vector<int>().swap(topk_ys_item);
std::vector<int>().swap(topk_xs_item);
}
std::vector<int> topk_ind;
std::vector<float> topk_view_score;
for (int i = 0; i < topk_scores.size(); i++)//作维度转换,其实可以放在上个函数中
for (int j = 0; j < topk_scores[i].size(); j++) {
topk_view_score.push_back(topk_scores[i][j]);
}
process_top_k(topk_view_score, top_K, topk_score, topk_ind);
topk_view_score.clear();
std::vector<float>().swap(topk_view_score);
for (int i = 0; i < topk_ind.size(); i++) {
topk_clses.push_back(int(topk_ind[i] / top_K));
}
process_gather_feat(vec_topk_inds, topk_ind, topk_inds);
process_gather_feat(vec_ys, topk_ind, topk_ys);
process_gather_feat(vec_xs, topk_ind, topk_xs);
topk_ind.clear();
std::vector<int>().swap(topk_ind);
}
void process_tranpose_and_gather_feat(std::vector<std::vector<float>> vec_reg, std::vector<int> vec_inds,
std::vector<std::vector<float>> &feat) {
process_gather_feat(vec_reg, vec_inds, feat);
}
void ctdet_decode(
int vec_hm_width,
int vec_hm_height,
int top_K,
float conf_thresh,
float cls_theta_thresh,
std::vector<std::vector<float>> vec_hm,
std::vector<std::vector<float>> vec_wh,
std::vector<std::vector<float>> vec_reg,
std::vector<std::vector<float>> vec_cls_theta,
std::vector<std::vector<float>> vec_hmax,
std::vector<std::vector<float>> &vec_detections) {
process_nms(vec_hm, vec_hmax);
std::cout << std::endl << std::endl;
std::vector<float> vec_scores;
std::vector<int> vec_inds;
std::vector<float> vec_clses;
std::vector<float> vec_ys, vec_xs;
process_topk(vec_hm, vec_hm_width, vec_hm_height, top_K, vec_scores, vec_inds, vec_clses, vec_ys, vec_xs);
std::vector<std::vector<float>> vec_regs;
process_tranpose_and_gather_feat(vec_reg, vec_inds, vec_regs); //
for (int i = 0; i < vec_xs.size(); i++) {
vec_xs[i] = vec_xs[i] + vec_regs[i][0];
vec_ys[i] = vec_ys[i] + vec_regs[i][1];
}
std::vector<std::vector<float>> vec_whs;
process_tranpose_and_gather_feat(vec_wh, vec_inds, vec_whs);
std::vector<std::vector<float>> vec_cls_thetas;
process_tranpose_and_gather_feat(vec_cls_theta, vec_inds, vec_cls_thetas);
std::vector<float> vec_tt_x, vec_tt_y, vec_rr_x, vec_rr_y, vec_bb_x, vec_bb_y, vec_ll_x, vec_ll_y;
for (int i = 0; i < vec_cls_thetas.size(); i++) {
int mask = (vec_cls_thetas[i][0] > cls_theta_thresh) ? 1 : 0;
vec_tt_x.push_back(vec_xs[i] + vec_whs[i][0] * mask + vec_xs[i] * (1 - mask));
vec_tt_y.push_back(vec_ys[i] + vec_whs[i][1] * mask + (vec_ys[i] - vec_whs[i][9] / 2) * (1 - mask));
vec_rr_x.push_back(vec_xs[i] + vec_whs[i][2] * mask + (vec_xs[i] + vec_whs[i][8] / 2) * (1 - mask));
vec_rr_y.push_back(vec_ys[i] + vec_whs[i][3] * mask + vec_xs[i] * (1 - mask));
vec_bb_x.push_back(vec_xs[i] + vec_whs[i][4] * mask + vec_xs[i] * (1 - mask));
vec_bb_y.push_back(vec_ys[i] + vec_whs[i][5] * mask + (vec_ys[i] + vec_whs[i][9] / 2) * (1 - mask));
vec_ll_x.push_back(vec_xs[i] + vec_whs[i][6] * mask + (vec_xs[i] - vec_whs[i][8] / 2) * (1 - mask));
vec_ll_y.push_back(vec_ys[i] + vec_whs[i][7] * mask + vec_xs[i] * (1 - mask));
}
std::vector<float> vec_detect_item;
for (int i = 0; i < vec_scores.size(); i++) {
if (vec_scores[i] > conf_thresh) {
vec_detect_item.push_back(vec_xs[i]);
vec_detect_item.push_back(vec_ys[i]);
vec_detect_item.push_back(vec_tt_x[i]);
vec_detect_item.push_back(vec_tt_y[i]);
vec_detect_item.push_back(vec_rr_x[i]);
vec_detect_item.push_back(vec_rr_y[i]);
vec_detect_item.push_back(vec_bb_x[i]);
vec_detect_item.push_back(vec_bb_y[i]);
vec_detect_item.push_back(vec_ll_x[i]);
vec_detect_item.push_back(vec_ll_y[i]);
vec_detect_item.push_back(vec_scores[i]);
vec_detect_item.push_back(vec_clses[i]);
vec_detections.push_back(vec_detect_item);
vec_detect_item.clear();
std::vector<float>().swap(vec_detect_item);
}
}
}
void process_decode_prediction(std::vector<std::vector<float>> pred,
int img_width,
int img_height,
int down_ratio,
std::string category[],
std::map<std::string, std::vector<Ploy>> &map_pts0,
std::map<std::string, std::vector<float>> &map_scores0) {
for (int i = 0; i < pred.size(); i++) {
float cen_pt_0 = pred[i][0];
float cen_pt_1 = pred[i][1];
float tt_2 = pred[i][2];
float tt_3 = pred[i][3];
float rr_4 = pred[i][4];
float rr_5 = pred[i][5];
float bb_6 = pred[i][6];
float bb_7 = pred[i][7];
float ll_8 = pred[i][8];
float ll_9 = pred[i][9];
float tl_0 = tt_2 + ll_8 - cen_pt_0;
float tl_1 = tt_3 + ll_9 - cen_pt_1;
float bl_0 = bb_6 + ll_8 - cen_pt_0;
float bl_1 = bb_7 + ll_9 - cen_pt_1;
float tr_0 = tt_2 + rr_4 - cen_pt_0;
float tr_1 = tt_3 + rr_5 - cen_pt_1;
float br_0 = bb_6 + rr_4 - cen_pt_0;
float br_1 = bb_7 + rr_5 - cen_pt_1;
float score = pred[i][10];
int clse = pred[i][11];
float pts_tr_0 = tr_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_br_0 = br_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_bl_0 = bl_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_tl_0 = tl_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_tr_1 = tr_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_br_1 = br_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_bl_1 = bl_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_tl_1 = tl_1 * down_ratio / IMAGE_TARGET * img_height;
map_pts0[category[clse]].push_back(
Ploy(pts_tr_0, pts_tr_1, pts_br_0, pts_br_1, pts_bl_0, pts_bl_1, pts_tl_0, pts_tl_1));
map_scores0[category[clse]].push_back(score);
}
}
// 进行iou计算
float iou(Ploy &r1, Ploy &r2) {
std::vector<double> p;
p.push_back(r1.tr_0);
p.push_back(r1.tr_1);
p.push_back(r1.br_0);
p.push_back(r1.br_1);
p.push_back(r1.bl_0);
p.push_back(r1.bl_1);
p.push_back(r1.tl_0);
p.push_back(r1.tl_1);
std::vector<double> q;
q.push_back(r2.tr_0);
q.push_back(r2.tr_1);
q.push_back(r2.br_0);
q.push_back(r2.br_1);
q.push_back(r2.bl_0);
q.push_back(r2.bl_1);
q.push_back(r2.tl_0);
q.push_back(r2.tl_1);
double iou = iou_poly(p, q);
p.clear();
std::vector<double>().swap(p);
q.clear();
std::vector<double>().swap(q);
return iou;
}
// 进行nms计算
void single_class_non_max_suppression(std::vector<Ploy> ploys, std::vector<float> confs, std::vector<Ploy> &ans,
std::vector<int> &keep_idx, float conf_thresh, float iou_thresh) {
if (ploys.size() == 0) {
return;
}
std::vector<BBox> bboxes;
BBox bbox;
for (int i = 0; i < (int) ploys.size(); ++i) {
bboxes.push_back(BBox(i, confs[i], ploys[i]));
}
// 对bbox的conf进行降序排序
sort(bboxes.begin(), bboxes.end(), [&](const BBox &a, const BBox &b) {
return a.conf > b.conf;
});
while (!bboxes.empty()) {
bbox = bboxes[0];
if (bbox.conf < conf_thresh) {
break;
}
keep_idx.emplace_back(bbox.idx);
bboxes.erase(bboxes.begin());
// 让conf最高的bbox与其他剩余的bbox进行iou计算
int size = bboxes.size();
for (int i = 0; i < size; ++i) {
float iou_ans = iou(bbox.ploy, bboxes[i].ploy);
if (iou_ans > iou_thresh) {
bboxes.erase(bboxes.begin() + i);
size = bboxes.size();
i = i - 1;
}
}
}
for (const int number : keep_idx) {
ans.push_back(ploys[number]);
}
}
void non_maximum_suppression(std::vector<Ploy> pts0_item_second, std::vector<float> scores0_item_second,
std::vector<Ploy> &ans, std::vector<int> &keep_idxs) {
float iou_thresh = 0.9;
float conf_thresh = 0;
single_class_non_max_suppression(pts0_item_second, scores0_item_second, ans, keep_idxs, conf_thresh, iou_thresh);
}
int main(int argc, char **argv) {
int top_K = 500;
float conf_thresh = 0.18f;
float cls_theta_thresh = 0.8f;
int down_ratio = 4;
std::string category[] = {"plane", "baseball-diamond", "bridge", "ground-track-field", "small-vehicle",
"large-vehicle", "ship", "tennis-court", "basketball-court", "storage-tank",
"soccer-ball-field", "roundabout", "harbor", "swimming-pool", "helicopter"
};
cv::Mat img = cv::imread("F:\\BBAVectors-Oriented-Object-Detection\\Dotasets\\dota\\images\\P0706.png");
int img_width = img.cols;
int img_height = img.rows;
ncnn::Net vector_net;
vector_net.load_param("F:\\BBAVectors-Oriented-Object-Detection\\weights_dota\\model_50_sim_pooling.param");
vector_net.load_model("F:\\BBAVectors-Oriented-Object-Detection\\weights_dota\\model_50_sim_pooling.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR, img.cols, img.rows, IMAGE_TARGET,
IMAGE_TARGET);
printf("input shape: %d %d %d %d\n", in.dims, in.h, in.w, in.c);
const float mean_vals[3] = {0.5f * 255.0f, 0.5f * 255.0f, 0.5f * 255.0f};
const float norm_vals[3] = {1.0f / 255.0f, 1.0f / 255.0f, 1.f / 255.0f};
in.substract_mean_normalize(mean_vals, norm_vals);
//pretty_print(in);
ncnn::Extractor ex = vector_net.create_extractor();
ex.input("input", in);
ncnn::Mat hm, wh, reg, cls_theta, hmax;
ex.extract("1104", cls_theta);
ex.extract("1097", wh);
ex.extract("1100", reg);
ex.extract("260", hm);
ex.extract("output", hmax);
printf("hm shape: %d %d %d %d\n", hm.dims, hm.c, hm.h, hm.w);
printf("wh shape: %d %d %d %d\n", wh.dims, wh.c, wh.h, wh.w);
printf("reg shape: %d %d %d %d\n", reg.dims, reg.c, reg.h, reg.w);
printf("cls_theta shape: %d %d %d %d\n", cls_theta.dims, cls_theta.c, cls_theta.h, cls_theta.w);
printf("hm_pool shape: %d %d %d %d\n", hmax.dims, hmax.c, hmax.h, hmax.w);
std::cout << std::endl << "hm" << std::endl;
//建议保持训练长度和宽度一致,否则这个代码的改~ 优化代码就不放了 自己优化吧 加速吧
//写的尽量不依赖ncnn ,还要进行mnn的转化
std::vector<std::vector<float>> vec_hm, vec_wh, vec_reg, vec_cls_theta, vec_hmax;
int vec_hm_width = hm.w;
int vec_hm_height = hm.h;
std::vector<float> vec_item;
for (int i = 0; i < hm.c; i++) {
for (int j = 0; j < hm.w * hm.h; j++) {
vec_item.push_back(hm[i * hm.w * hm.h + j]);
}
vec_hm.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
for (int i = 0; i < wh.w * wh.h; i++) {
for (int j = 0; j < wh.c; j++) {
vec_item.push_back(wh[j * wh.w * wh.h + i]);
}
vec_wh.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
for (int i = 0; i < reg.w * reg.h; i++) //permute(0, 2, 3, 1)
{
for (int j = 0; j < reg.c; j++) {
vec_item.push_back(reg[i + j * reg.w * reg.h]);
}
vec_reg.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
for (int i = 0; i < cls_theta.w * cls_theta.h; i++) {
for (int j = 0; j < cls_theta.c; j++) {
vec_item.push_back(cls_theta[j * cls_theta.w * cls_theta.h + i]);
}
vec_cls_theta.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
for (int i = 0; i < hmax.c; i++) {
for (int j = 0; j < hmax.w * hmax.h; j++) {
vec_item.push_back(hmax[i * hmax.w * hmax.h + j]);
}
vec_hmax.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
//写的尽量不依赖ncnn ,还要进行mnn的转化
std::vector<std::vector<float>> vec_detections;
//xs,ys, tt_x,tt_y,rr_x,rr_y,bb_x,bb_y,ll_x,ll_y,scores,clses
ctdet_decode(vec_hm_width,
vec_hm_height,
top_K,
conf_thresh,
cls_theta_thresh,
vec_hm,
vec_wh,
vec_reg,
vec_cls_theta,
vec_hmax,
vec_detections);
std::map<std::string, std::vector<Ploy>> map_pts0;
std::map<std::string, std::vector<float>> map_scores0;
// pts_tr_0 pts_tr_1 pts_br_0 pts_br_1 pts_bl_0 pts_bl_1 pts_tl_0 pts_tl_1 pts_tr_0 pts_tr_1
// score
process_decode_prediction(vec_detections, img_width, img_height, down_ratio, category, map_pts0, map_scores0);
std::vector<Ploy> ans;
std::vector<int> keep_idxs;
std::vector<std::vector<cv::Point>> contours;
for (auto pts0_item = map_pts0.begin(); pts0_item != map_pts0.end(); pts0_item++) {
non_maximum_suppression(pts0_item->second, map_scores0[pts0_item->first], ans, keep_idxs);
for (int j = 0; j < ans.size(); j++) {
std::vector<cv::Point> contours_item;
float score = map_scores0[pts0_item->first][keep_idxs[j]];
float tl_0 = ans[keep_idxs[j]].tr_0;
float tl_1 = ans[keep_idxs[j]].tr_1;
float tr_0 = ans[keep_idxs[j]].br_0;
float tr_1 = ans[keep_idxs[j]].br_1;
float br_0 = ans[keep_idxs[j]].bl_0;
float br_1 = ans[keep_idxs[j]].bl_1;
float bl_0 = ans[keep_idxs[j]].tl_0;
float bl_1 = ans[keep_idxs[j]].tl_1;
contours_item.push_back(cv::Point(tl_0, tl_1));
contours_item.push_back(cv::Point(tr_0, tr_1));
contours_item.push_back(cv::Point(br_0, br_1));
contours_item.push_back(cv::Point(bl_0, bl_1));
contours.push_back(contours_item);
float tt_0 = (tl_0 + tr_0) / 2;
float tt_1 = (tl_1 + tr_1) / 2;
float rr_0 = (tr_0 + br_0) / 2;
float rr_1 = (tr_1 + br_1) / 2;
float bb_0 = (bl_0 + br_0) / 2;
float bb_1 = (bl_1 + br_1) / 2;
float ll_0 = (tl_0 + bl_0) / 2;
float ll_1 = (tl_1 + bl_1) / 2;
float cen_pts_0 = (tt_0 + rr_0 + bb_0 + ll_0) / 4;
float cen_pts_1 = (tt_1 + rr_1 + bb_1 + ll_1) / 4;
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(tt_0), int(tt_1)), (0, 0, 255), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(rr_0), int(rr_1)), (255, 0, 255), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(bb_0), int(bb_1)), (0, 255, 0), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(ll_0), int(ll_1)), (255, 0, 0), 1,
1);
cv::drawContours(img, contours, -1, cv::Scalar(0, 255, 0), 1, 1);
char text[256];
sprintf(text, "%s %.1f%%", (pts0_item->first).c_str(), score * 100);
int baseLine = 0;
cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
cv::putText(img, text, cv::Point(int(cen_pts_0), int(cen_pts_1 + label_size.height)),
cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 0));
contours_item.clear();
std::vector<cv::Point>().swap(contours_item);
}
ans.clear();
std::vector<Ploy>().swap(ans);
keep_idxs.clear();
std::vector<int>().swap(keep_idxs);
contours.clear();
std::vector<std::vector<cv::Point>>().swap(contours);
}
cv::imshow("image", img);
cv::imwrite("image.jpg", img);
cv::waitKey(0);
return 0;
}
c++的测试结果
第六步:转mnn的模型
D:\MNN\buildMinGW>MNNConvert -f ONNX --modelFile F:\BBAVectors-Oriented-Object-Detection\weights_dota\model_50_sim_pooling.onnx --MNNModel F:\BBAVectors-Oriented-Object-Detection\weights_dota\model_50_sim_pooling.mnn --bizCode MNN
Start to Convert Other Model Format To MNN Model...
[22:18:11] D:\MNN\tools\converter\source\onnx\onnxConverter.cpp:30: ONNX Model ir version: 7
Start to Optimize the MNN Net...
inputTensors : [ input, ]
outputTensors: [ 1097, 1100, 1104, output, ]
Converted Success!
mnn测试代码
#include <iostream>
#include <ostream>
#include <random>
#include <vector>
#include <opencv2/opencv.hpp>
#include <iostream>
#include<opencv2/core.hpp>
#include<opencv2/imgproc.hpp>
#include<opencv2/highgui.hpp>
#include<MNN/Interpreter.hpp>
#include<MNN/ImageProcess.hpp>
#include "polyiou.h"
#define IMAGE_TARGET 608
struct Ploy {
float tr_0;
float tr_1;
float br_0;
float br_1;
float bl_0;
float bl_1;
float tl_0;
float tl_1;
Ploy() : tr_0(0), tr_1(0), br_0(0), br_1(0), bl_0(0), bl_1(0), tl_0(0), tl_1(0) {}
Ploy(float tr_0, float tr_1, float br_0, float br_1, float bl_0, float bl_1, float tl_0, float tl_1) : tr_0(tr_0),
tr_1(tr_1),
br_0(br_0),
br_1(br_1),
bl_0(bl_0),
bl_1(bl_1),
tl_0(tl_0),
tl_1(tl_1) {}
};
struct BBox {
int idx;
float conf;
Ploy ploy;
BBox() : idx(0), conf(0), ploy(0, 0, 0, 0, 0, 0, 0, 0) {}
BBox(int idx, float conf, Ploy ploy) : idx(idx), conf(conf), ploy(ploy) {}
};
void process_nms(std::vector<std::vector<float>> &vec_hm, std::vector<std::vector<float>> &vec_hmax) {
for (int i = 0; i < vec_hmax.size(); i++) {
for (int j = 0; j < vec_hmax[i].size(); j++) {
int a = round(vec_hm[i][j] * 10000);
int b = round(vec_hmax[i][j] * 10000);
vec_hm[i][j] = (a != b) ? 0.0f : vec_hm[i][j];
}
}
}
void process_top_k(std::vector<float> scores, int top_K, std::vector<float> &scores_K, std::vector<int> &index_K) {
std::vector<int> idx(scores.size());
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(),
[&scores](int index_1, int index_2) { return scores[index_1] > scores[index_2]; });
// 获取K值
int k_num = std::min<int>(scores.size(), top_K);
int idx_j = 0;
for (int j = 0; j < k_num; ++j) {
idx_j = idx[j];
index_K.push_back(idx_j);
scores_K.push_back(scores[idx_j]);
}
}
void process_gather_feat(std::vector<std::vector<float>> vec_reg, std::vector<int> ind,
std::vector<std::vector<float>> &item) {
for (int i = 0; i < ind.size(); i++) {
item.push_back(vec_reg[ind[i]]);
}
}
void
process_gather_feat(std::vector<std::vector<int>> topk_inds_item, std::vector<int> topk_ind, std::vector<int> &item) {
for (int i = 0; i < topk_ind.size(); i++) {
item.push_back(topk_inds_item[topk_ind[i] / topk_inds_item[0].size()][topk_ind[i] % topk_inds_item[0].size()]);
}
}
void
process_gather_feat(std::vector<std::vector<int>> topk_inds_item, std::vector<int> topk_ind, std::vector<float> &item) {
for (int i = 0; i < topk_ind.size(); i++) {
item.push_back(topk_inds_item[topk_ind[i] / topk_inds_item[0].size()][topk_ind[i] % topk_inds_item[0].size()]);
}
}
void process_topk(std::vector<std::vector<float>> hm,
int vec_hm_width,
int vec_hm_height,
int top_K,
std::vector<float> &topk_score,
std::vector<int> &topk_inds,
std::vector<float> &topk_clses,
std::vector<float> &topk_ys,
std::vector<float> &topk_xs) {
std::vector<std::vector<int>> vec_topk_inds;
std::vector<std::vector<float>> topk_scores;
for (int i = 0; i < hm.size(); i++) {
std::vector<int> index_K;
std::vector<float> scores_K;
process_top_k(hm[i], top_K, scores_K, index_K);
topk_scores.push_back(scores_K);
vec_topk_inds.push_back(index_K);
index_K.clear();
scores_K.clear();
std::vector<int>().swap(index_K);
std::vector<float>().swap(scores_K);
}
std::vector<std::vector<int>> vec_ys;
std::vector<std::vector<int>> vec_xs;
std::vector<int> topk_ys_item;
std::vector<int> topk_xs_item;
for (int i = 0; i < vec_topk_inds.size(); i++) {
for (int j = 0; j < vec_topk_inds[i].size(); j++) {
vec_topk_inds[i][j] = vec_topk_inds[i][j] % (vec_hm_width * vec_hm_height);
topk_ys_item.push_back(int(vec_topk_inds[i][j] / vec_hm_width));
topk_xs_item.push_back(int(vec_topk_inds[i][j] % vec_hm_width));
}
vec_ys.push_back(topk_ys_item);
vec_xs.push_back(topk_xs_item);
topk_ys_item.clear();
topk_xs_item.clear();
std::vector<int>().swap(topk_ys_item);
std::vector<int>().swap(topk_xs_item);
}
std::vector<int> topk_ind;
std::vector<float> topk_view_score;
for (int i = 0; i < topk_scores.size(); i++)//作维度转换,其实可以放在上个函数中
for (int j = 0; j < topk_scores[i].size(); j++) {
topk_view_score.push_back(topk_scores[i][j]);
}
process_top_k(topk_view_score, top_K, topk_score, topk_ind);
topk_view_score.clear();
std::vector<float>().swap(topk_view_score);
for (int i = 0; i < topk_ind.size(); i++) {
topk_clses.push_back(int(topk_ind[i] / top_K));
}
process_gather_feat(vec_topk_inds, topk_ind, topk_inds);
process_gather_feat(vec_ys, topk_ind, topk_ys);
process_gather_feat(vec_xs, topk_ind, topk_xs);
topk_ind.clear();
std::vector<int>().swap(topk_ind);
}
void process_tranpose_and_gather_feat(std::vector<std::vector<float>> vec_reg, std::vector<int> vec_inds,
std::vector<std::vector<float>> &feat) {
process_gather_feat(vec_reg, vec_inds, feat);
}
void ctdet_decode(
int vec_hm_width,
int vec_hm_height,
int top_K,
float conf_thresh,
float cls_theta_thresh,
std::vector<std::vector<float>> vec_hm,
std::vector<std::vector<float>> vec_wh,
std::vector<std::vector<float>> vec_reg,
std::vector<std::vector<float>> vec_cls_theta,
std::vector<std::vector<float>> vec_hmax,
std::vector<std::vector<float>> &vec_detections) {
process_nms(vec_hm, vec_hmax);
std::cout << std::endl << std::endl;
std::vector<float> vec_scores;
std::vector<int> vec_inds;
std::vector<float> vec_clses;
std::vector<float> vec_ys, vec_xs;
process_topk(vec_hm, vec_hm_width, vec_hm_height, top_K, vec_scores, vec_inds, vec_clses, vec_ys, vec_xs);
std::vector<std::vector<float>> vec_regs;
process_tranpose_and_gather_feat(vec_reg, vec_inds, vec_regs); //
for (int i = 0; i < vec_xs.size(); i++) {
vec_xs[i] = vec_xs[i] + vec_regs[i][0];
vec_ys[i] = vec_ys[i] + vec_regs[i][1];
}
std::vector<std::vector<float>> vec_whs;
process_tranpose_and_gather_feat(vec_wh, vec_inds, vec_whs);
std::vector<std::vector<float>> vec_cls_thetas;
process_tranpose_and_gather_feat(vec_cls_theta, vec_inds, vec_cls_thetas);
std::vector<float> vec_tt_x, vec_tt_y, vec_rr_x, vec_rr_y, vec_bb_x, vec_bb_y, vec_ll_x, vec_ll_y;
for (int i = 0; i < vec_cls_thetas.size(); i++) {
int mask = (vec_cls_thetas[i][0] > cls_theta_thresh) ? 1 : 0;
vec_tt_x.push_back(vec_xs[i] + vec_whs[i][0] * mask + vec_xs[i] * (1 - mask));
vec_tt_y.push_back(vec_ys[i] + vec_whs[i][1] * mask + (vec_ys[i] - vec_whs[i][9] / 2) * (1 - mask));
vec_rr_x.push_back(vec_xs[i] + vec_whs[i][2] * mask + (vec_xs[i] + vec_whs[i][8] / 2) * (1 - mask));
vec_rr_y.push_back(vec_ys[i] + vec_whs[i][3] * mask + vec_xs[i] * (1 - mask));
vec_bb_x.push_back(vec_xs[i] + vec_whs[i][4] * mask + vec_xs[i] * (1 - mask));
vec_bb_y.push_back(vec_ys[i] + vec_whs[i][5] * mask + (vec_ys[i] + vec_whs[i][9] / 2) * (1 - mask));
vec_ll_x.push_back(vec_xs[i] + vec_whs[i][6] * mask + (vec_xs[i] - vec_whs[i][8] / 2) * (1 - mask));
vec_ll_y.push_back(vec_ys[i] + vec_whs[i][7] * mask + vec_xs[i] * (1 - mask));
}
std::vector<float> vec_detect_item;
for (int i = 0; i < vec_scores.size(); i++) {
if (vec_scores[i] > conf_thresh) {
vec_detect_item.push_back(vec_xs[i]);
vec_detect_item.push_back(vec_ys[i]);
vec_detect_item.push_back(vec_tt_x[i]);
vec_detect_item.push_back(vec_tt_y[i]);
vec_detect_item.push_back(vec_rr_x[i]);
vec_detect_item.push_back(vec_rr_y[i]);
vec_detect_item.push_back(vec_bb_x[i]);
vec_detect_item.push_back(vec_bb_y[i]);
vec_detect_item.push_back(vec_ll_x[i]);
vec_detect_item.push_back(vec_ll_y[i]);
vec_detect_item.push_back(vec_scores[i]);
vec_detect_item.push_back(vec_clses[i]);
vec_detections.push_back(vec_detect_item);
vec_detect_item.clear();
std::vector<float>().swap(vec_detect_item);
}
}
}
void process_decode_prediction(std::vector<std::vector<float>> pred,
int img_width,
int img_height,
int down_ratio,
std::string category[],
std::map<std::string, std::vector<Ploy>> &map_pts0,
std::map<std::string, std::vector<float>> &map_scores0) {
for (int i = 0; i < pred.size(); i++) {
float cen_pt_0 = pred[i][0];
float cen_pt_1 = pred[i][1];
float tt_2 = pred[i][2];
float tt_3 = pred[i][3];
float rr_4 = pred[i][4];
float rr_5 = pred[i][5];
float bb_6 = pred[i][6];
float bb_7 = pred[i][7];
float ll_8 = pred[i][8];
float ll_9 = pred[i][9];
float tl_0 = tt_2 + ll_8 - cen_pt_0;
float tl_1 = tt_3 + ll_9 - cen_pt_1;
float bl_0 = bb_6 + ll_8 - cen_pt_0;
float bl_1 = bb_7 + ll_9 - cen_pt_1;
float tr_0 = tt_2 + rr_4 - cen_pt_0;
float tr_1 = tt_3 + rr_5 - cen_pt_1;
float br_0 = bb_6 + rr_4 - cen_pt_0;
float br_1 = bb_7 + rr_5 - cen_pt_1;
float score = pred[i][10];
int clse = pred[i][11];
float pts_tr_0 = tr_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_br_0 = br_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_bl_0 = bl_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_tl_0 = tl_0 * down_ratio / IMAGE_TARGET * img_width;
float pts_tr_1 = tr_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_br_1 = br_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_bl_1 = bl_1 * down_ratio / IMAGE_TARGET * img_height;
float pts_tl_1 = tl_1 * down_ratio / IMAGE_TARGET * img_height;
map_pts0[category[clse]].push_back(
Ploy(pts_tr_0, pts_tr_1, pts_br_0, pts_br_1, pts_bl_0, pts_bl_1, pts_tl_0, pts_tl_1));
map_scores0[category[clse]].push_back(score);
}
}
// 进行iou计算
float iou(Ploy &r1, Ploy &r2) {
std::vector<double> p;
p.push_back(r1.tr_0);
p.push_back(r1.tr_1);
p.push_back(r1.br_0);
p.push_back(r1.br_1);
p.push_back(r1.bl_0);
p.push_back(r1.bl_1);
p.push_back(r1.tl_0);
p.push_back(r1.tl_1);
std::vector<double> q;
q.push_back(r2.tr_0);
q.push_back(r2.tr_1);
q.push_back(r2.br_0);
q.push_back(r2.br_1);
q.push_back(r2.bl_0);
q.push_back(r2.bl_1);
q.push_back(r2.tl_0);
q.push_back(r2.tl_1);
double iou = iou_poly(p, q);
p.clear();
std::vector<double>().swap(p);
q.clear();
std::vector<double>().swap(q);
return iou;
}
// 进行nms计算
void single_class_non_max_suppression(std::vector<Ploy> ploys, std::vector<float> confs, std::vector<Ploy> &ans,
std::vector<int> &keep_idx, float conf_thresh, float iou_thresh) {
if (ploys.size() == 0) {
return;
}
std::vector<BBox> bboxes;
BBox bbox;
for (int i = 0; i < (int) ploys.size(); ++i) {
bboxes.push_back(BBox(i, confs[i], ploys[i]));
}
// 对bbox的conf进行降序排序
sort(bboxes.begin(), bboxes.end(), [&](const BBox &a, const BBox &b) {
return a.conf > b.conf;
});
while (!bboxes.empty()) {
bbox = bboxes[0];
if (bbox.conf < conf_thresh) {
break;
}
keep_idx.emplace_back(bbox.idx);
bboxes.erase(bboxes.begin());
// 让conf最高的bbox与其他剩余的bbox进行iou计算
int size = bboxes.size();
for (int i = 0; i < size; ++i) {
float iou_ans = iou(bbox.ploy, bboxes[i].ploy);
if (iou_ans > iou_thresh) {
bboxes.erase(bboxes.begin() + i);
size = bboxes.size();
i = i - 1;
}
}
}
for (const int number : keep_idx) {
ans.push_back(ploys[number]);
}
}
void non_maximum_suppression(std::vector<Ploy> pts0_item_second, std::vector<float> scores0_item_second,
std::vector<Ploy> &ans, std::vector<int> &keep_idxs) {
float iou_thresh = 0.9;
float conf_thresh = 0;
single_class_non_max_suppression(pts0_item_second, scores0_item_second, ans, keep_idxs, conf_thresh, iou_thresh);
}
int main(int argc, char **argv) {
int top_K = 500;
float conf_thresh = 0.18f;
float cls_theta_thresh = 0.8f;
int down_ratio = 4;
std::string category[] = {"plane", "baseball-diamond", "bridge", "ground-track-field", "small-vehicle",
"large-vehicle", "ship", "tennis-court", "basketball-court", "storage-tank",
"soccer-ball-field", "roundabout", "harbor", "swimming-pool", "helicopter"
};
std::vector<std::vector<float>> vec_hm, vec_wh, vec_reg, vec_cls_theta, vec_hmax;
cv::Mat img = cv::imread("F:\\BBAVectors-Oriented-Object-Detection\\Dotasets\\dota\\images\\P0706.png");
int img_width = img.cols;
int img_height = img.rows;
std::vector<float> meanVals ={ -0.5f, -0.5f , -0.5f};;
std::vector<float> normVals= { 1.0f / 255.f,1.0f / 255.f,1.0f / 255.f};
cv::Mat img_resized;
cv::resize(img.clone(), img_resized, cv::Size(IMAGE_TARGET, IMAGE_TARGET));
auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("F:\\BBAVectors-Oriented-Object-Detection\\weights_dota\\model_50_sim_pooling.mnn"));//创建解释器
std::cout << "Interpreter created" << std::endl;
MNN::ScheduleConfig config;
config.saveTensors.push_back("260");//这里需要添加 层名 取mnn中间层的数据
config.numThread = 8;
config.type = MNN_FORWARD_CPU;
auto session = net->createSession(config);//创建session
std::cout << "session created" << std::endl;
auto inTensor = net->getSessionInput(session, NULL);
auto outTensor = net->getSessionInput(session, NULL);
auto _Tensor = MNN::Tensor::create<float>({1,3,IMAGE_TARGET,IMAGE_TARGET}, NULL, MNN::Tensor::CAFFE);
if(_Tensor->elementSize()!=3*IMAGE_TARGET*IMAGE_TARGET)
{
std::cout<<_Tensor->elementSize()<<" "<<img_resized.channels()*img_resized.cols*img_resized.rows<<std::endl;
std::cout<<"input shape not equal image shape"<<std::endl;
return -1;
}
std::vector<cv::Mat> rgbChannels(3);
cv::split(img_resized, rgbChannels);
for (auto i = 0; i < rgbChannels.size(); i++) {
rgbChannels[i].convertTo(rgbChannels[i], CV_32FC1, normVals[i], meanVals[i]);
for(int j=0;j<rgbChannels[i].rows;j++) {
for (int k = 0; k < rgbChannels[i].cols; k++) {
_Tensor->host<float>()[i*IMAGE_TARGET*IMAGE_TARGET+j*IMAGE_TARGET+k] =rgbChannels[i].at<float>(j, k);
//printf("%f ",rgbChannels[i].at<float>(j, k));
}
}
}
inTensor->copyFromHostTensor(_Tensor);
//推理
net->runSession(session);
std::vector<float> vec_item;
auto output_wh= net->getSessionOutput(session, "1097");
MNN::Tensor wh(output_wh, output_wh->getDimensionType());
output_wh->copyToHostTensor(&wh);
auto output_ptr = wh.host<float>();
std::cout<<"wh = "<<wh.height()<<" "<<wh.width()<<" "<<wh.channel()<<" "<<wh.elementSize()<<std::endl;
for (int i = 0; i < wh.width() * wh.height(); i++) {
for (int j = 0; j < wh.channel(); j++) {
vec_item.push_back(output_ptr[j * wh.width() * wh.height() + i]);
}
vec_wh.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
auto output_reg= net->getSessionOutput(session, "1100");
MNN::Tensor reg(output_reg, output_reg->getDimensionType());
output_reg->copyToHostTensor(®);
output_ptr = reg.host<float>();
std::cout<<"reg = "<<reg.height()<<" "<<reg.width()<<" "<<reg.channel()<<" "<<reg.elementSize()<<std::endl;
for (int i = 0; i < reg.width() * reg.height(); i++) //permute(0, 2, 3, 1)
{
for (int j = 0; j < reg.channel(); j++) {
vec_item.push_back(output_ptr[i + j * reg.width() * reg.height()]);
}
vec_reg.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
auto output_cls_theta= net->getSessionOutput(session, "1104");
MNN::Tensor cls_theta(output_cls_theta, output_cls_theta->getDimensionType());
output_cls_theta->copyToHostTensor(&cls_theta);
output_ptr = cls_theta.host<float>();
std::cout<<"cls_theta = "<<cls_theta.height()<<" "<<cls_theta.width()<<" "<<cls_theta.channel()<<" "<<cls_theta.elementSize()<<std::endl;
for (int i = 0; i < cls_theta.width() * cls_theta.height(); i++) {
for (int j = 0; j < cls_theta.channel(); j++) {
vec_item.push_back(output_ptr[j * cls_theta.width() * cls_theta.height() + i]);
}
vec_cls_theta.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
auto output= net->getSessionOutput(session, "output");
MNN::Tensor hmax(output, output->getDimensionType());
output->copyToHostTensor(&hmax);
std::cout<<"hmax = "<<hmax.height()<<" "<<hmax.width()<<" "<<hmax.channel()<<" "<<hmax.elementSize()<<std::endl;
output_ptr = hmax.host<float>();
for (int i = 0; i < hmax.channel(); i++) {
for (int j = 0; j < hmax.width() * hmax.height(); j++) {
vec_item.push_back(output_ptr[i * hmax.width() * hmax.height() + j]);
}
vec_hm.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
auto output_hm= net->getSessionOutput(session, "260");
MNN::Tensor hm(output_hm, output_hm->getDimensionType());
output_hm->copyToHostTensor(&hm);
output_ptr = hm.host<float>();
std::cout<<"hm = "<<hm.height()<<" "<<hm.width()<<" "<<hm.channel()<<" "<<hm.elementSize()<<std::endl;
int vec_hm_width=hm.width();
int vec_hm_height=hm.height();
for (int i = 0; i < hm.channel(); i++) {
for (int j = 0; j < hm.width() * hm.height(); j++) {
vec_item.push_back(output_ptr[i * hm.width() * hm.height() + j]);
}
vec_hmax.push_back(vec_item);
vec_item.clear();
std::vector<float>().swap(vec_item);
}
int a=0;
// 写的尽量不依赖ncnn ,还要进行mnn的转化
std::vector<std::vector<float>> vec_detections;
//xs,ys, tt_x,tt_y,rr_x,rr_y,bb_x,bb_y,ll_x,ll_y,scores,clses
ctdet_decode(vec_hm_width,
vec_hm_height,
top_K,
conf_thresh,
cls_theta_thresh,
vec_hm,
vec_wh,
vec_reg,
vec_cls_theta,
vec_hmax,
vec_detections);
std::map<std::string, std::vector<Ploy>> map_pts0;
std::map<std::string, std::vector<float>> map_scores0;
// pts_tr_0 pts_tr_1 pts_br_0 pts_br_1 pts_bl_0 pts_bl_1 pts_tl_0 pts_tl_1 pts_tr_0 pts_tr_1
// score
process_decode_prediction(vec_detections, img_width, img_height, down_ratio, category, map_pts0, map_scores0);
std::vector<Ploy> ans;
std::vector<int> keep_idxs;
std::vector<std::vector<cv::Point>> contours;
for (auto pts0_item = map_pts0.begin(); pts0_item != map_pts0.end(); pts0_item++) {
non_maximum_suppression(pts0_item->second, map_scores0[pts0_item->first], ans, keep_idxs);
for (int j = 0; j < ans.size(); j++) {
std::vector<cv::Point> contours_item;
float score = map_scores0[pts0_item->first][keep_idxs[j]];
float tl_0 = ans[keep_idxs[j]].tr_0;
float tl_1 = ans[keep_idxs[j]].tr_1;
float tr_0 = ans[keep_idxs[j]].br_0;
float tr_1 = ans[keep_idxs[j]].br_1;
float br_0 = ans[keep_idxs[j]].bl_0;
float br_1 = ans[keep_idxs[j]].bl_1;
float bl_0 = ans[keep_idxs[j]].tl_0;
float bl_1 = ans[keep_idxs[j]].tl_1;
contours_item.push_back(cv::Point(tl_0, tl_1));
contours_item.push_back(cv::Point(tr_0, tr_1));
contours_item.push_back(cv::Point(br_0, br_1));
contours_item.push_back(cv::Point(bl_0, bl_1));
contours.push_back(contours_item);
float tt_0 = (tl_0 + tr_0) / 2;
float tt_1 = (tl_1 + tr_1) / 2;
float rr_0 = (tr_0 + br_0) / 2;
float rr_1 = (tr_1 + br_1) / 2;
float bb_0 = (bl_0 + br_0) / 2;
float bb_1 = (bl_1 + br_1) / 2;
float ll_0 = (tl_0 + bl_0) / 2;
float ll_1 = (tl_1 + bl_1) / 2;
float cen_pts_0 = (tt_0 + rr_0 + bb_0 + ll_0) / 4;
float cen_pts_1 = (tt_1 + rr_1 + bb_1 + ll_1) / 4;
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(tt_0), int(tt_1)), (0, 0, 255), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(rr_0), int(rr_1)), (255, 0, 255), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(bb_0), int(bb_1)), (0, 255, 0), 1,
1);
cv::line(img, cv::Point(int(cen_pts_0), int(cen_pts_1)), cv::Point(int(ll_0), int(ll_1)), (255, 0, 0), 1,
1);
cv::drawContours(img, contours, -1, cv::Scalar(0, 255, 0), 1, 1);
char text[256];
sprintf(text, "%s %.1f%%", (pts0_item->first).c_str(), score * 100);
int baseLine = 0;
cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
cv::putText(img, text, cv::Point(int(cen_pts_0), int(cen_pts_1 + label_size.height)),
cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 0));
contours_item.clear();
std::vector<cv::Point>().swap(contours_item);
}
ans.clear();
std::vector<Ploy>().swap(ans);
keep_idxs.clear();
std::vector<int>().swap(keep_idxs);
contours.clear();
std::vector<std::vector<cv::Point>>().swap(contours);
}
cv::imshow("image", img);
cv::imwrite("image.jpg", img);
cv::waitKey(0);
return 0;
}
测试结果
第七步:开始自己的业务训练,在上面的脚本生成对应png图片和labeltxt之后,进行划分一下训练集和测试集
import os
from glob import glob
import shutil
import random
ann_txt = r"C:\Users\PHILIPS\Desktop\datasets\labelTxt"
train_txt = r"C:\Users\PHILIPS\Desktop\datasets\trainval.txt"
test_txt = r"C:\Users\PHILIPS\Desktop\datasets\test.txt"
val_txt = r"C:\Users\PHILIPS\Desktop\datasets\val.txt"
train_num = 0.8
test_num = 0.2
val_num = 0
if os.path.isfile(train_txt):
os.remove(train_txt)
if os.path.isfile(test_txt):
os.remove(test_txt)
if os.path.isfile(val_txt):
os.remove(val_txt)
txt_list = glob(os.path.join(ann_txt, ".".join(["*", "txt"])))
train_list = random.sample(txt_list, int(train_num * len(txt_list)))
test_list = random.sample(txt_list, int(test_num * len(txt_list)))
val_list = random.sample(txt_list, int(val_num * len(txt_list)))
for idx, item in enumerate(txt_list):
source_file_txt = os.path.join(ann_txt, item)
(filepath, tempfilename) = os.path.split(item)
(filename, extension) = os.path.splitext(tempfilename)
if item in train_list:
with open(train_txt, "a") as f:
f.write(filename+"\n")
elif item in val_list:
with open(val_txt, "a") as f:
f.write(filename + "\n")
else:
with open(test_txt, "a") as f:
f.write(filename + "\n")
print("copy txt into file")
print("complish")
整个目录结构为
ubuntu@ubuntu-Super-Server:~/sxj731533730/BBAVectors-Oriented-Object-Detection/trainData$ tree -L 1
.
├── images
├── labelTxt
├── test.txt
└── trainval.txt
2 directories, 2 files
修改这个文件的 BBAVectors-Oriented-Object-Detection\datasetsa\dataset_dota.py 类别为
# self.category = ['plane',
# 'baseball-diamond',
# 'bridge',
# 'ground-track-field',
# 'small-vehicle',
# 'large-vehicle',
# 'ship',
# 'tennis-court',
# 'basketball-court',
# 'storage-tank',
# 'soccer-ball-field',
# 'roundabout',
# 'harbor',
# 'swimming-pool',
# 'helicopter'
# ]
self.category = ['card']
main.py
num_classes = {'dota': 15, 'hrsc': 1} 修改为
num_classes = {'dota': 1, 'hrsc': 1}
训练开始
ubuntu@ubuntu-Super-Server:~/sxj731533730/BBAVectors-Oriented-Object-Detection$ ubuntu@ubuntu-Super-Server:~/sxj731533730/BBAVectors-Oriented-Object-Detection$ python3 main.py --data_dir trainData --num_epoch 80 --batch_size 15 --dataset dota --phase train --input_h 320 --input_w 320 --ngpus 4
(main.py:933542): Gdk-CRITICAL **: 02:32:20.377: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
Setting up data...
Starting training...
----------
Epoch: 1/80
train loss: 33.940561783023
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
----------
Epoch: 2/80
train loss: 7.3992760588483115
----------
Epoch: 3/80
train loss: 5.412965937358577
----------
Epoch: 4/80
...
Epoch: 76/80
train loss: 0.1119781458242373
----------
Epoch: 77/80
train loss: 0.1125242995436896
----------
Epoch: 78/80
train loss: 0.14691200022670356
----------
Epoch: 79/80
train loss: 0.1159047374332493
----------
Epoch: 80/80
train loss: 0.03619453172114762
只训练了80轮,数据集只用了40张640*480的大小图, dota训练的效果 识别有问题的主要原因是 我将图片进行旋转 造成图片尺寸大小不统一造成的,等我写个镜像处理一下数据集在更更新,待补充
hsrc训练过程
目录结构
ubuntu@ubuntu-Super-Server:~/sxj731533730/BBAVectors-Oriented-Object-Detection/trainDataHSRC$ tree -L 1
.
├── AllImages
├── Annotations
├── test.txt
├── train.txt
└── val.txt
2 directories, 3 files
换分数据集
import os
from glob import glob
import shutil
import random
ann_txt = r"G:\1\black_total"
train_txt = r"G:\1\train.txt"
test_txt = r"G:\1\test.txt"
val_txt = r"G:\1\val.txt"
train_num = 0.6
test_num = 0.2
val_num = 0.2
if os.path.isfile(train_txt):
os.remove(train_txt)
if os.path.isfile(test_txt):
os.remove(test_txt)
if os.path.isfile(val_txt):
os.remove(val_txt)
txt_list = glob(os.path.join(ann_txt, ".".join(["*", "xml"])))
train_list = random.sample(txt_list, int(train_num * len(txt_list)))
test_list = random.sample(txt_list, int(test_num * len(txt_list)))
val_list = random.sample(txt_list, int(val_num * len(txt_list)))
for idx, item in enumerate(txt_list):
source_file_txt = os.path.join(ann_txt, item)
(filepath, tempfilename) = os.path.split(item)
(filename, extension) = os.path.splitext(tempfilename)
if item in train_list:
with open(train_txt, "a") as f:
f.write(filename+"\n")
elif item in val_list:
with open(val_txt, "a") as f:
f.write(filename + "\n")
else:
with open(test_txt, "a") as f:
f.write(filename + "\n")
print("copy txt into file")
print("complish")
测试结果
参考:
https://github.com/onnx/onnx/blob/main/docs/Operators.md#MaxPool
https://github.com/onnx/onnx/blob/main/docs/Operators.md
paper: https://arxiv.org/abs/2008.07043 code: https://github.com/yijingru/BBAVectors-Oriented-Object-Detection