(2)任务2:视频中的物体检测挑战。该任务与task 1类似,不同之处在于需要从视频中检测物体。
(3)task 3:单物体跟踪挑战。
(4)任务4:多目标跟踪挑战(multiobject tracking challenge)。
pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒,并给出实值置信度。一些很少发生的特种车辆(如机械车间卡车、叉车、油罐车)在评估中被忽略。
- 大量的检测物体
- 部分目标过小
- 不同的数据分布
- 目标遮挡严重
我们很高兴地宣布VisDrone2021图像对象检测挑战(任务1)。该比赛旨在推动与无人机平台的最先进的目标检测。要求团队预测10个预定义类别(pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒,并给出实值置信度。一些很少发生的特种车辆(如machineshop truck, forklift truck, and tanker)在评估中被忽略。
Number of images
Dataset Training Validation Test-Challenge
Object detection in images 6,471 images 548 images 1,580 images
标签从0到11分别为’ignored regions’,‘pedestrian’,‘people’,‘bicycle’,‘car’,‘van’,
Name Description
<bbox_left> The x coordinate of the top-left corner of the predicted bounding box
<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box
<bbox_width> The width in pixels of the predicted object bounding box
<bbox_height> The height in pixels of the predicted object bounding box
The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.
<object_category> The object category indicates the type of annotated object, (., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).
我们要求每个评估的算法以预定义的格式输出检测到的带有每个测试图像置信度得分的边界框列表。请参阅结果格式了解更多细节。与MS COCO[1]的评估协议类似,我们使用 AP, APIOU=0.50, APIOU=0.75, ARmax=1, ARmax=10, ARmax=100, and ARmax=500 metrics to evaluate the results of detection algorithms。除非另有规定,AP和AR指标是在联合(loU)值的多个交集上平均的。具体来说,我们使用十个loU阈值[0.50:0.05:0.95]。所有指标的计算允许最多500个最高得分检测每个图像(跨所有类别)。这些标准会惩罚对象检测缺失和重复检测(同一个对象实例有两个检测结果)。AP指标被用作算法排序的主要指标。下表描述了这些指标。
Measure Perfect Description
AP 100% The average precision over all 10 IoU thresholds (., [0.5:0.05:0.95]) of all object categories
APIOU=0.50 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50
APIOU=0.75 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75
ARmax=1 100% The maximum recall given 1 detection per image,给定每幅图像一次检测的最大召回率
ARmax=10 100% The maximum recall given 10 detections per image,给定每幅图像10次检测的最大召回率
ARmax=100 100% The maximum recall given 100 detections per image
ARmax=500 100% The maximum recall given 500 detections per image
以上指标是根据10个感兴趣的对象类别计算的。综合评估,我们将报告每个对象类别的性能。图像中对象检测的评估代码可以在VisDrone github上获得。
is the main function used to evaluate your detector -please modify the dataset path and result path -use “isImgDisplay” to display the groundtruth and detections
4.1 转换为YOLO(TXT)格式
YOLO数据集文件夹共有两个子文件夹,一个是 images ,一个是 labels ,分别存放图片与标签txt文件,并且 images与labels的目录结构需要对应,因为yolo是先读取images图片路径,随后直接将images替换为labels来查找标签文件 。如下所示:
每张图片对应的txt文件中,数据格式是:cls_id x y w h 其中坐标(x,y)是中心点坐标,并且是相对于图片宽高的比例值 ,并非绝对坐标。
import os
from pathlib import Path
from PIL import Image
from tqdm import tqdm
def visdrone2yolo(dir):
def convert_box(size, box):
#Convert VisDrone box to YOLO CxCywh box,坐标进行了归一化
dw = 1. / size[0]
dh = 1. / size[1]
return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh
# (dir / 'labels').mkdir(parents=True, exist_ok=True) # make labels directory
(dir / 'Annotations_YOLO').mkdir(parents=True, exist_ok=True) # make labels directory
pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')
for f in pbar:
img_size = ((dir / 'images' / ).with_suffix('.jpg')).size
lines = []
with open(f, 'r') as file: # read
for row in [(',') for x in ().strip().splitlines()]:
if row[4] == '0': # VisDrone 'ignored regions' class 0
cls = int(row[5]) - 1
box = convert_box(img_size, tuple(map(int, row[:4])))
(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
with open(str(f).replace( + 'annotations' + , + 'Annotations_YOLO' + ), 'w') as fl:
(lines) # write
dir = Path(r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019') # dataset文件夹下Visdrone2019文件夹路径
# Convert
for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
visdrone2yolo(dir / d) # convert VisDrone annotations to YOLO labels
正确执行代码后,会在’VisDrone2019-DET-train’, ‘VisDrone2019-DET-val’, 'VisDrone2019-DET-test-dev三个文件夹内新生成Annotations_YOLO文件夹,用以存放将VisDrone数据集处理成YoloV5格式后的数据标签。
from import Document
import os
import cv2
# def makexml(txtPath, xmlPath, picPath): # txt所在文件夹路径,xml文件保存路径,图片所在文件夹路径
def makexml(picPath, txtPath, xmlPath): # txt所在文件夹路径,xml文件保存路径,图片所在文件夹路径
dic = {'0': "hat", # 创建字典用来对类型进行转换
'1': "person", # 此处的字典要与自己的文件中的类对应,且顺序要一致
files = (txtPath)
for i, name in enumerate(files):
xmlBuilder = Document()
annotation = ("annotation") # 创建annotation标签
txtFile = open(txtPath + name)
txtList = ()
img = (picPath + name[0:-4] + ".jpg")
Pheight, Pwidth, Pdepth =
folder = ("folder") # folder标签
foldercontent = ("driving_annotation_dataset")
(folder) # folder标签结束
filename = ("filename") # filename标签
filenamecontent = (name[0:-4] + ".jpg")
(filename) # filename标签结束
size = ("size") # size标签
width = ("width") # size子标签width
widthcontent = (str(Pwidth))
(width) # size子标签width结束
height = ("height") # size子标签height
heightcontent = (str(Pheight))
(height) # size子标签height结束
depth = ("depth") # size子标签depth
depthcontent = (str(Pdepth))
(depth) # size子标签depth结束
(size) # size标签结束
for j in txtList:
oneline = ().split(" ")
object = ("object") # object 标签
picname = ("name") # name标签
namecontent = (dic[oneline[0]])
(picname) # name标签结束
pose = ("pose") # pose标签
posecontent = ("Unspecified")
(pose) # pose标签结束
truncated = ("truncated") # truncated标签
truncatedContent = ("0")
(truncated) # truncated标签结束
difficult = ("difficult") # difficult标签
difficultcontent = ("0")
(difficult) # difficult标签结束
bndbox = ("bndbox") # bndbox标签
xmin = ("xmin") # xmin标签
mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
xminContent = (str(mathData))
(xmin) # xmin标签结束
ymin = ("ymin") # ymin标签
mathData = int(((float(oneline[2])) * Pheight + 1) - (float(oneline[4])) * 0.5 * Pheight)
yminContent = (str(mathData))
(ymin) # ymin标签结束
xmax = ("xmax") # xmax标签
mathData = int(((float(oneline[1])) * Pwidth + 1) + (float(oneline[3])) * 0.5 * Pwidth)
xmaxContent = (str(mathData))
(xmax) # xmax标签结束
ymax = ("ymax") # ymax标签
mathData = int(((float(oneline[2])) * Pheight + 1) + (float(oneline[4])) * 0.5 * Pheight)
ymaxContent = (str(mathData))
(ymax) # ymax标签结束
(bndbox) # bndbox标签结束
(object) # object标签结束
f = open(xmlPath + name[0:-4] + ".xml", 'w')
(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
if __name__ == "__main__":
picPath = "VOCdevkit/VOC2007/JPEGImages/" # 图片所在文件夹路径,后面的/一定要带上
txtPath = "VOCdevkit/VOC2007/YOLO/" # txt所在文件夹路径,后面的/一定要带上
xmlPath = "VOCdevkit/VOC2007/Annotations/" # xml文件保存路径,后面的/一定要带上
makexml(picPath, txtPath, xmlPath)
4.2 转换为VOC(XML)格式
Annotations 目录存放.xml文件,JEPGImages 存放训练图片,划分数据集使用以下代码,
VOC Annotations文件夹,该文件下存放的是xml格式的标签文件,每个xml文件都对应于JPEGImages文件夹的一张图片, 其中对xml的解析如下:
<filename>2007_000392.jpg</filename> //文件名
<source> //图像来源(不重要)
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<size> //图像尺寸(长宽以及通道数)
<segmented>1</segmented> //是否用于分割(在图像物体识别中01无所谓)
<object> //检测到的物体
<name>horse</name> //物体类别
<pose>Right</pose> //拍摄角度
<truncated>0</truncated> //是否被截断(0表示完整)
<difficult>0</difficult> //目标是否难以识别(0表示容易识别)
<bndbox> //bounding-box(包含左下角和右上角xy坐标)
<object> //检测到多个物体
下面是visDrone2019的txt注释文件转换为voc xml的代码,visDrone2019_txt2xml_voc.py
import os
import datetime
from PIL import Image
from pathlib import Path
FILE = Path(__file__).resolve()
# print("FILE",FILE)
ROOT = [0] # root directory
def check_dir(path):
if (path):
root_dir = ROOT / 'VisDrone2019-DET-train'
annotations_dir = root_dir / "annotations/"
image_dir = root_dir / "images/"
xml_dir = root_dir / "Annotations_XML/" #在工作目录下创建Annotations_XML文件夹保存xml文件
# print("annotation_dir",annotations_dir)
# print("image_dir",image_dir)
# print("xml_dir",xml_dir)
# root_dir = r"D:\object_detection_data\datacovert\VisDrone2019-DET-val/"
# annotations_dir = root_dir+"annotations/"
# image_dir = root_dir + "images/"
# xml_dir = root_dir+"Annotations_XML/" #在工作目录下创建Annotations_XML文件夹保存xml文件
# 下面的类别也换成你自己数据类别,也可适用于其他的数据集转换
class_name = ['ignored regions','pedestrian','people','bicycle','car','van',
for filename in (annotations_dir):
fin = open(annotations_dir/ filename, 'r')
image_name = ('.')[0]
image_path=Path(image_dir).joinpath(image_name+".jpg")# 若图像数据是“png”转换成“.png”即可
img = (image_path) # 若图像数据是“png”转换成“.png”即可
xml_name = Path(xml_dir).joinpath(image_name+'.xml')
with open(xml_name, 'w') as fout:
for line in ():
line = (',')
# pay attention to this point!(0-based)
1、 转化为voc格式数据集的数据标签可视化
import os
import numpy as np
import as xmlET
from PIL import Image, ImageDraw
#'1': 'people', '2': 'people','3': 'bicycle', '4': 'car', '5': 'car',
# 6':'others','7':'others','8':'others','9':'others','10': 'motor','11':'others'
classes = ('__background__', # always index 0
'ignored regions','pedestrian', 'people','bicycle','car','van','truck','tricycle','awning-tricycle',
file_path_img = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\images'
file_path_xml = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\Annotations_XML'
save_file_path = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\Annotations_XML_show'
pathDir = (file_path_xml)
for idx in range(len(pathDir)):
filename = pathDir[idx]#xml文件名
tree = ((file_path_xml, filename))#解析xml
objs = ('object')
num_objs = len(objs)
boxes = ((num_objs, 5), dtype=np.uint16)
for ix, obj in enumerate(objs):
bbox = ('bndbox')
# Make pixel indexes 0-based
x1 = float(('xmin').text)
y1 = float(('ymin').text)
x2 = float(('xmax').text)
y2 = float(('ymax').text)
cla = ('name').text
label = (cla)
boxes[ix, 0:4] = [x1, y1, x2, y2]
boxes[ix, 4] = label
image_name = (filename)[0]
img = ((file_path_img, image_name + '.jpg'))
draw = (img)
for ix in range(len(boxes)):
xmin = int(boxes[ix, 0])
ymin = int(boxes[ix, 1])
xmax = int(boxes[ix, 2])
ymax = int(boxes[ix, 3])
([xmin, ymin, xmax, ymax], outline=(255, 0, 0))
([xmin, ymin], classes[boxes[ix, 4]], (255, 0, 0))
((save_file_path, image_name + '.png'))
import os
import random
trainval_percent = 0.8
train_percent = 0.8
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = (xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = (list, tv)
train = (trainval, tr)
ftrainval = open('ImageSets/Main/', 'w')
ftest = open('ImageSets/Main/', 'w')
ftrain = open('ImageSets/Main/', 'w')
fval = open('ImageSets/Main/', 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
if i in train:
4.3 转换为VOC(XML)TO COCO(JSON)数据格式
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
"id" : int,
"width" : int,
"height" : int,
"file_name" : str,
"license" : int,
"flickr_url" : str,
"coco_url" : str,
"date_captured" : datetime,
"id" : int,
"name" : str,
"url" : str,
info: {
"year": int,# 年份
"version": str,# 版本
"description": str, # 数据集描述
"contributor": str,# 提供者
"url": str,# 下载地址
"date_created": datetime
"id": int,# 图片的ID编号(每张图片ID是唯一的)
"width": int,#宽
"height": int,#高
"file_name": str,# 图片名
"license": int,
"flickr_url": str,# flickr网路地址
"coco_url": str,# 网路地址路径
"date_captured": datetime # 数据获取日期
annotations字段是包含多个annotation实例的一个列表,annotation类型本身又包含了一系列的字段,如这个目标的category id和segmentation mask。segmentation格式取决于这个实例是一个单个的对象(即iscrowd=0,将使用polygons格式)还是一组对象(即iscrowd=1,将使用RLE格式)。如下所示:
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
"id": int, # 对象ID,因为每一个图像有不止一个对象,所以要对每一个对象编号(每个对象的ID是唯一的)
"image_id": int,# 对应的图片ID(与images中的ID对应)
"category_id": int,# 类别ID(与categories中的ID对应)
"segmentation": RLE or [polygon],# 对象的边界点(边界多边形,此时iscrowd=0)。
"area": float,# 区域面积
"bbox": [x,y,width,height], # 定位边框 [x,y,w,h]
"iscrowd": 0 or 1 #见下
# coco数据标注的基本格式
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
"id" : int,
"name" : str,
"url" : str,
"id" : int, # 图片id
"width" : int, # 图片宽
"height" : int, # 图片高
"file_name" : str, # 图片名
"license" : int,
"flickr_url" : str,
"coco_url" : str, # 图片链接
"date_captured" : datetime, # 图片标注时间
"id" : int,
"image_id" : int,
"category_id" : int,
"segmentation" : RLE or [polygon],
"area" : float,
"bbox" : [x,y,width,height],
"iscrowd" : 0 or 1,
"id" : int,
"name" : str,
"supercategory" : str,
import sys, os, json, glob
import as ET
from pathlib import Path
from import minidom
# #类别列表无必要预先创建,程序中会根据所有图像中包含的ID来创建并更新
PREDEF_CLASSE = { 'ignored regions':0,'pedestrian': 1, 'people': 2,'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,'awning-tricycle': 8, 'bus': 9, 'motor': 10,'others':11}
#我这里只想检测这十个类, 0和11没有加入转化。
PREDEF_CLASSE = { 'pedestrian': 1, 'people': 2,'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,'awning-tricycle': 8, 'bus': 9, 'motor': 10}
#class_name = ['ignored regions','pedestrian','people','bicycle','car','van','truck','tricycle','awning-tricycle','bus','motor','others']
def check_dir(path):
if (path):
def get(root, name):
return (name)
def get_and_check(root, name, length):
vars = (name)
if len(vars) == 0:
raise NotImplementedError('Can not find %s in %s.'%(name, ))
if length > 0 and len(vars) != length:
raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
if length == 1:
vars = vars[0]
return vars
def convert(xml_paths, out_json):
json_dict = {'images': [], 'type':"instances",'annotations': [],'categories': []}
categories = PREDEF_CLASSE
bbox_id = INITIAL_BBOXIds
for image_id, xml_f in enumerate(xml_paths):
# 进度输出
('\r>> Converting image %d/%d' % (image_id + 1, len(xml_paths)))
tree = (xml_f)
root = ()
if len(path)==1:
filename = (path[0].text)
elif len(path) == 0:
filename = get_and_check(root, 'filename', 1).text
raise NotImplementedError('%d paths found in %s'%(len(path), xml_f))
size = get_and_check(root, 'size', 1)
width = int(get_and_check(size, 'width', 1).text)
height = int(get_and_check(size, 'height', 1).text)
image = {
'id': image_id + 1,
'height': height,
'width': width,
'file_name': filename
for obj in get(root, 'object'):
category = get_and_check(obj, 'name', 1).text
if category not in categories:
new_id = len(categories)
categories[category] = new_id
category_id = categories[category]
bbox = get_and_check(obj, 'bndbox', 1)
xmin = int(get_and_check(bbox, 'xmin', 1).text) - 1
ymin = int(get_and_check(bbox, 'ymin', 1).text) - 1
xmax = int(get_and_check(bbox, 'xmax', 1).text)
ymax = int(get_and_check(bbox, 'ymax', 1).text)
if xmax <= xmin or ymax <= ymin:
o_width = abs(xmax - xmin)
o_height = abs(ymax - ymin)
# ann = {'area': o_width * o_height, 'iscrowd': 0, 'image_id': image_id + 1,
# 'bbox': [xmin, ymin, o_width, o_height], 'category_id': category_id,
# 'id': bbox_id, 'ignore': 0, 'segmentation': [xmin,ymin,xmin,ymax,xmax,ymax,xmax,ymin]}
'image_id': image_id,
'category_id': category_id,
# 'segmentation': [xmin,ymin,xmin,ymax,xmax,ymax,xmax,ymin],
'area': o_width * o_height,
'bbox': [xmin, ymin, o_width, o_height],
'iscrowd': 0
bbox_id = bbox_id + 1
# 写入类别ID字典
for cate, cid in ():
cat = {'supercategory': 'none', 'id': cid, 'name': cate}
# json_file = open(out_json, 'w')
# json_str = (json_dict)
# json_file.write(json_str)
# json_file.close() # 快
(json_dict, open(out_json, 'w'), indent=4) # indent=4 更加美观显示 慢
print("json file write done...")
if __name__ == '__main__':
dir = Path(r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019') # dataset文件夹下Visdrone2019文件夹路径
# for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
for d in 'VisDrone2019-DET-train','VisDrone2019-DET-val':
xml_dir=dir / d / 'Annotations_XML'
coco_dir=dir / d/ 'Annotations_COCO'
check_dir( coco_dir)
xml_file = ((xml_dir, '*.xml'))
json_file=coco_dir / 'instances_{}'.format(('-')[2])
# convrt Annotations_COCO
convert(xml_file, json_file) #这里是生成的json保存位置,改一下
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
4.4 直接由txtT转换为 COCO(JSON)数据格式
import os
import cv2
from PIL import Image
from tqdm import tqdm
import json
def convert_to_cocodetection(dir, output_dir):
train_dir = (dir, "VisDrone2019-DET-train")
val_dir = (dir, "VisDrone2019-DET-val")
train_annotations = (train_dir, "annotations")
val_annotations = (val_dir, "annotations")
train_images = (train_dir, "images")
val_images = (val_dir, "images")
id_num = 0
categories = [
{"id": 0, "name": "ignored regions"},
{"id": 1, "name": "pedestrian"},
{"id": 2, "name": "people"},
{"id": 3, "name": "bicycle"},
{"id": 4, "name": "car"},
{"id": 5, "name": "van"},
{"id": 6, "name": "truck"},
{"id": 7, "name": "tricycle"},
{"id": 8, "name": "awning-tricycle"},
{"id": 9, "name": "bus"},
{"id": 10, "name": "motor"},
{"id": 11, "name": "others"}
for mode in ["train", "val"]:
images = []
annotations = []
print(f"start loading {mode} data...")
if mode == "train":
set = (train_annotations)
annotations_path = train_annotations
images_path = train_images
set = (val_annotations)
annotations_path = val_annotations
images_path = val_images
for i in tqdm(set):
f = open(annotations_path + "/" + i, "r")
name = (".txt", "")
image = {}
image_file_path=images_path + + name + ".jpg"
img_size = ((images_path + + name+ ".jpg")).size
# height, width = (images_path + + name + ".jpg").shape[:2]
file_name = name + ".jpg"
image["id"] = name
image["height"] = height
image["width"] = width
image["file_name"] = file_name
for line in ():
annotation = {}
line = ("\n", "")
if (","): # filter data
line = (",")
line_list = [int(i) for i in (",")]
bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
annotation["id"] = id_num
annotation["image_id"] = name
annotation["category_id"] = int(line_list[5])
# annotation["segmentation"] = []
annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
# annotation["score"] = line_list[4]
annotation["bbox"] = bbox_xywh
annotation["iscrowd"] = 0
id_num += 1
dataset_dict = {}
dataset_dict["images"] = images
dataset_dict["annotations"] = annotations
dataset_dict["categories"] = categories
json_str = (dataset_dict)
with open(f'{output_dir}/VisDrone2019-DET_{mode}_coco.json', 'w') as json_file:
print("json file write done...")
def get_test_namelist(dir, out_dir):
full_path = out_dir + "/" + ""
file = open(full_path, 'w')
for name in tqdm((dir)):
name = (".txt", "")
(name + "\n")
return None
def centerxywh_to_xyxy(boxes):
boxes:list of center_x,center_y,width,height,
boxes:list of x,y,x,y,cooresponding to top left and bottom right
x_top_left = boxes[0] - boxes[2] / 2
y_top_left = boxes[1] - boxes[3] / 2
x_bottom_right = boxes[0] + boxes[2] / 2
y_bottom_right = boxes[1] + boxes[3] / 2
return [x_top_left, y_top_left, x_bottom_right, y_bottom_right]
def centerxywh_to_topleftxywh(boxes):
boxes:list of center_x,center_y,width,height,
boxes:list of x,y,x,y,cooresponding to top left and bottom right
x_top_left = boxes[0] - boxes[2] / 2
y_top_left = boxes[1] - boxes[3] / 2
width = boxes[2]
height = boxes[3]
return [x_top_left, y_top_left, width, height]
def clamp(coord, width, height):
if coord[0] < 0:
coord[0] = 0
if coord[1] < 0:
coord[1] = 0
if coord[2] > width:
coord[2] = width
if coord[3] > height:
coord[3] = height
return coord
if __name__ == '__main__':
# 第一个参数输入上面目录的路径,第二个参数是要输出的路径
# 只添加了检测训练必要的数据,COCO格式多余的数据都设为空
6、 VisDrone2019目标检测数据集coco格式数据浏览browse_dataset
Tools /misc/browse_data .py帮助用户可视化地浏览检测数据集(包括图像和边界框注释),或者将图像保存到指定目录
python tools/misc/browse_dataset.py ${CONFIG} [-h] [--skip-type ${SKIP_TYPE[SKIP_TYPE...]}] [--output-dir ${OUTPUT_DIR}] [--not-show] [--show-interval ${SHOW_INTERVAL}]
- 1
可视化数据集标签 – browse_dataset.py
python tools/misc/browse_dataset.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
- 1