yolov4的网络模型主要分为4个部分
1. 主干特征提取网络,CSPDarkent53
相比 yolov3的Darknet53, yolov4的CSPDarknet53网络有如下特点
1.1 Msih激活函数
Mish = x * K.tanh(K.softplus(x)) 其中:softplus = ln(1 + e^x)
Mish激活函数在输入是负值的时候并不是完全截断,允许负梯度的流入,保证了信息的流动,另外Mish函数也保证了每一点的平滑,从而使得梯度下降效果比Relu要好
1.2 CSPnet结构
resblock_body的结构进行修改,采用CSPNet结构,即增加一个大的残差边将输入数据和最后的输出数据进行堆叠(concatenate)
分别取CSPDarknet53的最后三个特征层(8倍下采样、16倍下采样、32倍下采样),作为提取的特征输出
def darknet_body(x): x = DarknetConv2D_BN_Mish(32, (3,3))(x) # print(x.shape) x = resblock_body(x, 64, 1, False) # print(x.shape) x = resblock_body(x, 128, 2) # print(x.shape) x = resblock_body(x, 256, 8) feat1 = x x = resblock_body(x, 512, 8) feat2 = x x = resblock_body(x, 1024, 4) feat3 = x return feat1, feat2, feat3
2. 特征金字塔
2.1使用了SPP结构
最大池化的strides越大,代表更关注全局信息,采用不同strides对输入进行最大池化处理,然后通过concatenate,可以很好的融合全局信息和局部信息
2.2使用了PANet结构
将深层特征信息通过上采样取浅层特征融合,其中上采样采用(Upsample2D)插值方式,即resize到目标大小
特征融合采用concatenate方式,
将浅层特征信息通过下采样的方式与深层特征融合,其中下采样采样我们常用的卷积方式,strides=2,
特征融合采用concatenate方式,
最终输出三组特征,分别用于检测大目标、中目标、小目标,其维度分别为(13,13,3*(5+num_classes)),(26,26,3*(5+num_classes)),(52,52,3*(5+num_classes))
def yolo_body(inputs, num_anchors, num_classes): # 生成darknet53的主干模型 feat1, feat2, feat3 = darknet_body(inputs) # 第一个特征层 # y1=(batch_size,13,13,3,85) P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(feat3) P5 = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5) P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5) # 使用了SPP结构,即不同尺度的最大池化后堆叠。 maxpool1 = MaxPooling2D(pool_size=(13, 13), strides=(1, 1), padding=\'same\')(P5) maxpool2 = MaxPooling2D(pool_size=(9, 9), strides=(1, 1), padding=\'same\')(P5) maxpool3 = MaxPooling2D(pool_size=(5, 5), strides=(1, 1), padding=\'same\')(P5) P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5]) P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5) P5 = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5) P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5) P5_upsample = compose(DarknetConv2D_BN_Leaky(256, (1, 1)), UpSampling2D(2))(P5) P4 = DarknetConv2D_BN_Leaky(256, (1, 1))(feat2) P4 = Concatenate()([P4, P5_upsample]) P4 = make_five_convs(P4, 256) P4_upsample = compose(DarknetConv2D_BN_Leaky(128, (1, 1)), UpSampling2D(2))(P4) P3 = DarknetConv2D_BN_Leaky(128, (1, 1))(feat1) P3 = Concatenate()([P3, P4_upsample]) P3 = make_five_convs(P3, 128) P3_output = DarknetConv2D_BN_Leaky(256, (3, 3))(P3) P3_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P3_output) # 38x38 output P3_downsample = ZeroPadding2D(((1, 0), (1, 0)))(P3) P3_downsample = DarknetConv2D_BN_Leaky(256, (3, 3), strides=(2, 2))(P3_downsample) P4 = Concatenate()([P3_downsample, P4]) P4 = make_five_convs(P4, 256) P4_output = DarknetConv2D_BN_Leaky(512, (3, 3))(P4) P4_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P4_output) # 19x19 output P4_downsample = ZeroPadding2D(((1, 0), (1, 0)))(P4) P4_downsample = DarknetConv2D_BN_Leaky(512, (3, 3), strides=(2, 2))(P4_downsample) P5 = Concatenate()([P4_downsample, P5]) P5 = make_five_convs(P5, 512) P5_output = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5) P5_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P5_output) return Model(inputs, [P5_output, P4_output, P3_output])
3. yolo_head
利用yolo_head对提取的特征进行预测
特征层的预测结果对应着三个预测框的位置,我们先将其reshape一下,以voc数据集为例,其结果为(N,13,13,3,25),(N,26,26,3,25),(N,52,52,3,25)。
feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes+5])
最后一个维度中的25包含了4+1+20,分别代表x_offset、y_offset、h和w、置信度、分类结果。
yolov4的解码过程就是将每个网格点加上它对应的x_offset和y_offset,加完后的结果就是预测框的中心,然后再利用先验框和h、w结合 计算出预测框的长和宽。
网格点:
grid_shape = K.shape(feats)[1:3] #(height, width)
grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), [1, grid_shape[1], 1, 1])
grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), [grid_shape[0], 1, 1, 1])
grid = K.concatenate([grid_x, grid_y])
grid = K.cast(grid, K.dtype(feats))
x_offset和y_offset:
K.sigmoid(feats[..., :2])
预测框的中心:
(K.sigmoid(feats[..., :2]) + grid)
h、w:
K.exp(feats[..., 2:4]) * anchors_tensor
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False): \'\'\' :param feats: (b, 13, 13, 3*25) :param anchors: ([[142, 110], [192, 243], [459, 401]]) :param num_classes: 20 :param input_shape: (416,416) :param calc_loss: :return: \'\'\' num_anchors = len(anchors) feats = tf.convert_to_tensor(feats) anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) grid_shape = K.shape(feats)[1:3] #(height, width) grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), [1, grid_shape[1], 1, 1]) grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), [grid_shape[0], 1, 1, 1]) grid = K.concatenate([grid_x, grid_y]) grid = K.cast(grid, K.dtype(feats)) feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes+5]) box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[..., ::-1], K.dtype(feats)) box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[..., ::-1], K.dtype(feats)) box_confidence = K.sigmoid(feats[..., 4:5]) box_class_probs = K.sigmoid(feats[..., 5:]) if calc_loss == True: return grid, feats, box_xy, box_wh return box_xy, box_wh, box_confidence, box_class_probs
网络模型的完整过程
参考博客:(13条消息) 睿智的目标检测32——TF2搭建YoloV4目标检测平台(tensorflow2)_Bubbliiiing的学习小课堂-CSDN博客