VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

CVPR18

摘要

在3D点云中准确检测目标是许多应用中的核心问题，例如自主导航，客房服务机器人和增强/虚拟现实。为了将高度稀疏的LiDAR点云与区域建议网络（RPN）进行接口，大多数现有的工作都集中在手工制作的特征表示上，例如鸟瞰图。在这项工作中，我们避免了对3D点云进行手动特征，并提出了VoxelNet，这是一种通用3D检测网络，它将特征提取和边界框预测统一为一个单阶段端到端的网络。具体来说，VoxelNet将点云划分为等距的3D体素，并通过新引入的体素特征编码（VFE）层将每个体素内的一组点转换为统一的特征表示。以这种方式，将点云编码为descriptive volumetric representation，然后将其连接到RPN以生成检测。 KITTI汽车检测基准测试表明，VoxelNet在很大程度上优于基于LiDAR的最新3D检测方法。此外，我们的网络可有效识别具有各种几何形状的物体，从而仅基于LiDAR，就可以对行人和骑自行车的人进行3D检测，从而获得令人鼓舞的结果。

问题

RPN是目标检测优化算法。但是，这种方法要求数据密集且以tensor的结构（例如图像，视频）进行组织，而LiDAR点云则不是这种情况。希望能把RPN用到点云上。

创新

（1）提出了VoxelNet，这是一种通用的3D目标检测框架。

（2）设计了一种新颖的voxel feature encoding layer（VFE），具体来说，VoxelNet将点云划分为等距的3D voxels，并通过堆叠的VFE层对每个voxels进行编码，然后3D卷积进一步聚合局部voxel特征，将点云转换为high-dimensional volumetric representation。最后，RPN产生检测结果。

网络结构

VoxelNet

(1) Feature learning network：Voxel Partition+Grouping+Random Sampling

Stacked Voxel Feature Encoding（VFE）

VoxelNet

(2) Convolutional middle layers

ConvMD(c_in, c_out,k, s, p),扩大感受野，where c_in and c_out are the number of input and output channels, k, s, and p are the M-dimensional vectors corresponding to kernel size, stride size and padding size respectively

(3) Region proposal network¹

VoxelNet

Loss Function

VoxelNet

3D ground truth box:(x^g_c ,y^g_c , z^g_c , l^g,w^g, h^g, θ^g)
where x^g_c ,y^g_c , z^g_c represent the center location, l^g,w^g, h^g are length, width, height of the box, and θ^g is the yaw rotation around Z-axis.

matching positive anchor:(x^a_c ,y^a_c , z^a_c , l^a,w^a, h^a, θ^a)
d a = ( l a ) 2 + ( w a ) 2 d^{a}=\sqrt{(l^{a})^2+(w^{a})^2} da=(la)2+(wa)2

VoxelNet

where p^pos_i and p^neg_j represent the softmax output for positive
anchor a^pos_i and negative anchor a^neg_i respectively, while
u_i and u^*_i are the regression output and ground truth for positive anchor a^pos_i

前两项是归一化分类损失 L_cls是交叉熵损失。 L_reg是SmoothL1 function¹

数据集

KITTI

实验结果

VoxelNet

Notes

Note that, after concatenation operations in VFE, we reset the features corresponding to empty points to zero such that they do not affect the computed voxel features.

引用

[1] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towardsreal-time object detection with region proposal networks.In Advances in Neural Information Processing Systems 28, pages 91–99. 2015

链接

项目地址

秒客网

VoxelNet

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

摘要

问题

创新

网络结构

Loss Function

数据集

实验结果

Notes

引用

链接

相关文章