最近在看论文时看到一篇关于法向量预测的论文，其中训练集所用的是NYU Depth数据集。然而网上NYU Depth相关的资料极少，对于如何将dataset转换为输入数据几乎没有相关资料，在网上逛了一大圈后，总算找到了一些零碎的步骤，在这里做个归纳，方便后面需要相关资料的朋友参考。

一、什么是NYU Depth数据集

地址：http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
此地址为v2版本数据集，点击首页还可浏览v1版本数据集

1、声明

如果在论文中使用了此数据集，务必引用相关论文《Indoor Segmentation and Support Inference from RGBD Images》

2、概述

NYU-Depth V2 data set其实是视频的连续帧组成的，这些视频是Kinect的RGB摄像机和深度摄像机同步拍摄的室内场景。比如说用Kinect拍摄一间房内场景的video（用RGB摄像机和深度摄像机同步拍摄），然后把video中的每一帧图片提取出来，组成数据集。

3、内容

（1）特征

1449 组密集设置标签的RGB和Depth图像对（用RGB摄像机和深度摄像机同步拍摄）
来自3个城市的464 个场景
407,024幅无标签的帧

（2）组成

标记数据（Labeled）：视频数据的子集，即密集设置好多类标签的图片数据。这部分数据也可以用在填充缺失的深度标签。
原始数据（Raw）：Kinect提供的原始的RGB，Depth和accelerometer数据。（accelerometer中文是加速度计，是与陀螺仪类似的测定方向的传感器，具体操作通过对传感器模拟的三维坐标轴的三个方向上记录变化数据，检测当前位置及方向。Kinect用它来检测方向以及实现精准的3-D projections（https://msdn.microsoft.com/en-us/library/jj663790.aspx），具体原理可参照下列两篇博客：http://blog.csdn.net/lovewubo/article/details/9084291，http://blog.csdn.net/lovewubo/article/details/37937417）
工具箱（Toolbox）：封装好的处理数据的API。

（3）具体描述

a.标签数据集（Labeled dataset）

Windows系统下利用caffe训练NYU Depth数据集（一）

原始数据集的子集，每一组数据都是RGB图像和对应的Depth图像，同时对image进行密集标注。深度图采用the colorization scheme of Levin进行预处理（就是把x，y，z坐标转为RGB三元组并上色）.数据格式为.mat，用matlab打开如下图，可以看到有文件名，标签类名，深度信息，场景信息等。

Windows系统下利用caffe训练NYU Depth数据集（一）

accelData – Nx4 矩阵。每一个元组记录了frame拍摄时设备在xyz方向上的转动角度和倾斜角度（roll, yaw, pitch and tilt angle解释见：http://blog.csdn.net/yuzhongchun/article/details/22749521）
depths – HxWxN 矩阵，表示上色后的深度图像。H表示Height，W 表示width, N表示images数量. 单位为米。
images – HxWxN 矩阵，表示RGB图像。H表示Height，W 表示width, N表示images数量. 单位为米。
instances – HxWxN 矩阵，表示instance maps. 运行toolbox中的get_instance_masks.m可以显示图片中每一个object的标签码。
labels – HxWxN 矩阵，表示图像中物体标签的掩码。H表示Height，W 表示width, N表示images数量. 标签数字从1~C，C是标签的总数. 如果像素标签是0，表示未标记。
names – Cx1数组。表示每一类标签的英文名。
namesToIds – C个映射。英文名-掩码的key-value pairs。
rawDepths – HxWxN 矩阵，表示原始的深度图像。H表示Height，W 表示width, N表示images数量. 单位为米。
rawDepthFilenames – Nx1 数组，表示深度图的文件名。
rawRgbFilenames – Nx1 数组，表示RGB图的文件名。
scenes – Nx1 元胞数组，表示每一张image从哪个场景得来。
sceneTypes – Nx1 元胞数组，表示每一张image从哪个场景类型得来。

b. 原始数据集（Raw Dataset）

Windows系统下利用caffe训练NYU Depth数据集（一）

原始数据集包括了原始image和Kinect的加速计dump文件。文件名中包含了每一帧相应的时间戳，运行toolbox中的get_synched_frames.m可以根据时间戳连成视频。

数据集分成不同文件夹，每个文件夹是某个场景下的一段影片，比如‘living_room_0012′ or ‘office_0014′. 如下：

/
../bedroom_0001/
../bedroom_0001/a-1294886363.011060-3164794231.dump
../bedroom_0001/a-1294886363.016801-3164794231.dump
…
../bedroom_0001/d-1294886362.665769-3143255701.pgm
../bedroom_0001/d-1294886362.793814-3151264321.pgm
…
../bedroom_0001/r-1294886362.238178-3118787619.ppm
../bedroom_0001/r-1294886362.814111-3152792506.ppm

文件名中的a-表示accelerometer dumps，r-表示RGB camera， d- 表示Depth camera。

c. 工具箱（Toolbox）

处理数据的matlab API。各个函数基本看说明就能了解在做什么，重要的是我们要综合运用这些函数，达到转换数据的目的。

camera_params.m - Contains the camera parameters for the Kinect used to capture the data.
crop_image.m – Crops an image to use only the area when the depth signal is projected.
fill_depth_colorization.m – Fills in the depth using Levin et al’s Colorization method.
fill_depth_cross_bf.m - Fills in the depth using a cross-bilateral filter at multiple scales.
get_accel_data.m - Returns the accelerometer parameters at a specific moment in time.
get_instance_masks.m – Returns a set of binary masks, one for each object instance in an image.
get_rgb_depth_overlay.m – Returns a visualization of the RGB and Depth alignment.
get_synched_frames.m - Returns a set of synchronized RGB and Depth frames that can be used to produced RGBD videos of each scene.
get_timestamp_from_filename.m – Returns the timestamp from the raw dataset filenames. This is useful for sampling the RAW video dumps at even intervals in time.
project_depth_map.m – Projects the Depth map from the Kinect on the RGB image plane.