视频跟踪——TLD算法

时间:2022-09-27 10:13:48

部分内容转载于

http://blog.sina.com.cn/s/blog_4a03c0100101dbcr.html

http://blog.csdn.net/carson2005/article/details/7647500



TLD(Tracking Learning Detector),包括tracking、modeling、detection,其中tracking工作时基于Lucas-Kanade光流法的。modeling学习的过程可以正负反馈,得到较好的学习结果,对于他的学习过程,作者在另一篇文章中又详细介绍了P-N learning这种学习算法。detection的部分用的是多个分类器的级联。对于特征的选择,他提出了一种基于LBP特征的2bit BP特征。

TLD是一种新的单目标长时间(long term tracking)跟踪算法。该算法与传统跟踪算法的显著区别在于将传统的跟踪算法和传统的检测算法相结合来解决被跟踪目标在被跟踪过程中发生的形变、部分遮挡等问题。同时,通过一种改进的在线学习机制不断更新跟踪模块的“显著特征点”和检测模块的目标模型及相关参数,从而使得跟踪效果更加稳定、鲁棒、可靠。

对于长时间跟踪而言,一个关键的问题是:当目标重新出现在相机视野中时,系统应该能重新检测到它,并开始重新跟踪。但是,长时间跟踪过程中,被跟踪目标将不可避免的发生形状变化、光照条件变化、尺度变化、遮挡等情况。传统的跟踪算法,前端需要跟检测模块相互配合,当检测到被跟踪目标之后,就开始进入跟踪模块,而此后,检测模块就不会介入到跟踪过程中。但这种方法有一个致命的缺陷:即,当被跟踪目标存在形状变化或遮挡时,跟踪就很容易失败;因此,对于长时间跟踪,或者被跟踪目标存在形状变化情况下的跟踪,很多人采用检测的方法来代替跟踪。该方法虽然在某些情况下可以改进跟踪效果,但它需要一个离线的学习过程。即:在检测之前,需要挑选大量的被跟踪目标的样本来进行学习和训练。这也就意味着,训练样本要涵盖被跟踪目标可能发生的各种形变和各种尺度、姿态变化和光照变化的情况。换言之,利用检测的方法来达到长时间跟踪的目的,对于训练样本的选择至关重要,否则,跟踪的鲁棒性就难以保证。
考虑到单纯的跟踪或者单纯的检测算法都无法在长时间跟踪过程中达到理想的效果,所以,TLD方法就考虑将两者予以结合,并加入一种改进的在线学习机制,从而使得整体的目标跟踪更加稳定、有效。
简单来说,TLD算法由三部分组成:跟踪模块、检测模块、学习模块


整个算法是按照以下流程进行:
一、    读取参数
通过myParam.yml获取参数,通过文件或者用户鼠标选中初始跟踪区域Bounding Box。
二、    用第一帧图像last_gray、Bounding Box进行初始化工作,调用tld.init(last_gray,box,bb_file),其中bb_file用于存储后面的跟踪结果。init()具体分为如下几个部分:
1.      buildGrid(frame,box)
这个函数输入是当前帧frame和初始跟踪区域box,主要功能是获取当前帧中的所有扫描窗口,这些窗口有21个尺度,缩放系数是1.2,也即以1为中间尺度,进行10次1.2倍的缩小和10次1.2倍的放大。在每个尺度下,扫描窗口的步长为宽高的10%(SHIFT=0.1),从而得到所有的扫描窗口存放在容器grid中,这个grid的每个元素包含6个属性:当前扫描窗口左上角的x坐标、y坐标、宽度w、高度h、与初始跟踪区域box的重叠率overlap、当前扫描窗口所在的尺度sidx。其中重叠率的定义是:两个窗口的交集/两个窗口的并集。与此同时,如果当前窗口的宽度和高度不小于最小窗口的阈值min_win(固定值15,从myParam.yml中获取),就将这个尺寸放到scales容器中,由于有min_win的限制,所以某些特别消的尺度下的扫描窗口尺寸就不会放到scales中,故该容器长度小于等于21。
2.      为各种变量或容器分配存储空间,包括积分图iisum、平方积分图iisqsum、存放good_boxes和存放bad_boxes的容器,等等。
3.      getOverlappingBoxes(box,num_closest_init)
这个函数输入是初始跟踪窗口box,以及good_boxes中要放入的扫描窗口的数量closest_init(初始值10是从myParam.yml中获取),主要功能是:
A.      找出与初始跟踪窗口box重叠率最高的扫描窗口best_box
B.      找出与初始跟踪窗口box重叠率超过0.6的,将其当成较好的扫描窗口,可以用于后面产生正样本,放到good_boxes中
C.      找出与初始跟踪窗口box重叠率小于0.2的,将其当成较差的扫描窗口,可以用于后面产生负样本,放到bad_boxes中
D.      调用getBBHull()函数,获取good_boxes中所有窗口并集的外接矩形
4.      classifier.prepare(scales)
这个函数输入是buildGrid函数中得到的scales容器,主要功能是进行分类器的准备。TLD中的分类器由三个部分组成:方差分类器(用于结合积分平方图以剔除bad_boxes中方差较小的负样本)、Fern分类器、最近邻NN分类器。这三个分类器是级联的,也就是说每个扫描窗口必须都通过这些分类器,才能认为是含有前景目标的,才可能是当前帧的跟踪结果。这里的准备主要是针对Fern分类器进行的,所以先说说Fern分类器,以便于说明这个函数做了哪些准备。

5.      generatePositiveData(frame,num_warps_init)

这个函数的输入是当前帧frame、要进行仿射变换的次数num_warps_init(初始值20是从myParam.yml中获取),主要功能是将good_boxes中的10个扫描窗口进行num_warps_init次仿射变换,这样共得到10*20个窗口,做为正样本数据。可以分为以下几个步骤:
5.1     调用getPattern(frame(best_box),pEx,mean,stdev),对于frame的best_box区域,将其缩放为patch_size*patch_size(15*15)大小然后计算均值与方差,将每个元素的像素值减去均值使得得到的patern的均值为0。
5.2     调用GaussianBlur(frame,img,Size(9,9),1.5),对整个frame进行高斯平滑以得到img,高斯模板大小为9*9,X方向的方差为1.5,同时利用第3步得到的bbhull获取img的该区域。进行仿射变换是用cv::PatchGenerator做为生成器,调用其关于“()”符号的重载运算函数来进行。注意这里感觉并没有用到防身变换后的结果来进行处理,所以可能是代码有误。代码中是用generator(frame,pt,warped,bbhull.size(),rng),其中frame是当前的帧,pt是bbhull窗口的中心坐标,warped是进行仿射变换后的结果,bbhull.size()是bbhull的宽高度,rng是一个随机数。这个函数是根据bbhull.size()的尺寸和rng生成一个仿射矩阵,对frame进行仿射变换得到结果放到warped中,也即变换前尺寸是原始frame的尺寸,变换后就是bbhull的尺寸了。具体可以参见planardetect.cpp中关于“()”运算符的重载函数。

6.      meanStdDev(frame1(best_box),mean,stdev)
统计best_box的均值和标准差,取var为“标准差平方的一半”做为方差分类器的阈值
7.      integral(frame1,iisum,iisqsum)
计算积分图和平方积分图,并调用getVar(best_box, iisum, iisqsum)利用积分图和平方积分图进行快速计算以得到best_box中的方差,取方差的一半做为检测方差vr。
8.      generateNegativeData(frame1)
由于在跟踪时需要注意到跟踪的目标是否发生远离或靠近镜头,以及旋转和位移的变化所以加入了仿射变换,但是对于负样本而言,它本身就是负样本,进行仿射变换没有什么意义,所以无需进行。由于在3步中重叠率小于0.2的都放到bad_boxes中,所以其规模很大,这里就先把bad_boxes中的元素顺序打乱,之后把方差大于var*0.5的bad_boxes放到pEx中作为负样本,而把方差较小的进行剔除。和第5步一样,也调用getFeature()获取负样本数据并放到nX中。另外,它还对打乱顺序的bad_boxes取了前bad_patches(固定值为100,从myParam.yml中读取)个,通过getPattern()将获取的pattern放到nEx中,作为近邻数据集的训练集用于后面对近邻分类器的训练。
generatePositiveData和generateNegativeData的区别在于:前者进行仿射变换,后者没有;前者无需打乱good_boxes,而后者打乱了bad_boxes,目的是为了随机获取前bad_patches个负样本作为后面近邻数据集的负样本训练集(关于近邻数据集的正样本训练集只有一个数据就是best_box)
9.      将负样本集nX一分为二成为训练集nX和测试集nXT,将负样本近邻数据集nEx也一分为二成为近邻数据训练集nEx和近邻数据测试集nExT,而正样本集pX和正样本近邻数据集pEx无需划分。
10.     将上面所得的数据集进行如下划分:
10.1    将正样本集pX(good_boxes的区域仿射变换后的200个结果)和负样本集nX(bad_boxes中方差较大的区域选取一半出来)顺序打乱放到ferns_data[]中,用于训练Fern分类器,注意这里对于每个box所存的数据是10棵树分别对这个box的特征:也即10个长度为13的二进制编码。
10.2    将正样本近邻数据集pEx(其实只有一个数据,就是best_box所得到的pattern)和负样本近邻数据集nEx(bad_boxes打乱顺序后前bad_patches/2=50个数据所得到的pattern)放到nn_data[]中,用于训练最近邻分类器。
10.3    负样本集的另一个nXT和负样本近邻数据集的另一半nExT组合起来,作为测试集,用于评价并修改得到最好的分类器阈值。

11.     classifier.trainF(ferns_data,2)
这个函数输入是正负样本集pX和nX所存储的ferns_data[],第二个参数2表示bootstrap=2(但在函数中并没有看出其作用)。主要功能是:对每一个样本ferns_data[i] ,如果样本是正样本标签,先用measure_forest函数,找出该样本box中所有树的所有特征值,对应的后验概率累加值,该累加值如果小于正样本阈值(0.6* nstructs,,0.6这个经验值是Fern分类器的阈值,,初始化时从myParam.yml中读取,后面会用测试集来评估修改,找到最优),也就是输入的是正样本,却被分类成负样本了,出现了分类错误,所以就把该样本添加到正样本库使pNum=pNum+1,同时用update函数更新后验概率。对于负样本,同样,如果出现负样本分类错误,就添加到负样本库使nNum=nNum+1。update函数有三个参数,第一个参数是该box对应的10棵树fern[],第二个参数要进行更新的是正样本库还是负样本库,1表示更新正样本库的数目,0表示更新负样本库的数目,第三个参数表示要更新的数目,在整个程序中所有的调用该值都是取1,也即每次都是对样本库数目增加1(为何另一边的不用相应地减小1?),这也跟上面提到的分错的样本放到对应的库是同样的意思,因为每次只能判断一个样本是否有分错,所以更新的数目也只能是1。而在更新数目的同时,也更新了后验概率值,按照post=pNum/nNum的式子来更新

12.     classifier.trainNN(nn_data)
这个函数的输入是正样本近邻数据集pEx和负样本近邻数据集nEx所存储的nn_data[]。对每一个样本nn_data,如果标签是正样本,通过NNConf(nn_examples[i], isin, conf, dummy)计算输入图像片与在线模型之间的相关相似度conf,如果相关相似度小于0.65 ,则认为其不含有前景目标,也就是分类错误了,这时候就把它加到正样本库。然后就通过pEx.push_back(nn_examples[i]);将该样本添加到pEx正样本库中;同样,如果出现负样本分类错误,就添加到负样本库。
12.1    NNConf()函数中对于输入的图像片patch,先遍历在线模型中的正样本近邻数据集pEx中的box(第一次其实就是best_box,后面会在线更新),调用matchTemplate()计算匹配度ncc,再由ncc得到相似度nccP,并找出ncc中的最大者maxP;同样的方式也遍历在线模型中的负样本近邻数据集nEx中的box来找出maxN。
12.2    接下来会计算1-maxP和1-maxN,个人理解为:正样本不相似的最小程度、负样本不相似的最小程度。然后就计算相关相似度rsconf=(1-maxN)/[(1-maxP)+(1-maxN)],也即正负样本不相似的最小情况下,负样本不相似所占比例就定义为相关相似度,如果负样本不相似所占比重越大,那么该patchpatch与负样本不相似的可能性越大,从而相关相似度越高(好像有点拗口,个人是从对偶问题的角度理解的)。
12.3    关于保守相似度csconf是这样得到的,在pEx的前半部分数据中,如果有对maxP进行更新,那么把此时的maxP放到csmaxP中,也即csmaxP记录正样本近邻数据集pEx的前半部分数据中与输入图像片的最大相似度,然后csconf=(1-maxN)/[(1-maxN)+(1-csmaxP)],由于csmaxP不超过maxP,所以csconf不超过rsconf,也即在认为当前输入图像片patch与正样本近邻数据集数据相似的问题上,对同个patch,rsconf的度量更为“保守”。在第一个进行训练的时候,是否保守没有多大意义,所以在init()中第一次进行trainNN()时,会将rsconf所得到的值丢弃。
12.4    isin中存放三个int型值,初始化全为-1。第一个如果取值为1,则表示NNConf()在计算输入图像片patch与在线模型pEx中的box时发现在线模型中有一个与其相似度超过阈值ncc_thesame (固定值0.95,从myParam.yml中读取),此时会把这个patch也放到在线模型的pEx中,所以第一个取值为1就表示已经把当前输入图像片patch放到pEx中。第二个的取值依赖于第1个的取值,如果第一个取值为-1,那么第二个的取值就是-1,如果第一个的取值是1,那么第二个的取值就是在遍历在线模型时找到的第一个与输入图像片patch相似度超过ncc_the same的box的索引。第三个意义与第一个接近,不同的地方只在于第一个是对应在线模型的正样本近邻数据集pEx,第三个是对应在线模型的负样本近邻数据集nEx。
13.     classifier.evaluateTh(nXT,nExT)
这个函数的输入是负样本数据集的后半部分nXT(存放feature)和负样本近邻数据集(存放pattern)的后半部分nExT,主要功能是将这两个作为测试集用来校正阈值thr_fern、thr_nn、thr_nn_valid。
13.1    更新thr_fern:对nXT中的每个样本,用measure_forest()函数找出10棵树对这个box的01码,由01码找到对应的后验概率,将10个后验概率累加后求平均,将平均值与thr_fern(初始值0.6,从myParam.yml中获取)比较,如果超过thr_fern则将thr_fern更新为这个平均值。也即对于nXT中所有的box,10棵树都会对其投票得到一个后验概率,将所有后验概率取平均后,比较所有的box,取后验概率均值最大的那个box所对应的平均值放到thr_fern中。
13.2    更新thr_nn:对于负样本近邻数据nExT测试集中的每个样本,用NNConf()函数计算它与在线模型pEx和nEx中数据的相似度来得到相关相似度conf(与第一次训练NN分类器一样,这里得到的保守相似度也被丢弃了),如果conf大于阈值thr_nn(初始值0.65,从myParam.yml中获取),则更新thr_nn为conf。
13.3    更新thr_nn_valid:如果更新后的thr_nn大于thr_nn_valid(初始值0.7,从myParam.yml中获取),那么更新thr_nn_valid值为thr_nn。


相关代码:

TLD.h

#include <opencv2/opencv.hpp>
#include "tld_utils.h"
#include "LKTracker.h"
#include "FerNNClassifier.h"
#include"PatchGenerator.h"
#include <fstream>


//Bounding Boxes
struct BoundingBox : public cv::Rect {
BoundingBox(){}
BoundingBox(cv::Rect r): cv::Rect(r){}
public:
float overlap; //Overlap with current Bounding Box
int sidx; //scale index
};

//Detection structure
struct DetStruct {
std::vector<int> bb;
std::vector<std::vector<int> > patt;
std::vector<float> conf1;
std::vector<float> conf2;
std::vector<std::vector<int> > isin;
std::vector<cv::Mat> patch;
};
//Temporal structure
struct TempStruct {
std::vector<std::vector<int> > patt;
std::vector<float> conf;
};

struct OComparator{
OComparator(const std::vector<BoundingBox>& _grid):grid(_grid){}
std::vector<BoundingBox> grid;
bool operator()(int idx1,int idx2){
return grid[idx1].overlap > grid[idx2].overlap;
}
};
struct CComparator{
CComparator(const std::vector<float>& _conf):conf(_conf){}
std::vector<float> conf;
bool operator()(int idx1,int idx2){
return conf[idx1]> conf[idx2];
}
};


class TLD{
private:
/*cv::*/PatchGenerator generator;
FerNNClassifier classifier;
LKTracker tracker;
///Parameters
int bbox_step;
int min_win;
int patch_size;
//initial parameters for positive examples
int num_closest_init;
int num_warps_init;
int noise_init;
float angle_init;
float shift_init;
float scale_init;
//update parameters for positive examples
int num_closest_update;
int num_warps_update;
int noise_update;
float angle_update;
float shift_update;
float scale_update;
//parameters for negative examples
float bad_overlap;
float bad_patches;
///Variables
//Integral Images
cv::Mat iisum;
cv::Mat iisqsum;
float var;
//Training data
std::vector<std::pair<std::vector<int>,int> > pX; //positive ferns <features,labels=1>
std::vector<std::pair<std::vector<int>,int> > nX; // negative ferns <features,labels=0>
cv::Mat pEx; //positive NN example
std::vector<cv::Mat> nEx; //negative NN examples
//Test data
std::vector<std::pair<std::vector<int>,int> > nXT; //negative data to Test
std::vector<cv::Mat> nExT; //negative NN examples to Test
//Last frame data
BoundingBox lastbox;
bool lastvalid;
float lastconf;
//Current frame data
//Tracker data
bool tracked;
BoundingBox tbb;
bool tvalid;
float tconf;
//Detector data
TempStruct tmp;
DetStruct dt;
std::vector<BoundingBox> dbb;
std::vector<bool> dvalid;
std::vector<float> dconf;
bool detected;


//Bounding Boxes
std::vector<BoundingBox> grid;
std::vector<cv::Size> scales;
std::vector<int> good_boxes; //indexes of bboxes with overlap > 0.6
std::vector<int> bad_boxes; //indexes of bboxes with overlap < 0.2
BoundingBox bbhull; // hull of good_boxes
BoundingBox best_box; // maximum overlapping bbox

public:
//Constructors
TLD();
TLD(const cv::FileNode& file);
void read(const cv::FileNode& file);
//Methods
void init(const cv::Mat& frame1,const cv::Rect &box, FILE* bb_file);
void generatePositiveData(const cv::Mat& frame, int num_warps);
void generateNegativeData(const cv::Mat& frame);
void processFrame(const cv::Mat& img1,const cv::Mat& img2,std::vector<cv::Point2f>& points1,std::vector<cv::Point2f>& points2,
BoundingBox& bbnext,bool& lastboxfound, bool tl,FILE* bb_file);
void track(const cv::Mat& img1, const cv::Mat& img2,std::vector<cv::Point2f>& points1,std::vector<cv::Point2f>& points2);
void detect(const cv::Mat& frame);
void clusterConf(const std::vector<BoundingBox>& dbb,const std::vector<float>& dconf,std::vector<BoundingBox>& cbb,std::vector<float>& cconf);
void evaluate();
void learn(const cv::Mat& img);
//Tools
void buildGrid(const cv::Mat& img, const cv::Rect& box);
float bbOverlap(const BoundingBox& box1,const BoundingBox& box2);
void getOverlappingBoxes(const cv::Rect& box1,int num_closest);
void getBBHull();
void getPattern(const cv::Mat& img, cv::Mat& pattern,cv::Scalar& mean,cv::Scalar& stdev);
void bbPoints(std::vector<cv::Point2f>& points, const BoundingBox& bb);
void bbPredict(const std::vector<cv::Point2f>& points1,const std::vector<cv::Point2f>& points2,
const BoundingBox& bb1,BoundingBox& bb2);
double getVar(const BoundingBox& box,const cv::Mat& sum,const cv::Mat& sqsum);
bool bbComp(const BoundingBox& bb1,const BoundingBox& bb2);
int clusterBB(const std::vector<BoundingBox>& dbb,std::vector<int>& indexes);
};


TLD.cpp

/*
* TLD.cpp
*
* Created on: Jun 9, 2011
* Author: alantrrs
*/

#include "TLD.h"
#include <stdio.h>
using namespace cv;
using namespace std;


TLD::TLD()
{
}
TLD::TLD(const FileNode& file){
read(file);
}

void TLD::read(const FileNode& file){
///Bounding Box Parameters
min_win = (int)file["min_win"];
///Genarator Parameters
//initial parameters for positive examples
patch_size = (int)file["patch_size"];
num_closest_init = (int)file["num_closest_init"];
num_warps_init = (int)file["num_warps_init"];
noise_init = (int)file["noise_init"];
angle_init = (float)file["angle_init"];
shift_init = (float)file["shift_init"];
scale_init = (float)file["scale_init"];
//update parameters for positive examples
num_closest_update = (int)file["num_closest_update"];
num_warps_update = (int)file["num_warps_update"];
noise_update = (int)file["noise_update"];
angle_update = (float)file["angle_update"];
shift_update = (float)file["shift_update"];
scale_update = (float)file["scale_update"];
//parameters for negative examples
bad_overlap = (float)file["overlap"];
bad_patches = (int)file["num_patches"];
classifier.read(file);
}

void TLD::init(const Mat& frame1,const Rect& box,FILE* bb_file){
//bb_file = fopen("bounding_boxes.txt","w");
//Get Bounding Boxes
buildGrid(frame1,box);
printf("Created %d bounding boxes\n",(int)grid.size());
///Preparation
//allocation
iisum.create(frame1.rows+1,frame1.cols+1,CV_32F);
iisqsum.create(frame1.rows+1,frame1.cols+1,CV_64F);
dconf.reserve(100);
dbb.reserve(100);
bbox_step =7;
//tmp.conf.reserve(grid.size());
tmp.conf = vector<float>(grid.size());
tmp.patt = vector<vector<int> >(grid.size(),vector<int>(10,0));
//tmp.patt.reserve(grid.size());
dt.bb.reserve(grid.size());
good_boxes.reserve(grid.size());
bad_boxes.reserve(grid.size());
pEx.create(patch_size,patch_size,CV_64F);
//Init Generator
generator = PatchGenerator (0,0,noise_init,true,1-scale_init,1+scale_init,-angle_init*CV_PI/180,angle_init*CV_PI/180,-angle_init*CV_PI/180,angle_init*CV_PI/180);
getOverlappingBoxes(box,num_closest_init);
printf("Found %d good boxes, %d bad boxes\n",(int)good_boxes.size(),(int)bad_boxes.size());
printf("Best Box: %d %d %d %d\n",best_box.x,best_box.y,best_box.width,best_box.height);
printf("Bounding box hull: %d %d %d %d\n",bbhull.x,bbhull.y,bbhull.width,bbhull.height);
//Correct Bounding Box
lastbox=best_box;
lastconf=1;
lastvalid=true;
//Print
fprintf(bb_file,"%d,%d,%d,%d,%f\n",lastbox.x,lastbox.y,lastbox.br().x,lastbox.br().y,lastconf);
//Prepare Classifier
classifier.prepare(scales);
///Generate Data
// Generate positive data
generatePositiveData(frame1,num_warps_init);
// Set variance threshold
Scalar stdev, mean;
meanStdDev(frame1(best_box),mean,stdev);
integral(frame1,iisum,iisqsum);
var = (float)pow(stdev.val[0],2)*0.5f; //getVar(best_box,iisum,iisqsum);
cout << "variance: " << var << endl;
//check variance
double vr = getVar(best_box,iisum,iisqsum)*0.5;
cout << "check variance: " << vr << endl;
// Generate negative data
generateNegativeData(frame1);
//Split Negative Ferns into Training and Testing sets (they are already shuffled)
int half = (int)(nX.size()*0.5f);
nXT.assign(nX.begin()+half,nX.end());
nX.resize(half);
///Split Negative NN Examples into Training and Testing sets
half = (int)(nEx.size()*0.5f);
nExT.assign(nEx.begin()+half,nEx.end());
nEx.resize(half);
//Merge Negative Data with Positive Data and shuffle it
vector<pair<vector<int>,int> > ferns_data(nX.size()+pX.size());
vector<int> idx = index_shuffle(0,(int)(ferns_data.size()));
int a=0;
for (int i=0;i<pX.size();i++){
ferns_data[idx[a]] = pX[i];
a++;
}
for (int i=0;i<nX.size();i++){
ferns_data[idx[a]] = nX[i];
a++;
}
//Data already have been shuffled, just putting it in the same vector
vector<cv::Mat> nn_data(nEx.size()+1);
nn_data[0] = pEx;
for (int i=0;i<nEx.size();i++){
nn_data[i+1]= nEx[i];
}
///Training
classifier.trainF(ferns_data,2); //bootstrap = 2
classifier.trainNN(nn_data);
///Threshold Evaluation on testing sets
classifier.evaluateTh(nXT,nExT);
}

/* Generate Positive data
* Inputs:
* - good_boxes (bbP)
* - best_box (bbP0)
* - frame (im0)
* Outputs:
* - Positive fern features (pX)
* - Positive NN examples (pEx)
*/
void TLD::generatePositiveData(const Mat& frame, int num_warps){
Scalar mean;
Scalar stdev;
getPattern(frame(best_box),pEx,mean,stdev);
//Get Fern features on warped patches
Mat img;
Mat warped;
GaussianBlur(frame,img,Size(9,9),1.5);
warped = img(bbhull);
RNG& rng = theRNG();
Point2f pt(bbhull.x+(bbhull.width-1)*0.5f,bbhull.y+(bbhull.height-1)*0.5f);
vector<int> fern(classifier.getNumStructs());
pX.clear();
Mat patch;
if (pX.capacity()<num_warps*good_boxes.size())
pX.reserve(num_warps*good_boxes.size());
int idx;
for (int i=0;i<num_warps;i++){
if (i>0)
generator(frame,pt,warped,bbhull.size(),rng);
for (int b=0;b<good_boxes.size();b++){
idx=good_boxes[b];
patch = img(grid[idx]);
classifier.getFeatures(patch,grid[idx].sidx,fern);
pX.push_back(make_pair(fern,1));
}
}
printf("Positive examples generated: ferns:%d NN:1\n",(int)pX.size());
}

void TLD::getPattern(const Mat& img, Mat& pattern,Scalar& mean,Scalar& stdev){
//Output: resized Zero-Mean patch
resize(img,pattern,Size(patch_size,patch_size));
meanStdDev(pattern,mean,stdev);
pattern.convertTo(pattern,CV_32F);
pattern = pattern-mean.val[0];
}

void TLD::generateNegativeData(const Mat& frame){
/* Inputs:
* - Image
* - bad_boxes (Boxes far from the bounding box)
* - variance (pEx variance)
* Outputs
* - Negative fern features (nX)
* - Negative NN examples (nEx)
*/
random_shuffle(bad_boxes.begin(),bad_boxes.end());//Random shuffle bad_boxes indexes
int idx;
//Get Fern Features of the boxes with big variance (calculated using integral images)
int a=0;
//int num = std::min((int)bad_boxes.size(),(int)bad_patches*100); //limits the size of bad_boxes to try
printf("negative data generation started.\n");
vector<int> fern(classifier.getNumStructs());
nX.reserve(bad_boxes.size());
Mat patch;
for (int j=0;j<bad_boxes.size();j++){
idx = bad_boxes[j];
if (getVar(grid[idx],iisum,iisqsum)<var*0.5f)
continue;
patch = frame(grid[idx]);
classifier.getFeatures(patch,grid[idx].sidx,fern);
nX.push_back(make_pair(fern,0));
a++;
}
printf("Negative examples generated: ferns: %d ",a);
//random_shuffle(bad_boxes.begin(),bad_boxes.begin()+bad_patches);//Randomly selects 'bad_patches' and get the patterns for NN;
Scalar dum1, dum2;
nEx=vector<Mat>((unsigned __int64)bad_patches);
for (int i=0;i<bad_patches;i++){
idx=bad_boxes[i];
patch = frame(grid[idx]);
getPattern(patch,nEx[i],dum1,dum2);
}
printf("NN: %d\n",(int)nEx.size());
}

double TLD::getVar(const BoundingBox& box,const Mat& sum,const Mat& sqsum){
double brs = sum.at<int>(box.y+box.height,box.x+box.width);
double bls = sum.at<int>(box.y+box.height,box.x);
double trs = sum.at<int>(box.y,box.x+box.width);
double tls = sum.at<int>(box.y,box.x);
double brsq = sqsum.at<double>(box.y+box.height,box.x+box.width);
double blsq = sqsum.at<double>(box.y+box.height,box.x);
double trsq = sqsum.at<double>(box.y,box.x+box.width);
double tlsq = sqsum.at<double>(box.y,box.x);
double mean = (brs+tls-trs-bls)/((double)box.area());
double sqmean = (brsq+tlsq-trsq-blsq)/((double)box.area());
return sqmean-mean*mean;
}

void TLD::processFrame(const cv::Mat& img1,const cv::Mat& img2,vector<Point2f>& points1,vector<Point2f>& points2,BoundingBox& bbnext,bool& lastboxfound, bool tl, FILE* bb_file){
vector<BoundingBox> cbb;
vector<float> cconf;
int confident_detections=0;
int didx; //detection index
///Track
if(lastboxfound && tl){
track(img1,img2,points1,points2);
}
else{
tracked = false;
}
///Detect
detect(img2);
///Integration
if (tracked){
bbnext=tbb;
lastconf=tconf;
lastvalid=tvalid;
printf("Tracked\n");
if(detected){ // if Detected
clusterConf(dbb,dconf,cbb,cconf); // cluster detections
printf("Found %d clusters\n",(int)cbb.size());
for (int i=0;i<cbb.size();i++){
if (bbOverlap(tbb,cbb[i])<0.5 && cconf[i]>tconf){ // Get index of a clusters that is far from tracker and are more confident than the tracker
confident_detections++;
didx=i; //detection index
}
}
if (confident_detections==1){ //if there is ONE such a cluster, re-initialize the tracker
printf("Found a better match..reinitializing tracking\n");
bbnext=cbb[didx];
lastconf=cconf[didx];
lastvalid=false;
}
else {
printf("%d confident cluster was found\n",confident_detections);
int cx=0,cy=0,cw=0,ch=0;
int close_detections=0;
for (int i=0;i<dbb.size();i++){
if(bbOverlap(tbb,dbb[i])>0.7){ // Get mean of close detections
cx += dbb[i].x;
cy +=dbb[i].y;
cw += dbb[i].width;
ch += dbb[i].height;
close_detections++;
printf("weighted detection: %d %d %d %d\n",dbb[i].x,dbb[i].y,dbb[i].width,dbb[i].height);
}
}
if (close_detections>0){
bbnext.x = cvRound((float)(10*tbb.x+cx)/(float)(10+close_detections)); // weighted average trackers trajectory with the close detections
bbnext.y = cvRound((float)(10*tbb.y+cy)/(float)(10+close_detections));
bbnext.width = cvRound((float)(10*tbb.width+cw)/(float)(10+close_detections));
bbnext.height = cvRound((float)(10*tbb.height+ch)/(float)(10+close_detections));
printf("Tracker bb: %d %d %d %d\n",tbb.x,tbb.y,tbb.width,tbb.height);
printf("Average bb: %d %d %d %d\n",bbnext.x,bbnext.y,bbnext.width,bbnext.height);
printf("Weighting %d close detection(s) with tracker..\n",close_detections);
}
else{
printf("%d close detections were found\n",close_detections);

}
}
}
}
else{ // If NOT tracking
printf("Not tracking..\n");
lastboxfound = false;
lastvalid = false;
if(detected){ // and detector is defined
clusterConf(dbb,dconf,cbb,cconf); // cluster detections
printf("Found %d clusters\n",(int)cbb.size());
if (cconf.size()==1){
bbnext=cbb[0];
lastconf=cconf[0];
printf("Confident detection..reinitializing tracker\n");
lastboxfound = true;
}
}
}
lastbox=bbnext;
if (lastboxfound)
fprintf(bb_file,"%d,%d,%d,%d,%f\n",lastbox.x,lastbox.y,lastbox.br().x,lastbox.br().y,lastconf);
else
fprintf(bb_file,"NaN,NaN,NaN,NaN,NaN\n");
if (lastvalid && tl)
learn(img2);
}


void TLD::track(const Mat& img1, const Mat& img2,vector<Point2f>& points1,vector<Point2f>& points2){
/*Inputs:
* -current frame(img2), last frame(img1), last Bbox(bbox_f[0]).
*Outputs:
*- Confidence(tconf), Predicted bounding box(tbb),Validity(tvalid), points2 (for display purposes only)
*/
//Generate points
bbPoints(points1,lastbox);
if (points1.size()<1){
printf("BB= %d %d %d %d, Points not generated\n",lastbox.x,lastbox.y,lastbox.width,lastbox.height);
tvalid=false;
tracked=false;
return;
}
vector<Point2f> points = points1;
//Frame-to-frame tracking with forward-backward error cheking
tracked = tracker.trackf2f(img1,img2,points,points2);
if (tracked){
//Bounding box prediction
bbPredict(points,points2,lastbox,tbb);
if (tracker.getFB()>10 || tbb.x>img2.cols || tbb.y>img2.rows || tbb.br().x < 1 || tbb.br().y <1){
tvalid =false; //too unstable prediction or bounding box out of image
tracked = false;
printf("Too unstable predictions FB error=%f\n",tracker.getFB());
return;
}
//Estimate Confidence and Validity
Mat pattern;
Scalar mean, stdev;
BoundingBox bb;
bb.x = max(tbb.x,0);
bb.y = max(tbb.y,0);
bb.width = min(min(img2.cols-tbb.x,tbb.width),min(tbb.width,tbb.br().x));
bb.height = min(min(img2.rows-tbb.y,tbb.height),min(tbb.height,tbb.br().y));
getPattern(img2(bb),pattern,mean,stdev);
vector<int> isin;
float dummy;
classifier.NNConf(pattern,isin,dummy,tconf); //Conservative Similarity
tvalid = lastvalid;
if (tconf>classifier.thr_nn_valid){
tvalid =true;
}
}
else
printf("No points tracked\n");

}

void TLD::bbPoints(vector<cv::Point2f>& points,const BoundingBox& bb){
int max_pts=10;
int margin_h=0;
int margin_v=0;
int stepx = (int)ceil((double)(bb.width-2*margin_h)/max_pts);
int stepy = (int)ceil((double)(bb.height - 2 * margin_v) / max_pts);
for (int y=bb.y+margin_v;y<bb.y+bb.height-margin_v;y+=stepy){
for (int x=bb.x+margin_h;x<bb.x+bb.width-margin_h;x+=stepx){
points.push_back(Point2f((float)x,(float)y));
}
}
}

void TLD::bbPredict(const vector<cv::Point2f>& points1,const vector<cv::Point2f>& points2,
const BoundingBox& bb1,BoundingBox& bb2) {
int npoints = (int)points1.size();
vector<float> xoff(npoints);
vector<float> yoff(npoints);
printf("tracked points : %d\n",npoints);
for (int i=0;i<npoints;i++){
xoff[i]=points2[i].x-points1[i].x;
yoff[i]=points2[i].y-points1[i].y;
}
float dx = median(xoff);
float dy = median(yoff);
float s;
if (npoints>1){
vector<float> d;
d.reserve(npoints*(npoints-1)/2);
for (int i=0;i<npoints;i++){
for (int j=i+1;j<npoints;j++){
d.push_back((float)(norm(points2[i]-points2[j])/norm(points1[i]-points1[j])));
}
}
s = median(d);
}
else {
s = 1.0;
}
float s1 = 0.5f*(s-1)*bb1.width;
float s2 = 0.5f*(s-1)*bb1.height;
printf("s= %f s1= %f s2= %f \n",s,s1,s2);
bb2.x = (int)round( bb1.x + dx -s1);
bb2.y = (int)round( bb1.y + dy -s2);
bb2.width = (int)round(bb1.width*s);
bb2.height = (int)round(bb1.height*s);
printf("predicted bb: %d %d %d %d\n",bb2.x,bb2.y,bb2.br().x,bb2.br().y);
}

void TLD::detect(const cv::Mat& frame){
//cleaning
dbb.clear();
dconf.clear();
dt.bb.clear();
double t = (double)getTickCount();
Mat img(frame.rows,frame.cols,CV_8U);
integral(frame,iisum,iisqsum);
GaussianBlur(frame,img,Size(9,9),1.5);
int numtrees = classifier.getNumStructs();
float fern_th = classifier.getFernTh();
vector <int> ferns(10);
float conf;
int a=0;
Mat patch;
for (int i=0;i<grid.size();i++){//FIXME: BottleNeck
if (getVar(grid[i],iisum,iisqsum)>=var){
a++;
patch = img(grid[i]);
classifier.getFeatures(patch,grid[i].sidx,ferns);
conf = classifier.measure_forest(ferns);
tmp.conf[i]=conf;
tmp.patt[i]=ferns;
if (conf>numtrees*fern_th){
dt.bb.push_back(i);
}
}
else
tmp.conf[i]=0.0;
}
int detections = (int)(dt.bb.size());
printf("%d Bounding boxes passed the variance filter\n",a);
printf("%d Initial detection from Fern Classifier\n",detections);
if (detections>100){
nth_element(dt.bb.begin(),dt.bb.begin()+100,dt.bb.end(),CComparator(tmp.conf));
dt.bb.resize(100);
detections=100;
}
// for (int i=0;i<detections;i++){
// drawBox(img,grid[dt.bb[i]]);
// }
// imshow("detections",img);
if (detections==0){
detected=false;
return;
}
printf("Fern detector made %d detections ",detections);
t=(double)getTickCount()-t;
printf("in %gms\n", t*1000/getTickFrequency());
// Initialize detection structure
dt.patt = vector<vector<int> >(detections,vector<int>(10,0)); // Corresponding codes of the Ensemble Classifier
dt.conf1 = vector<float>(detections); // Relative Similarity (for final nearest neighbour classifier)
dt.conf2 =vector<float>(detections); // Conservative Similarity (for integration with tracker)
dt.isin = vector<vector<int> >(detections,vector<int>(3,-1)); // Detected (isin=1) or rejected (isin=0) by nearest neighbour classifier
dt.patch = vector<Mat>(detections,Mat(patch_size,patch_size,CV_32F));// Corresponding patches
int idx;
Scalar mean, stdev;
float nn_th = classifier.getNNTh();
for (int i=0;i<detections;i++){ // for every remaining detection
idx=dt.bb[i]; // Get the detected bounding box index
patch = frame(grid[idx]);
getPattern(patch,dt.patch[i],mean,stdev); // Get pattern within bounding box
classifier.NNConf(dt.patch[i],dt.isin[i],dt.conf1[i],dt.conf2[i]); // Evaluate nearest neighbour classifier
dt.patt[i]=tmp.patt[idx];
//printf("Testing feature %d, conf:%f isin:(%d|%d|%d)\n",i,dt.conf1[i],dt.isin[i][0],dt.isin[i][1],dt.isin[i][2]);
if (dt.conf1[i]>nn_th){ // idx = dt.conf1 > tld.model.thr_nn; % get all indexes that made it through the nearest neighbour
dbb.push_back(grid[idx]); // BB = dt.bb(:,idx); % bounding boxes
dconf.push_back(dt.conf2[i]); // Conf = dt.conf2(:,idx); % conservative confidences
}
} // end
if (dbb.size()>0){
printf("Found %d NN matches\n",(int)dbb.size());
detected=true;
}
else{
printf("No NN matches found.\n");
detected=false;
}
}

void TLD::evaluate(){
}

void TLD::learn(const Mat& img){
printf("[Learning] ");
///Check consistency
BoundingBox bb;
bb.x = max(lastbox.x,0);
bb.y = max(lastbox.y,0);
bb.width = min(min(img.cols-lastbox.x,lastbox.width),min(lastbox.width,lastbox.br().x));
bb.height = min(min(img.rows-lastbox.y,lastbox.height),min(lastbox.height,lastbox.br().y));
Scalar mean, stdev;
Mat pattern;
getPattern(img(bb),pattern,mean,stdev);
vector<int> isin;
float dummy, conf;
classifier.NNConf(pattern,isin,conf,dummy);
if (conf<0.5) {
printf("Fast change..not training\n");
lastvalid =false;
return;
}
if (pow(stdev.val[0],2)<var){
printf("Low variance..not training\n");
lastvalid=false;
return;
}
if(isin[2]==1){
printf("Patch in negative data..not traing");
lastvalid=false;
return;
}
/// Data generation
for (int i=0;i<grid.size();i++){
grid[i].overlap = bbOverlap(lastbox,grid[i]);
}
vector<pair<vector<int>,int> > fern_examples;
good_boxes.clear();
bad_boxes.clear();
getOverlappingBoxes(lastbox,num_closest_update);
if (good_boxes.size()>0)
generatePositiveData(img,num_warps_update);
else{
lastvalid = false;
printf("No good boxes..Not training");
return;
}
fern_examples.reserve(pX.size()+bad_boxes.size());
fern_examples.assign(pX.begin(),pX.end());
int idx;
for (int i=0;i<bad_boxes.size();i++){
idx=bad_boxes[i];
if (tmp.conf[idx]>=1){
fern_examples.push_back(make_pair(tmp.patt[idx],0));
}
}
vector<Mat> nn_examples;
nn_examples.reserve(dt.bb.size()+1);
nn_examples.push_back(pEx);
for (int i=0;i<dt.bb.size();i++){
idx = dt.bb[i];
if (bbOverlap(lastbox,grid[idx]) < bad_overlap)
nn_examples.push_back(dt.patch[i]);
}
/// Classifiers update
classifier.trainF(fern_examples,2);
classifier.trainNN(nn_examples);
classifier.show();
}

void TLD::buildGrid(const cv::Mat& img, const cv::Rect& box){
const float SHIFT = 0.1f;
const float SCALES[] = {0.16151f,0.19381f,0.23257f,0.27908f,0.33490f,0.40188f,0.48225f,
0.57870f,0.69444f,0.83333f,1.0f,1.20000f,1.44000f,1.72800f,
2.07360f,2.48832f,2.98598f,3.58318f,4.29982f,5.15978f,6.19174f};
int width, height, min_bb_side;
//Rect bbox;
BoundingBox bbox;
Size scale;
int sc=0;
for (int s=0;s<21;s++){
width = (int)round(box.width*SCALES[s]);
height = (int)round(box.height*SCALES[s]);
min_bb_side = min(height,width);
if (min_bb_side < min_win || width > img.cols || height > img.rows)
continue;
scale.width = width;
scale.height = height;
scales.push_back(scale);
for (int y=1;y<img.rows-height;y+=(int)round(SHIFT*min_bb_side)){
for (int x=1;x<img.cols-width;x+=(int)round(SHIFT*min_bb_side)){
bbox.x = x;
bbox.y = y;
bbox.width = width;
bbox.height = height;
bbox.overlap = bbOverlap(bbox,BoundingBox(box));
bbox.sidx = sc;
grid.push_back(bbox);
}
}
sc++;
}
}

float TLD::bbOverlap(const BoundingBox& box1,const BoundingBox& box2){
if (box1.x > box2.x+box2.width) { return 0.0; }
if (box1.y > box2.y+box2.height) { return 0.0; }
if (box1.x+box1.width < box2.x) { return 0.0; }
if (box1.y+box1.height < box2.y) { return 0.0; }

float colInt = (float)min(box1.x+box1.width,box2.x+box2.width) - max(box1.x, box2.x);
float rowInt = (float)min(box1.y+box1.height,box2.y+box2.height) - max(box1.y,box2.y);

float intersection = colInt * rowInt;
float area1 = (int)(box1.width*box1.height);
float area2 = (int)(box2.width*box2.height);
return intersection / (area1 + area2 - intersection);
}

void TLD::getOverlappingBoxes(const cv::Rect& box1,int num_closest){
float max_overlap = 0;
for (int i=0;i<grid.size();i++){
if (grid[i].overlap > max_overlap) {
max_overlap = grid[i].overlap;
best_box = grid[i];
}
if (grid[i].overlap > 0.6){
good_boxes.push_back(i);
}
else if (grid[i].overlap < bad_overlap){
bad_boxes.push_back(i);
}
}
//Get the best num_closest (10) boxes and puts them in good_boxes
if (good_boxes.size()>num_closest){
std::nth_element(good_boxes.begin(),good_boxes.begin()+num_closest,good_boxes.end(),OComparator(grid));
good_boxes.resize(num_closest);
}
getBBHull();
}

void TLD::getBBHull(){
int x1=INT_MAX, x2=0;
int y1=INT_MAX, y2=0;
int idx;
for (int i=0;i<good_boxes.size();i++){
idx= good_boxes[i];
x1=min(grid[idx].x,x1);
y1=min(grid[idx].y,y1);
x2=max(grid[idx].x+grid[idx].width,x2);
y2=max(grid[idx].y+grid[idx].height,y2);
}
bbhull.x = x1;
bbhull.y = y1;
bbhull.width = x2-x1;
bbhull.height = y2 -y1;
}

bool bbcomp(const BoundingBox& b1,const BoundingBox& b2){
TLD t;
if (t.bbOverlap(b1,b2)<0.5)
return false;
else
return true;
}
int TLD::clusterBB(const vector<BoundingBox>& dbb,vector<int>& indexes){
//FIXME: Conditional jump or move depends on uninitialised value(s)
const int c = (int)(dbb.size());
//1. Build proximity matrix
Mat D(c,c,CV_32F);
float d;
for (int i=0;i<c;i++){
for (int j=i+1;j<c;j++){
d = 1-bbOverlap(dbb[i],dbb[j]);
D.at<float>(i,j) = d;
D.at<float>(j,i) = d;
}
}
//2. Initialize disjoint clustering
float *L=new float[c-1]; //Level
int **nodes=new int*[c-1];
nodes[0] = new int[2];
nodes[1] = new int[1];
int *belongs=new int[c];
int m=c;
for (int i=0;i<c;i++){
belongs[i]=i;
}
for (int it=0;it<c-1;it++){
//3. Find nearest neighbor
float min_d = 1;
int node_a, node_b;
for (int i=0;i<D.rows;i++){
for (int j=i+1;j<D.cols;j++){
if (D.at<float>(i,j)<min_d && belongs[i]!=belongs[j]){
min_d = D.at<float>(i,j);
node_a = i;
node_b = j;
}
}
}
if (min_d>0.5){
int max_idx =0;
bool visited;
for (int j=0;j<c;j++){
visited = false;
for(int i=0;i<2*c-1;i++){
if (belongs[j]==i){
indexes[j]=max_idx;
visited = true;
}
}
if (visited)
max_idx++;
}
delete[]L;
delete[]nodes[0];
delete[]nodes[1];
delete[]nodes;
delete[]belongs;
return max_idx;
}

//4. Merge clusters and assign level
L[m]=min_d;
nodes[it][0] = belongs[node_a];
nodes[it][1] = belongs[node_b];
for (int k=0;k<c;k++){
if (belongs[k]==belongs[node_a] || belongs[k]==belongs[node_b])
belongs[k]=m;
}
m++;
}
delete[]L;
delete[]nodes[0];
delete[]nodes[1];
delete[]nodes;
delete[]belongs;
return 1;

}

void TLD::clusterConf(const vector<BoundingBox>& dbb,const vector<float>& dconf,vector<BoundingBox>& cbb,vector<float>& cconf){
int numbb =(int)(dbb.size());
vector<int> T;
float space_thr = 0.5;
int c=1;
switch (numbb){
case 1:
cbb=vector<BoundingBox>(1,dbb[0]);
cconf=vector<float>(1,dconf[0]);
return;
break;
case 2:
T =vector<int>(2,0);
if (1-bbOverlap(dbb[0],dbb[1])>space_thr){
T[1]=1;
c=2;
}
break;
default:
T = vector<int>(numbb,0);
c = partition(dbb,T,(*bbcomp));
//c = clusterBB(dbb,T);
break;
}
cconf=vector<float>(c);
cbb=vector<BoundingBox>(c);
printf("Cluster indexes: ");
BoundingBox bx;
for (int i=0;i<c;i++){
float cnf=0;
int N=0,mx=0,my=0,mw=0,mh=0;
for (int j=0;j<T.size();j++){
if (T[j]==i){
printf("%d ",i);
cnf=cnf+dconf[j];
mx=mx+dbb[j].x;
my=my+dbb[j].y;
mw=mw+dbb[j].width;
mh=mh+dbb[j].height;
N++;
}
}
if (N>0){
cconf[i]=cnf/N;
bx.x=cvRound(mx/N);
bx.y=cvRound(my/N);
bx.width=cvRound(mw/N);
bx.height=cvRound(mh/N);
cbb[i]=bx;
}
}
printf("\n");
}



再说一下自己的理解。TLD这个算法并不算是优化或者实现上面的一种突破,而是把原有的一些理论升华为实际应用。

从效率上来说,其并不一定会比单个跟踪或检测模块做的更好, 不过这种框架反而应该在实际中非常实用,因为可变性很强。