ROI Pooling的修改:将Conv4_3和Conv5_3(即Conv4的第三层卷积和Conv5的第三层卷积)的feature map单独进行ROI pooling,再把这两层Pooling后的feature map用一个1*1的卷积进行融合,这里1*1的卷积除了融合多通道(两层)信息,还有一个作用,就是降维,为下一步的FC做准备。
M. Busta, L. Neumann, and J. Matas. Fastext: Efficient unconstrained scene text detector. In Proc. ICCV, 2015.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. CVPR, 2015.
A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arxiv preprint arXiv:1601.07140, 2016.
X. Yin, X. Yin, K. Huang, and H. Hao. Robust text detection in natural scene images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(5):970– 983, 2014.
S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin. Character proposal network for robust text extraction. In Proc. ICASSP, 2016.
参考文献
R. Girshick. Fast r-cnn. In Proc. ICCV, 2015.
S. Gidaris and N. Komodakis. Object detection via a multiregion & semantic segmentation-aware cnn model. In Proc. ICCV, 2015.