一、Adaptive Neural Networks for Efficient Inference 2017
以缩短运行路径为目的,两种方式来缩减运行时间:一是使用在卷积层早期就能判别是否该导出的策略early-exit stratage to bypass some layers;一种是使用类似的方法选择构建的网络结构
二、NISP: Pruning Networksusing Neuron Importance Score Propagation 2017
特征排序使用Inf-FS方法【G.Roffo, S. Melzi, and M. Cristani. Infinite feature selection. In 2015 (ICCV),并有GitHub开源代码】
三、meProp: Sparsified BackPropagation for Accelerated Deep Learning with Reduced Overfitting
为做到这一点,需要找到需要更新的参数的子集,使用top-k重要的参数。The proposed meProp uses approximate gradients by keeping only top-kelements based on the magnitude values.
后向传播的时候,只保留前k个,其他的梯度设置为0. For meProp, back propagation computes an approximate gradient bykeeping top-k values of the backward flowed gradient and masking the remainingvalues to 0
这篇文章也列出了相关的研究,不过有一个比较有意思的是:Thesampled-output-loss methods (Jean et al., 2015 On using very large target vocabularyfor neural machine translation) are limited to the softmax layer (output layer)and are only based on random sampling对soft max处理的方法
四、Data-Driven Sparse Structure Selection for Deep Neural Networks 2017
1. 在一些结构块/神经节点等加上一个参数,控制该结构输出
2. 对这些结构进行正则化处理
五、Learning Efficient Convolutional Networks through Network Slimming 2017
六、squeezenet: alexnet-level accuracy with 50x fewer parameters and <0.5mb model size 2017
fire module是一个区块设计,用来组成squeezenet
Fire module结构是:squeeze卷积(1*1);expand卷积(1*1和3*3的组合);三中模式的维度是可学习的,分别为s1x1, e1x1, and e3x3
注:这篇文章也没有fully connection layer,由global pooling layer代替,作者说是参考与NIN
最后,在此基础上,作者又:添加了复杂的skip connection结构(又为bypass结构);使用deep compression(S. Han, H. Mao, and W. Dally. Deep compression: Compressing DNNswith pruning, trained quantization and huffman coding. arxiv:1510.00149v3,2015a.)进一步的压缩。
作者这样评价:these results demonstrate that DeepCompression (Han et al., 2015a) not only works well on CNN architectures withmany parameters (e.g. AlexNet and VGG), but it is also able to compress thealready compact, fully convolutional SqueezeNet architecture.
七、SBNet: Sparse Blocks Network for Fast Inference
稀疏化矩阵,在作者认为often gains in terms oftheoretical FLOPs without realizing a practical speed- up 只有理论效果,难有实际效果。
稀疏矩阵通常的做法:In these examples, spatialsparsity can be represented as binary computation masks. Mask通常有一个cheap net小网络或者主干网络中的部分形成的。
具体是先提取**的位置作为mask,成为reduce mask;然后作者是将对应部分复制出来,操作后再粘贴回去
实际上,这就是稀疏矩阵的标准做法。如mxnet 的RowSparseNDArray的做法http://mxnet.incubator.apache.org/tutorials/sparse/row_sparse.html。
此外,需要补充的是[J. Uhrig, N. Schneider, L.Schneider, U. Franke, T. Brox, and A. Geiger. Sparsity invariant cnns. CoRR,abs/1708.06500, 2017.] showed that batch-normalizing across non-sparse elements contributes to better model accuracy since it ignores non-valid data that may introduce noise to the statistics.就是说BN忽略包括零值在内的无效值,可以提高精度。——可用于工程分析
八、Attention-Based Guided Structured Sparsity of Deep Neural Networks
目前稀疏化的方法有主流的:group lasso;控制网络结构收缩constrain the structure scale;对网络结构正则化处理regularizingmultiple DNN structures。
这篇是使用group sparsity减少有效参数;设置额外的loss函数强迫部分参数不会被稀疏化
九、ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
https://arxiv.org/abs/1707.06342ThiNet (stands for “Thin Net”)
filter level pruning;we need to prune filters based on statistics in- formation computedfrom its next layer, not the current layer
现在主流的稀疏化方法:1)han song提出的方法,低于某一阈值的归零,但是这种非结构化的稀疏矩阵现有的库很难处理,需要专门设置。2)group-wise稀疏化,使用group-sparsityregularization【V. Lebedev andV. Lempitsky. Fastconvnets using group-wise brain damage. In CVPR, pages 2554–2564, 2016】。3)还有module等单位(filter,channel, filter shape and depth structures)的稀疏化,StructuredSparsity Learning (SSL) method。4)神经元的重要性度量:a. 使用权值的模【H. Li, A. Kadav, I. Durdanovic, H.Samet, and H. P. Graf. Pruning filters for efficient ConvNets. In ICLR, pages1–13, 2017.】;b. 度量relu之后的结果的稀疏度【H. Hu, R. Peng, Y.W. Tai, and C. K. Tang. Network trimming: Adata-driven neuron pruning approach towards efficient deep architectures. InarXiv preprint arXiv:1607.03250, pages 1–9, 2016】;c. 近似的求算对loss的影响力度,如使用Talayor expansion泰勒展开【P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruningconvolutional neural networks for resource efficient transfer learning. In ICLR,pages 1–17, 2017.】
所以对filter整体进行处理,可以保证结构化稀疏。evaluate the importance of each neuron, remove those unimportantones, and fine-tune the whole network
十、Differentiable plasticity: training plastic neural networks with back propagation
这是塑性网络结构,是一个研究的小领域,为了lifelong learning,具体实现的方式是synaptic plasticity – the strengthening and weakening of connectionsbetween neurons as a result of neural activity.生物学理论基础是Hebb’srule: if a neuron repeatedly takes part in making another neuron fire, theconnection between them is strengthened
顺便插一句,作者说这种方式的记忆能力优于LSTM。outperform advanced non-plastic recurrent networks (LSTMs) by ordersof magnitude in complex pattern memorization.实际上是具体的网络结构,不是用于压缩网络的策略或者结构。分析他主要是了解目前后向传播应用的一个维度。