distillation:
papers:
- NIPS2014_Distilling the Knowledge in a Neural Network_Hiton
用一个大网络来教小网络学习.以100类的分类任务为例, 之前给的label只是说输入的样本是属于第几个类的.现在给的label是大网络的输出,对一个样本来说,它的label会是一个100维的向量,每一维对应这个样本属于该类的概率(soft targets).比如说,某个已经训练好的模型(3类:宝马车,垃圾车,胡萝卜)对一个宝马车的图(样本)进行分类, 可能得到 0.8,0.19,0.02. 注意后两类, 虽然都是错误的, 但是0.19要远大于0.01. 总之就是说, 已经训练好的模型会有更多的信息和区分能力啥的. - FitNets: Hints for Thin Deep Nets
用1中的soft targets, 用一个teacher网络去训练一个deeper and thinner网络. - Dropout Distillation
- Distilling Knowledge to Specialist Networks for Clustered Classification
websites
论文笔记 《Distilling the Knowledge in a Neural Network》
论文笔记 《FitNets- Hints for Thin Deep Nets》
http://sei.pku.edu.cn/~luyy11/slides/slides_141231_ft_distill-nips14.pdf
compression
websites
Compressing and regularizing deep neural networks
papers
ICLR 2016 Best Paper: ICLR2016_Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
LSTM
- NIPS2016_Phased LSTM: Accelerating Recurrent Network
(大概是加入了一个新的time gate, 控制了更新频率,这样训练速度就快了?) - RECURRENT NEURAL NETWORK TRAINING WITH DARK KNOWLEDGE TRANSFER
用distillation的思想, 用一个cnn去训练一个lstm.