WHAT I READ FOR DEEP-LEARNING

Today, I spent some time on two new papers proposing a new way of training very deep neural networks (Highway-Networks) and a new activation function for Auto-Encoders (ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF
CO-ADAPTING FEATURES) which evades the use of any regularization methods such as Contraction or Denoising.

Lets start with the first one. Highway-Networks proposes a new activation type similar to LTSM networks and they claim that this peculiar activation is robust to any choice of initialization scheme and learning problems occurred at very deep NNs. It is also incentive to see that they trained models with >100 number of layers. The basic intuition here is to learn a gating function attached to a real activation function that decides to pass the activation or the input itself. Here is the formulation

T(x,Wt) is the gating function and H(x,WH) is the real activation. They use Sigmoid activation for gating and Rectifier for the normal activation in the paper. I also implemented it with Lasagne and tried to replicate the results (I aim to release the code later). It is really impressive to see its ability to learn for 50 layers (this is the most I can for my PC).

The other paper ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF CO-ADAPTING FEATURES suggests the use of non-biased rectifier units for the inference of AEs. You can train your model with a biased Rectifier Unit but at the inference time (test time), you should extract features by ignoring bias term. They show that doing so gives better recognition at CIFAR dataset. They also device a new activation function which has the similar intuition to Highway Networks. Again, there is a gating unit which thresholds the normal activation function.

WHAT I READ FOR DEEP-LEARNING

The first equation is the threshold function with a predefined threshold (they use 1 for their experiments). The second equation shows the reconstruction of the proposed model. Pay attention that, in this equation they use square of a linear activation for thresholding and they call this model TLin but they also use normal linear function which is called TRec. What this activation does here is to diminish the small activations so that the model is implicitly regularized without any additional regularizer. This is actually good for learning over-complete representation for the given data.

For more than this silly into, please refer to papers WHAT I READ FOR DEEP-LEARNING and warn me for any mistake.

These two papers shows a new coming trend to Deep Learning community which is using complex activation functions . We can call it controlling each unit behavior in a smart way instead of letting them fire naively. My notion also agrees with this idea. I believe even more complication we need for smart units in our deep models like Spike and Slap networks.

Deep learning：五十一(CNN的反向求导及练习)
前言: CNN作为DL中最成功的模型之一,有必要对其更进一步研究它.虽然在前面的博文Stacked CNN简单介绍中有大概介绍过CNN的使用,不过那是有个前提的:CNN中的参数必须已提前学习好.而本文 ...
【深度学习Deep Learning】资料大全
最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books by Yoshua Bengio, Ian Goodfellow and Aaron C ...
《Neural Network and Deep Learning》&lowbar;chapter4
<Neural Network and Deep Learning>_chapter4: A visual proof that neural nets can compute any f ...
Deep Learning模型之：CNN卷积神经网络（一）深度解析CNN
http://m.blog.csdn.net/blog/wu010555688/24487301 本文整理了网上几位大牛的博客,详细地讲解了CNN的基础结构与核心思想,欢迎交流. [1]Deep le ...
paper 124：【转载】无监督特征学习——Unsupervised feature learning and deep learning
来源:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio c ...
Deep Learning 26：读论文&OpenCurlyDoubleQuote;Maxout Networks”——ICML 2013
论文Maxout Networks实际上非常简单,只是发现一种新的激活函数(叫maxout)而已,跟relu有点类似,relu使用的max(x,0)是对每个通道的特征图的每一个单元执行的与0比较最大化 ...
Deep Learning 23：dropout理解&lowbar;之读论文&OpenCurlyDoubleQuote;Improving neural networks by preventing co-adaptation of feature detectors”
理论知识:Deep learning:四十一(Dropout简单理解).深度学习(二十二)Dropout浅层理解与实现.“Improving neural networks by preventing ...
Deep Learning 19&lowbar;深度学习UFLDL教程：Convolutional Neural Network&lowbar;Exercise（斯坦福大学深度学习教程）
理论知识:Optimization: Stochastic Gradient Descent和Convolutional Neural Network CNN卷积神经网络推导和实现.Deep lear ...
0&period;读书笔记之The major advancements in Deep Learning in 2016
The major advancements in Deep Learning in 2016 地址:https://tryolabs.com/blog/2016/12/06/major-advanc ...
&num;Deep Learning回顾&num;之LeNet、AlexNet、GoogLeNet、VGG、ResNet
CNN的发展史上一篇回顾讲的是2006年Hinton他们的Science Paper,当时提到,2006年虽然Deep Learning的概念被提出来了,但是学术界的大家还是表示不服.当时有流传的段 ...

随机推荐

C&num; SqlBulkCopy
public void Insert_Table(System.Data.DataTable dataTable) { try { ...
传输层（2）-TCP连接的建立和终止、TIME&lowbar;WAIT状态
1.TCP连接的建立和终止 1)三路握手客户端发送一个SYN(同步)分解,告诉服务器客户将在连接中发送的数据的初始序列号. 服务器发送确认客户的SYN(ACK),同时自己也得发送一个SYN分节,它含 ...
移动端HTML5资源整理
目录 meta基础知识 H5页面窗口自动调整到设备宽度,并禁止用户缩放页面忽略将页面中的数字识别为电话号码忽略Android平台中对邮箱地址的识别当网站添加到主屏幕快速启动方式,可隐藏地址栏,仅 ...
ubuntu创建、删除文件及文件夹方法
mkdir 目录名 => 创建一个目录 rmdir 空目录名 => 删除一个空目录 rm 文件名文件名 => 删除一个文件或多个文件 rm –rf 非 ...
virtualbox中centos系统配置nat+host only上网(zhuan)
http://www.cnblogs.com/leezhxing/p/4482659.html **************************************************** ...
2014图灵技术图书最受欢迎TOP15
来自:图灵社区昨晚给我发的邮件,感觉不错,和大家分享,mark下. [小编语] 回首2014,感谢小伙伴们一路相随.让我们2015一起更快乐地玩耍.今天小编为大家盘点一下过去2014年表现最给力的技术 ...
高性能JSON工具-FastJson处理超大JSON文本
使用阿里开源类库FastJson,当需要处理超大JSON文本时,需要Stream API,在fastjson-1.1.32版本中开始提供Stream API.文档参考GitHub:https://gi ...
Unicode字段也有collation
原文:Unicode字段也有collation 转自:http://blogs.msdn.com/b/apgcdsd/archive/2011/01/11/unicode-collation.aspx ...
js json处理双引号
在数据传输流程中,json是以文本,即字符串的形式传递的,而JS操作的是JSON对象 JSON字符串: var str1 = '{ "name": "cxh", ...
Unity3D 屏幕空间雪场景Shader渲染
笔者介绍:姜雪伟,IT公司技术合伙人,IT高级讲师,CSDN社区专家,特邀编辑,畅销书作者,已出版书籍:<手把手教你架构3D游戏引擎>电子工业出版社和<Unity3D实战核心技术详解 ...

秒客网

WHAT I READ FOR DEEP-LEARNING

WHAT I READ FOR DEEP-LEARNING

WHAT I READ FOR DEEP-LEARNING的更多相关文章

随机推荐

相关文章