文件名称:Learning Recurrent Neural Networks with Hessian-Free Optimization
文件大小:295KB
文件格式:PDF
更新时间:2021-08-11 02:56:14
Recurrent Neural Networks Hessian
James Martens JMARTENS @ CS . TORONTO . EDU Ilya Sutskever ILYA @ CS . UTORONTO . CA University of Toronto, Canada Abstract In this work we resolve the long-outstanding problem of how to effectively train recurrent neu- ral networks (RNNs) on complex and difficult sequence modeling problems which may con- tain long-term data dependencies. Utilizing re- cent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a col- lection of pathological synthetic datasets which are known to be impossible for standard op- timization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method sig- nificantly outperforms the previous state-of-the- art method for training neural sequence mod- els: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Addition- ally, we offer a new interpretation of the gen- eralized Gauss-Newton matrix of Schraudolph (2002) which is used within the HF approach of Martens.