【文件属性】:
文件名称:Training Neural Networks without Gradients
文件大小:344KB
文件格式:PDF
更新时间:2021-08-12 18:56:55
deep learnin gradient
With the growing importance of large network
models and enormous training datasets, GPUs
have become increasingly necessary to train neural
networks. This is largely because conventional
optimization algorithms rely on stochastic
gradient methods that don’t scale well to large
numbers of cores in a cluster setting. Furthermore,
the convergence of all gradient methods,
including batch methods, suffers from common
problems like saturation effects, poor conditioning,
and saddle points. This paper explores an
unconventional training method that uses alternating
direction methods and Bregman iteration
to train networks without gradient descent steps.
The proposed method reduces the network training
problem to a sequence of minimization substeps
that can each be solved globally in closed
form. The proposed method is advantageous because
it avoids many of the caveats that make
gradient methods slow on highly non-convex
problems. The method exhibits strong scaling in
the distributed setting, yielding linear speedups
even when split over thousands of cores.