On the momentum term in gradient descent learning algorithms
@article{Qian1999OnTM, title={On the momentum term in gradient descent learning algorithms}, author={Ning Qian}, journal={Neural networks : the official journal of the International Neural Network Society}, year={1999}, volume={12 1}, pages={ 145-151 }, url={https://api.semanticscholar.org/CorpusID:2783597} }
Figures from this paper
2,183 Citations
On the influence of momentum acceleration on online learning
- 2016
Computer Science, Mathematics
The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value, and suggests a method to enhance performance in the Stochastic setting by tuning the momentum parameter over time.
Continuous Time Analysis of Momentum Methods
- 2021
Computer Science, Mathematics
This work focuses on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm, and proves three continuous time approximations of the discrete algorithms.
Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting
- 2023
Computer Science, Mathematics
We prove explicit bounds on the exponential rate of convergence for the momentum stochastic gradient descent scheme (MSGD) for arbitrary, fixed hyperparameters (learning rate, friction parameter) and…
A Global Minimization Algorithm Based on a Geodesic of a Lagrangian Formulation of Newtonian Dynamics
- 2007
Computer Science, Physics
A novel adaptive steepest descent is obtained by applying the first-order update rule to the Rosenbrock- and Griewank-type potentials and determining the global minimum in most cases from various initial points.
Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method
- 2004
Computer Science, Mathematics
Analysis Of Momentum Methods
- 2019
Computer Science, Mathematics
This work shows that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate, and shows that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis.
Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks
- 2016
Computer Science, Mathematics
Compared with existed algorithms, the novel algorithm can get more sparse network structure, namely it forces weights to become smaller during the training and can eventually removed after the training, which means that it can simply the network structure and lower operation time.
Convergence of Momentum-Based Stochastic Gradient Descent
- 2020
Computer Science, Mathematics
It is proved that the m SGD algorithm is almost surely convergent at each trajectory, and the convergence rate of mSGD is analyzed.
Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem
- 2021
Computer Science, Mathematics
This work derives a closed set of equations that describe the behaviours of several algorithms including heavy-ball momentum and Nesterov acceleration in a prototypical non-convex model: the (spiked) matrix-tensor model.
Momentum accelerates evolutionary dynamics
- 2024
Computer Science, Mathematics
Using information divergences as Lyapunov functions as Lyapunov functions, it is shown that momentum accelerates the convergence of evolutionary dynamics including the continuous and discrete replicator equations and Euclidean gradient descent on populations.
10 References
Increased rates of convergence through learning rate adaptation
- 1988
Computer Science
Learning internal representations
- 1995
Computer Science
It is proved that the number of examples required to ensure good generalisation from a representation learner obeys and that gradient descent can be used to train neural network representations and experiment results are reported providing strong qualitative support for the theoretical results.
Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation
- 1989
Computer Science
The recurrent backpropagation learning algorithm of Pineda (1987) is used to construct network models with lateral and feedback connections that can solve the correspondence problem for random-dot stereograms.
Optimal Brain Damage
- 1989
Computer Science
A class of practical and nearly optimal schemes for adapting the size of a neural network by using second-derivative information to make a tradeoff between network complexity and training set error is derived.
The computational brain
- 1992
Computer Science
The Computational Brain addresses the foundational ideas of the emerging field of computational neuroscience, examines a diverse range of neural network models, and considers future directions of the field.
Predicting the secondary structure of globular proteins using neural network models.
- 1988
Computer Science, Biology
Learning internal representations by error propagation
- 1986
Computer Science, Mathematics