On the momentum term in gradient descent learning algorithms

N. Qian

DOI:10.1016/S0893-6080(98)00116-6
Corpus ID: 2783597

On the momentum term in gradient descent learning algorithms

@article{Qian1999OnTM,
  title={On the momentum term in gradient descent learning algorithms},
  author={Ning Qian},
  journal={Neural networks : the official journal of the International Neural Network Society},
  year={1999},
  volume={12 1},
  pages={
          145-151
        },
  url={https://api.semanticscholar.org/CorpusID:2783597}
}

N. Qian
Published in Neural Networks 1999
Computer Science, Mathematics

View on PubMed

doi.org

2,183 Citations

Highly Influential Citations

206

Background Citations

485

Methods Citations

907

Results Citations

Figures from this paper

figure 1

Topics

Momentum Term Convergence Learning Algorithm Local Minima Learning Rates Gradient Descent Continuous-time

On the influence of momentum acceleration on online learning

K. YuanBicheng YingA. H. Sayed

Computer Science, Mathematics

2016 IEEE International Conference on Acoustics…

2016

The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value, and suggests a method to enhance performance in the Stochastic setting by tuning the momentum parameter over time.

[PDF]

Continuous Time Analysis of Momentum Methods

Nikola B. KovachkiA. Stuart

Computer Science, Mathematics

J. Mach. Learn. Res.

2021

This work focuses on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm, and proves three continuous time approximations of the discrete algorithms.

[PDF]

Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting

B. GessSebastian Kassing

Computer Science, Mathematics

2023

We prove explicit bounds on the exponential rate of convergence for the momentum stochastic gradient descent scheme (MSGD) for arbitrary, fixed hyperparameters (learning rate, friction parameter) and…

[PDF]

A Global Minimization Algorithm Based on a Geodesic of a Lagrangian Formulation of Newtonian Dynamics

J. KimJong Chan KimO. JangminByoung-Tak Zhang

Computer Science, Physics

Neural Processing Letters

2007

A novel adaptive steepest descent is obtained by applying the first-order update rule to the Rosenbrock- and Griewank-type potentials and determining the global minimum in most cases from various initial points.

Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method

A. BhayaE. Kaszkurewicz

Computer Science, Mathematics

Neural Networks

2004

Analysis Of Momentum Methods

Nikola B. KovachkiA. Stuart

Computer Science, Mathematics

ArXiv

2019

This work shows that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate, and shows that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis.

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

Qinwei FanWei WuJ. Zurada

Computer Science, Mathematics

SpringerPlus

2016

Compared with existed algorithms, the novel algorithm can get more sparse network structure, namely it forces weights to become smaller during the training and can eventually removed after the training, which means that it can simply the network structure and lower operation time.

Convergence of Momentum-Based Stochastic Gradient Descent

Ruinan JinXingkang He

Computer Science, Mathematics

2020 IEEE 16th International Conference on…

2020

It is proved that the m SGD algorithm is almost surely convergent at each trajectory, and the convergence rate of mSGD is analyzed.

Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem

Stefano Sarao MannelliPierfrancesco Urbani

Computer Science, Mathematics

ArXiv

2021

This work derives a closed set of equations that describe the behaviours of several algorithms including heavy-ball momentum and Nesterov acceleration in a prototypical non-convex model: the (spiked) matrix-tensor model.

Momentum accelerates evolutionary dynamics

Marc HarperJoshua Safyan

Computer Science, Mathematics

Journal of Physics: Complexity

2024

Using information divergences as Lyapunov functions as Lyapunov functions, it is shown that momentum accelerates the convergence of evolutionary dynamics including the continuous and discrete replicator equations and Euclidean gradient descent on populations.

[PDF]

Increased rates of convergence through learning rate adaptation

R. Jacobs

Computer Science

Neural Networks

1988

Learning internal representations

Jonathan Baxter

Computer Science

COLT '95

1995

It is proved that the number of examples required to ensure good generalisation from a representation learner obeys and that gradient descent can be used to train neural network representations and experiment results are reported providing strong qualitative support for the theoretical results.

[PDF]

Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation

Ning QianT. Sejnowski

Computer Science

1989

The recurrent backpropagation learning algorithm of Pineda (1987) is used to construct network models with lateral and feedback connections that can solve the correspondence problem for random-dot stereograms.

Optimal Brain Damage

Yann LeCunJ. DenkerS. Solla

Computer Science

NIPS

1989

A class of practical and nearly optimal schemes for adapting the size of a neural network by using second-derivative information to make a tradeoff between network complexity and training set error is derived.

The computational brain

P. ChurchlandT. Sejnowski

Computer Science

Computational neuroscience

1992

The Computational Brain addresses the foundational ideas of the emerging field of computational neuroscience, examines a diverse range of neural network models, and considers future directions of the field.

1,858

Predicting the secondary structure of globular proteins using neural network models.

Ning QianT. Sejnowski

Computer Science, Biology

Journal of molecular biology

1988

Learning internal representations by error propagation

D. RumelhartGeoffrey E. HintonRonald J. Williams

Computer Science, Mathematics

1986

Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

J. GarnierD. OsguthorpeB. Robson

Biology, Chemistry

Journal of molecular biology

1978

Neurocomputing (vol. 2): directions for research

J.A. AndersonA. PellioniszEdward Rosenfeld

Computer Science

1990

On the momentum term in gradient descent learning algorithms

Figures from this paper

Topics

2,183 Citations

On the influence of momentum acceleration on online learning

Continuous Time Analysis of Momentum Methods

Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting

A Global Minimization Algorithm Based on a Geodesic of a Lagrangian Formulation of Newtonian Dynamics

Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method

Analysis Of Momentum Methods

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

Convergence of Momentum-Based Stochastic Gradient Descent

Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem

Momentum accelerates evolutionary dynamics

10 References

Increased rates of convergence through learning rate adaptation

Learning internal representations

Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation

Optimal Brain Damage

The computational brain

Predicting the secondary structure of globular proteins using neural network models.

Learning internal representations by error propagation

Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

Neurocomputing (vol. 2): directions for research

Related Papers