A sophisticated [gradient descent](/pages/Gradient%20descent.txt) algorithm that rescales the gradients of each [parameter](/pages/Parameter%20(machine%20learning.txt)), effectively giving each parameter an independent [learning rate](/pages/Learning%20rate.txt). For a full explanation, see [this AdaGrad paper](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf).[^1] ## Footnotes [^1]: https://developers.google.com/machine-learning/glossary#adagrad