Contents Gradient Descent with Adaptive Learning Rate Adaptive Gradient Root Mean Squared Propagation AdaDelta Adam MaxProp and AdaMax