>* [Compare Stochastic learning strategies for MLPClassifier](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_training_curves.html#sphx-glr-auto-examples-neural-networks-plot-mlp-training-curves-py)
>* [Visualization of MLP weights on MNIST](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mnist_filters.html#sphx-glr-auto-examples-neural-networks-plot-mnist-filters-py)
>* [“Learning representations by back-propagating errors.”](http://www.iro.umontreal.ca/~pift6266/A06/refs/backprop_old.pdf) Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams.
>* [“Stochastic Gradient Descent”](http://leon.bottou.org/projects/sgd) L. Bottou - Website, 2010.
>* [“Backpropagation”](http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm) Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, Caroline Suen - Website, 2011.
>* [“Efficient BackProp”](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) Y. LeCun, L. Bottou, G. Orr, K. Müller - In Neural Networks: Tricks of the Trade 1998.
>* [“Adam: A method for stochastic optimization.”](http://arxiv.org/pdf/1412.6980v8.pdf) Kingma, Diederik, and Jimmy Ba. arXiv preprint arXiv:1412.6980 (2014).