optimizer_en (#602)

* optimizer_en * Review * file path

optimizer_en (#602)
* optimizer_en * Review * file path
d5a50245 · HongyingG · Hao Wang · 33d8ca10 · d5a50245
隐藏空白更改
内联并排

Showing with 90 addition and 0 deletion

doc/fluid/api_guides/low_level/optimizer_en.rst doc/fluid/api_guides/low_level/optimizer_en.rst +90 -0

未找到文件。
--- a/doc/fluid/api_guides/low_level/optimizer_en.rst
+++ b/doc/fluid/api_guides/low_level/optimizer_en.rst
+..  _api_guide_optimizer_en:
+
+###########
+Optimizer
+###########
+
+Neural network in essence is a `Optimization problem <https://en.wikipedia.org/wiki/Optimization_problem>`_ .
+With `forward computing and back propagation <https://zh.wikipedia.org/zh-hans/backpropagation_algorithm>`_ ,
+:code:`Optimizer` use back-propagation gradients to optimize parameters in a neural network.
+
+1.SGD/SGDOptimizer
+------------------
+
+:code:`SGD` is an offspring class of :code:`Optimizer` implementing `Random Gradient Descent <https://arxiv.org/pdf/1609.04747.pdf>`_ which is a method of `Gradient Descent <https://zh.wikipedia.org/zh-hans/gradient_descent_algorithm>`_ .
+When it needs to train a large number of samples, we usually choose :code:`SGD` to make loss function converge more quickly.  
+
+API Reference: :ref:`api_fluid_optimizer_SGDOptimizer`
+
+
+2.Momentum/MomentumOptimizer
+----------------------------
+
+:code:`Momentum` optimizer adds momentum on the basis of :code:`SGD` , reducing noise problem in the process of random gradient descent.
+You can set :code:`ues_nesterov` as False or True, respectively corresponding to traditional `Momentum(Section 4.1 in thesis)
+<https://arxiv.org/pdf/1609.04747.pdf>`_  algorithm and `Nesterov accelerated gradient(Section 4.2 in thesis)
+<https://arxiv.org/pdf/1609.04747.pdf>`_ algorithm.
+
+API Reference: :ref:`api_fluid_optimizer_MomentumOptimizer`
+
+
+3. Adagrad/AdagradOptimizer
+---------------------------
+`Adagrad <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_ Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.
+
+API Reference: :ref:`api_fluid_optimizer_AdagradOptimizer`
+
+
+4.RMSPropOptimizer
+------------------
+`RMSProp optimizer <http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf>`_  is a method to adaptively adjust learning rate.
+It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.
+
+API Reference: :ref:`api_fluid_optimizer_RMSPropOptimizer`
+
+
+
+5.Adam/AdamOptimizer
+--------------------
+Optimizer of `Adam <https://arxiv.org/abs/1412.6980>`_ is a method to adaptively adjust learning rate,
+fit for most  non- `convex optimization <https://zh.wikipedia.org/zh/convex_optimization>`_ , big data set and high-dimensional scenarios. :code:`Adam` is the most common optimization algorithm.
+
+API Reference: :ref:`api_fluid_optimizer_AdamOptimizer`
+
+
+
+6.Adamax/AdamaxOptimizer
+------------------------
+
+`Adamax <https://arxiv.org/abs/1412.6980>`_ is a variant of :code:`Adam` algorithm, simplifying limits of learning rate, especially upper limit.
+
+API Reference: :ref:`api_fluid_optimizer_AdamaxOptimizer`
+
+
+
+7.DecayedAdagrad/DecayedAdagradOptimizer
+-------------------------------------------
+
+`DecayedAdagrad <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_ Optimizer can be regarded as an :code:`Adagrad` algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.
+
+API Reference: :ref:`api_fluid_optimizer_DecayedAdagrad`
+
+
+
+
+8. Ftrl/FtrlOptimizer
+----------------------
+
+`FtrlOptimizer <https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf>`_ Optimizer combines the high accuracy of `FOBOS algorithm <https://stanford.edu/~jduchi/projects/DuchiSi09b.pdf>`_ and the sparsity of `RDA algorithm <http://www1.se.cuhk.edu.hk/~sqma/SEEM5121_Spring2015/dual-averaging.pdf>`_ , which is an `Online Learning <https://en.wikipedia.org/wiki/Online_machine_learning>`_ algorithm with significantly satisfying effect.
+
+API Reference: :ref:`api_fluid_optimizer_FtrlOptimizer`
+
+
+
+9.ModelAverage
+-----------------
+
+:code:`ModelAverage` Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.
+
+API Reference: :ref:`api_fluid_optimizer_ModelAverage`
+