未验证 提交 a09d4a6b 编写于 作者: C Chen Weihang 提交者: GitHub

Polish the Chinese API documentation of DecayedAdagradOptimizer (#1256)

* polish decayed adagrad opt zh api doc

* change the note position

* delete return and return type

* polish details

* use whole optimizer name
上级 41dc1af2
...@@ -5,177 +5,43 @@ DecayedAdagradOptimizer ...@@ -5,177 +5,43 @@ DecayedAdagradOptimizer
.. py:class:: paddle.fluid.optimizer.DecayedAdagradOptimizer(learning_rate, decay=0.95, epsilon=1e-06, regularization=None, name=None) .. py:class:: paddle.fluid.optimizer.DecayedAdagradOptimizer(learning_rate, decay=0.95, epsilon=1e-06, regularization=None, name=None)
Decayed Adagrad Optimizer Decayed Adagrad优化器,可以看做是引入了衰减率的 `Adagrad <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_ 算法,用于解决使用 :ref:`cn_api_fluid_optimizer_AdagradOptimizer` 优化器时,在模型训练中后期学习率急剧下降的问题。
`原始论文 <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_ 其参数更新的计算公式如下:
原始论文: `http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_ 中没有 ``epsilon`` 参数。但是,为了数值稳定性, 防止除0错误, 增加了这个参数
.. math:: .. math::
moment\_out = decay*moment+(1-decay)*grad*grad moment\_out = decay*moment+(1-decay)*grad*grad
.. math:: .. math::
param\_out = param-\frac{learning\_rate*grad}{\sqrt{moment\_out}+\epsilon } param\_out = param-\frac{learning\_rate*grad}{\sqrt{moment\_out}+\epsilon }
参数:
- **learning_rate** (float|Variable) - 用于更新参数的学习率。可以是浮点值,也可以是具有一个浮点值作为数据元素的变量。
- **decay** (float) – 衰减率
- **regularization** - 一个正则化器,例如 ``fluid.regularizer.L2DecayRegularizer``
- **epsilon** (float) - 非常小的浮点值,为了数值稳定性
- **name** — 可选的名称前缀。
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from paddle.fluid.optimizer import DecayedAdagrad
x = layers.data( name='x', shape=[-1, 10], dtype='float32' )
trans = layers.fc( x, 100 )
cost = layers.reduce_mean( trans )
optimizer = fluid.optimizer.DecayedAdagrad(learning_rate=0.2)
optimizer.minimize(cost)
.. note::
当前, ``DecayedAdagradOptimizer`` 不支持 sparse parameter optimization
.. py:method:: apply_gradients(params_grads)
为给定的params_grads对附加优化算子,为minimize过程的第二步 在原论文中没有 ``epsilon`` 参数。但是,为了保持数值稳定性, 防止除0错误, 此处增加了这个参数。
参数: 相关论文:`Adaptive Subgradient Methods for Online Learning and Stochastic Optimization <http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>`_
- **params_grads** (list)- 用于优化的(param, grad)对组成的列表
返回: 附加在当前Program的算子组成的列表
返回类型: list
**代码示例**
.. code-block:: python
import paddle.fluid as fluid
loss = network()
optimizer = fluid.optimizer.SGD(learning_rate=0.1)
params_grads = optimizer.backward(loss)
# you may append operations for params_grads here
# ...
optimizer.apply_gradients(params_grads)
.. py:method:: apply_optimize(loss, startup_program, params_grads)
为给定的params_grads对附加优化算子,为minimize过程的第二步。
参数:
- **loss** (Variable) – 用于优化过程的损失值变量
- **startup_program** (Program) – 用于初始化在parameter_list中参数的startup_program
- **params_grads** (list)- 用于优化的(param, grad)对组成的列表
返回: 附加在当前Program的算子组成的列表
返回类型: list
.. py:method:: backward(loss, startup_program=None, parameter_list=None, no_grad_set=None, callbacks=None)
自动做diff来向当前program附加反向算子,为minimize过程的第一步。
参数:
- **loss** (Variable) – 用于优化过程的损失值变量
- **startup_program** (Program) – 用于初始化在parameter_list中参数的startup_program
- **parameter_list** (list) – 待更新的Variables组成的列表
- **no_grad_set** (set|None) – 应该被无视的Variables集合
- **callbacks** (list|None) – 当为某参数附加反向算子时所要运行的callables组成的列表
返回: 附加在当前Program的算子组成的列表
返回类型: list
**代码示例**
详见apply_gradients的示例
.. py:method:: load(stat_dict)
在dygraph模式下,附带学习率衰减来加载优化器。
参数: 参数:
- **stat_dict** – load_persistable方法加载的dict - **learning_rate** (float|Variable) - 学习率,用于参数更新的计算。可以是一个浮点型值或者一个值为浮点型的Variable
- **decay** (float,可选) – 衰减率,默认值为0.95
- **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None
- **epsilon** (float,可选) - 保持数值稳定性的短浮点类型值,默认值为1e-06
- **name** (str, 可选)- 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None
.. note::
当前, ``DecayedAdagradOptimizer`` 不支持Sparse Parameter Optimization(稀疏参数优化)
**代码示例** **代码示例**
.. code-block:: python .. code-block:: python
from __future__ import print_function
import numpy as np
import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.optimizer import SGDOptimizer import paddle.fluid.layers as layers
from paddle.fluid.dygraph.nn import FC from paddle.fluid.optimizer import DecayedAdagrad
from paddle.fluid.dygraph.base import to_variable
x = layers.data( name='x', shape=[-1, 10], dtype='float32' )
class MLP(fluid.Layer): trans = layers.fc( x, 100 )
def __init__(self, name_scope): cost = layers.reduce_mean( trans )
super(MLP, self).__init__(name_scope) optimizer = fluid.optimizer.DecayedAdagradOptimizer(learning_rate=0.2)
optimizer.minimize(cost)
self._fc1 = FC(self.full_name(), 10)
self._fc2 = FC(self.full_name(), 10)
def forward(self, inputs):
y = self._fc1(inputs)
y = self._fc2(y)
return y
with fluid.dygraph.guard():
mlp = MLP('mlp')
optimizer2 = SGDOptimizer(
learning_rate=fluid.layers.natural_exp_decay(
learning_rate=0.1,
decay_steps=10000,
decay_rate=0.5,
staircase=True))
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=128, drop_last=True)
for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array(
[x[0].reshape(1, 28, 28) for x in data]).astype('float32')
y_data = np.array([x[1] for x in data]).astype('int64').reshape(
128, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data)
label._stop_gradient = True
cost = mlp(img)
avg_loss = fluid.layers.reduce_mean(cost)
avg_loss.backward()
optimizer.minimize(avg_loss)
mlp.clear_gradients()
fluid.dygraph.save_persistables(
mlp.state_dict(), [optimizer, optimizer2], "save_dir_2")
if batch_id == 2:
break
with fluid.dygraph.guard():
mlp_load = MLP('mlp')
optimizer_load2 = SGDOptimizer(
learning_rate=fluid.layers.natural_exp_decay(
learning_rate=0.1,
decay_steps=10000,
decay_rate=0.5,
staircase=True))
parameters, optimizers = fluid.dygraph.load_persistables(
"save_dir_2")
mlp_load.load_dict(parameters)
optimizer_load2.load(optimizers)
self.assertTrue(optimizer2._learning_rate.__dict__ == optimizer_load2._learning_rate.__dict__)
.. py:method:: minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None, grad_clip=None) .. py:method:: minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None, grad_clip=None)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册