diff --git a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst index 33439ba6b9a8d6b2e5c9d090e756fd18775fe542..fcc1bf6fc03b41cb7bbd1095fb754d9b4c115944 100644 --- a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst +++ b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst @@ -72,6 +72,29 @@ Layer的全名。组成方式为: ``name_scope`` + “/” + MyLayer.__class__ 返回类型:list +.. py:method:: clear_gradients() + +清除该层所有参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + adam = fluid.optimizer.Adam(learning_rate=0.01, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + adam.minimize(out) + linear.clear_gradients() + + .. py:method:: named_parameters(prefix='', include_sublayers=True) 返回层中所有参数的迭代器,生成名称和参数的元组。 diff --git a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst index ed926e3392d97229d328de4648a03972fb44b02f..dcd4fa67e18c2ac5ddc56af1db1d5a752ba850fd 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst @@ -3,7 +3,7 @@ AdadeltaOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.AdadeltaOptimizer(learning_rate, epsilon=1.0e-6, rho=0.95, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.AdadeltaOptimizer(learning_rate, epsilon=1.0e-6, rho=0.95, parameter_list=None, regularization=None, name=None) **注意:此接口不支持稀疏参数更新。** @@ -22,6 +22,7 @@ Adadelta优化器,具体细节可参考论文 `ADADELTA: AN ADAPTIVE LEARNING - **learning_rate** (float|Variable) - 全局学习率。 - **epsilon** (float) - 维持数值稳定性的浮点型值,默认值为1.0e-6。 - **rho** (float) - 算法中的衰减率,默认值为0.95。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **regularization** (WeightDecayRegularizer,可选) - 正则化方法,例如fluid.regularizer.L2DecayRegularizer等。默认值为None,表示无正则化。 - **name** (str,可选) – 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为None。 @@ -68,3 +69,79 @@ Adadelta优化器,具体细节可参考论文 `ADADELTA: AN ADAPTIVE LEARNING optimizer_ops, params_grads = optimizer.minimize(cost) +.. py:method:: clear_gradients() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.AdadeltaOptimizer(learning_rate=0.0003, epsilon=1.0e-6, rho=0.95, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True + diff --git a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst index 2ba03fc0f9eb0964978cc6ad26c25600ae56ccc5..53b5b9774cc7d8177493b4f1af31c4950e5b37a0 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst @@ -3,7 +3,7 @@ AdagradOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.AdagradOptimizer(learning_rate, epsilon=1e-06, regularization=None, name=None, initial_accumulator_value=0.0) +.. py:class:: paddle.fluid.optimizer.AdagradOptimizer(learning_rate, epsilon=1e-06, parameter_list=None, regularization=None, name=None, initial_accumulator_value=0.0) Adaptive Gradient 优化器(自适应梯度优化器,简称Adagrad)可以针对不同参数样本数不平均的问题,自适应地为各个参数分配不同的学习率。 @@ -24,6 +24,7 @@ Adaptive Gradient 优化器(自适应梯度优化器,简称Adagrad)可以针 参数: - **learning_rate** (float|Variable) - 学习率,用于参数更新的计算。可以是一个浮点型值或者一个值为浮点型的Variable - **epsilon** (float, 可选) - 维持数值稳定性的浮点型值,默认值为1e-06 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None - **name** (str, 可选) - 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None - **initial_accumulator_value** (float, 可选) - moment累加器的初始值,默认值为0.0 @@ -86,7 +87,80 @@ Adaptive Gradient 优化器(自适应梯度优化器,简称Adagrad)可以针 fetch_list=[out.name]) +.. py:method:: clear_gradients() +**注意:** + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.AdagradOptimizer(learning_rate=0.2, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True + diff --git a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst index 98412378807cf856e120d6707a7bc22e0c525cdc..4b7683ffc2bdb8d50750c6aafc5d6012b2a2efb9 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst @@ -3,7 +3,7 @@ AdamOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, regularization=None, name=None, lazy_mode=False) +.. py:class:: paddle.fluid.optimizer.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, parameter_list=None, regularization=None, name=None, lazy_mode=False) Adam优化器出自 `Adam论文 `_ 的第二节,能够利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率。 @@ -24,6 +24,7 @@ Adam优化器出自 `Adam论文 `_ 的第二节 参数: - **learning_rate** (float|Variable,可选) - 学习率,用于参数更新的计算。可以是一个浮点型值或者一个值为浮点型的Variable,默认值为0.001 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **beta1** (float|Variable, 可选) - 一阶矩估计的指数衰减率,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。默认值为0.9 - **beta2** (float|Variable, 可选) - 二阶矩估计的指数衰减率,是一个float类型或者一个shape为[1],数据类型为float32的Variable类型。默认值为0.999 - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值,默认值为1e-08 @@ -32,7 +33,7 @@ Adam优化器出自 `Adam论文 `_ 的第二节 - **lazy_mode** (bool, 可选) - 设为True时,仅更新当前具有梯度的元素。官方Adam算法有两个移动平均累加器(moving-average accumulators)。累加器在每一步都会更新。在密集模式和稀疏模式下,两条移动平均线的每个元素都会更新。如果参数非常大,那么更新可能很慢。 lazy mode仅更新当前具有梯度的元素,所以它会更快。但是这种模式与原始的算法有不同的描述,可能会导致不同的结果,默认为False -**代码示例**: +**代码示例** .. code-block:: python @@ -134,7 +135,7 @@ Adam优化器出自 `Adam论文 `_ 的第二节 返回类型: tuple -**代码示例**: +**代码示例** .. code-block:: python @@ -159,3 +160,81 @@ Adam优化器出自 `Adam论文 `_ 的第二节 feed={'X': x, 'Y': y}, fetch_list=[loss.name]) + +.. py:method:: clear_gradients() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.Adam(learning_rate=0.02, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True + diff --git a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst index dd2d0bd6d623e9582860545dfbad9dc8f0bcf664..4dcfd2c789150771d427f73fded8a5706f07bbfb 100644 --- a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst @@ -3,7 +3,7 @@ AdamaxOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.AdamaxOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.AdamaxOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, parameter_list=None, regularization=None, name=None) Adamax优化器是参考 `Adam论文 `_ 第7节Adamax优化相关内容所实现的。Adamax算法是基于无穷大范数的 `Adam `_ 算法的一个变种,使学习率更新的算法更加稳定和简单。 @@ -29,6 +29,7 @@ Adamax优化器是参考 `Adam论文 `_ 第7节 - **beta1** (float, 可选) - 一阶矩估计的指数衰减率,默认值为0.9 - **beta2** (float, 可选) - 二阶矩估计的指数衰减率,默认值为0.999 - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值,默认值为1e-08 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **regularization** (WeightDecayRegularizer, 可选) - 正则化函数,用于减少泛化误差。例如可以是 :ref:`cn_api_fluid_regularizer_L2DecayRegularizer` ,默认值为None - **name** (str, 可选)- 该参数供开发人员打印调试信息时使用,具体用法请参见 :ref:`api_guide_Name` ,默认值为None @@ -100,8 +101,80 @@ Adamax优化器是参考 `Adam论文 `_ 第7节 +.. py:method:: clear_gradients() +**注意:** + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.AdamaxOptimizer(learning_rate=0.2, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True diff --git a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst index 60a6b7f440255b52c3016ca4535a65b62316795e..6341be07028835b0d1f941cc55f37d9fcd349384 100644 --- a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst @@ -3,7 +3,7 @@ DecayedAdagradOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.DecayedAdagradOptimizer(learning_rate, decay=0.95, epsilon=1e-06, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.DecayedAdagradOptimizer(learning_rate, decay=0.95, epsilon=1e-06, parameter_list=None, regularization=None, name=None) Decayed Adagrad优化器,可以看做是引入了衰减率的 `Adagrad `_ 算法,用于解决使用 :ref:`cn_api_fluid_optimizer_AdagradOptimizer` 优化器时,在模型训练中后期学习率急剧下降的问题。 @@ -21,6 +21,7 @@ Decayed Adagrad优化器,可以看做是引入了衰减率的 `Adagrad `_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.DecayedAdagradOptimizer(learning_rate=0.02, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True + diff --git a/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst index cc1eb9aa7183278f4243adad7aa9873eac847be7..cbdd0923095afa5ec09811734e680828495372b8 100644 --- a/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst @@ -3,7 +3,7 @@ FtrlOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.FtrlOptimizer(learning_rate, l1=0.0, l2=0.0, lr_power=-0.5,regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.FtrlOptimizer(learning_rate, l1=0.0, l2=0.0, lr_power=-0.5, parameter_list=None, regularization=None, name=None) 该接口实现FTRL (Follow The Regularized Leader) Optimizer. @@ -30,6 +30,7 @@ FTRL 原始论文: ( `https://www.eecs.tufts.edu/~dsculley/papers/ad-click-predi 参数: - **learning_rate** (float|Variable)- 全局学习率。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **l1** (float,可选) - L1 regularization strength,默认值0.0。 - **l2** (float,可选) - L2 regularization strength,默认值0.0。 - **lr_power** (float,可选) - 学习率降低指数,默认值-0.5。 @@ -91,3 +92,80 @@ FTRL 原始论文: ( `https://www.eecs.tufts.edu/~dsculley/papers/ad-click-predi 返回类型: tuple +.. py:method:: clear_gradients() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.FtrlOptimizer(learning_rate=0.02, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True + diff --git a/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst index 404315b997aabfe49d21df601279df5d75ca4b74..60dd8f4db8a428fbc61de426267ef680dff0c811 100644 --- a/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst @@ -3,7 +3,7 @@ LambOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.LambOptimizer(learning_rate=0.001, lamb_weight_decay=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06, regularization=None, exclude_from_weight_decay_fn=None, name=None) +.. py:class:: paddle.fluid.optimizer.LambOptimizer(learning_rate=0.001, lamb_weight_decay=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06, parameter_list=None, regularization=None, exclude_from_weight_decay_fn=None, name=None) LAMB(Layer-wise Adaptive Moments optimizer for Batching training)优化器 LAMB的优化器旨在不降低精度的前提下增大训练的批量大小,其支持自适应的逐元素更新和精确的分层校正。 更多信息请参考 `Large Batch Optimization for @@ -29,6 +29,7 @@ Deep Learning: Training BERT in 76 minutes `_ **模式下生效** +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + def exclude_fn(param): + return param.name.endswith('.b_0') + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.LambOptimizer(learning_rate=0.02, + exclude_from_weight_decay_fn=exclude_fn, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True diff --git a/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst index 3a487da090f1cf52844951b844227874281a9dc0..bec1c01565a9180fa5c91c236df0ffd3f24604b9 100644 --- a/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst @@ -3,7 +3,7 @@ LarsMomentumOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.LarsMomentumOptimizer(learning_rate, momentum, lars_coeff=0.001, lars_weight_decay=0.0005, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.LarsMomentumOptimizer(learning_rate, momentum, lars_coeff=0.001, lars_weight_decay=0.0005, parameter_list=None, regularization=None, name=None) 该接口实现LARS支持的Momentum优化器 @@ -19,6 +19,7 @@ LarsMomentumOptimizer 参数: - **learning_rate** (float|Variable) - 学习率,用于参数更新。作为数据参数,可以是浮点型值或含有一个浮点型值的变量。 - **momentum** (float) - 动量因子。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **lars_coeff** (float,可选) - 定义LARS本地学习率的权重,默认值0.001。 - **lars_weight_decay** (float,可选) - 使用LARS进行衰减的权重衰减系数,默认值0.0005。 - **regularization** - 正则化函数,例如 :code:`fluid.regularizer.L2DecayRegularizer`。 @@ -66,6 +67,80 @@ LarsMomentumOptimizer 返回类型: tuple +.. py:method:: clear_gradients() +**注意:** + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.LarsMomentumOptimizer(learning_rate=0.001, momentum=0.9, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True diff --git a/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst index d3d0d909335f14476525dc8e2da8ea46380eae19..2feb44a3a6a2eb9dfefa624647d95cb333a9b57c 100644 --- a/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst @@ -3,7 +3,7 @@ MomentumOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.MomentumOptimizer(learning_rate, momentum, use_nesterov=False, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.MomentumOptimizer(learning_rate, momentum, parameter_list=None, use_nesterov=False, regularization=None, name=None) 该接口实现含有速度状态的Simple Momentum 优化器 @@ -18,6 +18,7 @@ MomentumOptimizer 参数: - **learning_rate** (float|Variable) - 学习率,用于参数更新。作为数据参数,可以是浮点型值或含有一个浮点型值的变量。 - **momentum** (float) - 动量因子。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **use_nesterov** (bool,可选) - 赋能牛顿动量,默认值False。 - **regularization** - 正则化函数,,例如 :code:`fluid.regularizer.L2DecayRegularizer`,默认值None。 - **name** (str, 可选) - 可选的名称前缀,一般无需设置,默认值为None。 @@ -99,4 +100,80 @@ MomentumOptimizer +.. py:method:: clear_gradients() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.001, momentum=0.9, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True diff --git a/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst index cfd70076247304af5411a48a8dafe0a22f03a895..459ca943d644aa9c74bbbc154cc565f94ad9aa71 100644 --- a/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst @@ -3,7 +3,7 @@ RMSPropOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.RMSPropOptimizer(learning_rate, rho=0.95, epsilon=1e-06, momentum=0.0, centered=False, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.RMSPropOptimizer(learning_rate, rho=0.95, epsilon=1e-06, momentum=0.0, centered=False, parameter_list=None, regularization=None, name=None) 该接口实现均方根传播(RMSProp)法,是一种未发表的,自适应学习率的方法。原演示幻灯片中提出了RMSProp:[http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf]中的第29张。等式如下所示: @@ -30,6 +30,7 @@ RMSPropOptimizer 参数: - **learning_rate** (float) - 全局学习率。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **rho** (float,可选) - rho是等式中的 :math:`rho` ,默认值0.95。 - **epsilon** (float,可选) - 等式中的epsilon是平滑项,避免被零除,默认值1e-6。 - **momentum** (float,可选) - 方程中的β是动量项,默认值0.0。 @@ -117,5 +118,80 @@ RMSPropOptimizer +.. py:method:: clear_gradients() +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.RMSPropOptimizer(learning_rate=0.01, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True diff --git a/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst b/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst index f2c4df35ed80544759e773be5169dc835c349e8a..1c95ff00db1983ac1ba28d78fc5c96f59ed44a72 100644 --- a/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst +++ b/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst @@ -3,7 +3,7 @@ SGDOptimizer ------------------------------- -.. py:class:: paddle.fluid.optimizer.SGDOptimizer(learning_rate, regularization=None, name=None) +.. py:class:: paddle.fluid.optimizer.SGDOptimizer(learning_rate, parameter_list=None, regularization=None, name=None) 该接口实现随机梯度下降算法的优化器 @@ -13,6 +13,7 @@ SGDOptimizer 参数: - **learning_rate** (float|Variable) - 用于更新参数的学习率。可以是浮点值,也可以是具有一个浮点值作为数据元素的变量。 + - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数;在静态图模式下默认值为None,这时所有的参数都将被优化。 - **regularization** - 一个正则化器,例如 ``fluid.regularizer.L2DecayRegularizer`` 。 - **name** (str, 可选) - 可选的名称前缀,一般无需设置,默认值为None。 @@ -94,4 +95,80 @@ SGDOptimizer +.. py:method:: clear_gradients() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + + +清除需要优化的参数的梯度。 + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + with fluid.dygraph.guard(): + value = np.arange(26).reshape(2, 13).astype("float32") + a = fluid.dygraph.to_variable(value) + linear = fluid.Linear(13, 5, dtype="float32") + optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01, + parameter_list=linear.parameters()) + out = linear(a) + out.backward() + optimizer.minimize(out) + optimizer.clear_gradients() + + +.. py:method:: current_step_lr() + +**注意:** + + **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效** + +获取当前步骤的学习率。当不使用LearningRateDecay时,每次调用的返回值都相同,否则返回当前步骤的学习率。 + +返回:当前步骤的学习率。 + +返回类型:float + +**代码示例** + +.. code-block:: python + + import paddle.fluid as fluid + import numpy as np + + # example1: LearningRateDecay is not used, return value is all the same + with fluid.dygraph.guard(): + emb = fluid.dygraph.Embedding([10, 10]) + adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters()) + lr = adam.current_step_lr() + print(lr) # 0.001 + + # example2: PiecewiseDecay is used, return the step learning rate + with fluid.dygraph.guard(): + inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32") + linear = fluid.dygraph.nn.Linear(10, 10) + inp = fluid.dygraph.to_variable(inp) + out = linear(inp) + loss = fluid.layers.reduce_mean(out) + + bd = [2, 4, 6, 8] + value = [0.2, 0.4, 0.6, 0.8, 1.0] + adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0), + parameter_list=linear.parameters()) + + # first step: learning rate is 0.2 + np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True + + # learning rate for different steps + ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0] + for i in range(12): + adam.minimize(loss) + lr = adam.current_step_lr() + np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True