提交 57252dee 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!3191 Fix doc error of optim API

Merge pull request !3191 from Simson/doc-fix
...@@ -41,7 +41,7 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay_tensor, param, m, v, grad ...@@ -41,7 +41,7 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay_tensor, param, m, v, grad
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0). beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0. eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate. lr (Tensor): Learning rate.
weight_decay_tensor (Tensor): Weight decay. Should be equal to or greater than 0. weight_decay_tensor (Tensor): Weight decay. Should be in range [0.0, 1.0].
param (Tensor): Parameters. param (Tensor): Parameters.
m (Tensor): m value of parameters. m (Tensor): m value of parameters.
v (Tensor): v value of parameters. v (Tensor): v value of parameters.
...@@ -252,8 +252,8 @@ class Adam(Optimizer): ...@@ -252,8 +252,8 @@ class Adam(Optimizer):
use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
If True, updates the gradients using NAG. If True, updates the gradients using NAG.
If False, updates the gradients without using NAG. Default: False. If False, updates the gradients without using NAG. Default: False.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0. weight_decay (float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0.
loss_scale (float): A floating point value for the loss scale. Should be greater than 0. Default: 1.0. loss_scale (float): A floating point value for the loss scale. Should be not less than 1.0. Default: 1.0.
Inputs: Inputs:
- **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`. - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
...@@ -392,7 +392,7 @@ class AdamWeightDecay(Optimizer): ...@@ -392,7 +392,7 @@ class AdamWeightDecay(Optimizer):
Should be in range (0.0, 1.0). Should be in range (0.0, 1.0).
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6. eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
Should be greater than 0. Should be greater than 0.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0. weight_decay (float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0.
decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default: decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default:
lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name. lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name.
...@@ -457,7 +457,7 @@ class AdamWeightDecayDynamicLR(Optimizer): ...@@ -457,7 +457,7 @@ class AdamWeightDecayDynamicLR(Optimizer):
Should be in range (0.0, 1.0). Should be in range (0.0, 1.0).
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6. eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
Should be greater than 0. Should be greater than 0.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0. weight_decay (float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0.
decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default: decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default:
lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name. lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name.
......
...@@ -128,7 +128,7 @@ class FTRL(Optimizer): ...@@ -128,7 +128,7 @@ class FTRL(Optimizer):
l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0. l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
use_locking (bool): If True use locks for update operation. Default: False. use_locking (bool): If True use locks for update operation. Default: False.
loss_scale (float): Value for the loss scale. It should be equal to or greater than 1.0. Default: 1.0. loss_scale (float): Value for the loss scale. It should be equal to or greater than 1.0. Default: 1.0.
wegith_decay (float): Weight decay value to multiply weight, must be zero or positive value. Default: 0.0. wegith_decay (float): Weight decay value to multiply weight, should be in range [0.0, 1.0]. Default: 0.0.
Inputs: Inputs:
- **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params` - **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params`
......
...@@ -44,7 +44,7 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay_tensor, global_step, para ...@@ -44,7 +44,7 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay_tensor, global_step, para
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0). beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0. eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate. lr (Tensor): Learning rate.
weight_decay_tensor (Tensor): Weight decay. Should be equal to or greater than 0. weight_decay_tensor (Tensor): Weight decay. Should be in range [0.0, 1.0].
global_step (Tensor): Global step. global_step (Tensor): Global step.
param (Tensor): Parameters. param (Tensor): Parameters.
m (Tensor): m value of parameters. m (Tensor): m value of parameters.
...@@ -128,7 +128,7 @@ def _update_run_op_graph_kernel(beta1, beta2, eps, lr, weight_decay_tensor, ...@@ -128,7 +128,7 @@ def _update_run_op_graph_kernel(beta1, beta2, eps, lr, weight_decay_tensor,
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0). beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0. eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate. lr (Tensor): Learning rate.
weight_decay_tensor (Tensor): Weight decay. Should be equal to or greater than 0. weight_decay_tensor (Tensor): Weight decay. Should be in range [0.0, 1.0].
global_step (Tensor): Global step. global_step (Tensor): Global step.
param (Tensor): Parameters. param (Tensor): Parameters.
m (Tensor): m value of parameters. m (Tensor): m value of parameters.
...@@ -229,7 +229,7 @@ class Lamb(Optimizer): ...@@ -229,7 +229,7 @@ class Lamb(Optimizer):
Should be in range (0.0, 1.0). Should be in range (0.0, 1.0).
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6. eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
Should be greater than 0. Should be greater than 0.
weight_decay (float): Weight decay (L2 penalty). Default: 0.0. Should be equal to or greater than 0. weight_decay (float): Weight decay (L2 penalty). Default: 0.0. Should be in range [0.0, 1.0].
decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default: decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default:
lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name. lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name.
......
...@@ -133,7 +133,7 @@ class LazyAdam(Optimizer): ...@@ -133,7 +133,7 @@ class LazyAdam(Optimizer):
If True, updates the gradients using NAG. If True, updates the gradients using NAG.
If False, updates the gradients without using NAG. Default: False. If False, updates the gradients without using NAG. Default: False.
weight_decay (float): Weight decay (L2 penalty). Default: 0.0. weight_decay (float): Weight decay (L2 penalty). Default: 0.0.
loss_scale (float): A floating point value for the loss scale. Should be equal to or greater than 1. Default: loss_scale (float): A floating point value for the loss scale. It should be not less than 1.0. Default:
1.0. 1.0.
Inputs: Inputs:
......
...@@ -83,8 +83,8 @@ class Momentum(Optimizer): ...@@ -83,8 +83,8 @@ class Momentum(Optimizer):
or greater than 0.0. or greater than 0.0.
momentum (float): Hyperparameter of type float, means momentum for the moving average. momentum (float): Hyperparameter of type float, means momentum for the moving average.
It should be at least 0.0. It should be at least 0.0.
weight_decay (int, float): Weight decay (L2 penalty). It should be equal to or greater than 0.0. Default: 0.0. weight_decay (int, float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0.
loss_scale (int, float): A floating point value for the loss scale. It should be greater than 0.0. Default: 1.0. loss_scale (int, float): A floating point value for the loss scale. Should be not less than 1.0. Default: 1.0.
use_nesterov (bool): Enable Nesterov momentum. Default: False. use_nesterov (bool): Enable Nesterov momentum. Default: False.
Inputs: Inputs:
......
...@@ -79,10 +79,9 @@ class Optimizer(Cell): ...@@ -79,10 +79,9 @@ class Optimizer(Cell):
the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
in the value of 'order_params' should be in one of group parameters. in the value of 'order_params' should be in one of group parameters.
weight_decay (float): A floating point value for the weight decay. It should be not less than 0 and not weight_decay (float): A floating point value for the weight decay. It should be in range [0.0, 1.0].
greater than 1.
If the type of `weight_decay` input is int, it will be converted to float. Default: 0.0. If the type of `weight_decay` input is int, it will be converted to float. Default: 0.0.
loss_scale (float): A floating point value for the loss scale. It should be not less than 1. If the loss_scale (float): A floating point value for the loss scale. It should be not less than 1.0. If the
type of `loss_scale` input is int, it will be converted to float. Default: 1.0. type of `loss_scale` input is int, it will be converted to float. Default: 1.0.
Raises: Raises:
...@@ -333,8 +332,8 @@ class Optimizer(Cell): ...@@ -333,8 +332,8 @@ class Optimizer(Cell):
if 'weight_decay' in group_param.keys(): if 'weight_decay' in group_param.keys():
validator.check_float_legal_value('weight_decay', group_param['weight_decay'], None) validator.check_float_legal_value('weight_decay', group_param['weight_decay'], None)
validator.check_number_range('weight_decay', group_param['weight_decay'], 0.0, float("inf"), validator.check_number_range('weight_decay', group_param['weight_decay'], 0.0, 1.0,
Rel.INC_LEFT, self.cls_name) Rel.INC_BOTH, self.cls_name)
weight_decay_ = group_param['weight_decay'] * self.loss_scale weight_decay_ = group_param['weight_decay'] * self.loss_scale
else: else:
weight_decay_ = weight_decay * self.loss_scale weight_decay_ = weight_decay * self.loss_scale
......
...@@ -71,8 +71,8 @@ class ProximalAdagrad(Optimizer): ...@@ -71,8 +71,8 @@ class ProximalAdagrad(Optimizer):
l1 (float): l1 regularization strength, must be greater than or equal to zero. Default: 0.0. l1 (float): l1 regularization strength, must be greater than or equal to zero. Default: 0.0.
l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0. l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
use_locking (bool): If True use locks for update operation. Default: False. use_locking (bool): If True use locks for update operation. Default: False.
loss_scale (float): Value for the loss scale. It should be greater than 0.0. Default: 1.0. loss_scale (float): Value for the loss scale. It should be not less than 1.0. Default: 1.0.
wegith_decay (float): Weight decay value to multiply weight, must be zero or positive value. Default: 0.0. wegith_decay (float): Weight decay value to multiply weight, should be in range [0.0, 1.0]. Default: 0.0.
Inputs: Inputs:
- **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params` - **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params`
......
...@@ -123,8 +123,8 @@ class RMSProp(Optimizer): ...@@ -123,8 +123,8 @@ class RMSProp(Optimizer):
0. Default: 1e-10. 0. Default: 1e-10.
use_locking (bool): Enable a lock to protect the update of variable and accumlation tensors. Default: False. use_locking (bool): Enable a lock to protect the update of variable and accumlation tensors. Default: False.
centered (bool): If True, gradients are normalized by the estimated variance of the gradient. Default: False. centered (bool): If True, gradients are normalized by the estimated variance of the gradient. Default: False.
loss_scale (float): A floating point value for the loss scale. Should be greater than 0. Default: 1.0. loss_scale (float): A floating point value for the loss scale. Should be not less than 1.0. Default: 1.0.
weight_decay (float): Weight decay (L2 penalty). Should be equal to or greater than 0. Default: 0.0. weight_decay (float): Weight decay (L2 penalty). Should be in range [0.0, 1.0]. Default: 0.0.
Inputs: Inputs:
- **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`. - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
......
...@@ -76,10 +76,9 @@ class SGD(Optimizer): ...@@ -76,10 +76,9 @@ class SGD(Optimizer):
greater than 0. Default: 0.1. greater than 0. Default: 0.1.
momentum (float): A floating point value the momentum. should be at least 0.0. Default: 0.0. momentum (float): A floating point value the momentum. should be at least 0.0. Default: 0.0.
dampening (float): A floating point value of dampening for momentum. should be at least 0.0. Default: 0.0. dampening (float): A floating point value of dampening for momentum. should be at least 0.0. Default: 0.0.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0. weight_decay (float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0.
nesterov (bool): Enables the Nesterov momentum. Default: False. nesterov (bool): Enables the Nesterov momentum. Default: False.
loss_scale (float): A floating point value for the loss scale, which should be larger loss_scale (float): A floating point value for the loss scale. Should be not less than 1.0. Default: 1.0.
than 0.0. Default: 1.0.
Inputs: Inputs:
- **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`. - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册