raiseTypeError("dtype only support float16 or bfloat16.")
ifoptimizersisnotNone:
ifoptimizersisnotNone:
# check optimizers
# check optimizers
...
@@ -741,6 +789,7 @@ def decorate(
...
@@ -741,6 +789,7 @@ def decorate(
master_weight=None,
master_weight=None,
save_dtype=None,
save_dtype=None,
master_grad=False,
master_grad=False,
excluded_layers=None,
):
):
"""
"""
Decorate models and optimizers for auto-mixed-precision. When level is O1(amp), the decorate will do nothing.
Decorate models and optimizers for auto-mixed-precision. When level is O1(amp), the decorate will do nothing.
...
@@ -757,8 +806,10 @@ def decorate(
...
@@ -757,8 +806,10 @@ def decorate(
master_weight(bool, optinal): For level='O2', whether to use multi-precision during weight updating. If master_weight is None, in O2 level optimizer will use multi-precision. Default is None.
master_weight(bool, optinal): For level='O2', whether to use multi-precision during weight updating. If master_weight is None, in O2 level optimizer will use multi-precision. Default is None.
save_dtype(float, optional): The save model parameter dtype when use `paddle.save` or `paddle.jit.save`,it should be float16, bfloat16, float32, float64 or None.
save_dtype(float, optional): The save model parameter dtype when use `paddle.save` or `paddle.jit.save`,it should be float16, bfloat16, float32, float64 or None.
The save_dtype will not change model parameters dtype, it just change the state_dict dtype. When save_dtype is None, the save dtype is same as model dtype. Default is None.
The save_dtype will not change model parameters dtype, it just change the state_dict dtype. When save_dtype is None, the save dtype is same as model dtype. Default is None.
master_grad(bool, optional): For level='O2', whether to use FP32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If it is enabled, the weight
master_grad(bool, optional): For level='O2', whether to use float32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If master_grad is enabled, the weight
gradients will be FP32 dtype after the backpropagation. Default is False.
gradients will be float32 dtype after the backpropagation. Default is False, there is only float16 weight gradients.
excluded_layers(Layer|list of Layer, optional): Specify the layers not to be decorated. The weights of these layers will always keep float32 when level is O2. `excluded_layers` can be specified as
an Layer instance/type or a list of Layer instances/types. Default is None, the weights of the whole model will be casted to float16 or bfloat16.