Collect the config for checking nan and inf in module or op tensor.
Args:
* enable: Whether to enable Tensor's value detection function. The default value is False, which means that these tools will never be used.
* debug_mode: Debug mode,There are 6 kinds of debug mode.
CHECK_NAN_INF_AND_ABORT(default): Print or save Tensor key information with NaN/Inf and interrupt the program
CHECK_NAN_INF: Print or save Tensor critical information with NaN/Inf, but continue to run
CHECK_ALL_AND_ABORT: Print or save the output Tensor key information of all operators, and interrupt the program if NaN/Inf occurs
CHECK_ALL_FOR_OVERFLOW: Check the output of the FP32 operator, print or save key Tensor information that exceeds the FP16 representation range (overflow, underflow)
CHECK_ALL: Print or save output Tensor key information for all operators
DUMP_ALL: Saves all Tensor data. This mode does not print on the terminal
* dump_dir: The collection data storage path. If it is None, it will be directly printed to the terminal
* checked_op_list: A list of operators you want to check
* skipped_op_list: A list of operators to skip checking
* debug_step: The iteration scope of debugging
* stack_height_limit: The maximum depth of the call stack, and supports printing the call stack at the error location. The specific scheme needs to be investigated
* enable_traceback_filtering: Whether to filter the traceback. The main purpose is to filter out the internal code call stack of the framework and only display the user code call stack
enable_tensor_checker(checker_config) is enables model level accuracy checking, which is used together with disables_tensor_checker() to achieve model level precision checking through the combination of these two APIs, checking the output Tensors of all operators within the specified range.
Attention:
* If disable is called before loss. backward()_tensor_checker(), the gradient operator is not checked;
* If disable is called before optimizer.step() tensor_checker(), the optimizer and other weight update related operators will not be checked
x = paddle.to_tensor([1, 0, 3], place=paddle.CPUPlace(), dtype='float32', stop_gradient=False)
y = paddle.to_tensor([0.2, 0, 0.5], place=paddle.CPUPlace(), dtype='float32')
res = paddle.pow(x, y)
paddle.autograd.backward(res, retain_graph=True)
paddle.amp.debugging.disable_tensor_checker()
"""
ifchecker_config.check():
checker_config.run()
else:
checker_config.end()
defdisable_tensor_checker():
"""
disable_tensor_checker() to disables the accuracy checking, which is used together with enables_tensor_checker(config) to achieve model level precision checking through the combination of these two APIs, checking the output Tensors of all operators within the specified range.
Attention:
* If disable_tensor_checker() is called before loss.backward(), the gradient operator is not checked;
* If disable_tensor_checker() is called before optimizer.step(), the optimizer and other weight update related operators will not be checked