可以使用单测中的[check_nan_inf_base.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/check_nan_inf_base.py)文件进行试用。该脚本已设置FLAGS_check_nan_inf=1打开check nan inf功能。直接python check_nan_inf_base.py执行即可。
对于浮点类型操作,正常数值num,无穷大inf,非数值nan有如下运行关系。更详细可查看[INF, NAN, and NULL](https://wiki.analytica.com/index.php?title=INF,_NAN,_and_NULL_-_Exception_values&title=INF,_NAN,_and_NULL_-_Exception_values)
The three above environment variables respectively indicate skipping the checks of op, op role, and variables in op.
##### 2.1 Skip op check
In the following settings, the previous one only skips the nan inf check of the mul op, and the latter setting skips the check of mul and softmax_with_cross_entropy op.
`Note`: Op skip only accepts exact matches. To skip the softmax_with_cross_entropy check, you cannot set the environment variable to softmax_with or with_cross for fuzzy matching.
You must set the full softmax_with_cross_entropy name.
The currently accepted types are: forward, backward, optimize, rpc, dist, lrsched, loss, default.
In fp32 training, it is not necessary to skip the nan inf check of the op role.
However in `fp16` training, inf will be corrected in the backpropagation, so it is generally necessary to skip the backward check, which is why this feature is added.
In the following setting, the previous setting only skips the backward check, and the latter setting skips both the backward and optimize checks.
Same as above, the op role skipping only supports exact matching.
##### 2.3 Skip the checking of variables in the specified op
In the following setting, the former skip the fc_0.tmp_0 variable in mul op, and the latter setting skips the fc_0.tmp_0 and fc_0.tmp_1 variables in mul op and the new_relative variable in dropout op.
`Note`: In the specified op variable check, only exact matching is accepted for op, and fuzzy matching is used for variables.
For example, the fc_0.tmp_0 and fc_0.tmp_1 variables in mul op mentioned above can be matched by c_0.tmp
## <span id="test">Test</span>
You can use the [check_nan_inf_base.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/check_nan_inf_base.py) file for test.
The script has set FLAGS_check_nan_inf=1 to enable the nan inf check. Just execute `python check_nan_inf_base.py` to test.
#### 1. GPU log information
The check information of the GPU is printed in the GPU, so the nan inf information appears in front of the error information stack.
The tool will print the name of the op and tensor which find inf or nan. Each block will print the three values of nan, inf, and num.
And will print the number of nan, inf, and num in the respective block.
Test environment: v100 32G single card, Resnet50 model, Imagenet dataset.
`The speed may be different under different environments and different model datasets. The following speeds are only for reference`
> Without check nan inf speed, 307.7 images/s per card.
Check nan inf speed, 250.2 images/s per card.
## <span id="printciple">Principle</span>
#### 1. Tool principle
For floating-point operations, num(normal numeric), inf(infinite), and nan(not a number) have the following relations.
More details can be found in [INF, NAN, and NULL](https://wiki.analytica.com/index.php?title=INF,_NAN,_and_NULL_-_Exception_values&title=INF,_NAN,_and_NULL_-_Exception_values)
```
nan - nan = nan, inf - inf = nan, num - num = 0,
nan + nan = nan, inf + inf = inf, nan + 0 = nan,
inf + 0 = inf, nan + inf = nan, 0 + 0 = 0
```
Based on this, using the following operation and only check the sum is nan or inf is enough.
```
for(value:values): sum += (value-value)
```
***`Note`: The Advanced use, Speed, and Principles of this document are currently only effective in the develop version of the Paddle, and will be released with the 1.7 version of the Paddle.
It is not recommended to use the previous version of the check nan inf tool on the GPU, the speed of old tools is 0.25 images/s,will slow down the training speed by a thousand times.***