Created by: chenwhql
Required(必填, multiple choices, two at most)
-
PR type(PR 类型) is ( B ): A. New features(新功能)---------------- D. Performance optimization(性能优化) B. Bug fixes(问题修复)------------------ E. Breaking changes(向后不兼容的改变) C. Function optimization(功能优化)------F. Others(其它)
-
PR changes(改动点)is ( D ): A. OPs(operators)---------------------- C. Docs(文档) B. APIs(接口)--------------------------- D. Others(其它)
-
Use one sentence to describe what this PR does.(简述本次PR的目的和改动)
This PR fix random using problem in dy2static unittest test_bert
, fix the problem in https://github.com/PaddlePaddle/Paddle/pull/24692.
Optional(选填, If None, please delete it)
- Describe what this PR does in detail. If this PR fixes an issue, please give the issue id.
test_bert
data reader shows different generate result with the same seed, this is very strange, so we suspect that the DataLoader made changes to the input data.
But after testing, I found that these are the problems of numpy.random
and python random
, not the problem of DataLoader
-
numpy.random.seed
is not thread-safe, butfluid.DataLoader
use child thread to speed up data reading, the seed may be different in different thread even if you set the seed bynp.random.seed
. related doc:
- Numpy: random seed and multithreading causes differing results
- Differences between numpy.random and random.random in Python
- python
random
has bug in python2, we should avoid using it in unittest. related doc:
- for this case, we can use
numpy.random.RandomState
-
If you modified docs, please make sure that both Chinese and English docs were modified and provide a preview screenshot. (文档必填)
-
Please write down other information you want to tell reviewers.