图像分类dpn模型8卡运行报错
Created by: xiegegege
在paddle1.5分支下,图像分类dpn模型8卡运行报错,报错信息如下:
Traceback (most recent call last): File "train.py", line 588, in main() File "train.py", line 584, in main train(args) File "train.py", line 469, in train loss, acc1, acc5, lr = train_exe.run(fetch_list=train_fetch_list) File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/parallel_executor.py", line 280, in run return_numpy=return_numpy) File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 665, in run return_numpy=return_numpy) File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 527, in _run_parallel exe.run(fetch_var_names, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Enforce failed. Expected infer_next_address == next_address, but received infer_next_address:0x10126df9040 != next_address:0x1012bc4de40. The address is not consistent. at [/ssd1/xiege/paddle_ce/Paddle/paddle/fluid/framework/details/fused_all_reduce_op_handle.cc:135]