demo运行出错
Created by: Cristhine
paddle环境1.7.2 cuda9.0 cudnn7.5 如果使用命令/home/vis/duyuting/app/anaconda3/bin/python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml 会报错: Error: Failed to find dynamic library: libnccl.so ( /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/vis/duyuting/app/nccl_2.5.6-1+cuda10.0_x86_64/lib/libnccl.so) ) Please specify its path correctly using following ways: Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. For instance, issue command: export LD_LIBRARY_PATH=... Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:177) [operator < gen_nccl_id > error] 看起来是nccl问题 去官网下载了cuda9版本的nccl报错: Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.
- New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
- Recommended issue content: all error stack information [unhandled system error] at (/paddle/paddle/fluid/operators/distributed_ops/gen_nccl_id_op.cc:162) [operator < gen_nccl_id > error] 如果不使用分布式命令:/home/vis/duyuting/app/anaconda3/bin/python tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml 报错:Traceback (most recent call last): File "tools/train.py", line 133, in main(args) File "tools/train.py", line 59, in main fleet.init(role) File "/home/vis/duyuting/app/anaconda3/lib/python3.7/site-packages/paddle/fluid/incubate/fleet/base/fleet_base.py", line 202, in init self._role_maker.generate_role() File "/home/vis/duyuting/app/anaconda3/lib/python3.7/site-packages/paddle/fluid/incubate/fleet/base/role_maker.py", line 500, in generate_role assert self._worker_endpoints is not None, "can't find PADDLE_TRAINER_ENDPOINTS" 这个库难道不能单gpu运行????