集群训练问题,paddle.py和conf.py导致No module named trainer.config_parser
Created by: shijieheping
重现方法如下:
/root/pp/git/paddle/benchmark/paddle/image下,执行 paddle train --job=time --config=alexnet.py --use_gpu=True --trainer_count=2 --log_period=10 --test_period=100 --config_args=batch_size=64 顺利运行
复制集群训练配置文件到该目录, cp /data/PaddlePaddle/git/paddle/paddle/scripts/cluster_train/paddle.py ./ cp /data/PaddlePaddle/git/paddle/paddle/scripts/cluster_train/conf.py ./ 再执行 paddle train --job=time --config=alexnet.py --use_gpu=True --trainer_count=2 --log_period=10 --test_period=100 --config_args=batch_size=64 会报错 [root@g1 image]# paddle train --job=time --config=alexnet.py --use_gpu=True --trainer_count=2 --log_period=10 --test_period=100 --config_args=batch_size=64 Paddle release a new version 0.9.0, you can get the install package in http://www.paddlepaddle.org I0215 11:06:19.916949 38513 Util.cpp:160] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --job=time --config=alexnet.py --use_gpu=True --trainer_count=2 --log_period=10 --test_period=100 --config_args=batch_size=64 F0215 11:06:22.954466 38513 PythonUtil.cpp:186] Check failed: (module) != nullptr Current PYTHONPATH: ['/usr/local/opt/paddle/bin', '/root/pp/git/paddle/benchmark/paddle/image', '/data/PaddlePaddle/git/paddle/benchmark/paddle/image', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/root/.local/lib/python2.7/site-packages', '/usr/lib64/python2.7/site-packages', '/usr/lib/python2.7/site-packages'] Python Error: <type 'exceptions.ImportError'> : No module named trainer.config_parser Python Callstack: Import paddle.trainer.config_parserError *** Check failure stack trace: *** @ 0x93ab68 google::LogMessage::Fail() @ 0x93aac4 google::LogMessage::SendToLog() @ 0x93a448 google::LogMessage::Flush() @ 0x93d4ef google::LogMessageFatal::~LogMessageFatal() @ 0x87c4bb paddle::py::import() @ 0x87c52e paddle::callPythonFuncRetPyObj() @ 0x87c8fc paddle::callPythonFunc() @ 0x75fc0b paddle::TrainerConfigHelper::TrainerConfigHelper() @ 0x760244 paddle::TrainerConfigHelper::createFromFlags() @ 0x5b5b42 main @ 0x7f6eef23db35 __libc_start_main @ 0x5cc7f2 (unknown) @ (nil) (unknown) /usr/local/bin/paddle: line 109: 38513 Aborted ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
检查了一下python环境,是有这个包的 /root/miniconda2/lib/python2.7/site-packages/paddle/trainer/config_parser.py /root/miniconda2/lib/python2.7/site-packages/paddle/trainer/config_parser.pyc /usr/lib/python2.7/site-packages/paddle/trainer/config_parser.py /usr/lib/python2.7/site-packages/paddle/trainer/config_parser.pyc
这是什么原理?运行目录下的paddle.py文件会覆盖配置?为什么会说No module named trainer.config_parser呢
操作系统CentOS7.3,paddle版本0.9.0a