集群训练错误:Import paddle.trainer.config_parserERROR
Created by: Lionyan
步骤:
- 使用『一键安装』在三台CentOS6.3的机器上分别安装成功paddle,并且尝试单机模式运行demo/recommendation的例子:成功运行 2.尝试其中一台机器作为主机,另外的2台机器作为节点尝试集群训练: 2.1 主机上copy demo/recommendation下所有文件到/home/paddle/root下 2.2 主机上copy paddle/scripts/cluster_train/ 下三个脚本到/home/paddle/root下 2.3 修改/home/paddle/root/conf.py: HOSTS = [ "root@", "root@", ] ROOT_DIR = "/home/paddle/root"
除了这两个其他的都没有修改
2.4 三台机器都运行pip install fabric
- 主机上在/home/paddle/root目录下运行sh run
- 查看node节点下的log:
train.log:
/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/site-packages/Crypto/Util/number.py:57: PowmInsecureWarning: Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability.
_warn("Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability.", PowmInsecureWarning)
F1110 16:10:30.047618 29776 PythonUtil.cpp:191] Check failed: (module) != nullptr Current PYTHONPATH: ['/home/paddle/paddle_internal_release_tools/idl/paddle/output/opt/paddle/bin', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/site-packages/setuptools-18.2-py2.7.egg', '/home/paddle/root/JOB20161110161019', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python27.zip', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/plat-linux2', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/lib-tk', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/lib-old', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/lib-dynload', '/home/paddle/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/site-packages']
Python Error: * Check failure stack trace: *
@ 0x13a2218 google::LogMessage::Fail()
@ 0x13a2170 google::LogMessage::SendToLog()
@ 0x13a1c05 google::LogMessage::Flush()
@ 0x13a49c6 google::LogMessageFatal::~LogMessageFatal()
@ 0x808742 paddle::py::import()
@ 0x8087fc paddle::callPythonFuncRetPyObj()
@ 0x808bf1 paddle::callPythonFunc()
@ 0x6fde2e paddle::TrainerConfigHelper::TrainerConfigHelper()
@ 0x6fe48d paddle::TrainerConfigHelper::createFromFlags()
@ 0x58c77c main
@ 0x7f12b6c33bd5 __libc_start_main
@ 0x597d85 (unknown)
/home/paddle/paddle_internal_release_tools/idl/paddle/output/bin/paddle: line 81: 29776 Aborted ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
server.log: F1110 16:10:24.771934 28292 LightNetwork.cpp:68] Check failed: ioctl(sock, 0x8915, &ifr) >= 0 (-1 vs. 0)
* Check failure stack trace: *
@ 0x1361288 google::LogMessage::Fail() @ 0x13611e0 google::LogMessage::SendToLog() @ 0x1360c75 google::LogMessage::Flush() @ 0x1363a36 google::LogMessageFatal::~LogMessageFatal() @ 0x6dd2aa paddle::getIpAddr() @ 0x57bcaf main @ 0x7f684d97bbd5 __libc_start_main @ 0x586579 (unknown) /home/paddle/paddle_internal_release_tools/idl/paddle/output/bin/paddle: line 81: 28292 Aborted ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_pserver_main ${@:2}求帮助!