cluter_trian集群预测任务报错求助
Created by: HugoLian
提交cluster_run.sh后,系统报错:
connect to receiver: yqxx-idl-gpu-offlinexxxx.yqxxx.baidu.com:92xx (隐藏部分信息)
compressing thirdparty files
finished to pack request
starting to submit to server
Traceback (most recent call last):
File "/home/iknow/lianjie/dnn/paddle_cluster/output/submit.py", line 288, in <module>
no_prefix_train_args_dict, ).run()
File "/home/iknow/lianjie/dnn/paddle_cluster/output/submit.py", line 73, in run
self._do_poster_request()
File "/home/iknow/lianjie/dnn/paddle_cluster/output/submit.py", line 162, in _do_poster_request
print urllib2.urlopen(request).read()
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "build/bdist.linux-x86_64/egg/poster/streaminghttp.py", line 142, in http_open
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/urllib2.py", line 1200, in do_open
r = h.getresponse(buffering=True)
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/httplib.py", line 1132, in getresponse
response.begin()
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/home/iknow/lianjie/dnn/paddle_internal_release_tools/idl/paddle/output/python27-gcc482/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
其中cluster_run.sh是这么配置的:
model_path=hdfs://nj01-nanxxxxxx.xxxxx.baidu.com:5xxxx/app/ns/iknow/spam/lianjie/paddle/model_lstm/pass-00003
HDFS_output_path=hdfs://nj01-nanxxxxxx.xxxxx.baidu.com:5xxxx/app/ns/iknow/spam/lianjie/paddle/output
HDFS_input_path=hdfs://nj01-nanxxxxxx.xxxxx.baidu.com:5xxxx/app/ns/iknow/spam/lianjie/paddle/input
paddle cluster_train \
--config cluster_config_lstm.py \
--use_gpu false \
--time_limit 00:30:00 \
--submitter hugolian \
--num_nodes 4 \
--job_priority high \
--trainer_count 4 \
--init_model_path ${model_path} \
--test_data_path ${HDFS_input_path} \
--log_period 100 \
--dot_period 10 \
--saving_period 1 \
--config_args is_predict=1 \
--where nmg01-idl-dl-cpu-10G_cluster \
--job_name paddle_platform_HighRisk_lstm \
--thirdparty thirdparty \
--output_path ${HDFS_output_path}
cluster_config_lstm.py 的主要部分切图是:
receiver的文件配置是(local_config.py): receivers = ["yqxx-idl-gpu-offlinexxxxx.baidu.com:9290", "yqxx-idl-gpu-offlinexxxxx.baidu.com:9295"] (随变选择了demo中的两行)
请问这个出错是什么问题呢?是参数没有传递正确?文件路径不正确,还是通信错误呢? 请各位大神帮我看一下!!