mpirun noticed that process rank 140 with PID 363 on node 10.87.102.19 exited on signal 9 (Killed).
Created by: Emma-Ding
报错信息如下,这是因为什么? 谢谢 id: app-user-20180906172439-7725
Thu Sep 6 17:32:25 2018[1,115]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,115]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,81]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,81]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,95]<stderr>:+ export LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,95]<stderr>:+ LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,16]<stderr>:+ export LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,55]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,55]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,91]<stderr>:+ LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,61]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,61]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ source ./server.env
Thu Sep 6 17:32:25 2018[1,125]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,125]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,95]<stderr>:+ GLOG_logtostderr=0
Thu Sep 6 17:32:25 2018[1,95]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,16]<stderr>:+ LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,91]<stderr>:+ GLOG_logtostderr=0
Thu Sep 6 17:32:25 2018[1,91]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,147]<stderr>:++ export PYTHONPATH=:thirdparty/thirdparty
Thu Sep 6 17:32:25 2018[1,147]<stderr>:++ PYTHONPATH=:thirdparty/thirdparty
Thu Sep 6 17:32:25 2018[1,95]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,16]<stderr>:+ GLOG_logtostderr=0
Thu Sep 6 17:32:25 2018[1,91]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,147]<stderr>:++ export LD_LIBRARY_PATH=./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,147]<stderr>:++ LD_LIBRARY_PATH=./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,16]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,16]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ '[' -z tcp ']'
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ export LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ LD_LIBRARY_PATH=/usr/local/lib:./python27-gcc482/lib::/usr/local/openmpi/lib:/home/normandy/nma/tools/hadoop-client/hadoop/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhce/lib:/home/normandy/nma/tools/hadoop-client/hadoop/libhdfs:/home/normandy/nma/tools/hadoop-client/hadoop/../java6/jre/lib/amd64/server
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ GLOG_logtostderr=0
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ GLOG_log_dir=./log
Thu Sep 6 17:32:25 2018[1,147]<stderr>:+ ./paddle_pserver2 --num_gradient_servers=150 --nics=xgbe0 --port=03538 --ports_num=1 --ports_num_for_sparse=1 --rdma_tcp=tcp --comment=paddle_cluster_job
--------------------------------------------------------------------------
mpirun noticed that process rank 140 with PID 363 on node 10.87.102.19 exited on signal 9 (Killed).
--------------------------------------------------------------------------