operation is not permitted when run model in idl cluster.
Created by: zhouyuan0828
使用idl集群跑数据,出错信息如下,是指没有什么权限?model 为 空 job如下: http://nmg01-hpc-controller.nmg01.baidu.com:8090/job/i-205638/
[zhouyuan@nj02-cm-as-cache04.nj02.baidu.com hadoop-client-pfs]$ ./hadoop-mulan/bin/hadoop fs -ls /app/idl/idl-dl/paddle/zhouyuan/seqToseq/output_20161202161141/output Found 2 items drwxr-xr-x 3 paddle_demo paddle_demo 0 2016-12-02 16:13 /app/idl/idl-dl/paddle/zhouyuan/seqToseq/output_20161202161141/output/rank-00000 drwxr-xr-x 3 paddle_demo paddle_demo 0 2016-12-02 16:13 /app/idl/idl-dl/paddle/zhouyuan/seqToseq/output_20161202161141/output/rank-0000
报错信息如下: [12-02 16:13:31] [0] + '[' wangyanfei01@baidu.com '!=' '' ']' [12-02 16:13:31] [0] + check_return 'run check slow node daemon failed' [12-02 16:13:31] [0] + '[' 0 -ne 0 ']' [12-02 16:13:31] [0] + python ./checkslownode.py job.205739.instances.nmg01-hpc-w0162.nmg01.baidu.com wangyanfei01@baidu.com [12-02 16:13:31] [0] supervise.rtlc: no process killed [12-02 16:13:31] [0] rtlc: no process killed [12-02 16:13:31] [0] ./load.sh: line 5: ulimit: core file size: cannot modify limit: Operation not permitted [12-02 16:13:32] [0] + check_return 'mpirun train.sh failed' [12-02 16:13:32] [0] + '[' 1 -ne 0 ']' [12-02 16:13:32] [0] + echo '[job.sh : 122] [main]' [12-02 16:13:32] [0] [job.sh : 122] [main] [12-02 16:13:32] [0] + echo '[FATAL]: mpirun train.sh failed' [12-02 16:13:32] [0] [FATAL]: mpirun train.sh failed [12-02 16:13:32] [0] + get_stack [12-02 16:13:32] [0] + set +x [12-02 16:13:32] [0] [12-02 16:13:32] [0] *Shell Script Stack Trace [12-02 16:13:32] [0] @: [./log.sh: 55] check_return [12-02 16:13:32] [0] @: [job.sh: 122] main [12-02 16:13:32] [0] [12-02 16:13:32] [0] + exit 1