Paddle在集群上训练,网络结构构建好后,似乎在初始化过程中,一直报这种错误。
Created by: CruiseSun
Mon Oct 8 11:12:55 2018[1,20]:./train.sh: line 239: 62525 Segmentation fault (core dumped) python27-gcc482/bin/python conf/trainer_config.conf Mon Oct 8 11:12:55 2018[1,20]:+ '[' 139 -ne 0 ']' Mon Oct 8 11:12:55 2018[1,20]:+ kill_pserver2_exit Mon Oct 8 11:12:55 2018[1,20]:+ ps aux Mon Oct 8 11:12:55 2018[1,20]:+ grep paddle_pserver2 Mon Oct 8 11:12:55 2018[1,20]:+ grep paddle_cluster_job Mon Oct 8 11:12:55 2018[1,20]:+ grep -v grep Mon Oct 8 11:12:55 2018[1,20]:+ cut -c10-14 Mon Oct 8 11:12:55 2018[1,20]:+ xargs kill -9 Mon Oct 8 11:12:55 2018[1,20]:usage: kill [ -s signal | -p ] [ -a ] pid ... Mon Oct 8 11:12:55 2018[1,20]: kill -l [ signal ] Mon Oct 8 11:12:55 2018[1,20]:+ log_fatal 'paddle_trainer failed kill paddle_pserver2 and exit' Mon Oct 8 11:12:55 2018[1,20]:+ echo '[./common.sh : 399] [kill_pserver2_exit]' Mon Oct 8 11:12:55 2018[1,20]:[./common.sh : 399] [kill_pserver2_exit] Mon Oct 8 11:12:55 2018[1,20]:+ echo '[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit' Mon Oct 8 11:12:55 2018[1,20]:[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit Mon Oct 8 11:12:55 2018[1,20]:+ get_stack Mon Oct 8 11:12:55 2018[1,20]:+ set +x Mon Oct 8 11:12:55 2018[1,20]: Mon Oct 8 11:12:55 2018[1,20]:*Shell Script Stack Trace Mon Oct 8 11:12:55 2018[1,20]: @: [./log.sh: 41] log_fatal Mon Oct 8 11:12:55 2018[1,20]: @: [./common.sh: 399] kill_pserver2_exit Mon Oct 8 11:12:55 2018[1,20]: @: [./train.sh: 242] main Mon Oct 8 11:12:55 2018[1,20]: Mon Oct 8 11:12:55 2018[1,20]:+ exit 1