稍大数据量paddle集群训练任务失败
Created by: zhangyuzhen1990
报错信息: Mon Sep 11 21:04:53 2017[1,54]:Pass 0, Batch 0, Cost 0.911084, {'auc_evaluator_0': 0.3861544728279114, 'classification_error_evaluator': 0.5600000023841858} Mon Sep 11 21:05:29 2017[1,36]:*** Error in `python27-gcc482/bin/python': corrupted double-linked list: 0x0000000011f4c9c0 *** Tue Sep 12 11:12:34 2017[1,0]:*** Aborted at 1505185954 (unix time) try "date -d @1505185954" if you are using GNU date *** 错误描述: 54w+训练数据(108个part数据),训练出错,错误信息如上,当使用一个part小数据量在集群以及单机训练均能正确运行