paddlecloud mpi fleet 分布式训练 optimizer报错
Created by: maosengshulei
- 版本、环境信息: 1)PaddlePaddle版本:paddlepaddle 1.5.2
代码:
total_loss, ctr_loss, dura_loss, auc_var, batch_auc_var, accuracy, ctr_predict, dura_predict, data_list = DNN(args, feat_list.features) strategy = DistributeTranspilerConfig() strategy.sync_mode = False optimizer = fluid.optimizer.Adam(learning_rate=1e-4) optimizer = fleet.distributed_optimizer(optimizer, strategy) optimizer.minimize(total_loss)
报错:
Traceback (most recent call last): Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "cluster_dnn_train.py", line 182, in <module> Thu Oct 31 15:25:02 2019[1,4]<stdout>: train() Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "cluster_dnn_train.py", line 131, in train Thu Oct 31 15:25:02 2019[1,4]<stdout>: optimizer.minimize(total_loss) Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "/home/disk1/task_data/history/20191031/12.app-user-20191031151954-36721--shulei_msd_mmoe_dnn_v1_20191031_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/incubate/fleet/parameter_server/distribute_transpiler/__init__.py", line 345, in minimize Thu Oct 31 15:25:02 2019[1,4]<stdout>: fleet._transpile(config=self._strategy) Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "/home/disk1/task_data/history/20191031/12.app-user-20191031151954-36721--shulei_msd_mmoe_dnn_v1_20191031_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/incubate/fleet/parameter_server/distribute_transpiler/__init__.py", line 237, in _transpile Thu Oct 31 15:25:02 2019[1,4]<stdout>: current_endpoint=self.server_endpoints()[self.server_index()]) Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "/home/disk1/task_data/history/20191031/12.app-user-20191031151954-36721--shulei_msd_mmoe_dnn_v1_20191031_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 620, in transpile Thu Oct 31 15:25:02 2019[1,4]<stdout>: program, param_varname, height_sections, eps, table_names) Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "/home/disk1/task_data/history/20191031/12.app-user-20191031151954-36721--shulei_msd_mmoe_dnn_v1_20191031_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 325, in _update_remote_sparse_update_op Thu Oct 31 15:25:02 2019[1,4]<stdout>: if param_varname in op.input_arg_names and op_type == "": Thu Oct 31 15:25:02 2019[1,4]<stdout>: File "/home/disk1/task_data/history/20191031/12.app-user-20191031151954-36721--shulei_msd_mmoe_dnn_v1_20191031_paddlecloud/logs/workspace/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1245, in input_arg_names Thu Oct 31 15:25:02 2019[1,4]<stdout>: return self.desc.input_arg_names() Thu Oct 31 15:25:02 2019[1,4]<stdout>:ValueError: vector::_M_range_insert