F1115 grpc_serde.cc:107: AppendZeroCopy varname:embedding_w@GRAD, vlen:
Created by: webary
- 版本、环境信息: 1)PaddlePaddle版本:v1.4.1
- 训练信息 1)多机
- 复现信息:使用Adagrad优化器,embedding使用稀疏更新is_sparse=True时就报错
Fri Nov 15 18:09:25 2019[1,0]:F1115 18:09:25.348170 2010 grpc_serde.cc:107] AppendZeroCopy varname:embedding_w@GRAD, vlen:5213934592 Fri Nov 15 18:09:25 2019[1,0]:*** Check failure stack trace: *** Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e76ba4c0d google::LogMessage::Fail() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e76ba86bc google::LogMessage::SendToLog() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e76ba4733 google::LogMessage::Flush() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e76ba9bce google::LogMessageFatal::~LogMessageFatal() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e777bd44a paddle::operators::distributed::SerializeToByteBuffer() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e777aa3cc _ZNSt17_Function_handlerIFSt10unique_ptrIN6paddle8platform13EnforceNotMetESt14default_deleteIS3_EEvESt17reference_wrapperISt12_Bind_simpleIFS8_IZNS1_9framework10ThreadPool18RunAndGetExceptionIZNS1_9operators11distributed10GRPCClient12AsyncSendVarERKSsRKNS2_13DeviceContextERKNSA_5ScopeESH_lEUlvE_EESt6futureIS6_ET_EUlvE_EvEEEE9_M_invokeERKSt9_Any_data Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e7738ce0a std::_Function_handler<>::_M_invoke() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e76b44ac7 std::__future_base::_State_base::_M_do_set() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0eebb70973 __GI___pthread_once Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e777ada74 _ZNSt13__future_base11_Task_stateIZN6paddle9framework10ThreadPool18RunAndGetExceptionIZNS1_9operators11distributed10GRPCClient12AsyncSendVarERKSsRKNS1_8platform13DeviceContextERKNS2_5ScopeES9_lEUlvE_EESt6futureISt10unique_ptrINSA_13EnforceNotMetESt14default_deleteISK_EEET_EUlvE_SaIiEFSN_vEE6_M_runEv Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0e77c62db9 paddle::framework::ThreadPool::TaskLoop() Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0ee38058a0 execute_native_thread_routine Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0eebb6b1c3 start_thread Fri Nov 15 18:09:25 2019[1,0]: @ 0x7f0eeb19312d __clone Fri Nov 15 18:09:25 2019[1,0]: @ (nil) (unknown) Fri Nov 15 18:11:50 2019[1,0]:.//paddle/start_trainer.sh: line 176: 1906 Aborted (core dumped) python train.py --emb_dict_size=800000 --num_classes=7 --name=age_v1.0.2 --num_passes=200 --lr=0.01 --patience=7 --batch_size=128 --hidden_layer=128 --optimizer=Adagrad --use_cuda=0 --data_fmt=tab3 --is_binary=1 Fri Nov 15 18:11:50 2019[1,0]:[INFO]: exit 134