clip op在多机下报错
Created by: ccmeteorljh
paddle version:1.3.0 复现模型: deep_attention_matching实现多机版本; https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/deep_attention_matching_net/train_and_evaluate.py 报错如下:
get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call.
I0321 13:13:49.156570 62757 grpc_server.cc:430] Server listening on 127.0.0.1:9122 selected port: 9122
F0321 13:14:45.583297 63121 listen_and_serv_op.cc:74] run sub program:60 error Invoke operator clip error.
Python Callstacks:
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1317, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 1928, in _append_pserver_non_opt_ops
attrs=opt_op.all_attrs())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 775, in __append_optimize_op__
self._append_pserver_non_opt_ops(block, op)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/transpiler/distribute_transpiler.py", line 845, in get_pserver_program
lr_ops)
File "/home/work/ljh_test/baidu/paddle/test/cts_test/dist_base.py", line 72, in run_pserver
pserver_prog = t.get_pserver_program(current_endpoint)
File "/home/work/ljh_test/baidu/paddle/test/cts_test/dist_base.py", line 390, in runtime_main
model.run_pserver(endpoints, trainers, current_endpoint, trainer_id, run_params)
File "dist_deep_attention_matching.py", line 175, in <module>
runtime_main(TestDistDeepAttentionMatching)
C++ Callstacks:
holder_ should not be null
Tensor holds no memory. Call Tensor::mutable_data first. at [/paddle/paddle/fluid/framework/tensor.cc:23]
PaddlePaddle Call Stacks:
0 0x7faf69e5ce2dp void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 365
1 0x7faf69e5d177p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7faf6b8ca9a6p paddle::framework::Tensor::check_memory_size() const + 182
3 0x7faf6a040b8ap paddle::operators::ClipKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 858
4 0x7faf6a041283p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::ClipKernel<paddle::platform::CPUDeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 35
5 0x7faf6b86dbb3p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 659
6 0x7faf6b86b425p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7 0x7faf69f7c33ap paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 218
8 0x7faf6aa2c692p
9 0x7faf6aa3394ap std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::EnforceNotMet, std::default_delete<paddle::platform::EnforceNotMet> > > >::_M_invoke(std::_Any_data const&) + 42
10 0x7faf6a9f65a7p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) + 39
11 0x7fafd3abaa99p
12 0x7faf6aa2b7d2p
13 0x7faf6b8c32c9p paddle::framework::ThreadPool::TaskLoop() + 1689
14 0x7faf850f57e0p
15 0x7fafd3ab36bap
16 0x7fafd37e941dp clone + 109
*** Check failure stack trace: ***
@ 0x7faf69f3731d google::LogMessage::Fail()
@ 0x7faf69f3adcc google::LogMessage::SendToLog()
@ 0x7faf69f36e43 google::LogMessage::Flush()
@ 0x7faf69f3c2de google::LogMessageFatal::~LogMessageFatal()
@ 0x7faf6aa2c730 _ZNSt17_Function_handlerIFSt10unique_ptrIN6paddle8platform13EnforceNotMetESt14default_deleteIS3_EEvESt17reference_wrapperISt12_Bind_simpleIFS8_IZNS1_9framework10ThreadPool18RunAndGetExceptionIZNS1_9operatorsL21ParallelExecuteBlocksERKSt6vectorImSaImEEPNSA_8ExecutorERKSE_ISt10shared_ptrINSA_22ExecutorPrepareContextEESaISN_EEPNSA_11ProgramDescEPNSA_5ScopeEEUlvE_EESt6futureIS6_ET_EUlvE_EvEEEE9_M_invokeERKSt9_Any_data
@ 0x7faf6aa3394a std::_Function_handler<>::_M_invoke()
@ 0x7faf6a9f65a7 std::__future_base::_State_base::_M_do_set()
@ 0x7fafd3abaa99 __pthread_once_slow
@ 0x7faf6aa2b7d2 _ZNSt13__future_base11_Task_stateIZN6paddle9framework10ThreadPool18RunAndGetExceptionIZNS1_9operatorsL21ParallelExecuteBlocksERKSt6vectorImSaImEEPNS2_8ExecutorERKS6_ISt10shared_ptrINS2_22ExecutorPrepareContextEESaISF_EEPNS2_11ProgramDescEPNS2_5ScopeEEUlvE_EESt6futureISt10unique_ptrINS1_8platform13EnforceNotMetESt14default_deleteISS_EEET_EUlvE_SaIiEFSV_vEE6_M_runEv
@ 0x7faf6b8c32c9 paddle::framework::ThreadPool::TaskLoop()
@ 0x7faf850f57e0 execute_native_thread_routine
@ 0x7fafd3ab36ba start_thread
@ 0x7fafd37e941d clone
@ (nil) (unknown)
```
**去掉包含clip的如下代码后可正确运行:**
```bash
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
max=1.0, min=-1.0))
```