设置学习率为0.01,模型训练到30轮的时候出core;设置学习率为0.02,训练到15轮的时候出core
Created by: znss2012
MPI集群,设置学习率为0.01,模型训练到30轮的时候出core;设置学习率为0.02,训练到15轮的时候出core。有时间麻烦帮忙看下。
Sun Nov 12 21:02:14 2017[1,9]:*** SIGSEGV (@0x8) received by PID 44133 (TID 0x7f9f0e1fc700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa66a3e1160 (unknown) Sun Nov 12 21:02:14 2017[1,71]: @ 0x7fd5f16201c3 start_thread Sun Nov 12 21:02:14 2017[1,71]: @ 0x7fd5f0c4812d __clone Sun Nov 12 21:02:14 2017[1,71]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c88dc0012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c89a4b17a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c88ec70ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae92cca012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae9395517a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17d773b012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae92dd10ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17d83c617a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa6677ab012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17d78420ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c87f1d8a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae91e278a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa66843617a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa6678b20ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae958f81c3 start_thread Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c8b9ee1c3 start_thread Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17d68988a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17da3691c3 start_thread Sun Nov 12 21:02:14 2017[1,78]: @ 0x7fae94f2012d __clone Sun Nov 12 21:02:14 2017[1,78]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,74]: @ 0x7f17d999112d __clone Sun Nov 12 21:02:14 2017[1,74]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,7]:*** SIGSEGV (@0x8) received by PID 31649 (TID 0x7f70217fb700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f77833f0160 (unknown) Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa6669088a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa66a3d91c3 start_thread Sun Nov 12 21:02:14 2017[1,9]: @ 0x7fa669a0112d __clone Sun Nov 12 21:02:14 2017[1,9]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,89]: @ 0x7f0c8b01612d __clone Sun Nov 12 21:02:14 2017[1,89]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f77807ba012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f778144517a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f77808c10ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f777f9178a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f77833e81c3 start_thread Sun Nov 12 21:02:14 2017[1,7]: @ 0x7f7782a1012d __clone Sun Nov 12 21:02:14 2017[1,7]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,69]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,69]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,85]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,85]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,96]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,85]:*** SIGSEGV (@0x8) received by PID 40980 (TID 0x7f8eb17fb700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f9679673160 (unknown) Sun Nov 12 21:02:14 2017[1,96]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,69]:*** SIGSEGV (@0x8) received by PID 47912 (TID 0x7f10c18a0700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f1700d21160 (unknown) Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f9676a3d012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f96776c817a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f9676b440ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,96]:*** SIGSEGV (@0x8) received by PID 36670 (TID 0x7fa5097fb700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f9675b9a8a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9e810160 (unknown) Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f967966b1c3 start_thread Sun Nov 12 21:02:14 2017[1,85]: @ 0x7f9678c9312d __clone Sun Nov 12 21:02:14 2017[1,85]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f16fe0eb012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f16fed7617a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f16fe1f20ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f16fd2488a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f1700d191c3 start_thread Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9bbda012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9c86517a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9bce10ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9ad378a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9e8081c3 start_thread Sun Nov 12 21:02:14 2017[1,96]: @ 0x7fac9de3012d __clone Sun Nov 12 21:02:14 2017[1,69]: @ 0x7f170034112d __clone Sun Nov 12 21:02:14 2017[1,69]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,96]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,67]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,67]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,67]:*** SIGSEGV (@0x8) received by PID 35085 (TID 0x7f283cdfa700) from PID 8; stack trace: *** Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e823d4160 (unknown) Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e7f79e012 paddle::ProtoClient::recv() Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e8042917a paddle::ParameterClient2::sendParallel() Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e7f8a50ec _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e7e8fb8a0 execute_native_thread_routine Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e823cc1c3 start_thread Sun Nov 12 21:02:14 2017[1,67]: @ 0x7f2e819f412d __clone Sun Nov 12 21:02:14 2017[1,67]: @ 0x0 (unknown) Sun Nov 12 21:02:14 2017[1,99]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,90]:*** Aborted at 1510491734 (unix time) try "date -d @1510491734" if you are using GNU date *** Sun Nov 12 21:02:14 2017[1,99]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:15 2017[1,90]:PC: @ 0x0 (unknown) Sun Nov 12 21:02:15 2017[1,99]:*** SIGSEGV (@0x8) received by PID 32102 (TID 0x7f68a57fb700) from PID 8; stack trace: ***