MPI集群跑机器翻译模型demo,中途报错Forwarding __bidirectional_gru_0___fw
Created by: Bodhi-Tree
Sat Aug 11 15:35:27 2018[1,1]: Sat Aug 11 15:35:27 2018[1,1]:Pass 0, Batch 368, Cost 50.880733, {'classification_error_evaluator': 0.7685352563858032} Sat Aug 11 15:35:27 2018[1,2]: Sat Aug 11 15:35:27 2018[1,2]:Pass 0, Batch 368, Cost 53.013538, {'classification_error_evaluator': 0.7777777910232544} Sat Aug 11 15:35:42 2018[1,19]:Thread [139910583289600] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,19]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,13]:Thread [140696326326016] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,13]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,19]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,13]: Sat Aug 11 15:35:42 2018[1,13]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,20]:Thread [140598679189248] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,20]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,24]:Thread [139964018677504] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,24]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,11]:Thread [140135036057344] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,11]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,10]:Thread [139842902214400] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,10]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,24]: Sat Aug 11 15:35:42 2018[1,24]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,11]: Sat Aug 11 15:35:42 2018[1,11]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,10]: Sat Aug 11 15:35:42 2018[1,10]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,18]:Thread [139962781460224] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,18]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,24]: Sat Aug 11 15:35:42 2018[1,24]:*** SIGFPE (@0x7f4d8904b460) received by PID 5470 (TID 0x7f4be99d5700) from PID 18446744071713371232; stack trace: *** Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]:*** SIGFPE (@0x7fdfff589460) received by PID 20103 (TID 0x7fdfae543700) from PID 18446744073698579552; stack trace: *** Sat Aug 11 15:35:42 2018[1,24]: Sat Aug 11 15:35:42 2018[1,24]: @ 0x7f4d8eb9f160 (unknown) Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]: @ 0x7fe0050dd160 (unknown) Sat Aug 11 15:35:42 2018[1,13]: Sat Aug 11 15:35:42 2018[1,13]:*** SIGFPE (@0x7ff6b4518460) received by PID 48799 (TID 0x7ff66a8d4700) from PID 18446744072439825504; stack trace: *** Sat Aug 11 15:35:42 2018[1,11]: Sat Aug 11 15:35:42 2018[1,11]:*** SIGFPE (@0x7f73f16fc460) received by PID 27779 (TID 0x7f73bb0b7700) from PID 18446744073465218144; stack trace: *** Sat Aug 11 15:35:42 2018[1,13]: Sat Aug 11 15:35:42 2018[1,13]: @ 0x7ff6ba06c160 (unknown) Sat Aug 11 15:35:42 2018[1,3]:Thread [140542075557632] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,3]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,19]:*** SIGFPE (@0x7f3fb9a26460) received by PID 51743 (TID 0x7f3f789e0700) from PID 18446744072529011808; stack trace: *** Sat Aug 11 15:35:42 2018[1,3]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,19]: @ 0x7f3fbf57a160 (unknown) Sat Aug 11 15:35:42 2018[1,21]:Thread [140391670904576] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,21]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,17]:Thread [140468229388032] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,17]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,11]: Sat Aug 11 15:35:42 2018[1,11]: @ 0x7f73f7250160 (unknown) Sat Aug 11 15:35:42 2018[1,10]: Sat Aug 11 15:35:42 2018[1,10]:*** SIGFPE (@0x7f3156ea8460) received by PID 35278 (TID 0x7f2fb682c700) from PID 1458209888; stack trace: *** Sat Aug 11 15:35:42 2018[1,10]: Sat Aug 11 15:35:42 2018[1,10]: @ 0x7f315c9fc160 (unknown) Sat Aug 11 15:35:42 2018[1,18]: Sat Aug 11 15:35:42 2018[1,18]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]: @ 0x7fdfff589460 (unknown) Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]: @ 0x7fdfff2fc20b hl_cpu_gru_forward<>() Sat Aug 11 15:35:42 2018[1,12]:Thread [140489601419008] Forwarding __bidirectional_gru_0___fw, Sat Aug 11 15:35:42 2018[1,12]:*** Aborted at 1533972942 (unix time) try "date -d @1533972942" if you are using GNU date *** Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]: @ 0x7fdfff2fbce7 paddle::GruCompute::forward<>() Sat Aug 11 15:35:42 2018[1,21]: Sat Aug 11 15:35:42 2018[1,21]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,24]: Sat Aug 11 15:35:42 2018[1,24]: @ 0x7f4d8904b460 (unknown) Sat Aug 11 15:35:42 2018[1,19]: Sat Aug 11 15:35:42 2018[1,19]: @ 0x7f3fb9a26460 (unknown) Sat Aug 11 15:35:42 2018[1,12]: Sat Aug 11 15:35:42 2018[1,12]:PC: @ 0x0 (unknown) Sat Aug 11 15:35:42 2018[1,20]: Sat Aug 11 15:35:42 2018[1,20]: @ 0x7fdfff2ff9ae paddle::GatedRecurrentLayer::forwardBatch() Sat Aug 11 15:35:42 2018[1,24]: Sat Aug 11 15:35:42 2018[1,24]: @ 0x7f4d88dbe20b hl_cpu_gru_forward<>() Sat Aug 11 15:35:42 2018[1,19]: Sat Aug 11 15:35:42 2018[1,19]: @ 0x7f3fb979920b hl_cpu_gru_forward<>() ......
如题,我用集群跑paddle给的机器翻译的demo,刚开始跑得好好的,后来不知怎的就报上面这个错,不知是什么原因,还请老师帮忙看一下