Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #46

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 9月 08, 2016 by saxon_zh@saxon_zhGuest

Floating point exception (overflow)

Created by: F0REacH

Used same config as in Issue #44 (closed) with CPU (changed only batch_size=45) Looks like floating point overflow, but I can't figure what causing it. Maybe incorrect layer connection?

I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:144] commandline: /opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=false --trainer_count=4 --num_passes=100000 --log_period=10 --dot_period=1 --show_parameter_stats_period=1000 --test_all_data_in_one_period=1 --saving_period=100 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:113] Calling runInitFunctions
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-08 02:48:21,778 networks.py:1122] The input order is [input, label]
[INFO 2016-09-08 02:48:21,778 networks.py:1129] The output order is [__cost_0__]
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Trainer.cpp:169] trainer mode: Normal
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:134] Initing parameters..
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:141] Init parameters done.
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=0 Batch=4 samples=178 AvgCost=15884.9 Eval: classification_error_evaluator=0.993309 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=172604 Eval: classification_error_evaluator=0.995207 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:112] Saving parameters to ./model_output/pass-00000
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:219] copy trainer_config.py to ./model_output/pass-00000
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=1 Batch=4 samples=178 AvgCost=14954.9 Eval: classification_error_evaluator=0.980111 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=159166 Eval: classification_error_evaluator=0.975115 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=2 Batch=4 samples=178 AvgCost=14009.3 Eval: classification_error_evaluator=0.935489 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=135530 Eval: classification_error_evaluator=0.871007 
... some steps ....

I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=97979 Eval: classification_error_evaluator=0.676567 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=46 Batch=4 samples=178 AvgCost=8838.92 Eval: classification_error_evaluator=0.705236 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=102650 Eval: classification_error_evaluator=0.730931 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=47 Batch=4 samples=178 AvgCost=8806.03 Eval: classification_error_evaluator=0.701997 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=91264.2 Eval: classification_error_evaluator=0.659892 
/opt/paddle/bin/paddle: line 46:  4547 Floating point exception(core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

Error repeats after ~40 passes each time I run training Backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --s'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
214     ../sysdeps/ieee754/dbl-64/e_exp.c: No such file or directory.
[Current thread is 1 (Thread 0x7f53cec29700 (LWP 4548))]
(gdb) bt
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
#1  0x00007f53d329847f in __GI___exp (x=711.2794189453125) at ../sysdeps/ieee754/dbl-64/w_exp.c:26
#2  0x0000000000e2c4dd in hppl::tanh (a=-355.639709) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cpu_functions.cc:33
#3  0x0000000000a3dd22 in hppl::forward::lstm::operator() (this=0x7f53cec28050, valueIn=@0x7f53cec28014: -0.940858305, valueIg=@0x7f53cec28010: 0.999997735, valueFg=@0x7f53cec2800c: 0.999997735, valueOg=@0x7f53cec28008: 0.999997735, prevState=@0x7f53cec27ff4: -354.699646, 
    state=@0x7f53cec27ff8: -355.639709, stateAtv=@0x7f53cec27ff0: 0.368853271, output=@0x7f53cec27fec: 0.165656254, checkI=@0x7f53cec28004: -0.0588896535, checkF=@0x7f53cec28000: -0.0764867961, checkO=@0x7f53cec27ffc: -0.0473404899, actInput=0xe2c4ba <hppl::tanh(float)>, 
    actGate=0xe2c431 <hppl::sigmoid(float)>, actState=0xe2c4ba <hppl::tanh(float)>) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_lstm_ops.cuh:65
#4  0x0000000000a3ec6c in hl_naive_lstm_forward_one_sequence<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:60
#5  0x0000000000a3e662 in hl_cpu_lstm_forward<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:348
#6  0x0000000000a3d94f in paddle::LstmCompute::forwardOneSequence<false> (this=0x2e423a8, value=..., frameSize=6) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmCompute.cpp:32
#7  0x0000000000a3da0f in paddle::LstmCompute::forwardBatch<false> (this=0x2e423a8, value=..., frameSize=6, batchSize=10) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmCompute.cpp:47
#8  0x0000000000a3b75d in paddle::LstmLayer::forwardBatch (this=0x2e42010, batchSize=37105, numSequences=11, starts=0x7f53b805bb40, inputValue=std::shared_ptr (count 2, weak 0) 0x7f53b80d1a10) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmLayer.cpp:501
#9  0x0000000000a38c8c in paddle::LstmLayer::forward (this=0x2e42010, passType=paddle::enumeration_wrapper::PASS_TRAIN) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmLayer.cpp:172
#10 0x0000000000ac2334 in paddle::NeuralNetwork::forward (this=0x2e1a3e0, inArgs=std::vector of length 2, capacity 2 = {...}, outArgs=0x2e10d08, passType=paddle::enumeration_wrapper::PASS_TRAIN)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/NeuralNetwork.cpp:242
#11 0x0000000000ad620c in paddle::TrainerThread::forward (this=0x2e10be0) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:581
#12 0x0000000000ad5ef2 in paddle::TrainerThread::computeThread (this=0x2e10be0) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:519
#13 0x0000000000ad5abd in paddle::TrainerThread::<lambda()>::operator()(void) const (__closure=0x2ef45f8) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:465
#14 0x0000000000adb9b2 in std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x2ef45f8) at /opt/gcc/include/c++/4.9.4/functional:1700
#15 0x0000000000adb6ed in std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()>::operator()(void) (this=0x2ef45f8) at /opt/gcc/include/c++/4.9.4/functional:1688
#16 0x0000000000adb4d2 in std::thread::_Impl<std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()> >::_M_run(void) (this=0x2ef45e0) at /opt/gcc/include/c++/4.9.4/thread:115
#17 0x00007f53d363d380 in std::execute_native_thread_routine_compat (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:110
#18 0x00007f53d7578454 in start_thread (arg=0x7f53cec29700) at pthread_create.c:333
#19 0x00007f53d2da715d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) frame 0
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
214     in ../sysdeps/ieee754/dbl-64/e_exp.c
(gdb) info locals
ctx = {env = {__control_word = <optimized out>, __glibc_reserved1 = <optimized out>, __status_word = <optimized out>, __glibc_reserved2 = <optimized out>, __tags = <optimized out>, __glibc_reserved3 = <optimized out>, __eip = <optimized out>, __cs_selector = <optimized out>, 
    __opcode = <optimized out>, __glibc_reserved4 = <optimized out>, __data_offset = <optimized out>, __data_selector = <optimized out>, __glibc_reserved5 = <optimized out>, __mxcsr = 39281}, updated_status = <optimized out>}
bexp = <optimized out>
t = 0.11041169086502123
eps = <optimized out>
del = <optimized out>
base = 0.11041259765625
y = 25769803776.110413
al = 1.1167387406605691
bet = -1.4572163044673799e-09
res = 1.1167377264919247
rem = -1.0141686444258574e-06
cor = -2.8229067033144067e-17
junk1 = <optimized out>
m = 1082538556
n = 1082538556
ex = <optimized out>
retval = <optimized out>
(gdb) frame 1
#1  0x00007f53d329847f in __GI___exp (x=711.2794189453125) at ../sysdeps/ieee754/dbl-64/w_exp.c:26
26      ../sysdeps/ieee754/dbl-64/w_exp.c: No such file or directory.
(gdb) info locals
z = <optimized out>
(gdb) frame 2
#2  0x0000000000e2c4dd in hppl::tanh (a=-355.639709) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cpu_functions.cc:33
33          return (2.0 / (1.0 + exp(-2.0*a))) - 1.0;
(gdb) info locals
No locals.
(gdb) frame 3
#3  0x0000000000a3dd22 in hppl::forward::lstm::operator() (this=0x7f53cec28050, valueIn=@0x7f53cec28014: -0.940858305, valueIg=@0x7f53cec28010: 0.999997735, valueFg=@0x7f53cec2800c: 0.999997735, valueOg=@0x7f53cec28008: 0.999997735, prevState=@0x7f53cec27ff4: -354.699646, 
    state=@0x7f53cec27ff8: -355.639709, stateAtv=@0x7f53cec27ff0: 0.368853271, output=@0x7f53cec27fec: 0.165656254, checkI=@0x7f53cec28004: -0.0588896535, checkF=@0x7f53cec28000: -0.0764867961, checkO=@0x7f53cec27ffc: -0.0473404899, actInput=0xe2c4ba <hppl::tanh(float)>, 
    actGate=0xe2c431 <hppl::sigmoid(float)>, actState=0xe2c4ba <hppl::tanh(float)>) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_lstm_ops.cuh:65
65          stateAtv = actState(state);
(gdb) info locals
No locals.
(gdb) frame 4
#4  0x0000000000a3ec6c in hl_naive_lstm_forward_one_sequence<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:60
60          op(rValueIn,
(gdb) info locals
i = 3
rValueIn = -0.940858305
rValueFg = 0.999997735
rCheckO = -0.0473404899
rPrevState = -354.699646
rOut = 0.165656254
valueOg = 0x7f5324c72908
rValueIg = 0.999997735
valueIn = 0x7f5324c728c0
valueFg = 0x7f5324c728f0
rCheckI = -0.0588896535
valueIg = 0x7f5324c728d8
rValueOg = 0.999997735
rCheckF = -0.0764867961
rState = -355.639709
rStateAtv = 0.368853271
(gdb) frame 5
#5  0x0000000000a3e662 in hl_cpu_lstm_forward<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:348
348         hl_naive_lstm_forward_one_sequence(op, value, frameSize,
(gdb) info locals
No locals.

Paddle build options:

cmake -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/opt/paddle ..

BLAS backend is Intel MKL 11.3.3.210 CPU is Intel i5 4690K

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#46
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7