Paddle SparseRowCpuMatrix::addTo SIGSEGV when width of matrix cannot be divided by 32. (#1360) · Issue · PaddlePaddle / Paddle

Paddle SparseRowCpuMatrix::addTo SIGSEGV when width of matrix cannot be divided by 32.

Created by: reyoung

The log message shows below.

[WARNING 2017-02-17 17:31:41,513 layers.py:1252] NOTE: the gru memory layer's size is set by previous input layer, and should be input size / 3. Set size explicitly will be ignored.
[INFO 2017-02-17 17:31:41,516 networks.py:1466] The input order is [bidword_seq, label]
[INFO 2017-02-17 17:31:41,516 networks.py:1472] The output order is [__cost_0__]
I0217 17:31:41.518110 26649 Trainer.cpp:125] ignore sparse_remote_update=true due to  --local=true
I0217 17:31:41.518137 26649 Trainer.cpp:173] trainer mode: SgdSparseCpuTraining
I0217 17:31:55.190280 26649 PyDataProvider2.cpp:243] loading dataprovider dataprovider::process
[INFO 2017-02-17 17:31:56,881 dataprovider.py:20] dict len : 1972305
I0217 17:31:56.881968 26649 PyDataProvider2.cpp:243] loading dataprovider dataprovider::process
[INFO 2017-02-17 17:31:58,100 dataprovider.py:20] dict len : 1972305
I0217 17:31:58.100997 26649 GradientMachine.cpp:135] Initing parameters..
I0217 17:32:31.008913 26649 GradientMachine.cpp:142] Init parameters done.
I0217 17:32:32.164254  3860 ThreadLocal.cpp:40] thread use undeterministic rand seed:3861
*** Aborted at 1487323962 (unix time) try "date -d @1487323962" if you are using GNU date ***
PC: @           0x798110 paddle::simd::internal::addToImpl()
*** SIGSEGV (@0x0) received by PID 26649 (TID 0x7f41619ea700) from PID 0; stack trace: ***
    @     0x7f46ad8be160 (unknown)
    @           0x798110 paddle::simd::internal::addToImpl()
    @           0x78c9ed paddle::SparseRowCpuMatrix::addTo()
    @           0x70b117 paddle::TrainerThread::mergeGradSparse()
    @           0x70b58b paddle::TrainerThread::mergeCpuGradients()
    @           0x70bda7 paddle::TrainerThread::backward()
    @           0x70c02d paddle::TrainerThread::computeThread()
    @     0x7f46acbd28a0 execute_native_thread_routine
    @     0x7f46ad8b61c3 start_thread
    @     0x7f46ac34312d __clone
/home/work/yangyaming/programs/paddle_internal_release_tools/idl/paddle/output/bin/paddle_local: line 109: 26649 Segmentation fault      ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

The buggy code is here. This bug because simd:: addTo method uses plain _mm256_load_ps instruct, the input buffer must be aligned by 32. If matrix dimension cannot be divided by 32. It will cause SIGSEGV.

The possible fixes could be.

dest = dest.rowBuf(id)
local = getLocalRow(i)
while dest % 32 !=0:
    *dest += *local
    ++dest
    ++local
simd::addTo(dest, local, this->width_);

PaddlePaddle / Paddle 大约 1 年 前同步成功

Paddle SparseRowCpuMatrix::addTo SIGSEGV when width of matrix cannot be divided by 32.

PaddlePaddle / Paddle
大约 1 年前同步成功