基于C++训练库的demo trainer在MKLDNN下报segmentation fault
Created by: mapingshuo
###错误描述
基于C++训练库的demo trainer在MKLDNN下报segmentation fault: http://ci.paddlepaddle.org/viewLog.html?tab=buildLog&logTab=tree&filter=debug&expand=all&buildId=171521&_focus=6744
本地编译时选择 WITH_MKL=ON, WITH_MKLDNN=ON是出错,WITH_MKL=ON, WITH_MKLDNN=OFF时正常运行, 应该是MKLDNN的问题。
错误排查
错误出现在运行这一行时:https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/train/demo/demo_trainer.cc#L83
经过排查,是在tensor.cc里面,holder_.reset()时出的错:
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/tensor.cc#L55
如何复现这一错误?
cmake .. -DWITH_GPU=OFF -DWITH_DISTRIBUTE=OFF -DON_INFER=ON
make -j40 fluid_lib_dist
make -j40 inference_lib_dist
然后,在Paddle/paddle/fluid/train/demo文件夹下:
set -x
PADDLE_ROOT=your_Paddle/
TURN_ON_MKL=ON # use MKL or Openblas
# download models
function download() {
wget -q http://paddle-tar.bj.bcebos.com/train_demo/LR/main_program
wget -q http://paddle-tar.bj.bcebos.com/train_demo/LR/startup_program
}
download
# build demo trainer
fluid_install_dir=${PADDLE_ROOT}/build/fluid_install_dir
mkdir -p build
cd build
rm -rf *
cmake .. -DPADDLE_LIB=$fluid_install_dir \
-DWITH_MKLDNN=ON \
-DWITH_MKL=$TURN_ON_MKL
make
cd ..
# run demo trainer
build/demo_trainer