mkldnn batch norm maybe diff with mkl batch norm
Created by: luotao1
The test machine is: E5-2620 v2, which only have AVX Instruction set.
When I inference resnet50 model: https://github.com/PaddlePaddle/Paddle/compare/develop...luotao1:resnet50?expand=1
The commit id is: https://github.com/PaddlePaddle/Paddle/commit/643b6faa0ced3304d5531dfa6a87a7fb39a54e0f
I run ctest -R test_analyzer_resnet50 -V
. following Test fails:
// Compare result of NativeConfig and AnalysisConfig
TEST(Analyzer_resnet50, compare)
The error is that there is > 0.001 diff for all result data.
95: The difference between pdata_ref[j] and pdata[j] is 0.00164794921875, which exceeds 1e-3, where
95: pdata_ref[j] evaluates to 451.11993408203125,
95: pdata[j] evaluates to 451.1182861328125, and
95: 1e-3 evaluates to 0.001.
When FLAGS_with_mkldnn=true
:
- NativeConfig: mkl
- AnalysisConfig: mkldnn. And to avoid
fc_fuse_pass
, I exclude thisfc_fuse_pass
as well. Thus, the only diff is between mkl op and mkldnn op.
However, as test_analyzer_ocr
runs sucessfully. We compare the ops in ocr model and resnet50 model. The difference is batch_norm op, which doesn't exist in ocr model.
Besides I also merge #13486 to see whether conv bn fuse pass
runs sucessfully.
But both I set cfg->_use_mkldnn = true
and cfg->_use_mkldnn = false
, there is still diff.
Could you help see it?