- Added fc + gru fuse pass and enabled oneDNN gru fp32 kernel, speeding up GRU fp32 model inference on 4 CPU threads by 20% (1.2x) on machine Intel Xeon 6248
- 增加了对许多Op的oneDNN inplace支持(人脸feature fp32模型提速2%)
- Added support for oneDNN inplace support for many operators (speedup 2% for Feature model)
- 优化的oneDNN LRN op,使得GoogleNet fp32模型提速1%
- Optimized LRN operator (speedup 1% for GoogleNet)
- 升级了量化模型的转换和优化 @intel
- Improved the transformation and optimization of quantized model