Created by: jczaja
This is followup PR to #14872. Transpose MKL-DNN Op is extended to reuse MKL-DNN primitives.
Performance: Model: Transformer (https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/neural_machine_translation/transformer) Platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Threads: 1 Results[Transpose Op speedup]: ~2 times(BS:1)
Notes:
- Accuracy tested on MobileNet-SSD and Transformer (python) models
@luotao1 Could you please check on your model of interest if all works fine and if there is performance improvement?