Created by: jczaja
Changes proposed here are introducing Transpose/Transpose2 op executed with MKL-DNN reorder primitive. This is first out of three PR's related to transpose Op.
Performance: Model: Transformer (https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/neural_machine_translation/transformer) Platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Threads: 1 Results[overall model speedup]: ~2 times(BS:1) , ~6 times(BS:50)
Notes:
- Forward Op only was implemented (works if is_test == True))
- No any layout support yet (will come in third installment)
- No reusing of reorder primitives (will come in second installment)
- Other models may benefit from this optimization, but for full performance on convolutional models (eg. mostly mkl-dnn based) any format support has to be added (will happen in third installment of this series)
@luotao1: 1.This PR does not deliver corressponding Grad Op , so there is added is_test and assertion if this op is used during traning. Is this proper way of handling situation where we implement only Froward Op ? 2. Could you also check if you see improvment on your target use case of Transformer?