Created by: jczaja
This PR is introducing C-API MKL-DNN pass to take adventage of in-place ops of MKL-DNN for C-API inference. Currently only softmax in-place is available.
There should be little performance improvement visible on BERT fp32 and ERNIE int8( https://github.com/PaddlePaddle/Paddle/issues/22904#issuecomment-600227776)
Apart from performance improvement, memory consumption should also be smaller, but due to fact that PaddlePaddle is preallocating memory blocks , I did not observed lower memory usage in tested model (BERT).
After this PR, elementwise_add and activation in-place support will follow