Created by: jczaja
This PR is introducing MKL based execution of softmax operator.
Capi DAM test's profiling shows ~2 times improvement in softmax op execution with this optimization. Num threads: 1 Batch: 1,8,32,128 Platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
Notes:
- Optimization is enabled when Paddle is configured with: ON_INFER = ON flag
- To have unit test for it , just build with ON_INFER=ON and run test_softmax_op.py