Created by: Sand3r-
Introduces a JIT implementation of Elementwise Mul which provides a 13% speedup on the whole ResNeXt-50 topology when BS=1 and Thread=1 and MKL-DNN is used.
Also, with this change, when ran in multi-threaded environment, (in this example on a 20 core machine) on BS=1 it provides 2.8x speedup on the whole topology when MKL-DNN run is measured against a reference implementation, and 9.1x speed up for BS=64.
The AVX2 implementation will be implemented later on in another PR.