[DNNL] Add inplace support to DNNL ops
Created by: jczaja
DNNL primitives works faster if computation is in-place e.g. Input and output are the same memory so input is overwritten by outcome of computation. We checked (using external program) that DNNL's layer norm takes 30% less time to execute when in-place. We hope that other operators will gain as well.
Following DNNL operators are supporting in place:
- Layer Norm
- Batch Norm
- softmax
- binary
Goal of this issue is to: Add IR pass to scan a graph and and make In = Out tensors, if input is not used by more than one operator (We cannot overwrite input of layer norm if this input is to be used in other operator as well)