[DNNL] Add inplace support to DNNL ops (#22904) · Issue · PaddlePaddle / Paddle

[DNNL] Add inplace support to DNNL ops

Created by: jczaja

DNNL primitives works faster if computation is in-place e.g. Input and output are the same memory so input is overwritten by outcome of computation. We checked (using external program) that DNNL's layer norm takes 30% less time to execute when in-place. We hope that other operators will gain as well.

Following DNNL operators are supporting in place:

Layer Norm
Batch Norm
softmax
binary

Goal of this issue is to: Add IR pass to scan a graph and and make In = Out tensors, if input is not used by more than one operator (We cannot overwrite input of layer norm if this input is to be used in other operator as well)

PaddlePaddle / Paddle 9 个月 前同步成功

[DNNL] Add inplace support to DNNL ops

PaddlePaddle / Paddle
9 个月前同步成功