Created by: jczaja
This PR is extending reusing concept for elementwise_add_mkldnn op (#18047 (closed)). It is needed for upgrade mkl-dnn to 0.2 and improve performance of models that use elementwise_add_mkldnn op .
Performance evaluation details: platform: SKX 8180 model: Bert
develop (before): I0617 04:57:11.074146 306287 helper.h:326] ====== batch latency: 3.76448ms, number of samples: 4962, sample latency: 3.76448ms, fps: 265.641, data type: float ====== W0617 04:57:50.213590 306287 profiler.cc:89] CUDA CUPTI is not enabled
-------------------------> Profiling Report <-------------------------
Event Calls Total Min. Max. Ave. Ratio. ... thread0::elementwise_add 1885560 45216.6 0.013487 9.32789 0.0239804 0.245703 ...
this PR(after): I0617 05:04:12.229713 316097 helper.h:326] ====== batch latency: 3.38282ms, number of samples: 4962, sample latency: 3.38282ms, fps: 295.611, data type: float ====== W0617 05:04:50.954174 316097 profiler.cc:89] CUDA CUPTI is not enabled
-------------------------> Profiling Report <-------------------------
Event Calls Total Min. Max. Ave. Ratio. ... thread0::elementwise_add 1885560 29439.8 0.009848 9.43231 0.0156133 0.178296 ...
@LeoZhao-Intel could you please review as well?