Created by: jczaja
Changes here are implementing suggestions from #14437 .
In terms of performance, replacing single threaded suming replaced with cblas_sasum reduce by ~3-4% execution time of softmax op in DAM model for single threaded scenario.
Created by: jczaja
Changes here are implementing suggestions from #14437 .
In terms of performance, replacing single threaded suming replaced with cblas_sasum reduce by ~3-4% execution time of softmax op in DAM model for single threaded scenario.