Created by: dzhwinter
fix https://github.com/PaddlePaddle/Paddle/issues/9460 There are some conclusions.
- profiler count the max time cost is not accurate.
Inside
PushEvent
,PopEvent
, it uses the mutex lock, and make it sometime have peak value happens. - The Eigen operation will degrade when process large input data.
For example, the
scale_op
implement based on Eigenscalar * Tensor
operation.
The input data has an instance longer than others noteblely, its' max value goes to 2.95648.
Event Calls Total Min. Max. Ave.
thread0::scale 64100 1743.01 0.016128 2.95648 0.0271921
If we crop the input size to the same length. Then it goes
Event Calls Total Min. Max. Ave.
thread0::scale 5200 123.396 0.019424 0.071616 0.02373```