Created by: wangchaochaohu
PR types
Performance optimization
PR changes
APIs
Describe
same reson as https://github.com/PaddlePaddle/Paddle/pull/25810
问题发现和测试 https://github.com/PaddlePaddle/benchmark/pull/586 develop
-------------------------> Profiling Report <-------------------------
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
------------------------- Overhead Summary -------------------------
Total time: 5.216
Computation time Total: 5.0294 Ratio: 96.4224%
Framework overhead Total: 0.186607 Ratio: 3.57758%
------------------------- GpuMemCpy Summary -------------------------
GpuMemcpy Calls: 3 Total: 0.969177 Ratio: 18.5808%
GpuMemcpyAsync Calls: 3 Total: 0.969177 Ratio: 18.5808%
------------------------- Event Summary -------------------------
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::fill_constant 3 3.86991 3.865012 (0.998734) 0.004898 (0.001266) 0.099551 3.66175 1.28997 0.74193
thread0::linspace 1 1.34609 1.336493 (0.992868) 0.009601 (0.007132) 1.34609 1.34609 1.34609 0.25807
GpuMemcpyAsync:GPU->CPU 3 0.969177 0.964345 (0.995014) 0.004832 (0.004986) 0.028547 0.90385 0.323059 0.185808
this PR
-------------------------> Profiling Report <-------------------------
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
------------------------- Overhead Summary -------------------------
Total time: 3.13474
Computation time Total: 3.02807 Ratio: 96.5971%
Framework overhead Total: 0.106671 Ratio: 3.40287%
------------------------- GpuMemCpy Summary -------------------------
GpuMemcpy Calls: 0 Total: 0 Ratio: 0%
------------------------- Event Summary -------------------------
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::linspace 1 2.96124 2.959388 (0.999373) 0.001856 (0.000627) 2.96124 2.96124 2.96124 0.944655
thread0::fill_constant 3 0.173492 0.173492 (1.000000) 0.000000 (0.000000) 0.027756 0.117965 0.0578307 0.055345