Created by: zhangting2020
fix the bug:
- when
GpuMemcpy
is included incompute
, it will be counted repeatedly in the compute ratio:
recurrent_grad/sum/compute 2640 83.2866 62.121459 (0.745876) 21.165112 (0.254124) 0.015171 0.095683 0.0315479 0.068406
recurrent_grad/sum/compute/GpuMemcpyAsync:CPU->GPU 570 8.39523 7.405052 (0.882055) 0.990173 (0.117945) 0.010849 0.046229 0.0147285 0.00689527
This will cause the ratio of compute time
to be higher than the actual value.
-
GpuMemcpy
may be in main events, it should be counted.