动态图是否支持用profiler工具进行性能分析?
Created by: baiyfbupt
我在动态图中用profileer
with profiler.profiler('All', 'total', '/tmp/profile', 'AllOpDetail') as prof:
结果如下:
-------------------------> Profiling Report <-------------------------
Note! This Report merge all thread info into one.
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
Total time: 85.3118
Computation time Total: 0 Ratio: 0%
Framework overhead Total: 85.3118 Ratio: 100%
------------------------- GpuMemCpy Summary -------------------------
GpuMemcpy Calls: 1592 Total: 67.659 Ratio: 79.3078%
GpuMemcpyAsync Calls: 1581 Total: 67.3125 Ratio: 78.9017%
GpuMemcpySync Calls: 11 Total: 0.346447 Ratio: 0.406095%
------------------------- Event Summary -------------------------
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
GpuMemcpyAsync:CPU->GPU 1507 60.6811 59.008674 (0.972439) 1.672417 (0.027561) 0.016008 10.3689 0.0402662 0.711286
BufferedReader:MemoryCopy 26 21.8826 19.813381 (0.905441) 2.069192 (0.094559) 0.563868 1.1884 0.841637 0.256501
GpuMemcpyAsync:CUDAPinned->GPU 52 4.22972 2.160526 (0.510797) 2.069192 (0.489203) 0.01446 0.225421 0.0813407 0.0495795
GpuMemcpyAsync(same_gpu):GPU->GPU 22 2.40172 2.373141 (0.988102) 0.028576 (0.011898) 0.039795 1.46456 0.109169 0.0281522
GpuMemcpySync(same_gpu):GPU->GPU 11 0.346447 0.330703 (0.954556) 0.015744 (0.045444) 0.028282 0.043144 0.0314952 0.00406095
-------------------------> Profiling Report <-------------------------
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
------------------------- Event Summary -------------------------
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread14::BufferedReader:MemoryCopy 13 10.5248 9.455296 (0.898379) 1.069541 (0.101621) 0.713474 0.976148 0.809603 1
GpuMemcpyAsync:CUDAPinned->GPU 26 2.0956 1.026063 (0.489626) 1.069541 (0.510374) 0.01446 0.225421 0.0806002 0.19911
thread13::BufferedReader:MemoryCopy 13 11.3577 10.358085 (0.911985) 0.999651 (0.088015) 0.563868 1.1884 0.873672 1
GpuMemcpyAsync:CUDAPinned->GPU 26 2.13411 1.134463 (0.531585) 0.999651 (0.468415) 0.016397 0.202192 0.0820813 0.1879
thread0::GpuMemcpyAsync:CPU->GPU 1507 60.6811 59.008674 (0.972439) 1.672417 (0.027561) 0.016008 10.3689 0.0402662 0.956674
thread0::GpuMemcpyAsync(same_gpu):GPU->GPU 22 2.40172 2.373141 (0.988102) 0.028576 (0.011898) 0.039795 1.46456 0.109169 0.0378645
thread0::GpuMemcpySync(same_gpu):GPU->GPU 11 0.346447 0.330703 (0.954556) 0.015744 (0.045444) 0.028282 0.043144 0.0314952 0.00546194
只能看到memory copy/sync的结果,没有各个OP的,想问下是否动态图还不支持用profiler呢,如果不支持还有什么性能分析工具可用吗?