Created by: wangchaochaohu
添加对 Fusion Group Profile的支持 PR一共做了两项工作: (1)添加了开启Profiler的时候对cuda Driver CallBack的支持,从而能够使得对cuLaunch启动的Kernel进行GPU时间统计(可见device_tracer.cc文件改动) (2)解决了在对Fusion Group的单测Profile的过程中发现两个executor run的时候显示存在问题的bug(develop 在run的时候不能区分fusion_group1 和fusion_group2)
对Fusion Group的单测进行Profiler develop
-------------------------> Profiling Report <-------------------------
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::fetch 8 0.412749 0.393165 (0.952552) 0.019584 (0.047448) 0.032694 0.081049 0.0515936 0.343904
thread0::fetch/fetch0 8 0.404343 0.384759 (0.951566) 0.019584 (0.048434) 0.031925 0.07981 0.0505429 0.3369
thread0::fetch/fetch0/GpuMemcpySync:GPU->CPU 8 0.315917 0.296333 (0.938009) 0.019584 (0.061991) 0.027216 0.054059 0.0394896 0.263223
thread0::mul 2 0.391124 0.363732 (0.929966) 0.027392 (0.070034) 0.137578 0.253546 0.195562 0.325886
thread0::mul/mul0 2 0.388089 0.360697 (0.929418) 0.027392 (0.070582) 0.136053 0.252036 0.194045 0.323357
thread0::mul/mul0/prepare_data 2 0.002788 0.002788 (1.000000) 0.000000 (0.000000) 0.001296 0.001492 0.001394 0.00232297
thread0::mul/mul0/infer_shape 2 0.012194 0.012194 (1.000000) 0.000000 (0.000000) 0.005748 0.006446 0.006097 0.0101601
thread0::mul/mul0/compute 2 0.353352 0.325960 (0.922480) 0.027392 (0.077520) 0.119663 0.233689 0.176676 0.294414
thread0::fusion_group 2 0.120147 0.120147 (1.000000) 0.000000 (0.000000) 0.030131 0.090016 0.0600735 0.100107
thread0::fusion_group/fusion_group0 2 0.117328 0.117328 (1.000000) 0.000000 (0.000000) 0.028387 0.088941 0.058664 0.0977581
thread0::fusion_group/fusion_group0/prepare_data 2 0.004051 0.004051 (1.000000) 0.000000 (0.000000) 0.00148 0.002571 0.0020255 0.00337531
thread0::fusion_group/fusion_group0/infer_shape 2 0.013023 0.013023 (1.000000) 0.000000 (0.000000) 0.003499 0.009524 0.0065115 0.0108508
thread0::fusion_group/fusion_group0/compute 2 0.076616 0.076616 (1.000000) 0.000000 (0.000000) 0.018491 0.058125 0.038308 0.0638367
thread0::elementwise_sub 1 0.089661 0.087933 (0.980727) 0.001728 (0.019273) 0.089661 0.089661 0.089661 0.0747059
thread0::elementwise_sub/elementwise_sub0 1 0.088111 0.086383 (0.980388) 0.001728 (0.019612) 0.088111 0.088111 0.088111 0.0734144
thread0::elementwise_sub/elementwise_sub0/prepare_data 1 0.002712 0.002712 (1.000000) 0.000000 (0.000000) 0.002712 0.002712 0.002712 0.00225965
thread0::elementwise_sub/elementwise_sub0/infer_shape 1 0.007135 0.007135 (1.000000) 0.000000 (0.000000) 0.007135 0.007135 0.007135 0.00594491
thread0::elementwise_sub/elementwise_sub0/compute 1 0.059192 0.057464 (0.970807) 0.001728 (0.029193) 0.059192 0.059192 0.059192 0.049319
thread0::relu 2 0.068834 0.065762 (0.955371) 0.003072 (0.044629) 0.025021 0.043813 0.034417 0.0573527
thread0::relu/relu0 1 0.042253 0.040589 (0.960618) 0.001664 (0.039382) 0.042253 0.042253 0.042253 0.0352053
thread0::relu/relu0/prepare_data 1 0.00133 0.001330 (1.000000) 0.000000 (0.000000) 0.00133 0.00133 0.00133 0.00110816
thread0::relu/relu0/infer_shape 1 0.001779 0.001779 (1.000000) 0.000000 (0.000000) 0.001779 0.001779 0.001779 0.00148227
thread0::relu/relu0/compute 1 0.028051 0.026387 (0.940679) 0.001664 (0.059321) 0.028051 0.028051 0.028051 0.0233722
thread0::relu/relu1 1 0.023772 0.022364 (0.940771) 0.001408 (0.059229) 0.023772 0.023772 0.023772 0.0198069
thread0::relu/relu1/prepare_data 1 0.001369 0.001369 (1.000000) 0.000000 (0.000000) 0.001369 0.001369 0.001369 0.00114066
thread0::relu/relu1/infer_shape 1 0.001572 0.001572 (1.000000) 0.000000 (0.000000) 0.001572 0.001572 0.001572 0.0013098
thread0::relu/relu1/compute 1 0.01624 0.014832 (0.913300) 0.001408 (0.086700) 0.01624 0.01624 0.01624 0.0135312
thread0::elementwise_mul 1 0.04303 0.041430 (0.962817) 0.001600 (0.037183) 0.04303 0.04303 0.04303 0.0358527
thread0::elementwise_mul/elementwise_mul0 1 0.041211 0.039611 (0.961175) 0.001600 (0.038825) 0.041211 0.041211 0.041211 0.0343371
thread0::elementwise_mul/elementwise_mul0/prepare_data 1 0.001668 0.001668 (1.000000) 0.000000 (0.000000) 0.001668 0.001668 0.001668 0.00138978
thread0::elementwise_mul/elementwise_mul0/infer_shape 1 0.00278 0.002780 (1.000000) 0.000000 (0.000000) 0.00278 0.00278 0.00278 0.00231631
thread0::elementwise_mul/elementwise_mul0/compute 1 0.028359 0.026759 (0.943581) 0.001600 (0.056419) 0.028359 0.028359 0.028359 0.0236288
thread0::feed 8 0.040351 0.040351 (1.000000) 0.000000 (0.000000) 0.002181 0.013594 0.00504388 0.0336206
thread0::feed/feed0 1 0.00913 0.009130 (1.000000) 0.000000 (0.000000) 0.00913 0.00913 0.00913 0.00760715
thread0::feed/feed1 1 0.001911 0.001911 (1.000000) 0.000000 (0.000000) 0.001911 0.001911 0.001911 0.00159225
thread0::feed/feed2 1 0.001652 0.001652 (1.000000) 0.000000 (0.000000) 0.001652 0.001652 0.001652 0.00137645
thread0::feed/feed3 5 0.016676 0.016676 (1.000000) 0.000000 (0.000000) 0.001411 0.010447 0.0033352 0.0138945
thread0::sigmoid 1 0.034291 0.032403 (0.944942) 0.001888 (0.055058) 0.034291 0.034291 0.034291 0.0285714
thread0::sigmoid/sigmoid0 1 0.032857 0.030969 (0.942539) 0.001888 (0.057461) 0.032857 0.032857 0.032857 0.0273766
thread0::sigmoid/sigmoid0/prepare_data 1 0.001532 0.001532 (1.000000) 0.000000 (0.000000) 0.001532 0.001532 0.001532 0.00127647
thread0::sigmoid/sigmoid0/infer_shape 1 0.00137 0.001370 (1.000000) 0.000000 (0.000000) 0.00137 0.00137 0.00137 0.00114149
thread0::sigmoid/sigmoid0/compute 1 0.024439 0.022551 (0.922746) 0.001888 (0.077254) 0.024439 0.024439 0.024439 0.0203627
.
----------------------------------------------------------------------
PR结果:
-------------------------> Profiling Report <-------------------------
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::fetch 8 0.462244 0.442628 (0.957564) 0.019616 (0.042436) 0.035045 0.09964 0.0577805 0.385939
thread0::fetch/fetch0 8 0.452494 0.432878 (0.956649) 0.019616 (0.043351) 0.034142 0.098141 0.0565617 0.377798
thread0::fetch/fetch0/GpuMemcpySync:GPU->CPU 8 0.356812 0.337196 (0.945024) 0.019616 (0.054976) 0.028388 0.071285 0.0446015 0.297911
thread0::mul 2 0.330204 0.303740 (0.919856) 0.026464 (0.080144) 0.111085 0.219119 0.165102 0.275695
thread0::mul/mul0 2 0.326698 0.300234 (0.918996) 0.026464 (0.081004) 0.109296 0.217402 0.163349 0.272768
thread0::mul/mul0/prepare_data 2 0.002831 0.002831 (1.000000) 0.000000 (0.000000) 0.001277 0.001554 0.0014155 0.00236367
thread0::mul/mul0/infer_shape 2 0.012438 0.012438 (1.000000) 0.000000 (0.000000) 0.005872 0.006566 0.006219 0.0103848
thread0::mul/mul0/compute 2 0.295194 0.268730 (0.910350) 0.026464 (0.089650) 0.092362 0.202832 0.147597 0.246465
thread0::fusion_group 2 0.13688 0.133072 (0.972180) 0.003808 (0.027820) 0.037209 0.099671 0.06844 0.114284
thread0::fusion_group/fusion_group0 1 0.098165 0.096149 (0.979463) 0.002016 (0.020537) 0.098165 0.098165 0.098165 0.0819603
thread0::fusion_group/fusion_group0/prepare_data 1 0.002704 0.002704 (1.000000) 0.000000 (0.000000) 0.002704 0.002704 0.002704 0.00225763
thread0::fusion_group/fusion_group0/infer_shape 1 0.008688 0.008688 (1.000000) 0.000000 (0.000000) 0.008688 0.008688 0.008688 0.00725382
thread0::fusion_group/fusion_group0/compute 1 0.069108 0.067092 (0.970828) 0.002016 (0.029172) 0.069108 0.069108 0.069108 0.0576999
thread0::fusion_group/fusion_group1 1 0.035183 0.033391 (0.949066) 0.001792 (0.050934) 0.035183 0.035183 0.035183 0.0293751
thread0::fusion_group/fusion_group1/prepare_data 1 0.00144 0.001440 (1.000000) 0.000000 (0.000000) 0.00144 0.00144 0.00144 0.00120229
thread0::fusion_group/fusion_group1/infer_shape 1 0.005061 0.005061 (1.000000) 0.000000 (0.000000) 0.005061 0.005061 0.005061 0.00422555
thread0::fusion_group/fusion_group1/compute 1 0.022916 0.021124 (0.921801) 0.001792 (0.078199) 0.022916 0.022916 0.022916 0.0191331
thread0::elementwise_sub 1 0.086273 0.084545 (0.979971) 0.001728 (0.020029) 0.086273 0.086273 0.086273 0.0720314
thread0::elementwise_sub/elementwise_sub0 1 0.084296 0.082568 (0.979501) 0.001728 (0.020499) 0.084296 0.084296 0.084296 0.0703807
thread0::elementwise_sub/elementwise_sub0/prepare_data 1 0.003029 0.003029 (1.000000) 0.000000 (0.000000) 0.003029 0.003029 0.003029 0.00252898
thread0::elementwise_sub/elementwise_sub0/infer_shape 1 0.006674 0.006674 (1.000000) 0.000000 (0.000000) 0.006674 0.006674 0.006674 0.00557228
thread0::elementwise_sub/elementwise_sub0/compute 1 0.055882 0.054154 (0.969078) 0.001728 (0.030922) 0.055882 0.055882 0.055882 0.0466572
thread0::relu 2 0.063476 0.060373 (0.951115) 0.003103 (0.048885) 0.023056 0.04042 0.031738 0.0529976
thread0::relu/relu0 1 0.038875 0.037179 (0.956373) 0.001696 (0.043627) 0.038875 0.038875 0.038875 0.0324577
thread0::relu/relu0/prepare_data 1 0.001683 0.001683 (1.000000) 0.000000 (0.000000) 0.001683 0.001683 0.001683 0.00140518
thread0::relu/relu0/infer_shape 1 0.001632 0.001632 (1.000000) 0.000000 (0.000000) 0.001632 0.001632 0.001632 0.0013626
thread0::relu/relu0/compute 1 0.027732 0.026036 (0.938843) 0.001696 (0.061157) 0.027732 0.027732 0.027732 0.0231541
thread0::relu/relu1 1 0.021781 0.020374 (0.935402) 0.001407 (0.064598) 0.021781 0.021781 0.021781 0.0181855
thread0::relu/relu1/prepare_data 1 0.001091 0.001091 (1.000000) 0.000000 (0.000000) 0.001091 0.001091 0.001091 0.000910902
thread0::relu/relu1/infer_shape 1 0.001243 0.001243 (1.000000) 0.000000 (0.000000) 0.001243 0.001243 0.001243 0.00103781
thread0::relu/relu1/compute 1 0.014549 0.013142 (0.903292) 0.001407 (0.096708) 0.014549 0.014549 0.014549 0.0121473
thread0::elementwise_mul 1 0.042547 0.040947 (0.962395) 0.001600 (0.037605) 0.042547 0.042547 0.042547 0.0355235
thread0::elementwise_mul/elementwise_mul0 1 0.04066 0.039060 (0.960649) 0.001600 (0.039351) 0.04066 0.04066 0.04066 0.033948
thread0::elementwise_mul/elementwise_mul0/prepare_data 1 0.001665 0.001665 (1.000000) 0.000000 (0.000000) 0.001665 0.001665 0.001665 0.00139015
thread0::elementwise_mul/elementwise_mul0/infer_shape 1 0.0027 0.002700 (1.000000) 0.000000 (0.000000) 0.0027 0.0027 0.0027 0.00225429
thread0::elementwise_mul/elementwise_mul0/compute 1 0.028188 0.026588 (0.943238) 0.001600 (0.056762) 0.028188 0.028188 0.028188 0.0235348
thread0::feed 8 0.042524 0.042524 (1.000000) 0.000000 (0.000000) 0.002229 0.01391 0.0053155 0.0355043
thread0::feed/feed0 1 0.010202 0.010202 (1.000000) 0.000000 (0.000000) 0.010202 0.010202 0.010202 0.00851789
thread0::feed/feed1 1 0.00195 0.001950 (1.000000) 0.000000 (0.000000) 0.00195 0.00195 0.00195 0.0016281
thread0::feed/feed2 1 0.001652 0.001652 (1.000000) 0.000000 (0.000000) 0.001652 0.001652 0.001652 0.00137929
thread0::feed/feed3 5 0.017344 0.017344 (1.000000) 0.000000 (0.000000) 0.001479 0.010837 0.0034688 0.0144809
thread0::sigmoid 1 0.033566 0.031678 (0.943753) 0.001888 (0.056247) 0.033566 0.033566 0.033566 0.0280251
thread0::sigmoid/sigmoid0 1 0.031971 0.030083 (0.940946) 0.001888 (0.059054) 0.031971 0.031971 0.031971 0.0266934
thread0::sigmoid/sigmoid0/prepare_data 1 0.001512 0.001512 (1.000000) 0.000000 (0.000000) 0.001512 0.001512 0.001512 0.0012624
thread0::sigmoid/sigmoid0/infer_shape 1 0.001464 0.001464 (1.000000) 0.000000 (0.000000) 0.001464 0.001464 0.001464 0.00122233
thread0::sigmoid/sigmoid0/compute 1 0.022998 0.021110 (0.917906) 0.001888 (0.082094) 0.022998 0.022998 0.022998 0.0192016
.
----------------------------------------------------------------------
Ran 3 tests in 6.275s