Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !24611

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Add pe profiler !24611

  • Report abuse
!24611 已合并 5月 18, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007f7e0c6fbf80>
  • 概览 1
  • 提交 6
  • 变更 8

Created by: wangchaochaohu

Profiler using example code This PR:


------------------------->     Profiling Report     <-------------------------

Note! This Report merge all thread info into one.
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread

Total time: 33.4768
  Computation time       Total: 9.54767     Ratio: 28.5203%
  Framework overhead     Total: 23.9291     Ratio: 71.4797%

-------------------------     GpuMemCpy Summary     -------------------------

GpuMemcpy                Calls: 12          Total: 1.06891     Ratio: 3.193%
  GpuMemcpyAsync         Calls: 12          Total: 1.06891     Ratio: 3.193%

-------------------------       Event Summary       -------------------------

Event                                                       Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
ParallelExecutor::Run                                       1           33.3599     33.267285 (0.997224)    0.092608 (0.002776)     33.3599     33.3599     33.3599     0.996509
  FastThreadedSSAGraphExecutorPrepare                       1           26.0902     25.997574 (0.996450)    0.092608 (0.003550)     26.0902     26.0902     26.0902     0.779352
  fused_all_reduce                                          1           8.76217     2.480211 (0.283059)     6.281961 (0.716941)     8.76217     8.76217     8.76217     0.261739
  mul                                                       4           3.47151     3.442485 (0.991639)     0.029024 (0.008361)     0.256854    1.33485     0.867877    0.103699
  mean                                                      4           2.82855     2.817609 (0.996131)     0.010944 (0.003869)     0.289439    1.0633      0.707138    0.084493
  elementwise_add_grad                                      4           2.07373     2.061025 (0.993874)     0.012704 (0.006126)     0.050706    1.15218     0.518432    0.0619453
  mean_grad                                                 4           1.83154     1.823063 (0.995370)     0.008480 (0.004630)     0.201255    0.818809    0.457886    0.0547109
  Scale LossGrad                                            4           1.7562      1.748869 (0.995827)     0.007328 (0.004173)     0.281598    0.888646    0.439049    0.0524602
    GpuMemcpyAsync:CPU->GPU                                 4           0.709247    0.701919 (0.989668)     0.007328 (0.010332)     0.042646    0.26316     0.177312    0.0211862
  sgd                                                       8           1.55178     1.538021 (0.991133)     0.013760 (0.008867)     0.052253    0.349815    0.193973    0.046354
  mul_grad                                                  4           0.889237    0.861365 (0.968656)     0.027872 (0.031344)     0.174724    0.343964    0.222309    0.0265628
  InitLocalVars                                             1           0.57186     0.571860 (1.000000)     0.000000 (0.000000)     0.57186     0.57186     0.57186     0.0170823
    coalesce_tensor                                         4           0.473217    0.473217 (1.000000)     0.000000 (0.000000)     0.084212    0.199022    0.118304    0.0141357
  Fetch                                                     1           0.381305    0.367929 (0.964920)     0.013376 (0.035080)     0.381305    0.381305    0.381305    0.0113901
    GpuMemcpyAsync:GPU->CPU                                 4           0.24279     0.229414 (0.944907)     0.013376 (0.055093)     0.035801    0.131014    0.0606975   0.00725249
  elementwise_add                                           4           0.363059    0.352307 (0.970385)     0.010752 (0.029615)     0.05774     0.113593    0.0907648   0.0108451
  eager_deletion                                            12          0.027635    0.027635 (1.000000)     0.000000 (0.000000)     0.001541    0.005127    0.00230292  0.000825498
  ScopeBufferedMonitor::post_local_exec_scopes_process      1           0.012179    0.012179 (1.000000)     0.000000 (0.000000)     0.012179    0.012179    0.012179    0.000363805
  ScopeBufferedMonitor::pre_local_exec_scopes_process       1           0.004394    0.004394 (1.000000)     0.000000 (0.000000)     0.004394    0.004394    0.004394    0.000131255
GpuMemcpyAsync:CPU->GPU                                     4           0.116875    0.102316 (0.875431)     0.014559 (0.124569)     0.020035    0.053887    0.0292188   0.00349123
W0518 07:13:48.128401 80971 profiler_helper.h:384] Cannot find the push marker of event 'ParallelExecutor::Run', which will be ignored in profiling report.

------------------------->     Profiling Report     <-------------------------

Place: All
Time unit: ms
Sorted by total time in descending order in the same thread


-------------------------       Event Summary       -------------------------

Event                                                                Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
thread16::sgd                                                        1           0.163545    0.161881 (0.989825)     0.001664 (0.010175)     0.163545    0.163545    0.163545    1
thread15::sgd                                                        1           0.268643    0.266531 (0.992138)     0.002112 (0.007862)     0.268643    0.268643    0.268643    1
thread14::sgd                                                        1           0.316286    0.314942 (0.995751)     0.001344 (0.004249)     0.316286    0.316286    0.316286    1
thread13::sgd                                                        1           0.197217    0.195873 (0.993185)     0.001344 (0.006815)     0.197217    0.197217    0.197217    1
thread12::sgd                                                        1           0.349815    0.348439 (0.996066)     0.001376 (0.003934)     0.349815    0.349815    0.349815    1
thread11::mean                                                       1           0.570155    0.567435 (0.995229)     0.002720 (0.004771)     0.570155    0.570155    0.570155    0.598273
thread11::Fetch                                                      1           0.381305    0.367929 (0.964920)     0.013376 (0.035080)     0.381305    0.381305    0.381305    0.40011
  GpuMemcpyAsync:GPU->CPU                                            4           0.24279     0.229414 (0.944907)     0.013376 (0.055093)     0.035801    0.131014    0.0606975   0.254764
thread11::eager_deletion                                             1           0.001541    0.001541 (1.000000)     0.000000 (0.000000)     0.001541    0.001541    0.001541    0.001617
thread10::mean                                                       1           0.905661    0.902813 (0.996855)     0.002848 (0.003145)     0.905661    0.905661    0.905661    0.997617
thread10::eager_deletion                                             1           0.002163    0.002163 (1.000000)     0.000000 (0.000000)     0.002163    0.002163    0.002163    0.00238262
thread9::mean                                                        1           1.0633      1.060802 (0.997653)     0.002496 (0.002347)     1.0633      1.0633      1.0633      0.997736
thread9::eager_deletion                                              1           0.002413    0.002413 (1.000000)     0.000000 (0.000000)     0.002413    0.002413    0.002413    0.00226422
thread8::Scale LossGrad                                              1           0.281598    0.279774 (0.993523)     0.001824 (0.006477)     0.281598    0.281598    0.281598    1
  GpuMemcpyAsync:CPU->GPU                                            1           0.26316     0.261336 (0.993069)     0.001824 (0.006931)     0.26316     0.26316     0.26316     0.934524
thread7::Scale LossGrad                                              1           0.291794    0.289906 (0.993530)     0.001888 (0.006470)     0.291794    0.291794    0.291794    0.669787
  GpuMemcpyAsync:CPU->GPU                                            1           0.190436    0.188548 (0.990086)     0.001888 (0.009914)     0.190436    0.190436    0.190436    0.437129
thread7::sgd                                                         1           0.143858    0.142162 (0.988211)     0.001696 (0.011789)     0.143858    0.143858    0.143858    0.330213
thread6::elementwise_add_grad                                        1           1.15218     1.148913 (0.997167)     0.003264 (0.002833)     1.15218     1.15218     1.15218     0.52329
thread6::mean_grad                                                   1           0.574346    0.572202 (0.996267)     0.002144 (0.003733)     0.574346    0.574346    0.574346    0.260853
thread6::Scale LossGrad                                              1           0.294159    0.292335 (0.993799)     0.001824 (0.006201)     0.294159    0.294159    0.294159    0.1336
  GpuMemcpyAsync:CPU->GPU                                            1           0.213005    0.211181 (0.991437)     0.001824 (0.008563)     0.213005    0.213005    0.213005    0.0967415
thread6::mul_grad                                                    1           0.176503    0.169527 (0.960477)     0.006976 (0.039523)     0.176503    0.176503    0.176503    0.0801632
thread6::eager_deletion                                              2           0.004611    0.004611 (1.000000)     0.000000 (0.000000)     0.001926    0.002685    0.0023055   0.0020942
thread5::mean_grad                                                   1           0.818809    0.816825 (0.997577)     0.001984 (0.002423)     0.818809    0.818809    0.818809    0.341401
thread5::mul                                                         1           0.658406    0.651334 (0.989259)     0.007072 (0.010741)     0.658406    0.658406    0.658406    0.274521
thread5::elementwise_add_grad                                        1           0.629194    0.626154 (0.995168)     0.003040 (0.004832)     0.629194    0.629194    0.629194    0.262341
thread5::mul_grad                                                    1           0.174724    0.168196 (0.962638)     0.006528 (0.037362)     0.174724    0.174724    0.174724    0.0728508
thread5::elementwise_add                                             1           0.113593    0.111129 (0.978309)     0.002464 (0.021691)     0.113593    0.113593    0.113593    0.0473624
thread5::eager_deletion                                              2           0.003654    0.003654 (1.000000)     0.000000 (0.000000)     0.001822    0.001832    0.001827    0.00152353
thread4::fused_all_reduce                                            1           8.76217     2.480211 (0.283059)     6.281961 (0.716941)     8.76217     8.76217     8.76217     0.808448
thread4::mul                                                         1           1.33485     1.327523 (0.994510)     0.007328 (0.005490)     1.33485     1.33485     1.33485     0.123161
thread4::mul_grad                                                    1           0.343964    0.336732 (0.978975)     0.007232 (0.021025)     0.343964    0.343964    0.343964    0.0317361
thread4::mean_grad                                                   1           0.201255    0.199079 (0.989188)     0.002176 (0.010812)     0.201255    0.201255    0.201255    0.0185689
thread4::elementwise_add                                             1           0.089355    0.086731 (0.970634)     0.002624 (0.029366)     0.089355    0.089355    0.089355    0.0082444
thread4::sgd                                                         1           0.052253    0.050109 (0.958969)     0.002144 (0.041031)     0.052253    0.052253    0.052253    0.00482116
thread4::elementwise_add_grad                                        1           0.050706    0.047506 (0.936891)     0.003200 (0.063109)     0.050706    0.050706    0.050706    0.00467843
thread4::eager_deletion                                              2           0.003706    0.003706 (1.000000)     0.000000 (0.000000)     0.001745    0.001961    0.001853    0.000341937
thread3::mul                                                         1           1.2214      1.213974 (0.993922)     0.007424 (0.006078)     1.2214      1.2214      1.2214      0.624312
thread3::elementwise_add_grad                                        1           0.241652    0.238452 (0.986758)     0.003200 (0.013242)     0.241652    0.241652    0.241652    0.123519
thread3::mean_grad                                                   1           0.237133    0.234957 (0.990824)     0.002176 (0.009176)     0.237133    0.237133    0.237133    0.12121
thread3::mul_grad                                                    1           0.194046    0.186910 (0.963225)     0.007136 (0.036775)     0.194046    0.194046    0.194046    0.0991858
thread3::elementwise_add                                             1           0.05774     0.054828 (0.949567)     0.002912 (0.050433)     0.05774     0.05774     0.05774     0.0295136
thread3::eager_deletion                                              2           0.00442     0.004420 (1.000000)     0.000000 (0.000000)     0.002201    0.002219    0.00221     0.00225926
thread2::Scale LossGrad                                              1           0.888646    0.886854 (0.997983)     0.001792 (0.002017)     0.888646    0.888646    0.888646    1
  GpuMemcpyAsync:CPU->GPU                                            1           0.042646    0.040854 (0.957980)     0.001792 (0.042020)     0.042646    0.042646    0.042646    0.0479899
thread1::mean                                                        1           0.289439    0.286559 (0.990050)     0.002880 (0.009950)     0.289439    0.289439    0.289439    0.405402
thread1::mul                                                         1           0.256854    0.249654 (0.971969)     0.007200 (0.028031)     0.256854    0.256854    0.256854    0.359762
thread1::elementwise_add                                             1           0.102371    0.099619 (0.973117)     0.002752 (0.026883)     0.102371    0.102371    0.102371    0.143386
thread1::sgd                                                         1           0.060164    0.058084 (0.965428)     0.002080 (0.034572)     0.060164    0.060164    0.060164    0.0842686
thread1::eager_deletion                                              1           0.005127    0.005127 (1.000000)     0.000000 (0.000000)     0.005127    0.005127    0.005127    0.00718112
thread0::FastThreadedSSAGraphExecutorPrepare                         1           26.0902     25.997574 (0.996450)    0.092608 (0.003550)     26.0902     26.0902     26.0902     0.973678
thread0::InitLocalVars                                               1           0.57186     0.571860 (1.000000)     0.000000 (0.000000)     0.57186     0.57186     0.57186     0.0213417
  coalesce_tensor                                                    4           0.473217    0.473217 (1.000000)     0.000000 (0.000000)     0.084212    0.199022    0.118304    0.0176603
thread0::GpuMemcpyAsync:CPU->GPU                                     4           0.116875    0.102316 (0.875431)     0.014559 (0.124569)     0.020035    0.053887    0.0292188   0.00436174
thread0::ScopeBufferedMonitor::post_local_exec_scopes_process        1           0.012179    0.012179 (1.000000)     0.000000 (0.000000)     0.012179    0.012179    0.012179    0.000454517
thread0::ScopeBufferedMonitor::pre_local_exec_scopes_process         1           0.004394    0.004394 (1.000000)     0.000000 (0.000000)     0.004394    0.004394    0.004394    0.000163983

develop

------------------------->     Profiling Report     <-------------------------

Note! This Report merge all thread info into one.
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread

Total time: 57.2235
  Computation time       Total: 15.9601     Ratio: 27.8908%
  Framework overhead     Total: 41.2634     Ratio: 72.1092%

-------------------------     GpuMemCpy Summary     -------------------------

GpuMemcpy                Calls: 12          Total: 1.21895     Ratio: 2.13016%
  GpuMemcpyAsync         Calls: 12          Total: 1.21895     Ratio: 2.13016%

-------------------------       Event Summary       -------------------------

Event                                                       Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
FastThreadedSSAGraphExecutorPrepare                         1           27.1881     27.091693 (0.996455)    0.096383 (0.003545)     27.1881     27.1881     27.1881     0.475121
mean                                                        4           10.4217     10.399238 (0.997845)    0.022463 (0.002155)     1.09743     3.72905     2.60543     0.182123
fused_all_reduce                                            1           8.69537     2.343391 (0.269499)     6.351975 (0.730501)     8.69537     8.69537     8.69537     0.151955
mul                                                         4           4.26694     4.237721 (0.993153)     0.029216 (0.006847)     0.622723    1.72138     1.06673     0.0745662
Scale LossGrad                                              4           3.11722     3.107936 (0.997023)     0.009280 (0.002977)     0.505603    0.953103    0.779304    0.0544744
  GpuMemcpyAsync:CPU->GPU                                   4           0.866906    0.857626 (0.989295)     0.009280 (0.010705)     0.029891    0.529066    0.216727    0.0151495
mul_grad                                                    4           1.27391     1.245369 (0.977593)     0.028544 (0.022407)     0.063961    1.06414     0.318478    0.0222621
InitLocalVars                                               1           0.580126    0.580126 (1.000000)     0.000000 (0.000000)     0.580126    0.580126    0.580126    0.0101379
  coalesce_tensor                                           4           0.476745    0.476745 (1.000000)     0.000000 (0.000000)     0.073257    0.221552    0.119186    0.00833129
sgd                                                         8           0.427709    0.412989 (0.965584)     0.014720 (0.034416)     0.0257      0.131092    0.0534636   0.00747436
Fetch                                                       1           0.366278    0.345606 (0.943562)     0.020672 (0.056438)     0.366278    0.366278    0.366278    0.00640084
  GpuMemcpyAsync:GPU->CPU                                   4           0.215761    0.195089 (0.904190)     0.020672 (0.095810)     0.034806    0.096327    0.0539403   0.0037705
elementwise_add                                             4           0.321055    0.310335 (0.966610)     0.010720 (0.033390)     0.064491    0.097086    0.0802637   0.00561055
mean_grad                                                   4           0.199697    0.191122 (0.957060)     0.008575 (0.042940)     0.039939    0.061745    0.0499243   0.00348977
elementwise_add_grad                                        4           0.17598     0.163564 (0.929447)     0.012416 (0.070553)     0.037632    0.053681    0.043995    0.00307531
GpuMemcpyAsync:CPU->GPU                                     4           0.136287    0.122559 (0.899271)     0.013728 (0.100729)     0.023804    0.060851    0.0340717   0.00238166
eager_deletion                                              12          0.039494    0.039494 (1.000000)     0.000000 (0.000000)     0.00112     0.007255    0.00329117  0.000690171
ScopeBufferedMonitor::post_local_exec_scopes_process        1           0.010934    0.010934 (1.000000)     0.000000 (0.000000)     0.010934    0.010934    0.010934    0.000191075
ScopeBufferedMonitor::pre_local_exec_scopes_process         1           0.002692    0.002692 (1.000000)     0.000000 (0.000000)     0.002692    0.002692    0.002692    4.70436e-05

------------------------->     Profiling Report     <-------------------------

Place: All
Time unit: ms
Sorted by total time in descending order in the same thread


-------------------------       Event Summary       -------------------------

Event                                                                Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
thread12::sgd                                                        1           0.131092    0.129524 (0.988039)     0.001568 (0.011961)     0.131092    0.131092    0.131092    1
thread11::sgd                                                        2           0.099652    0.096196 (0.965319)     0.003456 (0.034681)     0.033393    0.066259    0.049826    1
thread10::mean                                                       1           2.6048      2.598558 (0.997604)     0.006240 (0.002396)     2.6048      2.6048      2.6048      0.875974
thread10::Fetch                                                      1           0.366278    0.345606 (0.943562)     0.020672 (0.056438)     0.366278    0.366278    0.366278    0.123177
  GpuMemcpyAsync:GPU->CPU                                            4           0.215761    0.195089 (0.904190)     0.020672 (0.095810)     0.034806    0.096327    0.0539403   0.0725588
thread10::eager_deletion                                             1           0.002525    0.002525 (1.000000)     0.000000 (0.000000)     0.002525    0.002525    0.002525    0.000849139
thread9::mean                                                        1           2.99042     2.985007 (0.998192)     0.005408 (0.001808)     2.99042     2.99042     2.99042     0.999377
thread9::eager_deletion                                              1           0.001864    0.001864 (1.000000)     0.000000 (0.000000)     0.001864    0.001864    0.001864    0.000622937
thread8::mul                                                         1           1.25035     1.243022 (0.994139)     0.007328 (0.005861)     1.25035     1.25035     1.25035     0.850321
thread8::mul_grad                                                    1           0.071167    0.063903 (0.897930)     0.007264 (0.102070)     0.071167    0.071167    0.071167    0.0483983
thread8::elementwise_add                                             1           0.064491    0.061771 (0.957824)     0.002720 (0.042176)     0.064491    0.064491    0.064491    0.0438582
thread8::mean_grad                                                   1           0.039939    0.037731 (0.944716)     0.002208 (0.055284)     0.039939    0.039939    0.039939    0.0271612
thread8::elementwise_add_grad                                        1           0.037632    0.034528 (0.917517)     0.003104 (0.082483)     0.037632    0.037632    0.037632    0.0255923
thread8::eager_deletion                                              2           0.006866    0.006866 (1.000000)     0.000000 (0.000000)     0.00112     0.005746    0.003433    0.00466933
thread7::mean                                                        1           3.72905     3.721408 (0.997949)     0.007647 (0.002051)     3.72905     3.72905     3.72905     0.83005
thread7::mul                                                         1           0.672486    0.665222 (0.989198)     0.007264 (0.010802)     0.672486    0.672486    0.672486    0.149689
thread7::elementwise_add                                             1           0.084597    0.081717 (0.965956)     0.002880 (0.034044)     0.084597    0.084597    0.084597    0.0188304
thread7::eager_deletion                                              1           0.006427    0.006427 (1.000000)     0.000000 (0.000000)     0.006427    0.006427    0.006427    0.00143059
thread6::fused_all_reduce                                            1           8.69537     2.343391 (0.269499)     6.351975 (0.730501)     8.69537     8.69537     8.69537     0.733848
thread6::mul                                                         1           1.72138     1.713794 (0.995594)     0.007584 (0.004406)     1.72138     1.72138     1.72138     0.145276
thread6::mul_grad                                                    1           1.06414     1.056752 (0.993054)     0.007392 (0.006946)     1.06414     1.06414     1.06414     0.0898088
thread6::sgd                                                         5           0.196965    0.187269 (0.950773)     0.009696 (0.049227)     0.0257      0.064178    0.039393    0.0166229
thread6::elementwise_add                                             1           0.074881    0.072321 (0.965812)     0.002560 (0.034188)     0.074881    0.074881    0.074881    0.00631961
thread6::mean_grad                                                   1           0.044631    0.042423 (0.950528)     0.002208 (0.049472)     0.044631    0.044631    0.044631    0.00376665
thread6::elementwise_add_grad                                        1           0.041424    0.038000 (0.917343)     0.003424 (0.082657)     0.041424    0.041424    0.041424    0.00349599
thread6::eager_deletion                                              2           0.010209    0.010209 (1.000000)     0.000000 (0.000000)     0.002954    0.007255    0.0051045   0.000861592
thread5::Scale LossGrad                                              1           0.953103    0.951247 (0.998053)     0.001856 (0.001947)     0.953103    0.953103    0.953103    0.852902
  GpuMemcpyAsync:CPU->GPU                                            1           0.029891    0.028035 (0.937908)     0.001856 (0.062092)     0.029891    0.029891    0.029891    0.0267485
thread5::mul_grad                                                    1           0.063961    0.057369 (0.896937)     0.006592 (0.103063)     0.063961    0.063961    0.063961    0.0572367
thread5::mean_grad                                                   1           0.053382    0.051398 (0.962834)     0.001984 (0.037166)     0.053382    0.053382    0.053382    0.0477699
thread5::elementwise_add_grad                                        1           0.043243    0.040427 (0.934880)     0.002816 (0.065120)     0.043243    0.043243    0.043243    0.0386968
thread5::eager_deletion                                              2           0.003794    0.003794 (1.000000)     0.000000 (0.000000)     0.001456    0.002338    0.001897    0.00339513
thread4::Scale LossGrad                                              1           0.505603    0.503811 (0.996456)     0.001792 (0.003544)     0.505603    0.505603    0.505603    1
  GpuMemcpyAsync:CPU->GPU                                            1           0.272054    0.270262 (0.993413)     0.001792 (0.006587)     0.272054    0.272054    0.272054    0.538078
thread3::mean                                                        1           1.09743     1.094265 (0.997113)     0.003168 (0.002887)     1.09743     1.09743     1.09743     0.603191
thread3::mul                                                         1           0.622723    0.615683 (0.988695)     0.007040 (0.011305)     0.622723    0.622723    0.622723    0.342272
thread3::elementwise_add                                             1           0.097086    0.094526 (0.973632)     0.002560 (0.026368)     0.097086    0.097086    0.097086    0.0533622
thread3::eager_deletion                                              1           0.002137    0.002137 (1.000000)     0.000000 (0.000000)     0.002137    0.002137    0.002137    0.00117458
thread2::Scale LossGrad                                              1           0.749638    0.747526 (0.997183)     0.002112 (0.002817)     0.749638    0.749638    0.749638    1
  GpuMemcpyAsync:CPU->GPU                                            1           0.529066    0.526954 (0.996008)     0.002112 (0.003992)     0.529066    0.529066    0.529066    0.705762
thread1::Scale LossGrad                                              1           0.908872    0.905352 (0.996127)     0.003520 (0.003873)     0.908872    0.908872    0.908872    0.822798
  GpuMemcpyAsync:CPU->GPU                                            1           0.035895    0.032375 (0.901936)     0.003520 (0.098064)     0.035895    0.035895    0.035895    0.0324956
thread1::mul_grad                                                    1           0.074641    0.067345 (0.902252)     0.007296 (0.097748)     0.074641    0.074641    0.074641    0.0675722
thread1::mean_grad                                                   1           0.061745    0.059570 (0.964774)     0.002175 (0.035226)     0.061745    0.061745    0.061745    0.0558975
thread1::elementwise_add_grad                                        1           0.053681    0.050609 (0.942773)     0.003072 (0.057227)     0.053681    0.053681    0.053681    0.0485972
thread1::eager_deletion                                              2           0.005672    0.005672 (1.000000)     0.000000 (0.000000)     0.001551    0.004121    0.002836    0.00513484
thread0::FastThreadedSSAGraphExecutorPrepare                         1           27.1881     27.091693 (0.996455)    0.096383 (0.003545)     27.1881     27.1881     27.1881     0.973851
thread0::InitLocalVars                                               1           0.580126    0.580126 (1.000000)     0.000000 (0.000000)     0.580126    0.580126    0.580126    0.0207796
  coalesce_tensor                                                    4           0.476745    0.476745 (1.000000)     0.000000 (0.000000)     0.073257    0.221552    0.119186    0.0170765
thread0::GpuMemcpyAsync:CPU->GPU                                     4           0.136287    0.122559 (0.899271)     0.013728 (0.100729)     0.023804    0.060851    0.0340717   0.00488167
thread0::ScopeBufferedMonitor::post_local_exec_scopes_process        1           0.010934    0.010934 (1.000000)     0.000000 (0.000000)     0.010934    0.010934    0.010934    0.000391645
thread0::ScopeBufferedMonitor::pre_local_exec_scopes_process         1           0.002692    0.002692 (1.000000)     0.000000 (0.000000)     0.002692    0.002692    0.002692    9.64248e-05
share_vars_from is set, scope is ignored.

指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!24611
Source branch: github/fork/wangchaochaohu/add_PE_profiler
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7