Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !22953

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Add RecordEvent for dygraph to enable profiling. !22953

  • Report abuse
!22953 开放中 3月 11, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007ff7c8019718>
  • 概览 0
  • 提交 6
  • 变更 3

Created by: Xreki

给动态图添加一些RecordEvent,以使用profiler计时。主要添加了TraceOp定义、OpBase run时间。

Profiler结果

  • tracer_option设置为"Default"时
-------------------------       Event Summary       -------------------------

Event                                             Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
thread0::trace::elementwise_mul                   295021      20699.9     20163.544405 (0.974088) 536.377183 (0.025912)   0.049803    19.7763     0.0701642   0.124442
  elementwise_mul                                 295021      15792       15255.596223 (0.966035) 536.377183 (0.033965)   0.037162    19.7454     0.0535283   0.094937
thread0::trace::sigmoid                           278760      17430.3     16952.911439 (0.972612) 477.388863 (0.027388)   0.044703    54.5046     0.062528    0.104786
  sigmoid                                         278760      13471.3     12993.881955 (0.964562) 477.388863 (0.035438)   0.033885    19.3493     0.0483257   0.0809856
thread0::trace::elementwise_add                   188163      15421       15008.685350 (0.973260) 412.360942 (0.026740)   0.051291    20.3084     0.0819558   0.0927071
  elementwise_add                                 188163      11770.5     11358.157335 (0.964967) 412.360942 (0.035033)   0.036847    20.2535     0.0625549   0.0707612
thread0::trace::tanh                              185840      11365.5     11057.260118 (0.972881) 308.225771 (0.027119)   0.045966    45.9233     0.0611574   0.0683262
  tanh                                            185840      8775.66     8467.431791 (0.964877)  308.225771 (0.035123)   0.034759    45.8629     0.0472216   0.0527569
thread0::matmul_grad                              95243       11283.3     7796.735016 (0.691000)  3486.524817 (0.309000)  0.081368    105.813     0.118468    0.0678319
thread0::trace::matmul                            95243       10924       9377.359161 (0.858419)  1546.629708 (0.141581)  0.088672    19.7287     0.114696    0.065672
  matmul                                          95243       9146.79     7600.156775 (0.830910)  1546.629708 (0.169090)  0.073666    19.7017     0.0960363   0.054988
thread0::elementwise_mul_grad                     278760      10797.3     10208.462237 (0.945468) 588.792533 (0.054532)   0.028732    21.4931     0.0387332   0.0649102
thread0::sigmoid_grad                             278760      9968.17     9533.539719 (0.956398)  434.632331 (0.043602)   0.027511    24.2642     0.035759    0.0599259
thread0::trace::concat                            102212      8197.7      7971.810544 (0.972445)  225.887403 (0.027555)   0.048438    23.9326     0.0802029   0.0492823
  concat                                          102212      6336.53     6110.640915 (0.964352)  225.887403 (0.035648)   0.036144    22.4009     0.061994    0.0380935
    GpuMemcpyAsync:CPU->GPU                       2323        99.5479     95.156308 (0.955885)    4.391546 (0.044115)     0.026608    10.0145     0.0428531   0.000598455
    GpuMemcpyAsync(same_gpu):GPU->GPU             16261       362.299     340.398368 (0.939550)   21.901080 (0.060450)    0.014118    14.8761     0.0222803   0.00217805
thread0::trace::split                             92920       7483.12     7285.324363 (0.973568)  197.797091 (0.026432)   0.056189    21.1272     0.0805329   0.0449865
  split                                           92920       5841.18     5643.387099 (0.966138)  197.797091 (0.033862)   0.0423      21.0755     0.0628625   0.0351156
thread0::elementwise_add_grad                     188163      7328.05     6867.405289 (0.937140)  460.642014 (0.062860)   0.027718    35.002      0.0389452   0.0440542
thread0::tanh_grad                                185840      6553.21     6268.124259 (0.956497)  285.087087 (0.043503)   0.027526    18.5929     0.0352627   0.0393961
thread0::trace::uniform_random                    7           4619.67     4619.022944 (0.999861)  0.642942 (0.000139)     0.109459    4617.29     659.952     0.0277722
  uniform_random                                  7           4619.57     4618.923239 (0.999861)  0.642942 (0.000139)     0.100064    4617.25     659.938     0.0277716
thread0::trace::slice                             55752       4332.32     4200.342893 (0.969536)  131.982003 (0.030464)   0.042287    18.5202     0.0777071   0.0260448
  slice                                           55752       3344.56     3212.578064 (0.960538)  131.982003 (0.039462)   0.037432    18.4722     0.05999     0.0201066
thread0::concat_grad                              95243       4141.89     3928.251315 (0.948420)  213.638211 (0.051580)   0.030779    16.5058     0.0434876   0.0248999
  GpuMemcpyAsync:CPU->GPU                         2323        56.4214     51.910886 (0.920057)    4.510519 (0.079943)     0.015668    9.19188     0.0242882   0.00033919
thread0::concat                                   92920       4021.44     3811.756727 (0.947860)  209.679474 (0.052140)   0.030169    18.4992     0.0432785   0.0241758
thread0::slice_grad                               46460       2005.84     1879.338492 (0.936931)  126.506328 (0.063069)   0.033499    18.46       0.0431736   0.0120586
thread0::trace::reduce_sum                        20907       1887.86     1767.050701 (0.936008)  120.807216 (0.063992)   0.048801    16.5262     0.0902979   0.0113493
  reduce_sum                                      20907       1723.07     1602.262647 (0.929888)  120.807216 (0.070112)   0.043879    16.52       0.0824159   0.0103586
thread0::trace::reshape2                          16261       1459.65     1432.694261 (0.981531)  26.958985 (0.018469)    0.053817    18.4844     0.0897641   0.00877504
  reshape2                                        16261       1221.98     1195.020813 (0.977938)  26.958985 (0.022062)    0.046352    18.4596     0.0751479   0.00734621
    GpuMemcpyAsync(same_gpu):GPU->GPU             16261       670.593     643.633849 (0.959798)   26.958985 (0.040202)    0.02053     18.4275     0.0412393   0.00403142
  • tracer_option设置为"OpDetail"时
-------------------------       Event Summary       -------------------------

Event                                                       Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
thread0::trace::elementwise_mul                             295021      22224       21689.400541 (0.975943) 534.647137 (0.024057)   0.051376    47.4034     0.0753304   0.12384
  elementwise_mul                                           295021      16866.1     16331.465778 (0.968301) 534.647137 (0.031699)   0.038085    47.3347     0.0571692   0.093984
    elementwise_mul/mutable_var                             295021      815.767     815.766571 (1.000000)   0.000000 (0.000000)     0.001909    15.3072     0.00276511  0.00454574
    elementwise_mul/infer_shape                             295021      1545.64     1545.637178 (1.000000)  0.000000 (0.000000)     0.003805    13.0704     0.00523908  0.00861284
    elementwise_mul/compute                                 295021      9231.31     8696.660879 (0.942083)  534.647137 (0.057917)   0.018881    47.3015     0.0312903   0.0514401
thread0::trace::sigmoid                                     278760      18867.9     18390.807995 (0.974717) 477.043869 (0.025283)   0.045942    45.1106     0.0676849   0.105138
  sigmoid                                                   278760      14497.2     14020.118776 (0.967094) 477.043869 (0.032906)   0.034785    45.0921     0.0520059   0.0807833
    sigmoid/mutable_var                                     278760      728.007     728.007330 (1.000000)   0.000000 (0.000000)     0.00185     12.3596     0.00261159  0.00405672
    sigmoid/infer_shape                                     278760      818.173     818.173317 (1.000000)   0.000000 (0.000000)     0.002195    4.44803     0.00293505  0.00455915
    sigmoid/compute                                         278760      8384.28     7907.235240 (0.943103)  477.043869 (0.056897)   0.018679    23.8468     0.0300771   0.0467202
thread0::trace::elementwise_add                             188163      16498.3     16085.303700 (0.974968) 412.989688 (0.025032)   0.054022    25.675      0.0876809   0.0919343
  elementwise_add                                           188163      12450.2     12037.222237 (0.966829) 412.989688 (0.033171)   0.038458    25.6021     0.0661672   0.069377
    elementwise_add/mutable_var                             188163      537.545     537.545205 (1.000000)   0.000000 (0.000000)     0.001911    15.0784     0.00285681  0.00299539
    elementwise_add/infer_shape                             188163      1129.68     1129.684713 (1.000000)  0.000000 (0.000000)     0.003661    17.2693     0.00600376  0.006295
    elementwise_add/compute                                 188163      7323.43     6910.438996 (0.943607)  412.989688 (0.056393)   0.019223    24.7346     0.0389207   0.0408087
thread0::trace::tanh                                        185840      12511.6     12203.161753 (0.975351) 308.401579 (0.024649)   0.047614    48.9886     0.0673244   0.0697189
  tanh                                                      185840      9548.18     9239.782378 (0.967700)  308.401579 (0.032300)   0.036118    48.9272     0.0513785   0.0532059
    tanh/mutable_var                                        185840      449.617     449.617061 (1.000000)   0.000000 (0.000000)     0.001855    4.49429     0.00241938  0.00250543
    tanh/infer_shape                                        185840      564.351     564.351379 (1.000000)   0.000000 (0.000000)     0.002182    11.0229     0.00303676  0.00314477
    tanh/compute                                            185840      5705.06     5396.660562 (0.945942)  308.401579 (0.054058)   0.019368    48.8987     0.0306988   0.0317906
thread0::matmul_grad                                        95243       12020.6     8535.450835 (0.710066)  3485.196014 (0.289934)  0.082694    91.1439     0.12621     0.0669833
  thread0::matmul_grad/mutable_var                          95243       245.919     245.919410 (1.000000)   0.000000 (0.000000)     0.001893    2.13686     0.00258202  0.00137035
  thread0::matmul_grad/infer_shape                          95243       448.776     448.775658 (1.000000)   0.000000 (0.000000)     0.003282    21.5141     0.0047119   0.00250074
  thread0::matmul_grad/compute                              95243       10316.5     6831.259899 (0.662171)  3485.196014 (0.337829)  0.068233    91.1095     0.108317    0.057487
thread0::trace::matmul                                      95243       11746.7     10191.921382 (0.867640) 1554.793710 (0.132360)  0.090045    22.4605     0.123334    0.0654569
  matmul                                                    95243       9798.42     8243.627965 (0.841322)  1554.793710 (0.158678)  0.075643    22.3954     0.102878    0.0546003
    matmul/mutable_var                                      95243       276.706     276.705942 (1.000000)   0.000000 (0.000000)     0.001888    5.48664     0.00290526  0.0015419
    matmul/infer_shape                                      95243       608.525     608.525453 (1.000000)   0.000000 (0.000000)     0.004591    11.1404     0.00638919  0.00339092
    matmul/compute                                          95243       7764.74     6209.941411 (0.799762)  1554.793710 (0.200238)  0.058763    22.3688     0.0815255   0.0432679
thread0::elementwise_mul_grad                               278760      11402.4     10812.645272 (0.948278) 589.760714 (0.051722)   0.028939    33.1241     0.040904    0.0635383
  thread0::elementwise_mul_grad/mutable_var                 278760      693.415     693.414540 (1.000000)   0.000000 (0.000000)     0.001777    12.6198     0.0024875   0.00386395
  thread0::elementwise_mul_grad/infer_shape                 278760      1130.91     1130.914327 (1.000000)  0.000000 (0.000000)     0.002773    12.5793     0.00405695  0.00630186
  thread0::elementwise_mul_grad/compute                     278760      6759.91     6170.152945 (0.912756)  589.760714 (0.087244)   0.016099    33.0981     0.0242499   0.0376686
thread0::sigmoid_grad                                       278760      10411.2     9976.661146 (0.958259)  434.573065 (0.041741)   0.027713    18.5712     0.0373484   0.0580151
  thread0::sigmoid_grad/mutable_var                         278760      675.252     675.251732 (1.000000)   0.000000 (0.000000)     0.001856    9.11445     0.00242234  0.00376274
  thread0::sigmoid_grad/infer_shape                         278760      826.263     826.262670 (1.000000)   0.000000 (0.000000)     0.002369    3.83187     0.00296406  0.00460423
  thread0::sigmoid_grad/compute                             278760      6145.62     5711.049729 (0.929287)  434.573065 (0.070713)   0.015307    18.5486     0.0220463   0.0342456

RecordEvent的开销

基于该PR,关闭profiler功能之后,ptb模型的耗时,因为机器多人共用,会有一些波动。

One epoch cost 93.4636731148
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!22953
Source branch: github/fork/Xreki/profiler/dygraph
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7