Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !7983

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
接近 2 年 前同步成功

通知 2323
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Feature/memory profiler !7983

  • Report abuse
!7983 已关闭 1月 30, 2018 由 saxon_zh@saxon_zh 创建
#<User:0x00007f2baf923470>
  • 概览 16
  • 提交 9
  • 变更 5

Created by: dzhwinter

hack the memory profiler for memory debugging and benchmark. When I do this job, I find that the structure of profiler needs to make it readable.

Use with profiler.profiler('CPU', 'total') as prof to package the profiling code. profiler.reset_profiler() can be used to clear the previous records.

A simple usage is as follows:

image = fluid.layers.data(name='x', shape=[784], dtype='float32')
# ...
avg_cost = fluid.layers.mean(x=cost)
optimizer = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9)
opts = optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=predict, label=label)

place = fluid.CPUPlace() # or fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
accuracy.reset(exe)

with profiler.profiler('CPU', 'total') as prof:
     for iter in range(10):
         if iter == 2:
             profiler.reset_profiler()
         x = np.random.random((32, 784)).astype("float32")
         y = np.random.randint(0, 10, (32, 1)).astype("int64")

         outs = exe.run(fluid.default_main_program(),
                        feed={'x': x,
                              'y': y},
                        fetch_list=[avg_cost] + accuracy.metrics)
         acc = np.array(outs[1])
         pass_acc = accuracy.eval(exe)

Please see this for demo usage https://github.com/dzhwinter/benchmark/pull/80

  1 -----------  Configuration Arguments -----------
   2 batch_size: 16
   3 data_format: NCHW
   4 data_set: flowers
   5 device: GPU
   6 learning_rate: 0.001
   7 pass_num: 1
   8 ------------------------------------------------
   9
  10 ------------------------->     Profiling Report     <-------------------------
  11
  12 Place: CUDA Total Time:0ms  Total Memory:9065.64MB  Sorted by total time in descending order in the same thread
  13
  14 Event                            Calls       Total       Min.        Max.        Ave.        Total Memory.Min Memory. Max Memory. Ave Memory.
  15 thread0::elementwise_add_grad    16          206.254     0.037888    63.5791     12.8909     4648.4      0.00683594  196.001     290.525
  16 thread0::conv2d_grad             13          112.87      2.88768     19.3096     8.68234     4727.98     0.00683594  196.141     363.691
  17 thread0::dropout                 10          77.7945     0.043072    35.1161     7.77945     1121.58     0.0629883   392         112.158
  18 thread0::conv2d                  13          55.5693     0.971776    12.2081     4.27456     337.579     6.12524     196         25.9676
  19 thread0::batch_norm_grad         14          14.7364     0.074752    3.76525     1.0526      4649.8      0.0358887   196.001     332.129
  20 thread0::batch_norm              14          13.8792     0.083968    4.15642     0.991369    729.579     0.0358887   196.001     52.1128
  21 thread0::relu_grad               14          8.91693     0.027648    2.32448     0.636923    4649.76     0.0314941   196         332.126
  22 thread0::elementwise_add         16          7.9863      0.027648    2.15555     0.499144    533.579     0.00634766  196         33.3487
  23 thread0::relu                    14          7.06045     0.022528    1.92        0.504318    925.581     0.0314941   196         66.1129
  24 thread0::pool2d_grad             5           5.06368     0.152576    2.57126     1.01274     4703.47     6.12524     196         940.693
  25 thread0::dropout_grad            10          4.85581     0.034816    2.26202     0.485581    4649.73     0.0314941   196         464.973
  26 thread0::adam                    60          3.85926     0.026624    0.975872    0.0643211   9065.61     0           0           151.094
  27 thread0::fill_zeros_like         66          2.56307     0.017408    0.636928    0.0388344   4649.7      0.000488281 196         70.45
  28 thread0::pool2d                  5           1.72131     0.079872    0.797664    0.344262    2297.58     1.53149     49.0002     459.517
  29 thread0::elementwise_mul         60          1.66064     0.0256      0.048128    0.0276773   9065.61     0.000244141 0.000244141 151.094
  30 thread0::fill_constant           61          0.998304    0.014336    0.035744    0.0163656   4648.38     0.000244141 0.000244141 76.203
  31 thread0::mul_grad                3           0.518048    0.067584    0.36352     0.172683    4648.4      0.230957    50.5317     1549.47
  32 thread0::mul                     3           0.433088    0.041984    0.344       0.144363    4648.1      0.00634766  0.0314941   1549.37
  33 thread0::fetch                   2           0.275616    0.027136    0.24848     0.137808    9065.64     0           0           4532.82
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!7983
Source branch: github/fork/dzhwinter/feature/memory_profile
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7