Feature/memory profiler (!7983) · 合并请求 · PaddlePaddle / Paddle

Feature/memory profiler !7983

Created by: dzhwinter

hack the memory profiler for memory debugging and benchmark. When I do this job, I find that the structure of profiler needs to make it readable.

Use with profiler.profiler('CPU', 'total') as prof to package the profiling code. profiler.reset_profiler() can be used to clear the previous records.

A simple usage is as follows:

image = fluid.layers.data(name='x', shape=[784], dtype='float32')
# ...
avg_cost = fluid.layers.mean(x=cost)
optimizer = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9)
opts = optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=predict, label=label)

place = fluid.CPUPlace() # or fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
accuracy.reset(exe)

with profiler.profiler('CPU', 'total') as prof:
     for iter in range(10):
         if iter == 2:
             profiler.reset_profiler()
         x = np.random.random((32, 784)).astype("float32")
         y = np.random.randint(0, 10, (32, 1)).astype("int64")

         outs = exe.run(fluid.default_main_program(),
                        feed={'x': x,
                              'y': y},
                        fetch_list=[avg_cost] + accuracy.metrics)
         acc = np.array(outs[1])
         pass_acc = accuracy.eval(exe)

Please see this for demo usage https://github.com/dzhwinter/benchmark/pull/80

  1 -----------  Configuration Arguments -----------
   2 batch_size: 16
   3 data_format: NCHW
   4 data_set: flowers
   5 device: GPU
   6 learning_rate: 0.001
   7 pass_num: 1
   8 ------------------------------------------------
   9
  10 ------------------------->     Profiling Report     <-------------------------
  11
  12 Place: CUDA Total Time:0ms  Total Memory:9065.64MB  Sorted by total time in descending order in the same thread
  13
  14 Event                            Calls       Total       Min.        Max.        Ave.        Total Memory.Min Memory. Max Memory. Ave Memory.
  15 thread0::elementwise_add_grad    16          206.254     0.037888    63.5791     12.8909     4648.4      0.00683594  196.001     290.525
  16 thread0::conv2d_grad             13          112.87      2.88768     19.3096     8.68234     4727.98     0.00683594  196.141     363.691
  17 thread0::dropout                 10          77.7945     0.043072    35.1161     7.77945     1121.58     0.0629883   392         112.158
  18 thread0::conv2d                  13          55.5693     0.971776    12.2081     4.27456     337.579     6.12524     196         25.9676
  19 thread0::batch_norm_grad         14          14.7364     0.074752    3.76525     1.0526      4649.8      0.0358887   196.001     332.129
  20 thread0::batch_norm              14          13.8792     0.083968    4.15642     0.991369    729.579     0.0358887   196.001     52.1128
  21 thread0::relu_grad               14          8.91693     0.027648    2.32448     0.636923    4649.76     0.0314941   196         332.126
  22 thread0::elementwise_add         16          7.9863      0.027648    2.15555     0.499144    533.579     0.00634766  196         33.3487
  23 thread0::relu                    14          7.06045     0.022528    1.92        0.504318    925.581     0.0314941   196         66.1129
  24 thread0::pool2d_grad             5           5.06368     0.152576    2.57126     1.01274     4703.47     6.12524     196         940.693
  25 thread0::dropout_grad            10          4.85581     0.034816    2.26202     0.485581    4649.73     0.0314941   196         464.973
  26 thread0::adam                    60          3.85926     0.026624    0.975872    0.0643211   9065.61     0           0           151.094
  27 thread0::fill_zeros_like         66          2.56307     0.017408    0.636928    0.0388344   4649.7      0.000488281 196         70.45
  28 thread0::pool2d                  5           1.72131     0.079872    0.797664    0.344262    2297.58     1.53149     49.0002     459.517
  29 thread0::elementwise_mul         60          1.66064     0.0256      0.048128    0.0276773   9065.61     0.000244141 0.000244141 151.094
  30 thread0::fill_constant           61          0.998304    0.014336    0.035744    0.0163656   4648.38     0.000244141 0.000244141 76.203
  31 thread0::mul_grad                3           0.518048    0.067584    0.36352     0.172683    4648.4      0.230957    50.5317     1549.47
  32 thread0::mul                     3           0.433088    0.041984    0.344       0.144363    4648.1      0.00634766  0.0314941   1549.37
  33 thread0::fetch                   2           0.275616    0.027136    0.24848     0.137808    9065.64     0           0           4532.82

PaddlePaddle / Paddle 接近 2 年 前同步成功

Feature/memory profiler !7983

PaddlePaddle / Paddle
接近 2 年前同步成功