PaddlePaddle / Paddle
大约 1 年前同步成功

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 1423
- 列表
- 看板
- 标记
- 里程碑
合并请求 543
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

Profiling tools survey.

Created by: qingqing01

The profiling tools are important for performance tuning. I write some survey about profiling tools at last week.

Caffe2: there are CUDA and CPU profiling operators. - CUDA: nvprof_op https://github.com/caffe2/caffe2/blob/master/caffe2/contrib/prof/cuda_profile_ops.cc - CPU: stats_op: https://github.com/caffe2/caffe2/blob/master/caffe2/operators/stats_ops.cc But I think this operator is not convenient for multiple mini-batches. - prof_net_op: https://github.com/caffe2/caffe2/blob/master/caffe2/contrib/prof/prof_dag_net.cc This operator can count the execution time of each op in the network。
PyTorch - CUDA: nvprof - https://github.com/pytorch/pytorch/blob/master/torch/cuda/profiler.py - https://github.com/pytorch/pytorch/blob/master/torch/autograd/profiler.py - CPU：write their own profiling tools. https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/profiler.h
TensorFlow: - Write their own specialized, complex, flexible tools ： https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/profiler

For the profiling of deep learning system with Python and C++ code, maybe the main statistical is as follows:

Time and ratio of each operator in the total execution time.
Time and ratio of each operator with multiple mini-batches, then can calculate the average time of each operator.
Time of Python execution process.

And @reyoung write how to use Yep + pprof to profiling. The developers don't need to add extra code. And I add nvprof tools in our framework.

In addition, I think the Timer in PaddlePaddle's old framework is good and convenient to count time of each operator and cross mini-batches.