Profiling tools survey.
Created by: qingqing01
The profiling tools are important for performance tuning. I write some survey about profiling tools at last week.
-
Caffe2: there are CUDA and CPU profiling operators. - CUDA: nvprof_op https://github.com/caffe2/caffe2/blob/master/caffe2/contrib/prof/cuda_profile_ops.cc - CPU: stats_op: https://github.com/caffe2/caffe2/blob/master/caffe2/operators/stats_ops.cc But I think this operator is not convenient for multiple mini-batches. - prof_net_op: https://github.com/caffe2/caffe2/blob/master/caffe2/contrib/prof/prof_dag_net.cc This operator can count the execution time of each op in the network。
-
PyTorch - CUDA: nvprof - https://github.com/pytorch/pytorch/blob/master/torch/cuda/profiler.py - https://github.com/pytorch/pytorch/blob/master/torch/autograd/profiler.py - CPU:write their own profiling tools. https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/profiler.h
-
TensorFlow: - Write their own specialized, complex, flexible tools : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/profiler
For the profiling of deep learning system with Python and C++ code, maybe the main statistical is as follows:
- Time and ratio of each operator in the total execution time.
- Time and ratio of each operator with multiple mini-batches, then can calculate the average time of each operator.
- Time of Python execution process.
And @reyoung write how to use Yep + pprof
to profiling. The developers don't need to add extra code. And I add nvprof tools in our framework.
In addition, I think the Timer in PaddlePaddle's old framework is good and convenient to count time of each operator and cross mini-batches.