diff --git a/doc/index.md b/doc/index.md index a4dffb0405a6b23c88473307a1d199e3caaadf55..9c5ebba9bed7672764c1b689dc7948c71c2d116d 100644 --- a/doc/index.md +++ b/doc/index.md @@ -17,6 +17,7 @@ Development Guide * [Layer Documents](layer.md) * [Writing New Layers](dev/new_layer/index.rst) * [Source Code Documents](source/index.md) +* [GPU Profiling Documents](optimization/index.rst) Algorithm Tutorial ------------------ diff --git a/doc/optimization/gpu_profiling.rst b/doc/optimization/gpu_profiling.rst new file mode 100644 index 0000000000000000000000000000000000000000..583c2d6caee460331aea366d4d4f65be81e553b0 --- /dev/null +++ b/doc/optimization/gpu_profiling.rst @@ -0,0 +1,77 @@ +GPU Profiling +============= + +This tutorial will guide you step-by-step through how to conduct profiling and performance tuning using :code:`nvprof` and :code:`nvvp`. + +- What is profiling? +- Why we need profiling? +- How to do profiling? +- Profile tools +- Hands-on Tutorial + +What's profiling? +================= +In software engineering, profiling is a form of dynamic program analysis that measures the space (memory) or time +complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. +Most commonly, profiling information serves to aid program optimization. + +Briefly, profiler is used to measure application performance. Program analysis tools are extremely important for +understanding program behavior. Simple profiling can tell you that how long does an operation take? For advanced +profiling, it can interpret why does an operation take a long time? + +Why we need profiling? +====================== +Since training deep neural network typically take a very long time to get over, performance is gradually becoming +the most important thing in deep learning field. The first step to improve performance is to understand what parts +are slow. No point in improving performance of a region which doesn’t take much time! + + +How to do profiling? +==================== +To achieve maximum performance, there are five steps you can take to reach your goals. + +- Profile the code +- Find the slow parts +- Work out why they’re slow +- Make them fast +- Profile the code again + +Usually, processor has two key performance limits include float point throughput and +memory throughput. For GPU, it also need more parallelism to fulfill its potential. +This is why they can be so fast. + +Profiler Tools +============== +For general GPU profiling, a bunch of tools are provided from both NVIDIA and third party. + +:code:`nvprof` is Nvidia profiler and :code:`nvvp` is (GUI based) Nvidia visual profiler. +In this tutorial, we will focus on nvprof and nvvp. + +:code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate +above profilers. + +.. code-block:: c++ + + TEST(Profiler, BilinearFwdBwd) { + hl_profiler_start(); + auto numSamples = 10; + auto channels = 16; + auto imgSize = 64; + testBilinearFwdBwd(numSamples, imgSize, imgSize, channels); + hl_profiler_end(); + } + +:code:`hl_profiler_start` and :code:`hl_profiler_end` can be used to profile only regions of interest +in PaddlePaddle. They are wrapper functions of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` +respectively to avoid program crashes when CPU version of PaddlePaddle invokes them. + +Hands-on Approach +================= + +.. image:: nvprof.png + :align: center + :scale: 30% + +.. image:: nvvp1.png + :align: center + :scale: 30% \ No newline at end of file diff --git a/doc/optimization/index.rst b/doc/optimization/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..c9e87e0778dfe44fa3d1bb84d0ad340aa6f25d08 --- /dev/null +++ b/doc/optimization/index.rst @@ -0,0 +1,7 @@ +Performance Tuning +================== + +.. toctree:: + :maxdepth: 3 + + gpu_profiling.rst diff --git a/doc/optimization/nvprof.png b/doc/optimization/nvprof.png new file mode 100644 index 0000000000000000000000000000000000000000..5931a9b7dc43e6438c9c2105020f59eb3367f0d9 Binary files /dev/null and b/doc/optimization/nvprof.png differ diff --git a/doc/optimization/nvvp1.png b/doc/optimization/nvvp1.png new file mode 100644 index 0000000000000000000000000000000000000000..1af23ac3c52929b2b0645d2f9fa4d4c6db1f6e77 Binary files /dev/null and b/doc/optimization/nvvp1.png differ