diff --git a/doc/howto/optimization/cpu_profiling.md b/doc/howto/optimization/cpu_profiling.md index e1d91c668e9c6711cc3a529168c8a3f6338de59d..1775374cf6e518586c28bbd8e04946c74df7e4c5 100644 --- a/doc/howto/optimization/cpu_profiling.md +++ b/doc/howto/optimization/cpu_profiling.md @@ -1,13 +1,13 @@ -This tutorial introduces techniques we used to profile and tune the +This tutorial introduces techniques we use to profile and tune the CPU performance of PaddlePaddle. We will use Python packages -`cProfile` and `yep`, and Google `perftools`. +`cProfile` and `yep`, and Google's `perftools`. -Profiling is the process that reveals the performance bottlenecks, +Profiling is the process that reveals performance bottlenecks, which could be very different from what's in the developers' mind. -Performance tuning is to fix the bottlenecks. Performance optimization +Performance tuning is done to fix these bottlenecks. Performance optimization repeats the steps of profiling and tuning alternatively. -PaddlePaddle users program AI by calling the Python API, which calls +PaddlePaddle users program AI applications by calling the Python API, which calls into `libpaddle.so.` written in C++. In this tutorial, we focus on the profiling and tuning of @@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime: We can see that the most time-consuming function is the `built-in method run`, which is a C++ function in `libpaddle.so`. We will -explain how to profile C++ code in the next section. At the right +explain how to profile C++ code in the next section. At this moment, let's look into the third function `sync_with_cpp`, which is a Python function. We can click it to understand more about it: @@ -135,8 +135,8 @@ to generate the profiling file. The default filename is `main.py.prof`. Please be aware of the `-v` command line option, which prints the -analysis results after generating the profiling file. By taking a -glance at the print result, we'd know that if we stripped debug +analysis results after generating the profiling file. By examining the + the print result, we'd know that if we stripped debug information from `libpaddle.so` at build time. The following hints help make sure that the analysis results are readable: @@ -155,9 +155,9 @@ help make sure that the analysis results are readable: variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically starting multiple threads. -### Look into the Profiling File +### Examining the Profiling File -The tool we used to look into the profiling file generated by +The tool we used to examine the profiling file generated by `perftools` is [`pprof`](https://github.com/google/pprof), which provides a Web-based GUI like `cprofilev`. @@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to optimize `MomentumOp`. `pprof` would mark performance critical parts of the program in -red. It's a good idea to follow the hint. +red. It's a good idea to follow the hints.