提交 6dc5b34e 编写于 作者: A Abhinav Arora 提交者: Yi Wang

Polishing the cpu profiling doc (#6116)

上级 0d40a4db
This tutorial introduces techniques we used to profile and tune the
This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
`cProfile` and `yep`, and Google `perftools`.
`cProfile` and `yep`, and Google's `perftools`.
Profiling is the process that reveals the performance bottlenecks,
Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.
PaddlePaddle users program AI by calling the Python API, which calls
PaddlePaddle users program AI applications by calling the Python API, which calls
into `libpaddle.so.` written in C++. In this tutorial, we focus on
the profiling and tuning of
......@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
We can see that the most time-consuming function is the `built-in
method run`, which is a C++ function in `libpaddle.so`. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let's look into the third function `sync_with_cpp`, which is a
Python function. We can click it to understand more about it:
......@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`.
Please be aware of the `-v` command line option, which prints the
analysis results after generating the profiling file. By taking a
glance at the print result, we'd know that if we stripped debug
analysis results after generating the profiling file. By examining the
the print result, we'd know that if we stripped debug
information from `libpaddle.so` at build time. The following hints
help make sure that the analysis results are readable:
......@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
starting multiple threads.
### Look into the Profiling File
### Examining the Profiling File
The tool we used to look into the profiling file generated by
The tool we used to examine the profiling file generated by
`perftools` is [`pprof`](https://github.com/google/pprof), which
provides a Web-based GUI like `cprofilev`.
......@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize `MomentumOp`.
`pprof` would mark performance critical parts of the program in
red. It's a good idea to follow the hint.
red. It's a good idea to follow the hints.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册