提交 00b17acd 编写于 作者: T Travis CI

Deploy to GitHub Pages: 6dc5b34e

上级 dac32632
This tutorial introduces techniques we used to profile and tune the This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages CPU performance of PaddlePaddle. We will use Python packages
`cProfile` and `yep`, and Google `perftools`. `cProfile` and `yep`, and Google's `perftools`.
Profiling is the process that reveals the performance bottlenecks, Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind. which could be very different from what's in the developers' mind.
Performance tuning is to fix the bottlenecks. Performance optimization Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively. repeats the steps of profiling and tuning alternatively.
PaddlePaddle users program AI by calling the Python API, which calls PaddlePaddle users program AI applications by calling the Python API, which calls
into `libpaddle.so.` written in C++. In this tutorial, we focus on into `libpaddle.so.` written in C++. In this tutorial, we focus on
the profiling and tuning of the profiling and tuning of
...@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime: ...@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
We can see that the most time-consuming function is the `built-in We can see that the most time-consuming function is the `built-in
method run`, which is a C++ function in `libpaddle.so`. We will method run`, which is a C++ function in `libpaddle.so`. We will
explain how to profile C++ code in the next section. At the right explain how to profile C++ code in the next section. At this
moment, let's look into the third function `sync_with_cpp`, which is a moment, let's look into the third function `sync_with_cpp`, which is a
Python function. We can click it to understand more about it: Python function. We can click it to understand more about it:
...@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is ...@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`. `main.py.prof`.
Please be aware of the `-v` command line option, which prints the Please be aware of the `-v` command line option, which prints the
analysis results after generating the profiling file. By taking a analysis results after generating the profiling file. By examining the
glance at the print result, we'd know that if we stripped debug the print result, we'd know that if we stripped debug
information from `libpaddle.so` at build time. The following hints information from `libpaddle.so` at build time. The following hints
help make sure that the analysis results are readable: help make sure that the analysis results are readable:
...@@ -155,9 +155,9 @@ help make sure that the analysis results are readable: ...@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
starting multiple threads. starting multiple threads.
### Look into the Profiling File ### Examining the Profiling File
The tool we used to look into the profiling file generated by The tool we used to examine the profiling file generated by
`perftools` is [`pprof`](https://github.com/google/pprof), which `perftools` is [`pprof`](https://github.com/google/pprof), which
provides a Web-based GUI like `cprofilev`. provides a Web-based GUI like `cprofilev`.
...@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to ...@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize `MomentumOp`. optimize `MomentumOp`.
`pprof` would mark performance critical parts of the program in `pprof` would mark performance critical parts of the program in
red. It's a good idea to follow the hint. red. It's a good idea to follow the hints.
...@@ -188,14 +188,14 @@ ...@@ -188,14 +188,14 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody"> <div itemprop="articleBody">
<p>This tutorial introduces techniques we used to profile and tune the <p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages CPU performance of PaddlePaddle. We will use Python packages
<code class="docutils literal"><span class="pre">cProfile</span></code> and <code class="docutils literal"><span class="pre">yep</span></code>, and Google <code class="docutils literal"><span class="pre">perftools</span></code>.</p> <code class="docutils literal"><span class="pre">cProfile</span></code> and <code class="docutils literal"><span class="pre">yep</span></code>, and Google&#8217;s <code class="docutils literal"><span class="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks, <p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what&#8217;s in the developers&#8217; mind. which could be very different from what&#8217;s in the developers&#8217; mind.
Performance tuning is to fix the bottlenecks. Performance optimization Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p> repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls <p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <code class="docutils literal"><span class="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on into <code class="docutils literal"><span class="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p> the profiling and tuning of</p>
<ol class="simple"> <ol class="simple">
...@@ -259,7 +259,7 @@ focus on. We can sort above profiling file by tottime:</p> ...@@ -259,7 +259,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div> </pre></div>
</div> </div>
<p>We can see that the most time-consuming function is the <code class="docutils literal"><span class="pre">built-in</span> <span class="pre">method</span> <span class="pre">run</span></code>, which is a C++ function in <code class="docutils literal"><span class="pre">libpaddle.so</span></code>. We will <p>We can see that the most time-consuming function is the <code class="docutils literal"><span class="pre">built-in</span> <span class="pre">method</span> <span class="pre">run</span></code>, which is a C++ function in <code class="docutils literal"><span class="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right explain how to profile C++ code in the next section. At this
moment, let&#8217;s look into the third function <code class="docutils literal"><span class="pre">sync_with_cpp</span></code>, which is a moment, let&#8217;s look into the third function <code class="docutils literal"><span class="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p> Python function. We can click it to understand more about it:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Called</span> <span class="n">By</span><span class="p">:</span> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Called</span> <span class="n">By</span><span class="p">:</span>
...@@ -305,8 +305,8 @@ pip install yep ...@@ -305,8 +305,8 @@ pip install yep
<p>to generate the profiling file. The default filename is <p>to generate the profiling file. The default filename is
<code class="docutils literal"><span class="pre">main.py.prof</span></code>.</p> <code class="docutils literal"><span class="pre">main.py.prof</span></code>.</p>
<p>Please be aware of the <code class="docutils literal"><span class="pre">-v</span></code> command line option, which prints the <p>Please be aware of the <code class="docutils literal"><span class="pre">-v</span></code> command line option, which prints the
analysis results after generating the profiling file. By taking a analysis results after generating the profiling file. By examining the
glance at the print result, we&#8217;d know that if we stripped debug the print result, we&#8217;d know that if we stripped debug
information from <code class="docutils literal"><span class="pre">libpaddle.so</span></code> at build time. The following hints information from <code class="docutils literal"><span class="pre">libpaddle.so</span></code> at build time. The following hints
help make sure that the analysis results are readable:</p> help make sure that the analysis results are readable:</p>
<ol class="simple"> <ol class="simple">
...@@ -324,9 +324,9 @@ variable <code class="docutils literal"><span class="pre">OMP_NUM_THREADS=1</spa ...@@ -324,9 +324,9 @@ variable <code class="docutils literal"><span class="pre">OMP_NUM_THREADS=1</spa
starting multiple threads.</li> starting multiple threads.</li>
</ol> </ol>
</div> </div>
<div class="section" id="look-into-the-profiling-file"> <div class="section" id="examining-the-profiling-file">
<span id="id1"></span><h2>Look into the Profiling File<a class="headerlink" href="#look-into-the-profiling-file" title="Permalink to this headline"></a></h2> <span id="examining-the-profiling-file"></span><h2>Examining the Profiling File<a class="headerlink" href="#examining-the-profiling-file" title="Permalink to this headline"></a></h2>
<p>The tool we used to look into the profiling file generated by <p>The tool we used to examine the profiling file generated by
<code class="docutils literal"><span class="pre">perftools</span></code> is <a class="reference external" href="https://github.com/google/pprof"><code class="docutils literal"><span class="pre">pprof</span></code></a>, which <code class="docutils literal"><span class="pre">perftools</span></code> is <a class="reference external" href="https://github.com/google/pprof"><code class="docutils literal"><span class="pre">pprof</span></code></a>, which
provides a Web-based GUI like <code class="docutils literal"><span class="pre">cprofilev</span></code>.</p> provides a Web-based GUI like <code class="docutils literal"><span class="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code <p>We can rely on the standard Go toolchain to retrieve the source code
...@@ -354,7 +354,7 @@ of the gradient of multiplication takes 2% to 4% of the total running ...@@ -354,7 +354,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <code class="docutils literal"><span class="pre">MomentumOp</span></code> takes about 17%. Obviously, we&#8217;d want to time, and <code class="docutils literal"><span class="pre">MomentumOp</span></code> takes about 17%. Obviously, we&#8217;d want to
optimize <code class="docutils literal"><span class="pre">MomentumOp</span></code>.</p> optimize <code class="docutils literal"><span class="pre">MomentumOp</span></code>.</p>
<p><code class="docutils literal"><span class="pre">pprof</span></code> would mark performance critical parts of the program in <p><code class="docutils literal"><span class="pre">pprof</span></code> would mark performance critical parts of the program in
red. It&#8217;s a good idea to follow the hint.</p> red. It&#8217;s a good idea to follow the hints.</p>
</div> </div>
</div> </div>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
This tutorial introduces techniques we used to profile and tune the This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages CPU performance of PaddlePaddle. We will use Python packages
`cProfile` and `yep`, and Google `perftools`. `cProfile` and `yep`, and Google's `perftools`.
Profiling is the process that reveals the performance bottlenecks, Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind. which could be very different from what's in the developers' mind.
Performance tuning is to fix the bottlenecks. Performance optimization Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively. repeats the steps of profiling and tuning alternatively.
PaddlePaddle users program AI by calling the Python API, which calls PaddlePaddle users program AI applications by calling the Python API, which calls
into `libpaddle.so.` written in C++. In this tutorial, we focus on into `libpaddle.so.` written in C++. In this tutorial, we focus on
the profiling and tuning of the profiling and tuning of
...@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime: ...@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
We can see that the most time-consuming function is the `built-in We can see that the most time-consuming function is the `built-in
method run`, which is a C++ function in `libpaddle.so`. We will method run`, which is a C++ function in `libpaddle.so`. We will
explain how to profile C++ code in the next section. At the right explain how to profile C++ code in the next section. At this
moment, let's look into the third function `sync_with_cpp`, which is a moment, let's look into the third function `sync_with_cpp`, which is a
Python function. We can click it to understand more about it: Python function. We can click it to understand more about it:
...@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is ...@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`. `main.py.prof`.
Please be aware of the `-v` command line option, which prints the Please be aware of the `-v` command line option, which prints the
analysis results after generating the profiling file. By taking a analysis results after generating the profiling file. By examining the
glance at the print result, we'd know that if we stripped debug the print result, we'd know that if we stripped debug
information from `libpaddle.so` at build time. The following hints information from `libpaddle.so` at build time. The following hints
help make sure that the analysis results are readable: help make sure that the analysis results are readable:
...@@ -155,9 +155,9 @@ help make sure that the analysis results are readable: ...@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
starting multiple threads. starting multiple threads.
### Look into the Profiling File ### Examining the Profiling File
The tool we used to look into the profiling file generated by The tool we used to examine the profiling file generated by
`perftools` is [`pprof`](https://github.com/google/pprof), which `perftools` is [`pprof`](https://github.com/google/pprof), which
provides a Web-based GUI like `cprofilev`. provides a Web-based GUI like `cprofilev`.
...@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to ...@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize `MomentumOp`. optimize `MomentumOp`.
`pprof` would mark performance critical parts of the program in `pprof` would mark performance critical parts of the program in
red. It's a good idea to follow the hint. red. It's a good idea to follow the hints.
...@@ -202,14 +202,14 @@ ...@@ -202,14 +202,14 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody"> <div itemprop="articleBody">
<p>This tutorial introduces techniques we used to profile and tune the <p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages CPU performance of PaddlePaddle. We will use Python packages
<code class="docutils literal"><span class="pre">cProfile</span></code> and <code class="docutils literal"><span class="pre">yep</span></code>, and Google <code class="docutils literal"><span class="pre">perftools</span></code>.</p> <code class="docutils literal"><span class="pre">cProfile</span></code> and <code class="docutils literal"><span class="pre">yep</span></code>, and Google&#8217;s <code class="docutils literal"><span class="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks, <p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what&#8217;s in the developers&#8217; mind. which could be very different from what&#8217;s in the developers&#8217; mind.
Performance tuning is to fix the bottlenecks. Performance optimization Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p> repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls <p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <code class="docutils literal"><span class="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on into <code class="docutils literal"><span class="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p> the profiling and tuning of</p>
<ol class="simple"> <ol class="simple">
...@@ -273,7 +273,7 @@ focus on. We can sort above profiling file by tottime:</p> ...@@ -273,7 +273,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div> </pre></div>
</div> </div>
<p>We can see that the most time-consuming function is the <code class="docutils literal"><span class="pre">built-in</span> <span class="pre">method</span> <span class="pre">run</span></code>, which is a C++ function in <code class="docutils literal"><span class="pre">libpaddle.so</span></code>. We will <p>We can see that the most time-consuming function is the <code class="docutils literal"><span class="pre">built-in</span> <span class="pre">method</span> <span class="pre">run</span></code>, which is a C++ function in <code class="docutils literal"><span class="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right explain how to profile C++ code in the next section. At this
moment, let&#8217;s look into the third function <code class="docutils literal"><span class="pre">sync_with_cpp</span></code>, which is a moment, let&#8217;s look into the third function <code class="docutils literal"><span class="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p> Python function. We can click it to understand more about it:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Called</span> <span class="n">By</span><span class="p">:</span> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Called</span> <span class="n">By</span><span class="p">:</span>
...@@ -319,8 +319,8 @@ pip install yep ...@@ -319,8 +319,8 @@ pip install yep
<p>to generate the profiling file. The default filename is <p>to generate the profiling file. The default filename is
<code class="docutils literal"><span class="pre">main.py.prof</span></code>.</p> <code class="docutils literal"><span class="pre">main.py.prof</span></code>.</p>
<p>Please be aware of the <code class="docutils literal"><span class="pre">-v</span></code> command line option, which prints the <p>Please be aware of the <code class="docutils literal"><span class="pre">-v</span></code> command line option, which prints the
analysis results after generating the profiling file. By taking a analysis results after generating the profiling file. By examining the
glance at the print result, we&#8217;d know that if we stripped debug the print result, we&#8217;d know that if we stripped debug
information from <code class="docutils literal"><span class="pre">libpaddle.so</span></code> at build time. The following hints information from <code class="docutils literal"><span class="pre">libpaddle.so</span></code> at build time. The following hints
help make sure that the analysis results are readable:</p> help make sure that the analysis results are readable:</p>
<ol class="simple"> <ol class="simple">
...@@ -338,9 +338,9 @@ variable <code class="docutils literal"><span class="pre">OMP_NUM_THREADS=1</spa ...@@ -338,9 +338,9 @@ variable <code class="docutils literal"><span class="pre">OMP_NUM_THREADS=1</spa
starting multiple threads.</li> starting multiple threads.</li>
</ol> </ol>
</div> </div>
<div class="section" id="look-into-the-profiling-file"> <div class="section" id="examining-the-profiling-file">
<span id="id1"></span><h2>Look into the Profiling File<a class="headerlink" href="#look-into-the-profiling-file" title="永久链接至标题"></a></h2> <span id="examining-the-profiling-file"></span><h2>Examining the Profiling File<a class="headerlink" href="#examining-the-profiling-file" title="永久链接至标题"></a></h2>
<p>The tool we used to look into the profiling file generated by <p>The tool we used to examine the profiling file generated by
<code class="docutils literal"><span class="pre">perftools</span></code> is <a class="reference external" href="https://github.com/google/pprof"><code class="docutils literal"><span class="pre">pprof</span></code></a>, which <code class="docutils literal"><span class="pre">perftools</span></code> is <a class="reference external" href="https://github.com/google/pprof"><code class="docutils literal"><span class="pre">pprof</span></code></a>, which
provides a Web-based GUI like <code class="docutils literal"><span class="pre">cprofilev</span></code>.</p> provides a Web-based GUI like <code class="docutils literal"><span class="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code <p>We can rely on the standard Go toolchain to retrieve the source code
...@@ -368,7 +368,7 @@ of the gradient of multiplication takes 2% to 4% of the total running ...@@ -368,7 +368,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <code class="docutils literal"><span class="pre">MomentumOp</span></code> takes about 17%. Obviously, we&#8217;d want to time, and <code class="docutils literal"><span class="pre">MomentumOp</span></code> takes about 17%. Obviously, we&#8217;d want to
optimize <code class="docutils literal"><span class="pre">MomentumOp</span></code>.</p> optimize <code class="docutils literal"><span class="pre">MomentumOp</span></code>.</p>
<p><code class="docutils literal"><span class="pre">pprof</span></code> would mark performance critical parts of the program in <p><code class="docutils literal"><span class="pre">pprof</span></code> would mark performance critical parts of the program in
red. It&#8217;s a good idea to follow the hint.</p> red. It&#8217;s a good idea to follow the hints.</p>
</div> </div>
</div> </div>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册