<p>This tutorial introduces techniques we used to profile and tune the
<p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks,
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google’s<codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what’s in the developers’ mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls
<p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p>
<olclass="simple">
...
...
@@ -259,7 +259,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div>
</div>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p>
<spanid="id1"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<p>The tool we used to look into the profiling file generated by
<spanid="examining-the-profiling-file"></span><h2>Examining the Profiling File<aclass="headerlink"href="#examining-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<p>The tool we used to examine the profiling file generated by
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
...
...
@@ -354,7 +354,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to
<p>This tutorial introduces techniques we used to profile and tune the
<p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks,
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google’s<codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what’s in the developers’ mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls
<p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p>
<olclass="simple">
...
...
@@ -273,7 +273,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div>
</div>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p>
<spanid="examining-the-profiling-file"></span><h2>Examining the Profiling File<aclass="headerlink"href="#examining-the-profiling-file"title="永久链接至标题">¶</a></h2>
<p>The tool we used to examine the profiling file generated by
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
...
...
@@ -368,7 +368,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to