<p>This tutorial introduces techniques we used to profile and tune the
<p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
CPU performance of PaddlePaddle. We will use Python packages
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google’s<codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks,
<p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what’s in the developers’ mind.
which could be very different from what’s in the developers’ mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p>
repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls
<p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p>
the profiling and tuning of</p>
<olclass="simple">
<olclass="simple">
...
@@ -259,7 +259,7 @@ focus on. We can sort above profiling file by tottime:</p>
...
@@ -259,7 +259,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div>
</pre></div>
</div>
</div>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p>
Python function. We can click it to understand more about it:</p>
<spanid="id1"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<spanid="examining-the-profiling-file"></span><h2>Examining the Profiling File<aclass="headerlink"href="#examining-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<p>The tool we used to look into the profiling file generated by
<p>The tool we used to examine the profiling file generated by
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
<p>We can rely on the standard Go toolchain to retrieve the source code
...
@@ -354,7 +354,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
...
@@ -354,7 +354,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to
<p>This tutorial introduces techniques we used to profile and tune the
<p>This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
CPU performance of PaddlePaddle. We will use Python packages
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google’s<codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<p>Profiling is the process that reveals the performance bottlenecks,
<p>Profiling is the process that reveals performance bottlenecks,
which could be very different from what’s in the developers’ mind.
which could be very different from what’s in the developers’ mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.</p>
repeats the steps of profiling and tuning alternatively.</p>
<p>PaddlePaddle users program AI by calling the Python API, which calls
<p>PaddlePaddle users program AI applications by calling the Python API, which calls
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
into <codeclass="docutils literal"><spanclass="pre">libpaddle.so.</span></code> written in C++. In this tutorial, we focus on
the profiling and tuning of</p>
the profiling and tuning of</p>
<olclass="simple">
<olclass="simple">
...
@@ -273,7 +273,7 @@ focus on. We can sort above profiling file by tottime:</p>
...
@@ -273,7 +273,7 @@ focus on. We can sort above profiling file by tottime:</p>
</pre></div>
</pre></div>
</div>
</div>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
moment, let’s look into the third function <codeclass="docutils literal"><spanclass="pre">sync_with_cpp</span></code>, which is a
Python function. We can click it to understand more about it:</p>
Python function. We can click it to understand more about it:</p>
<spanid="id1"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="永久链接至标题">¶</a></h2>
<spanid="examining-the-profiling-file"></span><h2>Examining the Profiling File<aclass="headerlink"href="#examining-the-profiling-file"title="永久链接至标题">¶</a></h2>
<p>The tool we used to look into the profiling file generated by
<p>The tool we used to examine the profiling file generated by
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
<p>We can rely on the standard Go toolchain to retrieve the source code
...
@@ -368,7 +368,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
...
@@ -368,7 +368,7 @@ of the gradient of multiplication takes 2% to 4% of the total running
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to
time, and <codeclass="docutils literal"><spanclass="pre">MomentumOp</span></code> takes about 17%. Obviously, we’d want to