<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<spanid="profiling-the-python-code"></span><h1>Profiling the Python Code<aclass="headerlink"href="#profiling-the-python-code"title="Permalink to this headline">¶</a></h1>
<spanid="generate-the-performance-profiling-file"></span><h2>Generate the Performance Profiling File<aclass="headerlink"href="#generate-the-performance-profiling-file"title="Permalink to this headline">¶</a></h2>
<p>where <codeclass="docutils literal"><spanclass="pre">main.py</span></code> is the program we are going to profile, <codeclass="docutils literal"><spanclass="pre">-o</span></code> specifies
the output file. Without <codeclass="docutils literal"><spanclass="pre">-o</span></code>, <codeclass="docutils literal"><spanclass="pre">cProfile</span></code> would outputs to standard
<spanid="id2"></span><h2>查看性能分析文件<aclass="headerlink"href="#"title="Permalink to this headline">¶</a></h2>
<spanid="look-into-the-profiling-file"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">cProfile</span></code> generates <codeclass="docutils literal"><spanclass="pre">profile.out</span></code> after <codeclass="docutils literal"><spanclass="pre">main.py</span></code> completes. We can
use <aclass="reference external"href="https://github.com/ymichael/cprofilev"><codeclass="docutils literal"><spanclass="pre">cprofilev</span></code></a> to look into
the details:</p>
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>cprofilev -a <spanclass="m">0</span>.0.0.0 -p <spanclass="m">3214</span> -f profile.out main.py
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>cprofilev -a <spanclass="m">0</span>.0.0.0 -p <spanclass="m">3214</span> -f profile.out main.py
<spanid="id3"></span><h2>寻找性能瓶颈<aclass="headerlink"href="#"title="Permalink to this headline">¶</a></h2>
<spanid="identify-performance-bottlenecks"></span><h2>Identify Performance Bottlenecks<aclass="headerlink"href="#identify-performance-bottlenecks"title="Permalink to this headline">¶</a></h2>
<p>Usually, <codeclass="docutils literal"><spanclass="pre">tottime</span></code> and the related <codeclass="docutils literal"><spanclass="pre">percall</span></code> time is what we want to
<p>将性能分析结果按照tottime排序,效果如下:</p>
focus on. We can sort above profiling file by tottime:</p>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
<spanid="pythonc"></span><h1>Python与C++混合代码的性能分析<aclass="headerlink"href="#pythonc"title="Permalink to this headline">¶</a></h1>
<spanid="profiling-python-and-c-code"></span><h1>Profiling Python and C++ Code<aclass="headerlink"href="#profiling-python-and-c-code"title="Permalink to this headline">¶</a></h1>
<spanid="id4"></span><h2>生成性能分析文件<aclass="headerlink"href="#"title="Permalink to this headline">¶</a></h2>
<spanid="generate-the-profiling-file"></span><h2>Generate the Profiling File<aclass="headerlink"href="#generate-the-profiling-file"title="Permalink to this headline">¶</a></h2>
package, <codeclass="docutils literal"><spanclass="pre">yep</span></code>, that can work with Google’s <codeclass="docutils literal"><spanclass="pre">perftools</span></code>, which is a
<p>In Ubuntu systems, we can install <codeclass="docutils literal"><spanclass="pre">yep</span></code> and <codeclass="docutils literal"><spanclass="pre">perftools</span></code> by running the
<li>Use GCC command line option <codeclass="docutils literal"><spanclass="pre">-g</span></code> when building <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code> so to
<li>Use GCC command line option <codeclass="docutils literal"><spanclass="pre">-O2</span></code> or <codeclass="docutils literal"><spanclass="pre">-O3</span></code> to generate optimized
binary code. It doesn’t make sense to profile <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>
without optimization, because it would anyway run slowly.</li>
<li>Profiling the single-threaded binary file before the
multi-threading version, because the latter often generates tangled
profiling analysis result. You might want to set environment
variable <codeclass="docutils literal"><spanclass="pre">OMP_NUM_THREADS=1</span></code> to prevents OpenMP from automatically
<spanid="id5"></span><h2>查看性能分析文件<aclass="headerlink"href="#"title="Permalink to this headline">¶</a></h2>
<spanid="id1"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="Permalink to this headline">¶</a></h2>
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
of <codeclass="docutils literal"><spanclass="pre">pprof</span></code> and build it:</p>
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>go get github.com/google/pprof
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>go get github.com/google/pprof
</pre></div>
</pre></div>
</div>
</div>
<p>进而我们可以使用如下命令开启一个HTTP服务:</p>
<p>Then we can use it to profile <codeclass="docutils literal"><spanclass="pre">main.py.prof</span></code> generated in the previous
<spanid="id6"></span><h2>寻找性能瓶颈<aclass="headerlink"href="#"title="Permalink to this headline">¶</a></h2>
<spanid="identifying-the-performance-bottlenecks"></span><h2>Identifying the Performance Bottlenecks<aclass="headerlink"href="#identifying-the-performance-bottlenecks"title="Permalink to this headline">¶</a></h2>
<p>Similar to how we work with <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>, we’d focus on <codeclass="docutils literal"><spanclass="pre">tottime</span></code> and
<codeclass="docutils literal"><spanclass="pre">cProfile</span></code> and <codeclass="docutils literal"><spanclass="pre">yep</span></code>, and Google <codeclass="docutils literal"><spanclass="pre">perftools</span></code>.</p>
<spanid="profiling-the-python-code"></span><h1>Profiling the Python Code<aclass="headerlink"href="#profiling-the-python-code"title="永久链接至标题">¶</a></h1>
<spanid="generate-the-performance-profiling-file"></span><h2>Generate the Performance Profiling File<aclass="headerlink"href="#generate-the-performance-profiling-file"title="永久链接至标题">¶</a></h2>
<p>where <codeclass="docutils literal"><spanclass="pre">main.py</span></code> is the program we are going to profile, <codeclass="docutils literal"><spanclass="pre">-o</span></code> specifies
the output file. Without <codeclass="docutils literal"><spanclass="pre">-o</span></code>, <codeclass="docutils literal"><spanclass="pre">cProfile</span></code> would outputs to standard
<spanid="look-into-the-profiling-file"></span><h2>Look into the Profiling File<aclass="headerlink"href="#look-into-the-profiling-file"title="永久链接至标题">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">cProfile</span></code> generates <codeclass="docutils literal"><spanclass="pre">profile.out</span></code> after <codeclass="docutils literal"><spanclass="pre">main.py</span></code> completes. We can
use <aclass="reference external"href="https://github.com/ymichael/cprofilev"><codeclass="docutils literal"><spanclass="pre">cprofilev</span></code></a> to look into
the details:</p>
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>cprofilev -a <spanclass="m">0</span>.0.0.0 -p <spanclass="m">3214</span> -f profile.out main.py
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>cprofilev -a <spanclass="m">0</span>.0.0.0 -p <spanclass="m">3214</span> -f profile.out main.py
<p>Usually, <codeclass="docutils literal"><spanclass="pre">tottime</span></code> and the related <codeclass="docutils literal"><spanclass="pre">percall</span></code> time is what we want to
<p>将性能分析结果按照tottime排序,效果如下:</p>
focus on. We can sort above profiling file by tottime:</p>
<p>We can see that the most time-consuming function is the <codeclass="docutils literal"><spanclass="pre">built-in</span><spanclass="pre">method</span><spanclass="pre">run</span></code>, which is a C++ function in <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>. We will
<spanid="profiling-python-and-c-code"></span><h1>Profiling Python and C++ Code<aclass="headerlink"href="#profiling-python-and-c-code"title="永久链接至标题">¶</a></h1>
<spanid="generate-the-profiling-file"></span><h2>Generate the Profiling File<aclass="headerlink"href="#generate-the-profiling-file"title="永久链接至标题">¶</a></h2>
package, <codeclass="docutils literal"><spanclass="pre">yep</span></code>, that can work with Google’s <codeclass="docutils literal"><spanclass="pre">perftools</span></code>, which is a
<p>In Ubuntu systems, we can install <codeclass="docutils literal"><spanclass="pre">yep</span></code> and <codeclass="docutils literal"><spanclass="pre">perftools</span></code> by running the
<li>Use GCC command line option <codeclass="docutils literal"><spanclass="pre">-g</span></code> when building <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code> so to
<li>Use GCC command line option <codeclass="docutils literal"><spanclass="pre">-O2</span></code> or <codeclass="docutils literal"><spanclass="pre">-O3</span></code> to generate optimized
binary code. It doesn’t make sense to profile <codeclass="docutils literal"><spanclass="pre">libpaddle.so</span></code>
without optimization, because it would anyway run slowly.</li>
<li>Profiling the single-threaded binary file before the
multi-threading version, because the latter often generates tangled
profiling analysis result. You might want to set environment
variable <codeclass="docutils literal"><spanclass="pre">OMP_NUM_THREADS=1</span></code> to prevents OpenMP from automatically
<codeclass="docutils literal"><spanclass="pre">perftools</span></code> is <aclass="reference external"href="https://github.com/google/pprof"><codeclass="docutils literal"><spanclass="pre">pprof</span></code></a>, which
provides a Web-based GUI like <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>.</p>
<p>We can rely on the standard Go toolchain to retrieve the source code
of <codeclass="docutils literal"><spanclass="pre">pprof</span></code> and build it:</p>
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>go get github.com/google/pprof
<divclass="highlight-bash"><divclass="highlight"><pre><span></span>go get github.com/google/pprof
</pre></div>
</pre></div>
</div>
</div>
<p>进而我们可以使用如下命令开启一个HTTP服务:</p>
<p>Then we can use it to profile <codeclass="docutils literal"><spanclass="pre">main.py.prof</span></code> generated in the previous
<spanid="identifying-the-performance-bottlenecks"></span><h2>Identifying the Performance Bottlenecks<aclass="headerlink"href="#identifying-the-performance-bottlenecks"title="永久链接至标题">¶</a></h2>
<p>Similar to how we work with <codeclass="docutils literal"><spanclass="pre">cprofilev</span></code>, we’d focus on <codeclass="docutils literal"><spanclass="pre">tottime</span></code> and
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.