提交 8996c8d6 编写于 作者: B baiyfbupt

Deployed 9bcff00c with MkDocs version: 1.0.4

上级 7f247c6d
...@@ -89,17 +89,19 @@ ...@@ -89,17 +89,19 @@
<a class="current" href="./">量化</a> <a class="current" href="./">量化</a>
<ul class="subnav"> <ul class="subnav">
<li class="toctree-l3"><a href="#paddleslimquant-api">paddleslim.quant API文档</a></li> <li class="toctree-l3"><a href="#_1">量化配置</a></li>
<ul>
<li><a class="toctree-l4" href="#api">量化训练API</a></li> <li class="toctree-l3"><a href="#quant_aware">quant_aware</a></li>
<li><a class="toctree-l4" href="#api_1">离线量化API</a></li>
<li><a class="toctree-l4" href="#embeddingapi">Embedding量化API</a></li> <li class="toctree-l3"><a href="#convert">convert</a></li>
<li class="toctree-l3"><a href="#quant_post">quant_post</a></li>
</ul>
<li class="toctree-l3"><a href="#quant_embedding">quant_embedding</a></li>
</ul> </ul>
...@@ -175,10 +177,9 @@ ...@@ -175,10 +177,9 @@
<div role="main"> <div role="main">
<div class="section"> <div class="section">
<h1 id="paddleslimquant-api">paddleslim.quant API文档<a class="headerlink" href="#paddleslimquant-api" title="Permanent link">#</a></h1> <h2 id="_1">量化配置<a class="headerlink" href="#_1" title="Permanent link">#</a></h2>
<h2 id="api">量化训练API<a class="headerlink" href="#api" title="Permanent link">#</a></h2> <p>通过字典配置量化参数</p>
<h3 id="_1">量化配置<a class="headerlink" href="#_1" title="Permanent link">#</a></h3> <table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span> 1
<p><table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span> 1
2 2
3 3
4 4
...@@ -194,10 +195,7 @@ ...@@ -194,10 +195,7 @@
14 14
15 15
16 16
17 17</pre></div></td><td class="code"><div class="codehilite"><pre><span></span><span class="nv">quant_config_default</span> <span class="o">=</span> {
18
19
20</pre></div></td><td class="code"><div class="codehilite"><pre><span></span><span class="nv">quant_config_default</span> <span class="o">=</span> {
<span class="s1">&#39;</span><span class="s">weight_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">weight_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>,
<span class="s1">&#39;</span><span class="s">activation_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">activation_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>,
<span class="s1">&#39;</span><span class="s">weight_bits</span><span class="s1">&#39;</span>: <span class="mi">8</span>, <span class="s1">&#39;</span><span class="s">weight_bits</span><span class="s1">&#39;</span>: <span class="mi">8</span>,
...@@ -213,34 +211,33 @@ ...@@ -213,34 +211,33 @@
<span class="s1">&#39;</span><span class="s">window_size</span><span class="s1">&#39;</span>: <span class="mi">10000</span>, <span class="s1">&#39;</span><span class="s">window_size</span><span class="s1">&#39;</span>: <span class="mi">10000</span>,
# <span class="nv">The</span> <span class="nv">decay</span> <span class="nv">coefficient</span> <span class="nv">of</span> <span class="nv">moving</span> <span class="nv">average</span>, <span class="nv">default</span> <span class="nv">is</span> <span class="mi">0</span>.<span class="mi">9</span> # <span class="nv">The</span> <span class="nv">decay</span> <span class="nv">coefficient</span> <span class="nv">of</span> <span class="nv">moving</span> <span class="nv">average</span>, <span class="nv">default</span> <span class="nv">is</span> <span class="mi">0</span>.<span class="mi">9</span>
<span class="s1">&#39;</span><span class="s">moving_rate</span><span class="s1">&#39;</span>: <span class="mi">0</span>.<span class="mi">9</span>, <span class="s1">&#39;</span><span class="s">moving_rate</span><span class="s1">&#39;</span>: <span class="mi">0</span>.<span class="mi">9</span>,
# <span class="k">if</span> <span class="nv">set</span> <span class="nv">quant_weight_only</span> <span class="nv">True</span>, <span class="k">then</span> <span class="nv">only</span> <span class="nv">quantize</span> <span class="nv">parameters</span> <span class="nv">of</span> <span class="nv">layers</span> <span class="nv">which</span> <span class="nv">need</span> <span class="nv">to</span> <span class="nv">be</span> <span class="nv">quantized</span>,
# <span class="nv">and</span> <span class="nv">activations</span> <span class="nv">will</span> <span class="nv">not</span> <span class="nv">be</span> <span class="nv">quantized</span>.
<span class="s1">&#39;</span><span class="s">quant_weight_only</span><span class="s1">&#39;</span>: <span class="nv">False</span>
} }
</pre></div> </pre></div>
</td></tr></table> </td></tr></table>
设置量化训练需要的配置。</p>
<p><strong>参数:</strong></p> <p><strong>参数:</strong></p>
<ul> <ul>
<li><strong>weight_quantize_type(str)</strong> - 参数量化方式。可选<code>'abs_max'</code>, <code>'channel_wise_abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>。 默认<code>'abs_max'</code></li> <li><strong>weight_quantize_type(str)</strong> - 参数量化方式。可选<code>'abs_max'</code>, <code>'channel_wise_abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>。 默认<code>'abs_max'</code></li>
<li><strong>activation_quantize_type(str)</strong> - 激活量化方式,可选<code>'abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>,默认<code>'abs_max'</code></li> <li><strong>activation_quantize_type(str)</strong> - 激活量化方式,可选<code>'abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>,默认<code>'abs_max'</code></li>
<li><strong>weight_bits(int)</strong> - 参数量化bit数,默认8, 推荐设为8。</li> <li><strong>weight_bits(int)</strong> - 参数量化bit数,默认8, 推荐设为8。</li>
<li><strong>activation_bits(int)</strong> - 激活量化bit数,默认8, 推荐设为8。</li> <li><strong>activation_bits(int)</strong> - 激活量化bit数,默认8, 推荐设为8。</li>
<li><strong>not_quant_pattern(str or list[str])</strong> - 所有<code>name_scope</code>包含<code>'not_quant_pattern'</code>字符串的<code>op</code>,都不量化, 设置方式请参考<code>fluid.name_scope()</code></li> <li><strong>not_quant_pattern(str | list[str])</strong> - 所有<code>name_scope</code>包含<code>'not_quant_pattern'</code>字符串的<code>op</code>,都不量化, 设置方式请参考<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/name_scope_cn.html#name-scope"><em>fluid.name_scope</em></a></li>
<li><strong>quantize_op_types(list[str])</strong> - 需要进行量化的<code>op</code>类型,目前支持<code>'conv2d', 'depthwise_conv2d', 'mul'</code></li> <li><strong>quantize_op_types(list[str])</strong> - 需要进行量化的<code>op</code>类型,目前支持<code>'conv2d', 'depthwise_conv2d', 'mul'</code></li>
<li><strong>dtype(int8)</strong> - 量化后的参数类型,默认 <code>int8</code>, 目前仅支持<code>int8</code></li> <li><strong>dtype(int8)</strong> - 量化后的参数类型,默认 <code>int8</code>, 目前仅支持<code>int8</code></li>
<li><strong>window_size(int)</strong> - <code>'range_abs_max'</code>量化方式的<code>window size</code>,默认10000。</li> <li><strong>window_size(int)</strong> - <code>'range_abs_max'</code>量化方式的<code>window size</code>,默认10000。</li>
<li><strong>moving_rate(int)</strong> - <code>'moving_average_abs_max'</code>量化方式的衰减系数,默认 0.9。</li> <li><strong>moving_rate(int)</strong> - <code>'moving_average_abs_max'</code>量化方式的衰减系数,默认 0.9。</li>
<li><strong>quant_weight_only(bool)</strong> - 是否只量化参数,如果设为<code>True</code>,则激活不进行量化,默认<code>False</code>。目前暂不支持设置为<code>True</code>。 设置为<code>True</code>时,只量化参数,这种方式不能减少显存占用和加速,只能用来减少带宽。</li>
</ul> </ul>
<h3 id="paddleslimquantquant_awareprogram-place-config-scopenone-for_testfalse">paddleslim.quant.quant_aware(program, place, config, scope=None, for_test=False)<a class="headerlink" href="#paddleslimquantquant_awareprogram-place-config-scopenone-for_testfalse" title="Permanent link">#</a></h3> <h2 id="quant_aware">quant_aware<a class="headerlink" href="#quant_aware" title="Permanent link">#</a></h2>
<p><code>program</code>中加入量化和反量化<code>op</code>, 用于量化训练。</p> <dl>
<dt>paddleslim.quant.quant_aware(program, place, config, scope=None, for_test=False)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
<dd><code>program</code>中加入量化和反量化<code>op</code>, 用于量化训练。</dd>
</dl>
<p><strong>参数:</strong></p> <p><strong>参数:</strong></p>
<ul> <ul>
<li><strong>program (fluid.Program)</strong> - 传入训练或测试<code>program</code></li> <li><strong>program (fluid.Program)</strong> - 传入训练或测试<code>program</code></li>
<li><strong>place(fluid.CPUPlace or fluid.CUDAPlace)</strong> - 该参数表示<code>Executor</code>执行所在的设备。</li> <li><strong>place(fluid.CPUPlace | fluid.CUDAPlace)</strong> - 该参数表示<code>Executor</code>执行所在的设备。</li>
<li><strong>config(dict)</strong> - 量化配置表。</li> <li><strong>config(dict)</strong> - 量化配置表。</li>
<li><strong>scope(fluid.Scope, optional)</strong> - 传入用于存储<code>Variable</code><code>scope</code>,需要传入<code>program</code>所使用的<code>scope</code>,一般情况下,是<code>fluid.global_scope()</code>。设置为<code>None</code>时将使用<code>fluid.global_scope()</code>,默认值为<code>None</code></li> <li><strong>scope(fluid.Scope, optional)</strong> - 传入用于存储<code>Variable</code><code>scope</code>,需要传入<code>program</code>所使用的<code>scope</code>,一般情况下,是<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>。设置为<code>None</code>时将使用<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>,默认值为<code>None</code></li>
<li><strong>for_test(bool)</strong> - 如果<code>program</code>参数是一个测试<code>program</code><code>for_test</code>应设为<code>True</code>,否则设为<code>False</code></li> <li><strong>for_test(bool)</strong> - 如果<code>program</code>参数是一个测试<code>program</code><code>for_test</code>应设为<code>True</code>,否则设为<code>False</code></li>
</ul> </ul>
<p><strong>返回</strong></p> <p><strong>返回</strong></p>
...@@ -250,27 +247,38 @@ ...@@ -250,27 +247,38 @@
<li><code>for_test=False</code>,返回类型为<code>fluid.CompiledProgram</code><strong>注意,此返回值不能用于保存参数</strong></li> <li><code>for_test=False</code>,返回类型为<code>fluid.CompiledProgram</code><strong>注意,此返回值不能用于保存参数</strong></li>
<li><code>for_test=True</code>,返回类型为<code>fluid.Program</code></li> <li><code>for_test=True</code>,返回类型为<code>fluid.Program</code></li>
</ul> </ul>
<p><strong>注意事项</strong></p> <div class="admonition note">
<p class="admonition-title">注意事项</p>
</div>
<ul> <ul>
<li>此接口会改变<code>program</code>结构,并且可能增加一些<code>persistable</code>的变量,所以加载模型参数时请注意和相应的<code>program</code>对应。</li> <li>此接口会改变<code>program</code>结构,并且可能增加一些<code>persistable</code>的变量,所以加载模型参数时请注意和相应的<code>program</code>对应。</li>
<li>此接口底层经历了<code>fluid.Program</code>-&gt; <code>fluid.framework.IrGraph</code>-&gt;<code>fluid.Program</code>的转变,在<code>fluid.framework.IrGraph</code>中没有<code>Parameter</code>的概念,<code>Variable</code>只有<code>persistable</code><code>not persistable</code>的区别,所以在保存和加载参数时,请使用<code>fluid.io.save_persistables</code><code>fluid.io.load_persistables</code>接口。</li> <li>此接口底层经历了<code>fluid.Program</code>-&gt; <code>fluid.framework.IrGraph</code>-&gt;<code>fluid.Program</code>的转变,在<code>fluid.framework.IrGraph</code>中没有<code>Parameter</code>的概念,<code>Variable</code>只有<code>persistable</code><code>not persistable</code>的区别,所以在保存和加载参数时,请使用<code>fluid.io.save_persistables</code><code>fluid.io.load_persistables</code>接口。</li>
<li>由于此接口会根据<code>program</code>的结构和量化配置来对<code>program</code>添加op,所以<code>Paddle</code>中一些通过<code>fuse op</code>来加速训练的策略不能使用。已知以下策略在使用量化时必须设为<code>False</code><code>fuse_all_reduce_ops, sync_batch_norm</code></li> <li>由于此接口会根据<code>program</code>的结构和量化配置来对<code>program</code>添加op,所以<code>Paddle</code>中一些通过<code>fuse op</code>来加速训练的策略不能使用。已知以下策略在使用量化时必须设为<code>False</code><code>fuse_all_reduce_ops, sync_batch_norm</code></li>
<li>如果传入的<code>program</code>中存在和任何op都没有连接的<code>Variable</code>,则会在量化的过程中被优化掉。</li> <li>如果传入的<code>program</code>中存在和任何op都没有连接的<code>Variable</code>,则会在量化的过程中被优化掉。</li>
</ul> </ul>
<h3 id="paddleslimquantconvertprogram-place-config-scopenone-save_int8false">paddleslim.quant.convert(program, place, config, scope=None, save_int8=False)<a class="headerlink" href="#paddleslimquantconvertprogram-place-config-scopenone-save_int8false" title="Permanent link">#</a></h3> <h2 id="convert">convert<a class="headerlink" href="#convert" title="Permanent link">#</a></h2>
<dl>
<dt>paddleslim.quant.convert(program, place, config, scope=None, save_int8=False)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
<dd>
<p>把训练好的量化<code>program</code>,转换为可用于保存<code>inference model</code><code>program</code></p> <p>把训练好的量化<code>program</code>,转换为可用于保存<code>inference model</code><code>program</code></p>
<p><strong>参数:</strong> </dd>
- <strong>program (fluid.Program)</strong> - 传入测试<code>program</code> </dl>
- <strong>place(fluid.CPUPlace or fluid.CUDAPlace)</strong> - 该参数表示<code>Executor</code>执行所在的设备。 <p><strong>参数:</strong></p>
- <strong>config(dict)</strong> - 量化配置表。 <ul>
- <strong>scope(fluid.Scope)</strong> - 传入用于存储<code>Variable</code><code>scope</code>,需要传入<code>program</code>所使用的<code>scope</code>,一般情况下,是<code>fluid.global_scope()</code>。设置为<code>None</code>时将使用<code>fluid.global_scope()</code>,默认值为<code>None</code> <li><strong>program (fluid.Program)</strong> - 传入测试<code>program</code></li>
- <strong>save_int8(bool)</strong> - 是否需要返回参数为<code>int8</code><code>program</code>。该功能目前只能用于确认模型大小。默认值为<code>False</code></p> <li><strong>place(fluid.CPUPlace | fluid.CUDAPlace)</strong> - 该参数表示<code>Executor</code>执行所在的设备。</li>
<li><strong>config(dict)</strong> - 量化配置表。</li>
<li><strong>scope(fluid.Scope)</strong> - 传入用于存储<code>Variable</code><code>scope</code>,需要传入<code>program</code>所使用的<code>scope</code>,一般情况下,是<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>。设置为<code>None</code>时将使用<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>,默认值为<code>None</code></li>
<li><strong>save_int8(bool)</strong> - 是否需要返回参数为<code>int8</code><code>program</code>。该功能目前只能用于确认模型大小。默认值为<code>False</code></li>
</ul>
<p><strong>返回</strong></p> <p><strong>返回</strong></p>
<ul> <ul>
<li><strong>program (fluid.Program)</strong> - freezed program,可用于保存inference model,参数为<code>float32</code>类型,但其数值范围可用int8表示。</li> <li><strong>program (fluid.Program)</strong> - freezed program,可用于保存inference model,参数为<code>float32</code>类型,但其数值范围可用int8表示。</li>
<li><strong>int8_program (fluid.Program)</strong> - freezed program,可用于保存inference model,参数为<code>int8</code>类型。当<code>save_int8</code><code>False</code>时,不返回该值。</li> <li><strong>int8_program (fluid.Program)</strong> - freezed program,可用于保存inference model,参数为<code>int8</code>类型。当<code>save_int8</code><code>False</code>时,不返回该值。</li>
</ul> </ul>
<p><strong>注意事项</strong></p> <div class="admonition note">
<p class="admonition-title">注意事项</p>
</div>
<p>因为该接口会对<code>op</code><code>Variable</code>做相应的删除和修改,所以此接口只能在训练完成之后调用。如果想转化训练的中间模型,可加载相应的参数之后再使用此接口。</p> <p>因为该接口会对<code>op</code><code>Variable</code>做相应的删除和修改,所以此接口只能在训练完成之后调用。如果想转化训练的中间模型,可加载相应的参数之后再使用此接口。</p>
<p><strong>代码示例</strong></p> <p><strong>代码示例</strong></p>
<table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span> 1 <table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span> 1
...@@ -335,9 +343,9 @@ ...@@ -335,9 +343,9 @@
<span class="n">build_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">BuildStrategy</span><span class="p">()</span> <span class="n">build_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">BuildStrategy</span><span class="p">()</span>
<span class="n">exec_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">ExecutionStrategy</span><span class="p">()</span> <span class="n">exec_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">ExecutionStrategy</span><span class="p">()</span>
<span class="c1">#调用api</span> <span class="c1">#调用api</span>
<span class="n">quant_train_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_aware</span><span class="p">(</span><span class="n">train_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="hll"><span class="n">quant_train_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_aware</span><span class="p">(</span><span class="n">train_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">quant_eval_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_aware</span><span class="p">(</span><span class="n">eval_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> </span><span class="hll"><span class="n">quant_eval_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_aware</span><span class="p">(</span><span class="n">eval_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1">#关闭策略</span> </span><span class="c1">#关闭策略</span>
<span class="n">build_strategy</span><span class="o">.</span><span class="n">fuse_all_reduce_ops</span> <span class="o">=</span> <span class="bp">False</span> <span class="n">build_strategy</span><span class="o">.</span><span class="n">fuse_all_reduce_ops</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">build_strategy</span><span class="o">.</span><span class="n">sync_batch_norm</span> <span class="o">=</span> <span class="bp">False</span> <span class="n">build_strategy</span><span class="o">.</span><span class="n">sync_batch_norm</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">quant_train_program</span> <span class="o">=</span> <span class="n">quant_train_program</span><span class="o">.</span><span class="n">with_data_parallel</span><span class="p">(</span> <span class="n">quant_train_program</span> <span class="o">=</span> <span class="n">quant_train_program</span><span class="o">.</span><span class="n">with_data_parallel</span><span class="p">(</span>
...@@ -349,47 +357,33 @@ ...@@ -349,47 +357,33 @@
</pre></div> </pre></div>
</td></tr></table> </td></tr></table>
<p>更详细的用法请参考 <a href='../../demo/quant/quant_aware/README.md'>量化训练demo</a></p> <p>更详细的用法请参考 <a href='https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_aware'>量化训练demo</a></p>
<h2 id="api_1">离线量化API<a class="headerlink" href="#api_1" title="Permanent link">#</a></h2> <h2 id="quant_post">quant_post<a class="headerlink" href="#quant_post" title="Permanent link">#</a></h2>
<p><table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span> 1 <dl>
2 <dt>paddleslim.quant.quant_post(executor, model_dir, quantize_model_path,sample_generator, model_filename=None, params_filename=None, batch_size=16,batch_nums=None, scope=None, algo='KL', quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"])<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
3 <dd>
4 <p>对保存在<code>${model_dir}</code>下的模型进行量化,使用<code>sample_generator</code>的数据进行参数校正。</p>
5 </dd>
6 </dl>
7 <p><strong>参数:</strong></p>
8 <ul>
9 <li><strong>executor (fluid.Executor)</strong> - 执行模型的executor,可以在cpu或者gpu上执行。</li>
10 <li><strong>model_dir(str)</strong> - 需要量化的模型所在的文件夹。</li>
11</pre></div></td><td class="code"><div class="codehilite"><pre><span></span><span class="n">paddleslim</span><span class="p">.</span><span class="n">quant</span><span class="p">.</span><span class="n">quant_post</span><span class="p">(</span><span class="n">executor</span><span class="p">,</span> <li><strong>quantize_model_path(str)</strong> - 保存量化后的模型的路径</li>
<span class="n">model_dir</span><span class="p">,</span> <li><strong>sample_generator(python generator)</strong> - 读取数据样本,每次返回一个样本。</li>
<span class="n">quantize_model_path</span><span class="p">,</span> <li><strong>model_filename(str, optional)</strong> - 模型文件名,如果需要量化的模型的参数存在一个文件中,则需要设置<code>model_filename</code>为模型文件的名称,否则设置为<code>None</code>即可。默认值是<code>None</code></li>
<span class="n">sample_generator</span><span class="p">,</span> <li><strong>params_filename(str)</strong> - 参数文件名,如果需要量化的模型的参数存在一个文件中,则需要设置<code>params_filename</code>为参数文件的名称,否则设置为<code>None</code>即可。默认值是<code>None</code></li>
<span class="n">model_filename</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <li><strong>batch_size(int)</strong> - 每个batch的图片数量。默认值为16 。</li>
<span class="n">params_filename</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <li><strong>batch_nums(int, optional)</strong> - 迭代次数。如果设置为<code>None</code>,则会一直运行到<code>sample_generator</code> 迭代结束, 否则,迭代次数为<code>batch_nums</code>, 也就是说参与对<code>Scale</code>进行校正的样本个数为 <code>'batch_nums' * 'batch_size'</code>.</li>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <li><strong>scope(fluid.Scope, optional)</strong> - 用来获取和写入<code>Variable</code>, 如果设置为<code>None</code>,则使用<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>. 默认值是<code>None</code>.</li>
<span class="n">batch_nums</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <li><strong>algo(str)</strong> - 量化时使用的算法名称,可为<code>'KL'</code>或者<code>'direct'</code>。该参数仅针对激活值的量化,因为参数值的量化使用的方式为<code>'channel_wise_abs_max'</code>. 当<code>algo</code> 设置为<code>'direct'</code>时,使用校正数据的激活值的绝对值的最大值当作<code>Scale</code>值,当设置为<code>'KL'</code>时,则使用<code>KL</code>散度的方法来计算<code>Scale</code>值。默认值为<code>'KL'</code></li>
<span class="k">scope</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <li><strong>quantizable_op_type(list[str])</strong> - 需要量化的<code>op</code>类型列表。默认值为<code>["conv2d", "depthwise_conv2d", "mul"]</code></li>
<span class="n">algo</span><span class="o">=</span><span class="s1">&#39;KL&#39;</span><span class="p">,</span> </ul>
<span class="n">quantizable_op_type</span><span class="o">=</span><span class="p">[</span><span class="ss">&quot;conv2d&quot;</span><span class="p">,</span> <span class="ss">&quot;depthwise_conv2d&quot;</span><span class="p">,</span> <span class="ss">&quot;mul&quot;</span><span class="p">])</span>
</pre></div>
</td></tr></table>
对保存在<code>${model_dir}</code>下的模型进行量化,使用<code>sample_generator</code>的数据进行参数校正。</p>
<p><strong>参数:</strong>
- <strong>executor (fluid.Executor)</strong> - 执行模型的executor,可以在cpu或者gpu上执行。
- <strong>model_dir(str)</strong> - 需要量化的模型所在的文件夹。
- <strong>quantize_model_path(str)</strong> - 保存量化后的模型的路径
- <strong>sample_generator(python generator)</strong> - 读取数据样本,每次返回一个样本。
- <strong>model_filename(str, optional)</strong> - 模型文件名,如果需要量化的模型的参数存在一个文件中,则需要设置<code>model_filename</code>为模型文件的名称,否则设置为<code>None</code>即可。默认值是<code>None</code>
- <strong>params_filename(str)</strong> - 参数文件名,如果需要量化的模型的参数存在一个文件中,则需要设置<code>params_filename</code>为参数文件的名称,否则设置为<code>None</code>即可。默认值是<code>None</code>
- <strong>batch_size(int)</strong> - 每个batch的图片数量。默认值为16 。
- <strong>batch_nums(int, optional)</strong> - 迭代次数。如果设置为<code>None</code>,则会一直运行到<code>sample_generator</code> 迭代结束, 否则,迭代次数为<code>batch_nums</code>, 也就是说参与对<code>Scale</code>进行校正的样本个数为 <code>'batch_nums' * 'batch_size'</code>.
- <strong>scope(fluid.Scope, optional)</strong> - 用来获取和写入<code>Variable</code>, 如果设置为<code>None</code>,则使用<code>fluid.global_scope()</code>. 默认值是<code>None</code>.
- <strong>algo(str)</strong> - 量化时使用的算法名称,可为<code>'KL'</code>或者<code>'direct'</code>。该参数仅针对激活值的量化,因为参数值的量化使用的方式为<code>'channel_wise_abs_max'</code>. 当<code>algo</code> 设置为<code>'direct'</code>时,使用校正数据的激活值的绝对值的最大值当作<code>Scale</code>值,当设置为<code>'KL'</code>时,则使用<code>KL</code>散度的方法来计算<code>Scale</code>值。默认值为<code>'KL'</code>
- <strong>quantizable_op_type(list[str])</strong> - 需要量化的<code>op</code>类型列表。默认值为<code>["conv2d", "depthwise_conv2d", "mul"]</code></p>
<p><strong>返回</strong></p> <p><strong>返回</strong></p>
<p>无。</p> <p>无。</p>
<p><strong>注意事项</strong></p> <div class="admonition note">
<p class="admonition-title">注意事项</p>
</div>
<p>因为该接口会收集校正数据的所有的激活值,所以使用的校正图片不能太多。<code>'KL'</code>散度的计算也比较耗时。</p> <p>因为该接口会收集校正数据的所有的激活值,所以使用的校正图片不能太多。<code>'KL'</code>散度的计算也比较耗时。</p>
<p><strong>代码示例</strong></p> <p><strong>代码示例</strong></p>
<blockquote> <blockquote>
...@@ -419,8 +413,8 @@ ...@@ -419,8 +413,8 @@
<span class="n">place</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">CUDAPlace</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">if</span> <span class="n">use_gpu</span> <span class="k">else</span> <span class="n">fluid</span><span class="o">.</span><span class="n">CPUPlace</span><span class="p">()</span> <span class="n">place</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">CUDAPlace</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">if</span> <span class="n">use_gpu</span> <span class="k">else</span> <span class="n">fluid</span><span class="o">.</span><span class="n">CPUPlace</span><span class="p">()</span>
<span class="n">exe</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Executor</span><span class="p">(</span><span class="n">place</span><span class="p">)</span> <span class="n">exe</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Executor</span><span class="p">(</span><span class="n">place</span><span class="p">)</span>
<span class="n">quant_post</span><span class="p">(</span> <span class="hll"><span class="n">quant_post</span><span class="p">(</span>
<span class="n">executor</span><span class="o">=</span><span class="n">exe</span><span class="p">,</span> </span> <span class="n">executor</span><span class="o">=</span><span class="n">exe</span><span class="p">,</span>
<span class="n">model_dir</span><span class="o">=</span><span class="s1">&#39;./model_path&#39;</span><span class="p">,</span> <span class="n">model_dir</span><span class="o">=</span><span class="s1">&#39;./model_path&#39;</span><span class="p">,</span>
<span class="n">quantize_model_path</span><span class="o">=</span><span class="s1">&#39;./save_path&#39;</span><span class="p">,</span> <span class="n">quantize_model_path</span><span class="o">=</span><span class="s1">&#39;./save_path&#39;</span><span class="p">,</span>
<span class="n">sample_generator</span><span class="o">=</span><span class="n">val_reader</span><span class="p">,</span> <span class="n">sample_generator</span><span class="o">=</span><span class="n">val_reader</span><span class="p">,</span>
...@@ -430,22 +424,26 @@ ...@@ -430,22 +424,26 @@
<span class="n">batch_nums</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> <span class="n">batch_nums</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</pre></div> </pre></div>
</td></tr></table> </td></tr></table>
更详细的用法请参考 <a href='../../demo/quant/quant_post/README.md'>离线量化demo</a></p> 更详细的用法请参考 <a href='https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post'>离线量化demo</a></p>
<h2 id="embeddingapi">Embedding量化API<a class="headerlink" href="#embeddingapi" title="Permanent link">#</a></h2> <h2 id="quant_embedding">quant_embedding<a class="headerlink" href="#quant_embedding" title="Permanent link">#</a></h2>
<p><table class="codehilitetable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span>1</pre></div></td><td class="code"><div class="codehilite"><pre><span></span><span class="n">paddleslim</span><span class="p">.</span><span class="n">quant</span><span class="p">.</span><span class="n">quant_embedding</span><span class="p">(</span><span class="n">program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">,</span> <span class="k">scope</span><span class="o">=</span><span class="k">None</span><span class="p">)</span> <dl>
</pre></div> <dt>paddleslim.quant.quant_embedding(program, place, config, scope=None)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quant_embedding.py">[源代码]</a></dt>
</td></tr></table> <dd><code>Embedding</code>参数进行量化。</dd>
<code>Embedding</code>参数进行量化。</p> </dl>
<p><strong>参数:</strong> <p><strong>参数:</strong></p>
- <strong>program(fluid.Program)</strong> - 需要量化的program <ul>
- <strong>scope(fluid.Scope, optional)</strong> - 用来获取和写入<code>Variable</code>, 如果设置为<code>None</code>,则使用<code>fluid.global_scope()</code>. <li><strong>program(fluid.Program)</strong> - 需要量化的program</li>
- <strong>place(fluid.CPUPlace or fluid.CUDAPlace)</strong> - 运行program的设备 <li><strong>scope(fluid.Scope, optional)</strong> - 用来获取和写入<code>Variable</code>, 如果设置为<code>None</code>,则使用<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>.</li>
- <strong>config(dict)</strong> - 定义量化的配置。可以配置的参数有: <li><strong>place(fluid.CPUPlace | fluid.CUDAPlace)</strong> - 运行program的设备</li>
- <code>'params_name'</code> (str, required): 需要进行量化的参数名称,此参数必须设置。 <li><strong>config(dict)</strong> - 定义量化的配置。可以配置的参数有:<ul>
- <code>'quantize_type'</code> (str, optional): 量化的类型,目前支持的类型是<code>'abs_max'</code>, 待支持的类型有 <code>'log', 'product_quantization'</code>。 默认值是<code>'abs_max'</code>. <li><code>'params_name'</code> (str, required): 需要进行量化的参数名称,此参数必须设置。</li>
- <code>'quantize_bits'</code>(int, optional): 量化的<code>bit</code>数,目前支持的<code>bit</code>数为8。默认值是8. <li><code>'quantize_type'</code> (str, optional): 量化的类型,目前支持的类型是<code>'abs_max'</code>, 待支持的类型有 <code>'log', 'product_quantization'</code>。 默认值是<code>'abs_max'</code>.</li>
- <code>'dtype'</code>(str, optional): 量化之后的数据类型, 目前支持的是<code>'int8'</code>. 默认值是<code>int8</code> <li><code>'quantize_bits'</code>(int, optional): 量化的<code>bit</code>数,目前支持的<code>bit</code>数为8。默认值是8.</li>
- <code>'threshold'</code>(float, optional): 量化之前将根据此阈值对需要量化的参数值进行<code>clip</code>. 如果不设置,则跳过<code>clip</code>过程直接量化。</p> <li><code>'dtype'</code>(str, optional): 量化之后的数据类型, 目前支持的是<code>'int8'</code>. 默认值是<code>int8</code></li>
<li><code>'threshold'</code>(float, optional): 量化之前将根据此阈值对需要量化的参数值进行<code>clip</code>. 如果不设置,则跳过<code>clip</code>过程直接量化。</li>
</ul>
</li>
</ul>
<p><strong>返回</strong></p> <p><strong>返回</strong></p>
<p>量化之后的program</p> <p>量化之后的program</p>
<p><strong>返回类型</strong></p> <p><strong>返回类型</strong></p>
...@@ -493,10 +491,10 @@ ...@@ -493,10 +491,10 @@
<span class="n">exe</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">fluid</span><span class="o">.</span><span class="n">default_startup_program</span><span class="p">())</span> <span class="n">exe</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">fluid</span><span class="o">.</span><span class="n">default_startup_program</span><span class="p">())</span>
<span class="n">config</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;params_name&#39;</span><span class="p">:</span> <span class="s1">&#39;emb&#39;</span><span class="p">,</span> <span class="s1">&#39;quantize_type&#39;</span><span class="p">:</span> <span class="s1">&#39;abs_max&#39;</span><span class="p">}</span> <span class="n">config</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;params_name&#39;</span><span class="p">:</span> <span class="s1">&#39;emb&#39;</span><span class="p">,</span> <span class="s1">&#39;quantize_type&#39;</span><span class="p">:</span> <span class="s1">&#39;abs_max&#39;</span><span class="p">}</span>
<span class="n">quant_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_embedding</span><span class="p">(</span><span class="n">infer_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">)</span> <span class="hll"><span class="n">quant_program</span> <span class="o">=</span> <span class="n">quant</span><span class="o">.</span><span class="n">quant_embedding</span><span class="p">(</span><span class="n">infer_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">config</span><span class="p">)</span>
</pre></div> </span></pre></div>
</td></tr></table></p> </td></tr></table></p>
<p>更详细的用法请参考 <a href='../../demo/quant/quant_embedding/README.md'>Embedding量化demo</a></p> <p>更详细的用法请参考 <a href='https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_embedding'>Embedding量化demo</a></p>
</div> </div>
</div> </div>
......
...@@ -288,5 +288,5 @@ ...@@ -288,5 +288,5 @@
<!-- <!--
MkDocs version : 1.0.4 MkDocs version : 1.0.4
Build Date UTC : 2019-12-31 02:34:53 Build Date UTC : 2019-12-31 04:07:55
--> -->
因为 它太大了无法显示 source diff 。你可以改为 查看blob
无法预览此类型文件
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册