提交 c8e905e6 编写于 作者: B baiyfbupt

Deployed 09a1807b with MkDocs version: 1.0.4

上级 4c0695a7
......@@ -125,6 +125,9 @@
<a class="current" href="./">算法原理</a>
<ul class="subnav">
<li class="toctree-l2"><a href="#_1">目录</a></li>
<li class="toctree-l2"><a href="#1-quantization-aware-training">1. Quantization Aware Training量化介绍</a></li>
<ul>
......@@ -196,7 +199,7 @@
<li>算法原理</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/algo/algo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/algo/algo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -206,7 +209,14 @@
<div role="main">
<div class="section">
<h2 id="1-quantization-aware-training">1. Quantization Aware Training量化介绍<a class="headerlink" href="#1-quantization-aware-training" title="Permanent link">#</a></h2>
<h2 id="_1">目录<a class="headerlink" href="#_1" title="Permanent link">#</a></h2>
<ul>
<li><a href="#1-quantization-aware-training量化介绍">量化原理介绍</a></li>
<li><a href="#2-卷积核剪裁原理">剪裁原理介绍</a></li>
<li><a href="#3-蒸馏">蒸馏原理介绍</a></li>
<li><a href="#4-轻量级模型结构搜索">轻量级模型结构搜索原理介绍</a></li>
</ul>
<h2 id="1-quantization-aware-training">1. Quantization Aware Training量化介绍<a class="headerlink" href="#1-quantization-aware-training" title="Permanent link">#</a></h2>
<h3 id="11">1.1 背景<a class="headerlink" href="#11" title="Permanent link">#</a></h3>
<p>近年来,定点量化使用更少的比特数(如8-bit、3-bit、2-bit等)表示神经网络的权重和激活已被验证是有效的。定点量化的优点包括低内存带宽、低功耗、低计算资源占用以及低模型存储需求等。</p>
<p align="center">
......@@ -338,7 +348,7 @@ Y_{dq} = \frac{Y_q}{(n - 1) * (n - 1)} * X_m * W_m \
在剪裁一个卷积核之前,按l1_norm对filter从高到低排序,越靠后的filter越不重要,优先剪掉靠后的filter.</p>
<h3 id="23">2.3 基于敏感度剪裁卷积网络<a class="headerlink" href="#23" title="Permanent link">#</a></h3>
<p>根据每个卷积层敏感度的不同,剪掉不同比例的卷积核。</p>
<h4 id="_1">两个假设<a class="headerlink" href="#_1" title="Permanent link">#</a></h4>
<h4 id="_2">两个假设<a class="headerlink" href="#_2" title="Permanent link">#</a></h4>
<ul>
<li>在一个conv layer的parameter内部,按l1_norm对filter从高到低排序,越靠后的filter越不重要。</li>
<li>两个layer剪裁相同的比例的filters,我们称对模型精度影响更大的layer的敏感度相对高。</li>
......@@ -348,7 +358,7 @@ Y_{dq} = \frac{Y_q}{(n - 1) * (n - 1)} * X_m * W_m \
<li>layer的剪裁比例与其敏感度成反比</li>
<li>优先剪裁layer内l1_norm相对低的filter</li>
</ul>
<h4 id="_2">敏感度的理解<a class="headerlink" href="#_2" title="Permanent link">#</a></h4>
<h4 id="_3">敏感度的理解<a class="headerlink" href="#_3" title="Permanent link">#</a></h4>
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_3.png" height=200 width=400 hspace='10'/> <br />
<strong>图7</strong>
......@@ -356,7 +366,7 @@ Y_{dq} = \frac{Y_q}{(n - 1) * (n - 1)} * X_m * W_m \
<p>如**图7**所示,横坐标是将filter剪裁掉的比例,竖坐标是精度的损失,每条彩色虚线表示的是网络中的一个卷积层。
以不同的剪裁比例**单独**剪裁一个卷积层,并观察其在验证数据集上的精度损失,并绘出**图7**中的虚线。虚线上升较慢的,对应的卷积层相对不敏感,我们优先剪不敏感的卷积层的filter.</p>
<h4 id="_3">选择最优的剪裁率组合<a class="headerlink" href="#_3" title="Permanent link">#</a></h4>
<h4 id="_4">选择最优的剪裁率组合<a class="headerlink" href="#_4" title="Permanent link">#</a></h4>
<p>我们将**图7**中的折线拟合为**图8**中的曲线,每在竖坐标轴上选取一个精度损失值,就在横坐标轴上对应着一组剪裁率,如**图8**中黑色实线所示。
用户给定一个模型整体的剪裁率,我们通过移动**图5**中的黑色实线来找到一组满足条件的且合法的剪裁率。</p>
<p align="center">
......@@ -364,7 +374,7 @@ Y_{dq} = \frac{Y_q}{(n - 1) * (n - 1)} * X_m * W_m \
<strong>图8</strong>
</p>
<h4 id="_4">迭代剪裁<a class="headerlink" href="#_4" title="Permanent link">#</a></h4>
<h4 id="_5">迭代剪裁<a class="headerlink" href="#_5" title="Permanent link">#</a></h4>
<p>考虑到多个卷积层间的相关性,一个卷积层的修改可能会影响其它卷积层的敏感度,我们采取了多次剪裁的策略,步骤如下:</p>
<ul>
<li>step1: 统计各卷积层的敏感度信息</li>
......
此差异已折叠。
......@@ -150,7 +150,7 @@
<li>PaddleSlim API文档导航</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/api_guide.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/api/api_guide.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......
......@@ -163,7 +163,7 @@
<li>SA搜索</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/nas_api.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/api/nas_api.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -182,16 +182,12 @@
<li><strong>block_num(int|None)</strong>:- <code>block_num</code>表示搜索空间中block的数量。</li>
<li><strong>block_mask(list|None)</strong>:- <code>block_mask</code>是一组由0、1组成的列表,0表示当前block是normal block,1表示当前block是reduction block。如果设置了<code>block_mask</code>,则主要以<code>block_mask</code>为主要配置,<code>input_size</code><code>output_size</code><code>block_num</code>三种配置是无效的。</li>
</ul>
<div class="admonition note">
<p class="admonition-title">Note</p>
<ol>
<li>reduction block表示经过这个block之后的feature map大小下降为之前的一半,normal block表示经过这个block之后feature map大小不变。<br></li>
<li><code>input_size</code><code>output_size</code>用来计算整个模型结构中reduction block数量。</li>
</ol>
</div>
<p>Note:<br>
1. reduction block表示经过这个block之后的feature map大小下降为之前的一半,normal block表示经过这个block之后feature map大小不变。<br>
2. <code>input_size</code><code>output_size</code>用来计算整个模型结构中reduction block数量。</p>
<h2 id="sanas">SANAS<a class="headerlink" href="#sanas" title="Permanent link">#</a></h2>
<dl>
<dt>paddleslim.nas.SANAS(configs, server_addr=("", 8881), init_temperature=100, reduce_rate=0.85, search_steps=300, save_checkpoint='./nas_checkpoint', load_checkpoint=None, is_server=True)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/nas/sa_nas.py#L36">[源代码]</a></dt>
<dt>paddleslim.nas.SANAS(configs, server_addr=("", 8881), init_temperature=100, reduce_rate=0.85, search_steps=300, save_checkpoint='./nas_checkpoint', load_checkpoint=None, is_server=True)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/nas/sa_nas.py#L36">源代码</a></dt>
<dd>SANAS(Simulated Annealing Neural Architecture Search)是基于模拟退火算法进行模型结构搜索的算法,一般用于离散搜索任务。</dd>
</dl>
<p><strong>参数:</strong></p>
......@@ -208,18 +204,16 @@
<p><strong>返回:</strong>
一个SANAS类的实例</p>
<p><strong>示例代码:</strong>
<div class="codehilite"><pre><span></span><span class="kn">from</span> <span class="nn">paddleslim.nas</span> <span class="kn">import</span> <span class="n">SANAS</span>
<span class="n">config</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">&#39;MobileNetV2Space&#39;</span><span class="p">)]</span>
<span class="n">sanas</span> <span class="o">=</span> <span class="n">SANAS</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="n">config</span><span class="p">)</span>
<div class="highlight"><pre><span></span>from paddleslim.nas import SANAS
config = [(&#39;MobileNetV2Space&#39;)]
sanas = SANAS(config=config)
</pre></div></p>
<dl>
<dt>paddlesim.nas.SANAS.tokens2arch(tokens)</dt>
<dd>通过一组token得到实际的模型结构,一般用来把搜索到最优的token转换为模型结构用来做最后的训练。</dd>
</dl>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>tokens是一个列表,token映射到搜索空间转换成相应的网络结构,一组token对应唯一的一个网络结构。</p>
</div>
<p>Note:<br>
tokens是一个列表,token映射到搜索空间转换成相应的网络结构,一组token对应唯一的一个网络结构。</p>
<p><strong>参数:</strong></p>
<ul>
<li><strong>tokens(list):</strong> - 一组token。</li>
......@@ -227,12 +221,12 @@
<p><strong>返回:</strong>
根据传入的token得到一个模型结构实例。</p>
<p><strong>示例代码:</strong>
<div class="codehilite"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;input&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">archs</span> <span class="o">=</span> <span class="n">sanas</span><span class="o">.</span><span class="n">token2arch</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="k">for</span> <span class="n">arch</span> <span class="ow">in</span> <span class="n">archs</span><span class="p">:</span>
<span class="n">output</span> <span class="o">=</span> <span class="n">arch</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">output</span>
<div class="highlight"><pre><span></span>import paddle.fluid as fluid
input = fluid.data(name=&#39;input&#39;, shape=[None, 3, 32, 32], dtype=&#39;float32&#39;)
archs = sanas.token2arch(tokens)
for arch in archs:
output = arch(input)
input = output
</pre></div></p>
<dl>
<dt>paddleslim.nas.SANAS.next_archs()</dt>
......@@ -241,12 +235,12 @@
<p><strong>返回:</strong>
返回模型结构实例的列表,形式为list。</p>
<p><strong>示例代码:</strong>
<div class="codehilite"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;input&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">archs</span> <span class="o">=</span> <span class="n">sanas</span><span class="o">.</span><span class="n">next_archs</span><span class="p">()</span>
<span class="k">for</span> <span class="n">arch</span> <span class="ow">in</span> <span class="n">archs</span><span class="p">:</span>
<span class="n">output</span> <span class="o">=</span> <span class="n">arch</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">output</span>
<div class="highlight"><pre><span></span>import paddle.fluid as fluid
input = fluid.data(name=&#39;input&#39;, shape=[None, 3, 32, 32], dtype=&#39;float32&#39;)
archs = sanas.next_archs()
for arch in archs:
output = arch(input)
input = output
</pre></div></p>
<dl>
<dt>paddleslim.nas.SANAS.reward(score)</dt>
......
此差异已折叠。
......@@ -172,7 +172,7 @@
<li>量化</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/quantization_api.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/api/quantization_api.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -184,29 +184,50 @@
<h2 id="_1">量化配置<a class="headerlink" href="#_1" title="Permanent link">#</a></h2>
<p>通过字典配置量化参数</p>
<div class="codehilite"><pre><span></span><span class="nv">quant_config_default</span> <span class="o">=</span> {
<span class="s1">&#39;</span><span class="s">weight_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>,
<span class="s1">&#39;</span><span class="s">activation_quantize_type</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">abs_max</span><span class="s1">&#39;</span>,
<span class="s1">&#39;</span><span class="s">weight_bits</span><span class="s1">&#39;</span>: <span class="mi">8</span>,
<span class="s1">&#39;</span><span class="s">activation_bits</span><span class="s1">&#39;</span>: <span class="mi">8</span>,
# <span class="nv">ops</span> <span class="nv">of</span> <span class="nv">name_scope</span> <span class="nv">in</span> <span class="nv">not_quant_pattern</span> <span class="nv">list</span>, <span class="nv">will</span> <span class="nv">not</span> <span class="nv">be</span> <span class="nv">quantized</span>
<span class="s1">&#39;</span><span class="s">not_quant_pattern</span><span class="s1">&#39;</span>: [<span class="s1">&#39;</span><span class="s">skip_quant</span><span class="s1">&#39;</span>],
# <span class="nv">ops</span> <span class="nv">of</span> <span class="nv">type</span> <span class="nv">in</span> <span class="nv">quantize_op_types</span>, <span class="nv">will</span> <span class="nv">be</span> <span class="nv">quantized</span>
<span class="s1">&#39;</span><span class="s">quantize_op_types</span><span class="s1">&#39;</span>:
[<span class="s1">&#39;</span><span class="s">conv2d</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">depthwise_conv2d</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">mul</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">elementwise_add</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">pool2d</span><span class="s1">&#39;</span>],
# <span class="nv">data</span> <span class="nv">type</span> <span class="nv">after</span> <span class="nv">quantization</span>, <span class="nv">such</span> <span class="nv">as</span> <span class="s1">&#39;</span><span class="s">uint8</span><span class="s1">&#39;</span>, <span class="s1">&#39;</span><span class="s">int8</span><span class="s1">&#39;</span>, <span class="nv">etc</span>. <span class="nv">default</span> <span class="nv">is</span> <span class="s1">&#39;</span><span class="s">int8</span><span class="s1">&#39;</span>
<span class="s1">&#39;</span><span class="s">dtype</span><span class="s1">&#39;</span>: <span class="s1">&#39;</span><span class="s">int8</span><span class="s1">&#39;</span>,
# <span class="nv">window</span> <span class="nv">size</span> <span class="k">for</span> <span class="s1">&#39;</span><span class="s">range_abs_max</span><span class="s1">&#39;</span> <span class="nv">quantization</span>. <span class="nv">defaulf</span> <span class="nv">is</span> <span class="mi">10000</span>
<span class="s1">&#39;</span><span class="s">window_size</span><span class="s1">&#39;</span>: <span class="mi">10000</span>,
# <span class="nv">The</span> <span class="nv">decay</span> <span class="nv">coefficient</span> <span class="nv">of</span> <span class="nv">moving</span> <span class="nv">average</span>, <span class="nv">default</span> <span class="nv">is</span> <span class="mi">0</span>.<span class="mi">9</span>
<span class="s1">&#39;</span><span class="s">moving_rate</span><span class="s1">&#39;</span>: <span class="mi">0</span>.<span class="mi">9</span>,
<div class="highlight"><pre><span></span>TENSORRT_OP_TYPES = [
&#39;mul&#39;, &#39;conv2d&#39;, &#39;pool2d&#39;, &#39;depthwise_conv2d&#39;, &#39;elementwise_add&#39;,
&#39;leaky_relu&#39;
]
TRANSFORM_PASS_OP_TYPES = [&#39;conv2d&#39;, &#39;depthwise_conv2d&#39;, &#39;mul&#39;]
QUANT_DEQUANT_PASS_OP_TYPES = [
&quot;pool2d&quot;, &quot;elementwise_add&quot;, &quot;concat&quot;, &quot;softmax&quot;, &quot;argmax&quot;, &quot;transpose&quot;,
&quot;equal&quot;, &quot;gather&quot;, &quot;greater_equal&quot;, &quot;greater_than&quot;, &quot;less_equal&quot;,
&quot;less_than&quot;, &quot;mean&quot;, &quot;not_equal&quot;, &quot;reshape&quot;, &quot;reshape2&quot;,
&quot;bilinear_interp&quot;, &quot;nearest_interp&quot;, &quot;trilinear_interp&quot;, &quot;slice&quot;,
&quot;squeeze&quot;, &quot;elementwise_sub&quot;, &quot;relu&quot;, &quot;relu6&quot;, &quot;leaky_relu&quot;, &quot;tanh&quot;, &quot;swish&quot;
]
_quant_config_default = {
# weight quantize type, default is &#39;channel_wise_abs_max&#39;
&#39;weight_quantize_type&#39;: &#39;channel_wise_abs_max&#39;,
# activation quantize type, default is &#39;moving_average_abs_max&#39;
&#39;activation_quantize_type&#39;: &#39;moving_average_abs_max&#39;,
# weight quantize bit num, default is 8
&#39;weight_bits&#39;: 8,
# activation quantize bit num, default is 8
&#39;activation_bits&#39;: 8,
# ops of name_scope in not_quant_pattern list, will not be quantized
&#39;not_quant_pattern&#39;: [&#39;skip_quant&#39;],
# ops of type in quantize_op_types, will be quantized
&#39;quantize_op_types&#39;: [&#39;conv2d&#39;, &#39;depthwise_conv2d&#39;, &#39;mul&#39;],
# data type after quantization, such as &#39;uint8&#39;, &#39;int8&#39;, etc. default is &#39;int8&#39;
&#39;dtype&#39;: &#39;int8&#39;,
# window size for &#39;range_abs_max&#39; quantization. defaulf is 10000
&#39;window_size&#39;: 10000,
# The decay coefficient of moving average, default is 0.9
&#39;moving_rate&#39;: 0.9,
# if True, &#39;quantize_op_types&#39; will be TENSORRT_OP_TYPES
&#39;for_tensorrt&#39;: False,
# if True, &#39;quantoze_op_types&#39; will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES
&#39;is_full_quantize&#39;: False
}
</pre></div>
<p><strong>参数:</strong></p>
<ul>
<li><strong>weight_quantize_type(str)</strong> - 参数量化方式。可选<code>'abs_max'</code>, <code>'channel_wise_abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code> 默认<code>'abs_max'</code></li>
<li><strong>activation_quantize_type(str)</strong> - 激活量化方式,可选<code>'abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>,默认<code>'abs_max'</code></li>
<li><strong>weight_quantize_type(str)</strong> - 参数量化方式。可选<code>'abs_max'</code>, <code>'channel_wise_abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>如果使用<code>TensorRT</code>加载量化后的模型来预测,请使用<code>'channel_wise_abs_max'</code>。 默认<code>'channel_wise_abs_max'</code></li>
<li><strong>activation_quantize_type(str)</strong> - 激活量化方式,可选<code>'abs_max'</code>, <code>'range_abs_max'</code>, <code>'moving_average_abs_max'</code>。如果使用<code>TensorRT</code>加载量化后的模型来预测,请使用<code>'range_abs_max', 'moving_average_abs_max'</code>。,默认<code>'moving_average_abs_max'</code></li>
<li><strong>weight_bits(int)</strong> - 参数量化bit数,默认8, 推荐设为8。</li>
<li><strong>activation_bits(int)</strong> - 激活量化bit数,默认8, 推荐设为8。</li>
<li><strong>not_quant_pattern(str | list[str])</strong> - 所有<code>name_scope</code>包含<code>'not_quant_pattern'</code>字符串的<code>op</code>,都不量化, 设置方式请参考<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/name_scope_cn.html#name-scope"><em>fluid.name_scope</em></a></li>
......@@ -214,6 +235,14 @@
<li><strong>dtype(int8)</strong> - 量化后的参数类型,默认 <code>int8</code>, 目前仅支持<code>int8</code></li>
<li><strong>window_size(int)</strong> - <code>'range_abs_max'</code>量化方式的<code>window size</code>,默认10000。</li>
<li><strong>moving_rate(int)</strong> - <code>'moving_average_abs_max'</code>量化方式的衰减系数,默认 0.9。</li>
<li><strong>for_tensorrt(bool)</strong> - 量化后的模型是否使用<code>TensorRT</code>进行预测。如果是的话,量化op类型为:<code>TENSORRT_OP_TYPES</code>。默认值为False.</li>
<li><strong>is_full_quantize(bool)</strong> - 是否量化所有可支持op类型。默认值为False.</li>
</ul>
<div class="admonition note">
<p class="admonition-title">注意事项</p>
</div>
<ul>
<li>目前<code>Paddle-Lite</code>有int8 kernel来加速的op只有 <code>['conv2d', 'depthwise_conv2d', 'mul']</code>, 其他op的int8 kernel将陆续支持。</li>
</ul>
<h2 id="quant_aware">quant_aware<a class="headerlink" href="#quant_aware" title="Permanent link">#</a></h2>
<dl>
......@@ -237,13 +266,13 @@
</ul>
<div class="admonition note">
<p class="admonition-title">注意事项</p>
</div>
<ul>
<li>此接口会改变<code>program</code>结构,并且可能增加一些<code>persistable</code>的变量,所以加载模型参数时请注意和相应的<code>program</code>对应。</li>
<li>此接口底层经历了<code>fluid.Program</code>-&gt; <code>fluid.framework.IrGraph</code>-&gt;<code>fluid.Program</code>的转变,在<code>fluid.framework.IrGraph</code>中没有<code>Parameter</code>的概念,<code>Variable</code>只有<code>persistable</code><code>not persistable</code>的区别,所以在保存和加载参数时,请使用<code>fluid.io.save_persistables</code><code>fluid.io.load_persistables</code>接口。</li>
<li>由于此接口会根据<code>program</code>的结构和量化配置来对<code>program</code>添加op,所以<code>Paddle</code>中一些通过<code>fuse op</code>来加速训练的策略不能使用。已知以下策略在使用量化时必须设为<code>False</code><code>fuse_all_reduce_ops, sync_batch_norm</code></li>
<li>如果传入的<code>program</code>中存在和任何op都没有连接的<code>Variable</code>,则会在量化的过程中被优化掉。</li>
</ul>
</div>
<h2 id="convert">convert<a class="headerlink" href="#convert" title="Permanent link">#</a></h2>
<dl>
<dt>paddleslim.quant.convert(program, place, config, scope=None, save_int8=False)<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
......@@ -266,10 +295,10 @@
</ul>
<div class="admonition note">
<p class="admonition-title">注意事项</p>
<p>因为该接口会对<code>op</code><code>Variable</code>做相应的删除和修改,所以此接口只能在训练完成之后调用。如果想转化训练的中间模型,可加载相应的参数之后再使用此接口。</p>
</div>
<p>因为该接口会对<code>op</code><code>Variable</code>做相应的删除和修改,所以此接口只能在训练完成之后调用。如果想转化训练的中间模型,可加载相应的参数之后再使用此接口。</p>
<p><strong>代码示例</strong></p>
<div class="codehilite"><pre><span></span><span class="c1">#encoding=utf8</span>
<div class="highlight"><pre><span></span><span class="c1">#encoding=utf8</span>
<span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<span class="kn">import</span> <span class="nn">paddleslim.quant</span> <span class="kn">as</span> <span class="nn">quant</span>
......@@ -311,7 +340,7 @@
<p>更详细的用法请参考 <a href='https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_aware'>量化训练demo</a></p>
<h2 id="quant_post">quant_post<a class="headerlink" href="#quant_post" title="Permanent link">#</a></h2>
<dl>
<dt>paddleslim.quant.quant_post(executor, model_dir, quantize_model_path,sample_generator, model_filename=None, params_filename=None, batch_size=16,batch_nums=None, scope=None, algo='KL', quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"])<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
<dt>paddleslim.quant.quant_post(executor, model_dir, quantize_model_path,sample_generator, model_filename=None, params_filename=None, batch_size=16,batch_nums=None, scope=None, algo='KL', quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"], is_full_quantize=False, is_use_cache_file=False, cache_dir="./temp_post_training")<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py">[源代码]</a></dt>
<dd>
<p>对保存在<code>${model_dir}</code>下的模型进行量化,使用<code>sample_generator</code>的数据进行参数校正。</p>
</dd>
......@@ -329,18 +358,24 @@
<li><strong>scope(fluid.Scope, optional)</strong> - 用来获取和写入<code>Variable</code>, 如果设置为<code>None</code>,则使用<a href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html"><em>fluid.global_scope()</em></a>. 默认值是<code>None</code>.</li>
<li><strong>algo(str)</strong> - 量化时使用的算法名称,可为<code>'KL'</code>或者<code>'direct'</code>。该参数仅针对激活值的量化,因为参数值的量化使用的方式为<code>'channel_wise_abs_max'</code>. 当<code>algo</code> 设置为<code>'direct'</code>时,使用校正数据的激活值的绝对值的最大值当作<code>Scale</code>值,当设置为<code>'KL'</code>时,则使用<code>KL</code>散度的方法来计算<code>Scale</code>值。默认值为<code>'KL'</code></li>
<li><strong>quantizable_op_type(list[str])</strong> - 需要量化的<code>op</code>类型列表。默认值为<code>["conv2d", "depthwise_conv2d", "mul"]</code></li>
<li><strong>is_full_quantize(bool)</strong> - 是否量化所有可支持的op类型。如果设置为False, 则按照 <code>'quantizable_op_type'</code> 的设置进行量化。</li>
<li><strong>is_use_cache_file(bool)</strong> - 是否使用硬盘对中间结果进行存储。如果为False, 则将中间结果存储在内存中。</li>
<li><strong>cache_dir(str)</strong> - 如果 <code>'is_use_cache_file'</code>为True, 则将中间结果存储在此参数设置的路径下。</li>
</ul>
<p><strong>返回</strong></p>
<p>无。</p>
<div class="admonition note">
<p class="admonition-title">注意事项</p>
<p>因为该接口会收集校正数据的所有的激活值,所以使用的校正图片不能太多。<code>'KL'</code>散度的计算也比较耗时。</p>
</div>
<ul>
<li>因为该接口会收集校正数据的所有的激活值,当校正图片比较多时,请设置<code>'is_use_cache_file'</code>为True, 将中间结果存储在硬盘中。另外,<code>'KL'</code>散度的计算比较耗时。</li>
<li>目前<code>Paddle-Lite</code>有int8 kernel来加速的op只有 <code>['conv2d', 'depthwise_conv2d', 'mul']</code>, 其他op的int8 kernel将陆续支持。</li>
</ul>
<p><strong>代码示例</strong></p>
<blockquote>
<p>注: 此示例不能直接运行,因为需要加载<code>${model_dir}</code>下的模型,所以不能直接运行。</p>
</blockquote>
<p><div class="codehilite"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<p><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<span class="kn">import</span> <span class="nn">paddle.dataset.mnist</span> <span class="kn">as</span> <span class="nn">reader</span>
<span class="kn">from</span> <span class="nn">paddleslim.quant</span> <span class="kn">import</span> <span class="n">quant_post</span>
<span class="n">val_reader</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
......@@ -383,7 +418,7 @@
<p><strong>返回类型</strong></p>
<p><code>fluid.Program</code></p>
<p><strong>代码示例</strong>
<div class="codehilite"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<span class="kn">import</span> <span class="nn">paddleslim.quant</span> <span class="kn">as</span> <span class="nn">quant</span>
<span class="n">train_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
......
此差异已折叠。
......@@ -168,7 +168,7 @@
<li>Home</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/index.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/index.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -211,15 +211,15 @@
<ul>
<li>安装develop版本</li>
</ul>
<div class="codehilite"><pre><span></span><span class="n">git</span> <span class="n">clone</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">PaddlePaddle</span><span class="o">/</span><span class="n">PaddleSlim</span><span class="p">.</span><span class="n">git</span>
<span class="n">cd</span> <span class="n">PaddleSlim</span>
<span class="n">python</span> <span class="n">setup</span><span class="p">.</span><span class="n">py</span> <span class="n">install</span>
<div class="highlight"><pre><span></span>git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install
</pre></div>
<ul>
<li>安装官方发布的最新版本</li>
</ul>
<div class="codehilite"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">paddleslim</span> <span class="o">-</span><span class="n">i</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">pypi</span><span class="p">.</span><span class="n">org</span><span class="o">/</span><span class="k">simple</span>
<div class="highlight"><pre><span></span>pip install paddleslim -i https://pypi.org/simple
</pre></div>
<ul>
......@@ -289,5 +289,5 @@
<!--
MkDocs version : 1.0.4
Build Date UTC : 2020-01-16 05:32:44
Build Date UTC : 2020-01-16 06:38:06
-->
......@@ -58,7 +58,7 @@
<a class="current" href="./">模型库</a>
<ul class="subnav">
<li class="toctree-l2"><a href="#1">1. 图分类</a></li>
<li class="toctree-l2"><a href="#1">1. 图分类</a></li>
<ul>
......@@ -190,7 +190,7 @@
<li>模型库</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/model_zoo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/model_zoo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -200,7 +200,7 @@
<div role="main">
<div class="section">
<h2 id="1">1. 图分类<a class="headerlink" href="#1" title="Permanent link">#</a></h2>
<h2 id="1">1. 图分类<a class="headerlink" href="#1" title="Permanent link">#</a></h2>
<p>数据集:ImageNet1000类</p>
<h3 id="11">1.1 量化<a class="headerlink" href="#11" title="Permanent link">#</a></h3>
<table>
......@@ -216,7 +216,7 @@
<tbody>
<tr>
<td align="center">MobileNetV1</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">70.99%/89.68%</td>
<td align="center">xx</td>
<td align="center"><a href="">下载链接</a></td>
......@@ -237,7 +237,7 @@
</tr>
<tr>
<td align="center">MobileNetV2</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">72.15%/90.65%</td>
<td align="center">xx</td>
<td align="center"><a href="">下载链接</a></td>
......@@ -258,7 +258,7 @@
</tr>
<tr>
<td align="center">ResNet50</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">76.50%/93.00%</td>
<td align="center">xx</td>
<td align="center"><a href="">下载链接</a></td>
......@@ -294,7 +294,7 @@
<tbody>
<tr>
<td align="center">MobileNetV1</td>
<td align="center">baseline</td>
<td align="center">Baseline</td>
<td align="center">70.99%/89.68%</td>
<td align="center">17</td>
<td align="center">1.11</td>
......@@ -326,7 +326,7 @@
</tr>
<tr>
<td align="center">MobileNetV2</td>
<td align="center">baseline</td>
<td align="center">-</td>
<td align="center">72.15%/90.65%</td>
<td align="center">15</td>
<td align="center">0.59</td>
......@@ -342,7 +342,7 @@
</tr>
<tr>
<td align="center">ResNet34</td>
<td align="center">baseline</td>
<td align="center">-</td>
<td align="center">72.15%/90.65%</td>
<td align="center">84</td>
<td align="center">7.36</td>
......@@ -460,7 +460,7 @@
<tbody>
<tr>
<td align="center">MobileNet-V1-YOLOv3</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">29.3</td>
......@@ -493,7 +493,7 @@
</tr>
<tr>
<td align="center">R50-dcn-YOLOv3 obj365_pretrain</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">41.4</td>
......@@ -542,7 +542,7 @@
<tbody>
<tr>
<td align="center">BlazeFace</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">8</td>
<td align="center">640</td>
<td align="center">0.915/0.892/0.797</td>
......@@ -569,7 +569,7 @@
</tr>
<tr>
<td align="center">BlazeFace-Lite</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">8</td>
<td align="center">640</td>
<td align="center">0.909/0.885/0.781</td>
......@@ -596,7 +596,7 @@
</tr>
<tr>
<td align="center">BlazeFace-NAS</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">8</td>
<td align="center">640</td>
<td align="center">0.837/0.807/0.658</td>
......@@ -643,7 +643,7 @@
<tbody>
<tr>
<td align="center">MobileNet-V1-YOLOv3</td>
<td align="center">baseline</td>
<td align="center">Baseline</td>
<td align="center">Pascal VOC</td>
<td align="center">8</td>
<td align="center">76.2</td>
......@@ -667,7 +667,7 @@
</tr>
<tr>
<td align="center">MobileNet-V1-YOLOv3</td>
<td align="center">baseline</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">29.3</td>
......@@ -691,7 +691,7 @@
</tr>
<tr>
<td align="center">R50-dcn-YOLOv3</td>
<td align="center">baseline</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">39.1</td>
......@@ -727,7 +727,7 @@
</tr>
<tr>
<td align="center">R50-dcn-YOLOv3 obj365_pretrain</td>
<td align="center">baseline</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">41.4</td>
......@@ -782,7 +782,7 @@
<tbody>
<tr>
<td align="center">MobileNet-V1-YOLOv3</td>
<td align="center">student</td>
<td align="center">-</td>
<td align="center">Pascal VOC</td>
<td align="center">8</td>
<td align="center">76.2</td>
......@@ -793,7 +793,7 @@
</tr>
<tr>
<td align="center">ResNet34-YOLOv3</td>
<td align="center">teacher</td>
<td align="center">-</td>
<td align="center">Pascal VOC</td>
<td align="center">8</td>
<td align="center">82.6</td>
......@@ -815,7 +815,7 @@
</tr>
<tr>
<td align="center">MobileNet-V1-YOLOv3</td>
<td align="center">student</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">29.3</td>
......@@ -826,7 +826,7 @@
</tr>
<tr>
<td align="center">ResNet34-YOLOv3</td>
<td align="center">teacher</td>
<td align="center">-</td>
<td align="center">COCO</td>
<td align="center">8</td>
<td align="center">36.2</td>
......@@ -864,7 +864,7 @@
<tbody>
<tr>
<td align="center">DeepLabv3+/MobileNetv1</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">63.26</td>
<td align="center">xx</td>
<td align="center"><a href="">下载链接</a></td>
......@@ -885,7 +885,7 @@
</tr>
<tr>
<td align="center">DeepLabv3+/MobileNetv2</td>
<td align="center">FP32 baseline</td>
<td align="center">-</td>
<td align="center">69.81</td>
<td align="center">xx</td>
<td align="center"><a href="">下载链接</a></td>
......
此差异已折叠。
......@@ -177,7 +177,7 @@
<li>搜索空间</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/search_space.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/search_space.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -243,7 +243,7 @@
&emsp; 2. token中每个数字的搜索列表长度(<code>range_table</code>函数),tokens中每个token的索引范围。<br>
&emsp; 3. 根据token产生模型结构(<code>token2arch</code>函数),根据搜索到的tokens列表产生模型结构。 <br></p>
<p>以新增reset block为例说明如何构造自己的search space。自定义的search space不能和已有的search space同名。</p>
<div class="codehilite"><pre><span></span><span class="c1">### 引入搜索空间基类函数和search space的注册类函数</span>
<div class="highlight"><pre><span></span><span class="c1">### 引入搜索空间基类函数和search space的注册类函数</span>
<span class="kn">from</span> <span class="nn">.search_space_base</span> <span class="kn">import</span> <span class="n">SearchSpaceBase</span>
<span class="kn">from</span> <span class="nn">.search_space_registry</span> <span class="kn">import</span> <span class="n">SEARCHSPACE</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
......
无法预览此类型文件
......@@ -172,7 +172,7 @@
<li>硬件延时评估表</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/table_latency.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/table_latency.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -208,7 +208,7 @@
<p>操作信息字段之间以逗号分割。操作信息与延迟信息之间以制表符分割。</p>
<h3 id="conv2d">conv2d<a class="headerlink" href="#conv2d" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">flag_bias</span><span class="p">,</span><span class="n">flag_relu</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="p">,</span><span class="n">c_out</span><span class="p">,</span><span class="n">groups</span><span class="p">,</span><span class="n">kernel</span><span class="p">,</span><span class="n">padding</span><span class="p">,</span><span class="n">stride</span><span class="p">,</span><span class="n">dilation</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,flag_bias,flag_relu,n_in,c_in,h_in,w_in,c_out,groups,kernel,padding,stride,dilation\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......@@ -230,7 +230,7 @@
</ul>
<h3 id="activation">activation<a class="headerlink" href="#activation" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,n_in,c_in,h_in,w_in\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......@@ -244,7 +244,7 @@
</ul>
<h3 id="batch_norm">batch_norm<a class="headerlink" href="#batch_norm" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">active_type</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,active_type,n_in,c_in,h_in,w_in\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......@@ -259,7 +259,7 @@
</ul>
<h3 id="eltwise">eltwise<a class="headerlink" href="#eltwise" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,n_in,c_in,h_in,w_in\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......@@ -273,7 +273,7 @@
</ul>
<h3 id="pooling">pooling<a class="headerlink" href="#pooling" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">flag_global_pooling</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="p">,</span><span class="n">kernel</span><span class="p">,</span><span class="n">padding</span><span class="p">,</span><span class="n">stride</span><span class="p">,</span><span class="n">ceil_mode</span><span class="p">,</span><span class="n">pool_type</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,flag_global_pooling,n_in,c_in,h_in,w_in,kernel,padding,stride,ceil_mode,pool_type\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......@@ -293,7 +293,7 @@
</ul>
<h3 id="softmax">softmax<a class="headerlink" href="#softmax" title="Permanent link">#</a></h3>
<p><strong>格式</strong></p>
<div class="codehilite"><pre><span></span><span class="n">op_type</span><span class="p">,</span><span class="n">axis</span><span class="p">,</span><span class="n">n_in</span><span class="p">,</span><span class="n">c_in</span><span class="p">,</span><span class="n">h_in</span><span class="p">,</span><span class="n">w_in</span><span class="err">\</span><span class="n">tlatency</span>
<div class="highlight"><pre><span></span>op_type,axis,n_in,c_in,h_in,w_in\tlatency
</pre></div>
<p><strong>字段解释</strong></p>
......
......@@ -150,7 +150,7 @@
<li>Demo guide</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/demo_guide.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/demo_guide.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......
......@@ -177,7 +177,7 @@
<li>知识蒸馏</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/distillation_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/distillation_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -194,7 +194,7 @@
<p>一般情况下,模型参数量越多,结构越复杂,其性能越好,但运算量和资源消耗也越大。<strong>知识蒸馏</strong> 就是一种将大模型学习到的有用信息(Dark Knowledge)压缩进更小更快的模型,而获得可以匹敌大模型结果的方法。</p>
<p>在本示例中精度较高的大模型被称为teacher,精度稍逊但速度更快的小模型被称为student。</p>
<h3 id="1-student_program">1. 定义student_program<a class="headerlink" href="#1-student_program" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">student_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<div class="highlight"><pre><span></span><span class="n">student_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="n">student_startup</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">program_guard</span><span class="p">(</span><span class="n">student_program</span><span class="p">,</span> <span class="n">student_startup</span><span class="p">):</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
......@@ -210,7 +210,7 @@
<h3 id="2-teacher_program">2. 定义teacher_program<a class="headerlink" href="#2-teacher_program" title="Permanent link">#</a></h3>
<p>在定义好<code>teacher_program</code>后,可以一并加载训练好的pretrained_model。</p>
<p><code>teacher_program</code>内需要加上<code>with fluid.unique_name.guard():</code>,保证teacher的变量命名不被<code>student_program</code>影响,从而能够正确地加载预训练参数。</p>
<div class="codehilite"><pre><span></span><span class="n">teacher_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<div class="highlight"><pre><span></span><span class="n">teacher_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="n">teacher_startup</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">program_guard</span><span class="p">(</span><span class="n">teacher_program</span><span class="p">,</span> <span class="n">teacher_startup</span><span class="p">):</span>
<span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">unique_name</span><span class="o">.</span><span class="n">guard</span><span class="p">():</span>
......@@ -232,7 +232,7 @@
<h3 id="3">3.选择特征图<a class="headerlink" href="#3" title="Permanent link">#</a></h3>
<p>定义好<code>student_program</code><code>teacher_program</code>后,我们需要从中两两对应地挑选出若干个特征图,留待后续为其添加知识蒸馏损失函数。</p>
<div class="codehilite"><pre><span></span><span class="c1"># get all student variables</span>
<div class="highlight"><pre><span></span><span class="c1"># get all student variables</span>
<span class="n">student_vars</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">student_program</span><span class="o">.</span><span class="n">list_vars</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
......@@ -255,14 +255,14 @@
<h3 id="4-programmerge">4. 合并Program(merge)<a class="headerlink" href="#4-programmerge" title="Permanent link">#</a></h3>
<p>PaddlePaddle使用Program来描述计算图,为了同时计算student和teacher两个Program,这里需要将其两者合并(merge)为一个Program。</p>
<p>merge过程操作较多,具体细节请参考<a href="https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/#merge">merge API文档</a></p>
<div class="codehilite"><pre><span></span><span class="n">data_name_map</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;data&#39;</span><span class="p">:</span> <span class="s1">&#39;image&#39;</span><span class="p">}</span>
<span class="n">student_program</span> <span class="o">=</span> <span class="n">merge</span><span class="p">(</span><span class="n">teacher_program</span><span class="p">,</span> <span class="n">student_program</span><span class="p">,</span> <span class="n">data_name_map</span><span class="p">,</span> <span class="n">place</span><span class="p">)</span>
<div class="highlight"><pre><span></span><span class="n">data_name_map</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;data&#39;</span><span class="p">:</span> <span class="s1">&#39;image&#39;</span><span class="p">}</span>
<span class="n">merge</span><span class="p">(</span><span class="n">teacher_program</span><span class="p">,</span> <span class="n">student_program</span><span class="p">,</span> <span class="n">data_name_map</span><span class="p">,</span> <span class="n">place</span><span class="p">)</span>
</pre></div>
<h3 id="5loss">5.添加蒸馏loss<a class="headerlink" href="#5loss" title="Permanent link">#</a></h3>
<p>在添加蒸馏loss的过程中,可能还会引入部分变量(Variable),为了避免命名重复这里可以使用<code>with fluid.name_scope("distill"):</code>为新引入的变量加一个命名作用域。</p>
<p>另外需要注意的是,merge过程为<code>teacher_program</code>的变量统一加了名称前缀,默认是<code>"teacher_"</code>, 这里在添加<code>l2_loss</code>时也要为teacher的变量加上这个前缀。</p>
<div class="codehilite"><pre><span></span><span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">program_guard</span><span class="p">(</span><span class="n">student_program</span><span class="p">,</span> <span class="n">student_startup</span><span class="p">):</span>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">program_guard</span><span class="p">(</span><span class="n">student_program</span><span class="p">,</span> <span class="n">student_startup</span><span class="p">):</span>
<span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s2">&quot;distill&quot;</span><span class="p">):</span>
<span class="n">distill_loss</span> <span class="o">=</span> <span class="n">l2_loss</span><span class="p">(</span><span class="s1">&#39;teacher_bn5c_branch2b.output.1.tmp_3&#39;</span><span class="p">,</span>
<span class="s1">&#39;depthwise_conv2d_11.tmp_0&#39;</span><span class="p">,</span> <span class="n">student_program</span><span class="p">)</span>
......
......@@ -166,7 +166,7 @@
<li>SA搜索</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/nas_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -181,57 +181,57 @@
<h2 id="_2">接口介绍<a class="headerlink" href="#_2" title="Permanent link">#</a></h2>
<p>请参考。</p>
<h3 id="1">1. 配置搜索空间<a class="headerlink" href="#1" title="Permanent link">#</a></h3>
<p>详细的搜索空间配置可以参考<a href="https://paddlepaddle.github.io/PaddleSlim/api/nas_api/">神经网络搜索API文档</a>
<div class="codehilite"><pre><span></span><span class="n">config</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">&#39;MobileNetV2Space&#39;</span><span class="p">)]</span>
<p>详细的搜索空间配置可以参考<a href='../../../paddleslim/nas/nas_api.md'>神经网络搜索API文档</a>
<div class="highlight"><pre><span></span>config = [(&#39;MobileNetV2Space&#39;)]
</pre></div></p>
<h3 id="2-sanas">2. 利用搜索空间初始化SANAS实例<a class="headerlink" href="#2-sanas" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="kn">from</span> <span class="nn">paddleslim.nas</span> <span class="kn">import</span> <span class="n">SANAS</span>
<div class="highlight"><pre><span></span>from paddleslim.nas import SANAS
<span class="n">sa_nas</span> <span class="o">=</span> <span class="n">SANAS</span><span class="p">(</span>
<span class="n">config</span><span class="p">,</span>
<span class="n">server_addr</span><span class="o">=</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="mi">8881</span><span class="p">),</span>
<span class="n">init_temperature</span><span class="o">=</span><span class="mf">10.24</span><span class="p">,</span>
<span class="n">reduce_rate</span><span class="o">=</span><span class="mf">0.85</span><span class="p">,</span>
<span class="n">search_steps</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span>
<span class="n">is_server</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
sa_nas = SANAS(
config,
server_addr=(&quot;&quot;, 8881),
init_temperature=10.24,
reduce_rate=0.85,
search_steps=300,
is_server=True)
</pre></div>
<h3 id="3-nas">3. 根据实例化的NAS得到当前的网络结构<a class="headerlink" href="#3-nas" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">archs</span> <span class="o">=</span> <span class="n">sa_nas</span><span class="p">.</span><span class="n">next_archs</span><span class="p">()</span>
<div class="highlight"><pre><span></span>archs = sa_nas.next_archs()
</pre></div>
<h3 id="4-program">4. 根据得到的网络结构和输入构造训练和测试program<a class="headerlink" href="#4-program" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="kn">import</span> <span class="nn">paddle.fluid</span> <span class="kn">as</span> <span class="nn">fluid</span>
<div class="highlight"><pre><span></span>import paddle.fluid as fluid
<span class="n">train_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="n">test_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
<span class="n">startup_program</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">Program</span><span class="p">()</span>
train_program = fluid.Program()
test_program = fluid.Program()
startup_program = fluid.Program()
<span class="k">with</span> <span class="n">fluid</span><span class="o">.</span><span class="n">program_guard</span><span class="p">(</span><span class="n">train_program</span><span class="p">,</span> <span class="n">startup_program</span><span class="p">):</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">label</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;label&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;int64&#39;</span><span class="p">)</span>
<span class="k">for</span> <span class="n">arch</span> <span class="ow">in</span> <span class="n">archs</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">arch</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">output</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">softmax_out</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">output</span><span class="p">,</span> <span class="n">use_cudnn</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">cost</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">cross_entropy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">softmax_out</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
<span class="n">avg_cost</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">cost</span><span class="p">)</span>
<span class="n">acc_top1</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">accuracy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">softmax_out</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
with fluid.program_guard(train_program, startup_program):
data = fluid.data(name=&#39;data&#39;, shape=[None, 3, 32, 32], dtype=&#39;float32&#39;)
label = fluid.data(name=&#39;label&#39;, shape=[None, 1], dtype=&#39;int64&#39;)
for arch in archs:
data = arch(data)
output = fluid.layers.fc(data, 10)
softmax_out = fluid.layers.softmax(input=output, use_cudnn=False)
cost = fluid.layers.cross_entropy(input=softmax_out, label=label)
avg_cost = fluid.layers.mean(cost)
acc_top1 = fluid.layers.accuracy(input=softmax_out, label=label, k=1)
<span class="n">test_program</span> <span class="o">=</span> <span class="n">train_program</span><span class="o">.</span><span class="n">clone</span><span class="p">(</span><span class="n">for_test</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">sgd</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">SGD</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">)</span>
<span class="n">sgd</span><span class="o">.</span><span class="n">minimize</span><span class="p">(</span><span class="n">avg_cost</span><span class="p">)</span>
test_program = train_program.clone(for_test=True)
sgd = fluid.optimizer.SGD(learning_rate=1e-3)
sgd.minimize(avg_cost)
</pre></div>
<h3 id="5-program">5. 根据构造的训练program添加限制条件<a class="headerlink" href="#5-program" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="kn">from</span> <span class="nn">paddleslim.analysis</span> <span class="kn">import</span> <span class="n">flops</span>
<div class="highlight"><pre><span></span>from paddleslim.analysis import flops
<span class="k">if</span> <span class="n">flops</span><span class="p">(</span><span class="n">train_program</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">321208544</span><span class="p">:</span>
<span class="k">continue</span>
if flops(train_program) &gt; 321208544:
continue
</pre></div>
<h3 id="6-score">6. 回传score<a class="headerlink" href="#6-score" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">sa_nas</span><span class="p">.</span><span class="n">reward</span><span class="p">(</span><span class="n">score</span><span class="p">)</span>
<div class="highlight"><pre><span></span>sa_nas.reward(score)
</pre></div>
</div>
......
......@@ -150,7 +150,7 @@
<li>卷积通道剪裁示例</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/pruning_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/pruning_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -173,15 +173,15 @@
<p>该示例使用了<code>paddleslim.Pruner</code>工具类,用户接口使用介绍请参考:<a href="https://paddlepaddle.github.io/PaddleSlim/api/prune_api/">API文档</a></p>
<h2 id="_3">确定待裁参数<a class="headerlink" href="#_3" title="Permanent link">#</a></h2>
<p>不同模型的参数命名不同,在剪裁前需要确定待裁卷积层的参数名称。可通过以下方法列出所有参数名:</p>
<div class="codehilite"><pre><span></span><span class="k">for</span> <span class="nv">param</span> <span class="nv">in</span> <span class="nv">program</span>.<span class="nv">global_block</span><span class="ss">()</span>.<span class="nv">all_parameters</span><span class="ss">()</span>:
<span class="nv">print</span><span class="ss">(</span><span class="s2">&quot;</span><span class="s">param name: {}; shape: {}</span><span class="s2">&quot;</span>.<span class="nv">format</span><span class="ss">(</span><span class="nv">param</span>.<span class="nv">name</span>, <span class="nv">param</span>.<span class="nv">shape</span><span class="ss">))</span>
<div class="highlight"><pre><span></span>for param in program.global_block().all_parameters():
print(&quot;param name: {}; shape: {}&quot;.format(param.name, param.shape))
</pre></div>
<p><code>train.py</code>脚本中,提供了<code>get_pruned_params</code>方法,根据用户设置的选项<code>--model</code>确定要裁剪的参数。</p>
<h2 id="_4">启动裁剪任务<a class="headerlink" href="#_4" title="Permanent link">#</a></h2>
<p>通过以下命令启动裁剪任务:</p>
<div class="codehilite"><pre><span></span><span class="n">export</span> <span class="n">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="mi">0</span>
<span class="n">python</span> <span class="n">train</span><span class="p">.</span><span class="n">py</span>
<div class="highlight"><pre><span></span>export CUDA_VISIBLE_DEVICES=0
python train.py
</pre></div>
<p>执行<code>python train.py --help</code>查看更多选项。</p>
......
......@@ -168,7 +168,7 @@
<li>量化训练</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/quant_aware_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/quant_aware_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -181,64 +181,64 @@
<h1 id="_1">在线量化示例<a class="headerlink" href="#_1" title="Permanent link">#</a></h1>
<p>本示例介绍如何使用在线量化接口,来对训练好的分类模型进行量化, 可以减少模型的存储空间和显存占用。</p>
<h2 id="_2">接口介绍<a class="headerlink" href="#_2" title="Permanent link">#</a></h2>
<p>请参考 <a href="https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/">量化API文档</a></p>
<p>请参考 <a href='../../../paddleslim/quant/quantization_api_doc.md'>量化API文档</a></p>
<h2 id="_3">分类模型的离线量化流程<a class="headerlink" href="#_3" title="Permanent link">#</a></h2>
<h3 id="1">1. 配置量化参数<a class="headerlink" href="#1" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">quant_config</span> <span class="o">=</span> <span class="err">{</span>
<span class="s1">&#39;weight_quantize_type&#39;</span><span class="p">:</span> <span class="s1">&#39;abs_max&#39;</span><span class="p">,</span>
<span class="s1">&#39;activation_quantize_type&#39;</span><span class="p">:</span> <span class="s1">&#39;moving_average_abs_max&#39;</span><span class="p">,</span>
<span class="s1">&#39;weight_bits&#39;</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span>
<span class="s1">&#39;activation_bits&#39;</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span>
<span class="s1">&#39;not_quant_pattern&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;skip_quant&#39;</span><span class="p">],</span>
<span class="s1">&#39;quantize_op_types&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;conv2d&#39;</span><span class="p">,</span> <span class="s1">&#39;depthwise_conv2d&#39;</span><span class="p">,</span> <span class="s1">&#39;mul&#39;</span><span class="p">],</span>
<span class="s1">&#39;dtype&#39;</span><span class="p">:</span> <span class="s1">&#39;int8&#39;</span><span class="p">,</span>
<span class="s1">&#39;window_size&#39;</span><span class="p">:</span> <span class="mi">10000</span><span class="p">,</span>
<span class="s1">&#39;moving_rate&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9</span><span class="p">,</span>
<span class="s1">&#39;quant_weight_only&#39;</span><span class="p">:</span> <span class="k">False</span>
<span class="err">}</span>
<div class="highlight"><pre><span></span>quant_config = {
&#39;weight_quantize_type&#39;: &#39;abs_max&#39;,
&#39;activation_quantize_type&#39;: &#39;moving_average_abs_max&#39;,
&#39;weight_bits&#39;: 8,
&#39;activation_bits&#39;: 8,
&#39;not_quant_pattern&#39;: [&#39;skip_quant&#39;],
&#39;quantize_op_types&#39;: [&#39;conv2d&#39;, &#39;depthwise_conv2d&#39;, &#39;mul&#39;],
&#39;dtype&#39;: &#39;int8&#39;,
&#39;window_size&#39;: 10000,
&#39;moving_rate&#39;: 0.9,
&#39;quant_weight_only&#39;: False
}
</pre></div>
<h3 id="2-programop">2. 对训练和测试program插入可训练量化op<a class="headerlink" href="#2-programop" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">val_program</span> <span class="o">=</span> <span class="n">quant_aware</span><span class="p">(</span><span class="n">val_program</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">quant_config</span><span class="p">,</span> <span class="k">scope</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="k">True</span><span class="p">)</span>
<div class="highlight"><pre><span></span>val_program = quant_aware(val_program, place, quant_config, scope=None, for_test=True)
<span class="n">compiled_train_prog</span> <span class="o">=</span> <span class="n">quant_aware</span><span class="p">(</span><span class="n">train_prog</span><span class="p">,</span> <span class="n">place</span><span class="p">,</span> <span class="n">quant_config</span><span class="p">,</span> <span class="k">scope</span><span class="o">=</span><span class="k">None</span><span class="p">,</span> <span class="n">for_test</span><span class="o">=</span><span class="k">False</span><span class="p">)</span>
compiled_train_prog = quant_aware(train_prog, place, quant_config, scope=None, for_test=False)
</pre></div>
<h3 id="3build">3.关掉指定build策略<a class="headerlink" href="#3build" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">build_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="p">.</span><span class="n">BuildStrategy</span><span class="p">()</span>
<span class="n">build_strategy</span><span class="p">.</span><span class="n">fuse_all_reduce_ops</span> <span class="o">=</span> <span class="k">False</span>
<span class="n">build_strategy</span><span class="p">.</span><span class="n">sync_batch_norm</span> <span class="o">=</span> <span class="k">False</span>
<span class="n">exec_strategy</span> <span class="o">=</span> <span class="n">fluid</span><span class="p">.</span><span class="n">ExecutionStrategy</span><span class="p">()</span>
<span class="n">compiled_train_prog</span> <span class="o">=</span> <span class="n">compiled_train_prog</span><span class="p">.</span><span class="n">with_data_parallel</span><span class="p">(</span>
<span class="n">loss_name</span><span class="o">=</span><span class="n">avg_cost</span><span class="p">.</span><span class="n">name</span><span class="p">,</span>
<span class="n">build_strategy</span><span class="o">=</span><span class="n">build_strategy</span><span class="p">,</span>
<span class="n">exec_strategy</span><span class="o">=</span><span class="n">exec_strategy</span><span class="p">)</span>
<div class="highlight"><pre><span></span>build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
exec_strategy = fluid.ExecutionStrategy()
compiled_train_prog = compiled_train_prog.with_data_parallel(
loss_name=avg_cost.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
</pre></div>
<h3 id="4-freeze-program">4. freeze program<a class="headerlink" href="#4-freeze-program" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="n">float_program</span><span class="p">,</span> <span class="n">int8_program</span> <span class="o">=</span> <span class="k">convert</span><span class="p">(</span><span class="n">val_program</span><span class="p">,</span>
<span class="n">place</span><span class="p">,</span>
<span class="n">quant_config</span><span class="p">,</span>
<span class="k">scope</span><span class="o">=</span><span class="k">None</span><span class="p">,</span>
<span class="n">save_int8</span><span class="o">=</span><span class="k">True</span><span class="p">)</span>
<div class="highlight"><pre><span></span>float_program, int8_program = convert(val_program,
place,
quant_config,
scope=None,
save_int8=True)
</pre></div>
<h3 id="5">5.保存预测模型<a class="headerlink" href="#5" title="Permanent link">#</a></h3>
<div class="codehilite"><pre><span></span><span class="nv">fluid</span>.<span class="nv">io</span>.<span class="nv">save_inference_model</span><span class="ss">(</span>
<span class="k">dirname</span><span class="o">=</span><span class="nv">float_path</span>,
<span class="nv">feeded_var_names</span><span class="o">=</span>[<span class="nv">image</span>.<span class="nv">name</span>],
<span class="nv">target_vars</span><span class="o">=</span>[<span class="nv">out</span>], <span class="nv">executor</span><span class="o">=</span><span class="nv">exe</span>,
<span class="nv">main_program</span><span class="o">=</span><span class="nv">float_program</span>,
<span class="nv">model_filename</span><span class="o">=</span><span class="nv">float_path</span> <span class="o">+</span> <span class="s1">&#39;</span><span class="s">/model</span><span class="s1">&#39;</span>,
<span class="nv">params_filename</span><span class="o">=</span><span class="nv">float_path</span> <span class="o">+</span> <span class="s1">&#39;</span><span class="s">/params</span><span class="s1">&#39;</span><span class="ss">)</span>
<div class="highlight"><pre><span></span>fluid.io.save_inference_model(
dirname=float_path,
feeded_var_names=[image.name],
target_vars=[out], executor=exe,
main_program=float_program,
model_filename=float_path + &#39;/model&#39;,
params_filename=float_path + &#39;/params&#39;)
<span class="nv">fluid</span>.<span class="nv">io</span>.<span class="nv">save_inference_model</span><span class="ss">(</span>
<span class="k">dirname</span><span class="o">=</span><span class="nv">int8_path</span>,
<span class="nv">feeded_var_names</span><span class="o">=</span>[<span class="nv">image</span>.<span class="nv">name</span>],
<span class="nv">target_vars</span><span class="o">=</span>[<span class="nv">out</span>], <span class="nv">executor</span><span class="o">=</span><span class="nv">exe</span>,
<span class="nv">main_program</span><span class="o">=</span><span class="nv">int8_program</span>,
<span class="nv">model_filename</span><span class="o">=</span><span class="nv">int8_path</span> <span class="o">+</span> <span class="s1">&#39;</span><span class="s">/model</span><span class="s1">&#39;</span>,
<span class="nv">params_filename</span><span class="o">=</span><span class="nv">int8_path</span> <span class="o">+</span> <span class="s1">&#39;</span><span class="s">/params</span><span class="s1">&#39;</span><span class="ss">)</span>
fluid.io.save_inference_model(
dirname=int8_path,
feeded_var_names=[image.name],
target_vars=[out], executor=exe,
main_program=int8_program,
model_filename=int8_path + &#39;/model&#39;,
params_filename=int8_path + &#39;/params&#39;)
</pre></div>
</div>
......
......@@ -168,7 +168,7 @@
<li>离线量化</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/quant_post_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/quant_post_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -181,7 +181,7 @@
<h1 id="_1">离线量化示例<a class="headerlink" href="#_1" title="Permanent link">#</a></h1>
<p>本示例介绍如何使用离线量化接口<code>paddleslim.quant.quant_post</code>来对训练好的分类模型进行离线量化, 该接口无需对模型进行训练就可得到量化模型,减少模型的存储空间和显存占用。</p>
<h2 id="_2">接口介绍<a class="headerlink" href="#_2" title="Permanent link">#</a></h2>
<p>请参考 <a href="https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/">量化API文档</a></p>
<p>请参考 <a href='../../../paddleslim/quant/quantization_api_doc.md'>量化API文档</a></p>
<h2 id="_3">分类模型的离线量化流程<a class="headerlink" href="#_3" title="Permanent link">#</a></h2>
<h3 id="_4">准备数据<a class="headerlink" href="#_4" title="Permanent link">#</a></h3>
<p>在当前文件夹下创建<code>data</code>文件夹,将<code>imagenet</code>数据集解压在<code>data</code>文件夹下,解压后<code>data</code>文件夹下应包含以下文件:
......@@ -195,12 +195,12 @@
<p>在当前文件夹下创建<code>'pretrain'</code>文件夹,将<code>mobilenetv1</code>模型在该文件夹下解压,解压后的目录为<code>pretrain/MobileNetV1_pretrained</code></p>
<h3 id="_6">导出模型<a class="headerlink" href="#_6" title="Permanent link">#</a></h3>
<p>通过运行以下命令可将模型转化为离线量化接口可用的模型:
<div class="codehilite"><pre><span></span><span class="n">python</span> <span class="n">export_model</span><span class="p">.</span><span class="n">py</span> <span class="c1">--model &quot;MobileNet&quot; --pretrained_model ./pretrain/MobileNetV1_pretrained --data imagenet</span>
<div class="highlight"><pre><span></span>python export_model.py --model &quot;MobileNet&quot; --pretrained_model ./pretrain/MobileNetV1_pretrained --data imagenet
</pre></div>
转化之后的模型存储在<code>inference_model/MobileNet/</code>文件夹下,可看到该文件夹下有<code>'model'</code>, <code>'weights'</code>两个文件。</p>
<h3 id="_7">离线量化<a class="headerlink" href="#_7" title="Permanent link">#</a></h3>
<p>接下来对导出的模型文件进行离线量化,离线量化的脚本为<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/quant/quant_post/quant_post.py">quant_post.py</a>,脚本中使用接口<code>paddleslim.quant.quant_post</code>对模型进行离线量化。运行命令为:
<div class="codehilite"><pre><span></span><span class="n">python</span> <span class="n">quant_post</span><span class="p">.</span><span class="n">py</span> <span class="c1">--model_path ./inference_model/MobileNet --save_path ./quant_model_train/MobileNet --model_filename model --params_filename weights</span>
<p>接下来对导出的模型文件进行离线量化,离线量化的脚本为<a href="./quant_post.py">quant_post.py</a>,脚本中使用接口<code>paddleslim.quant.quant_post</code>对模型进行离线量化。运行命令为:
<div class="highlight"><pre><span></span>python quant_post.py --model_path ./inference_model/MobileNet --save_path ./quant_model_train/MobileNet --model_filename model --params_filename weights
</pre></div></p>
<ul>
<li><code>model_path</code>: 需要量化的模型坐在的文件夹</li>
......@@ -213,19 +213,19 @@
<p>使用的量化算法为<code>'KL'</code>, 使用训练集中的160张图片进行量化参数的校正。</p>
</blockquote>
<h3 id="_8">测试精度<a class="headerlink" href="#_8" title="Permanent link">#</a></h3>
<p>使用<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/quant/quant_post/eval.py">eval.py</a>脚本对量化前后的模型进行测试,得到模型的分类精度进行对比。</p>
<p>使用<a href="./eval.py">eval.py</a>脚本对量化前后的模型进行测试,得到模型的分类精度进行对比。</p>
<p>首先测试量化前的模型的精度,运行以下命令:
<div class="codehilite"><pre><span></span><span class="n">python</span> <span class="n">eval</span><span class="p">.</span><span class="n">py</span> <span class="c1">--model_path ./inference_model/MobileNet --model_name model --params_name weights</span>
<div class="highlight"><pre><span></span>python eval.py --model_path ./inference_model/MobileNet --model_name model --params_name weights
</pre></div>
精度输出为:
<div class="codehilite"><pre><span></span><span class="n">top1_acc</span><span class="o">/</span><span class="n">top5_acc</span><span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">70913923</span> <span class="mi">0</span><span class="p">.</span><span class="mi">89548034</span><span class="p">]</span>
<div class="highlight"><pre><span></span>top1_acc/top5_acc= [0.70913923 0.89548034]
</pre></div></p>
<p>使用以下命令测试离线量化后的模型的精度:</p>
<div class="codehilite"><pre><span></span><span class="n">python</span> <span class="n">eval</span><span class="p">.</span><span class="n">py</span> <span class="c1">--model_path ./quant_model_train/MobileNet</span>
<div class="highlight"><pre><span></span>python eval.py --model_path ./quant_model_train/MobileNet
</pre></div>
<p>精度输出为
<div class="codehilite"><pre><span></span><span class="n">top1_acc</span><span class="o">/</span><span class="n">top5_acc</span><span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">70141864</span> <span class="mi">0</span><span class="p">.</span><span class="mi">89086477</span><span class="p">]</span>
<div class="highlight"><pre><span></span>top1_acc/top5_acc= [0.70141864 0.89086477]
</pre></div>
从以上精度对比可以看出,对<code>mobilenet</code><code>imagenet</code>上的分类模型进行离线量化后 <code>top1</code>精度损失为<code>0.77%</code><code>top5</code>精度损失为<code>0.46%</code>. </p>
......
......@@ -150,7 +150,7 @@
<li>Sensitivity demo</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/sensitivity_demo.md"
<a href="https://github.com/PaddlePaddle/PaddleSlim/edit/master/docs/tutorials/sensitivity_demo.md"
class="icon icon-github"> Edit on GitHub</a>
</li>
......@@ -176,8 +176,8 @@
</ul>
<h2 id="2">2. 运行示例<a class="headerlink" href="#2" title="Permanent link">#</a></h2>
<p>在路径<code>PaddleSlim/demo/sensitive</code>下执行以下代码运行示例:</p>
<div class="codehilite"><pre><span></span><span class="n">export</span> <span class="n">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="mi">0</span>
<span class="n">python</span> <span class="n">train</span><span class="p">.</span><span class="n">py</span> <span class="c1">--model &quot;MobileNetV1&quot;</span>
<div class="highlight"><pre><span></span>export CUDA_VISIBLE_DEVICES=0
python train.py --model &quot;MobileNetV1&quot;
</pre></div>
<p>通过<code>python train.py --help</code>查看更多选项。</p>
......@@ -187,34 +187,34 @@
<p>调用<code>paddleslim.prune.sensitivity</code>接口计算敏感度。敏感度信息会追加到<code>sensitivities_file</code>选项所指定的文件中,如果需要重新计算敏感度,需要先删除<code>sensitivities_file</code>文件。</p>
<p>如果模型评估速度较慢,可以通过多进程的方式加速敏感度计算过程。比如在进程1中设置<code>pruned_ratios=[0.1, 0.2, 0.3, 0.4]</code>,并将敏感度信息存放在文件<code>sensitivities_0.data</code>中,然后在进程2中设置<code>pruned_ratios=[0.5, 0.6, 0.7]</code>,并将敏感度信息存储在文件<code>sensitivities_1.data</code>中。这样每个进程只会计算指定剪切率下的敏感度信息。多进程可以运行在单机多卡,或多机多卡。</p>
<p>代码如下:</p>
<div class="codehilite"><pre><span></span><span class="o">#</span> <span class="err">进程</span><span class="mi">1</span>
<span class="n">sensitivity</span><span class="p">(</span>
<span class="n">val_program</span><span class="p">,</span>
<span class="n">place</span><span class="p">,</span>
<span class="n">params</span><span class="p">,</span>
<span class="n">test</span><span class="p">,</span>
<span class="n">sensitivities_file</span><span class="o">=</span><span class="ss">&quot;sensitivities_0.data&quot;</span><span class="p">,</span>
<span class="n">pruned_ratios</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">4</span><span class="p">])</span>
<div class="highlight"><pre><span></span># 进程1
sensitivity(
val_program,
place,
params,
test,
sensitivities_file=&quot;sensitivities_0.data&quot;,
pruned_ratios=[0.1, 0.2, 0.3, 0.4])
</pre></div>
<div class="codehilite"><pre><span></span><span class="o">#</span> <span class="err">进程</span><span class="mi">2</span>
<span class="n">sensitivity</span><span class="p">(</span>
<span class="n">val_program</span><span class="p">,</span>
<span class="n">place</span><span class="p">,</span>
<span class="n">params</span><span class="p">,</span>
<span class="n">test</span><span class="p">,</span>
<span class="n">sensitivities_file</span><span class="o">=</span><span class="ss">&quot;sensitivities_1.data&quot;</span><span class="p">,</span>
<span class="n">pruned_ratios</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">6</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">7</span><span class="p">])</span>
<div class="highlight"><pre><span></span># 进程2
sensitivity(
val_program,
place,
params,
test,
sensitivities_file=&quot;sensitivities_1.data&quot;,
pruned_ratios=[0.5, 0.6, 0.7])
</pre></div>
<h3 id="32">3.2 合并敏感度<a class="headerlink" href="#32" title="Permanent link">#</a></h3>
<p>如果用户通过上一节多进程的方式生成了多个存储敏感度信息的文件,可以通过<code>paddleslim.prune.merge_sensitive</code>将其合并,合并后的敏感度信息存储在一个<code>dict</code>中。代码如下:</p>
<div class="codehilite"><pre><span></span><span class="n">sens</span> <span class="o">=</span> <span class="n">merge_sensitive</span><span class="p">([</span><span class="ss">&quot;./sensitivities_0.data&quot;</span><span class="p">,</span> <span class="ss">&quot;./sensitivities_1.data&quot;</span><span class="p">])</span>
<div class="highlight"><pre><span></span>sens = merge_sensitive([&quot;./sensitivities_0.data&quot;, &quot;./sensitivities_1.data&quot;])
</pre></div>
<h3 id="33">3.3 计算剪裁率<a class="headerlink" href="#33" title="Permanent link">#</a></h3>
<p>调用<code>paddleslim.prune.get_ratios_by_loss</code>接口计算一组剪裁率。</p>
<div class="codehilite"><pre><span></span><span class="n">ratios</span> <span class="o">=</span> <span class="n">get_ratios_by_loss</span><span class="p">(</span><span class="n">sens</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">01</span><span class="p">)</span>
<div class="highlight"><pre><span></span>ratios = get_ratios_by_loss(sens, 0.01)
</pre></div>
<p>其中,<code>0.01</code>为一个阈值,对于任意卷积层,其剪裁率为使精度损失低于阈值<code>0.01</code>的最大剪裁率。</p>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册