第二种是基于FSP的蒸馏方法(参考论文:[A Gift from Knowledge Distillation:
Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf))
第二种是基于FSP的蒸馏方法(参考论文:[A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf))
# Training-aware Quantization of image classification model - quick start
This tutorial shows how to do training-aware quantization using [API](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/quantization_api.md) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
This tutorial shows how to do training-aware quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_aware) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
1. Necessary imports
2. Model architecture
...
...
@@ -89,7 +89,7 @@ test(val_program)
## 4. Quantization
We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#_1).
We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#id2).
The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``.
The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``.
# Post-training Quantization of image classification model - quick start
This tutorial shows how to do post training quantization using [API](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/quantization_api.md) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
This tutorial shows how to do post training quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_post) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
第二种是基于FSP的蒸馏方法(参考论文:<ahref="#id23"><spanclass="problematic"id="id24">`</span></a>A Gift from Knowledge Distillation:</p>
第二种是基于FSP的蒸馏方法(参考论文:<aclass="reference external"href="http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf">A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning</a>)
<dt>Fast Optimization, Network Minimization and Transfer Learning <<aclass="reference external"href="http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf">http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf</a>>`_)</dt>
<li><aclass="reference external"href="https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf">High-Performance Hardware for Machine Learning</a></li>
<li><aclass="reference external"href="https://arxiv.org/pdf/1806.08342.pdf">Quantizing deep convolutional networks for efficient inference: A whitepaper</a></li>
<liclass="toctree-l3"><aclass="reference internal"href="#search-space-which-paddleslim-nas-provided">search space which paddleslim.nas provided</a></li>
<liclass="toctree-l3"><aclass="reference internal"href="#search-space-which-paddleslim-nas-provided">search space which paddleslim.nas provided</a><ul>
<liclass="toctree-l4"><aclass="reference internal"href="#based-on-origin-model-architecture">Based on origin model architecture:</a></li>
<liclass="toctree-l4"><aclass="reference internal"href="#based-on-block-from-different-model">Based on block from different model:</a></li>
</ul>
</li>
<liclass="toctree-l3"><aclass="reference internal"href="#how-to-use-search-space">How to use search space</a></li>
<liclass="toctree-l3"><aclass="reference internal"href="#how-to-write-yourself-search-space">How to write yourself search space</a></li>
<p>Search Space used in neural architecture search. Search Space is a collection of model architecture, the purpose of SANAS is to get a model which FLOPs or latency is smaller or percision is higher.</p>
<h2>search space which paddleslim.nas provided<aclass="headerlink"href="#search-space-which-paddleslim-nas-provided"title="Permalink to this headline">¶</a></h2>
<p>Based on origin model architecture:
1. MobileNetV2Space<br>
&emsp; MobileNetV2’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L29">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L29</a>), [paper](<aclass="reference external"href="https://arxiv.org/abs/1801.04381">https://arxiv.org/abs/1801.04381</a>)</p>
<p>2. MobileNetV1Space<br>
&emsp; MobilNetV1’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L29">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L29</a>), [paper](<aclass="reference external"href="https://arxiv.org/abs/1704.04861">https://arxiv.org/abs/1704.04861</a>)</p>
<p>3. ResNetSpace<br>
&emsp; ResNetSpace’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L30">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L30</a>), [paper](<aclass="reference external"href="https://arxiv.org/pdf/1512.03385.pdf">https://arxiv.org/pdf/1512.03385.pdf</a>)</p>
<p>Based on block from different model:
1. MobileNetV1BlockSpace<br>
&emsp; MobileNetV1Block’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L173">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L173</a>)</p>
<p>2. MobileNetV2BlockSpace<br>
&emsp; MobileNetV2Block’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L174">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L174</a>)</p>
<p>3. ResNetBlockSpace<br>
&emsp; ResNetBlock’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L148">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L148</a>)</p>
<p>4. InceptionABlockSpace<br>
&emsp; InceptionABlock’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L140">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L140</a>)</p>
<p>5. InceptionCBlockSpace<br>
&emsp; InceptionCBlock’s architecture can reference: [code](<aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L291">https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L291</a>)</p>
&emsp; InceptionCBlock’s architecture can reference: <aclass="reference external"href="https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L291">code</a></li>
</ol>
</div>
</div>
<divclass="section"id="how-to-use-search-space">
<h2>How to use search space<aclass="headerlink"href="#how-to-use-search-space"title="Permalink to this headline">¶</a></h2>
<olclass="arabic simple">
<li>Only need to specify the name of search space if use the space based on origin model architecture, such as configs for class SANAS is [(‘MobileNetV2Space’)] if you want to use origin MobileNetV2 as search space.</li>
<li>Use search space paddleslim.nas provided based on block:<br></li>
<li>Use search space paddleslim.nas provided based on block:<spanclass="raw-html-m2r"><br></span>
2.1 Use <codeclass="docutils literal"><spanclass="pre">input_size</span></code>, <codeclass="docutils literal"><spanclass="pre">output_size</span></code> and <codeclass="docutils literal"><spanclass="pre">block_num</span></code> to construct search space, such as configs for class SANAS is (‘MobileNetV2BlockSpace’, {‘input_size’: 224, ‘output_size’: 32, ‘block_num’: 10})].<spanclass="raw-html-m2r"><br></span>
2.2 Use <codeclass="docutils literal"><spanclass="pre">block_mask</span></code> to construct search space, such as configs for class SANAS is [(‘MobileNetV2BlockSpace’, {‘block_mask’: [0, 1, 1, 1, 1, 0, 1, 0]})].</li>
</ol>
<blockquote>
<div>2.1 Use <cite>input_size</cite>, <cite>output_size</cite> and <cite>block_num</cite> to construct search space, such as configs for class SANAS is (‘MobileNetV2BlockSpace’, {‘input_size’: 224, ‘output_size’: 32, ‘block_num’: 10})].<br>
2.2 Use <cite>block_mask</cite> to construct search space, such as configs for class SANAS is [(‘MobileNetV2BlockSpace’, {‘block_mask’: [0, 1, 1, 1, 1, 0, 1, 0]})].</div></blockquote>
<h2>How to write yourself search space<aclass="headerlink"href="#how-to-write-yourself-search-space"title="Permalink to this headline">¶</a></h2>
<p>If you want to write yourself search space, you need to inherit base class named SearchSpaceBase and overwrite following functions:<br>
&emsp; 1. Function to get initial tokens(function <cite>init_tokens</cite>), set the initial tokens which you want, every token in tokens means index of search list, such as if tokens=[0, 3, 5], it means the list of channel of current model architecture is [8, 40, 128].
&emsp; 2. Function about the length of every token in tokens(function <cite>range_table</cite>), range of every token in tokens.
&emsp; 3. Function to get model architecture according to tokens(function <cite>token2arch</cite>), get model architecture according to tokens in the search process.</p>
<p>If you want to write yourself search space, you need to inherit base class named SearchSpaceBase and overwrite following functions:<spanclass="raw-html-m2r"><br></span>
&emsp; 1. Function to get initial tokens(function <codeclass="docutils literal"><spanclass="pre">init_tokens</span></code>), set the initial tokens which you want, every token in tokens means index of search list, such as if tokens=[0, 3, 5], it means the list of channel of current model architecture is [8, 40, 128].
&emsp; 2. Function about the length of every token in tokens(function <codeclass="docutils literal"><spanclass="pre">range_table</span></code>), range of every token in tokens.
&emsp; 3. Function to get model architecture according to tokens(function <codeclass="docutils literal"><spanclass="pre">token2arch</span></code>), get model architecture according to tokens in the search process.</p>
<p>For example, how to add a search space with resnet block. New search space can NOT has the same name with existing search space.</p>
<h1>Training-aware Quantization of image classification model - quick start<aclass="headerlink"href="#training-aware-quantization-of-image-classification-model-quick-start"title="Permalink to this headline">¶</a></h1>
<p>This tutorial shows how to do training-aware quantization using <aclass="reference external"href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/quantization_api.md">API</a> in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:</p>
<p>This tutorial shows how to do training-aware quantization using <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_aware">API</a> in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:</p>
<olclass="arabic simple">
<li>Necessary imports</li>
<li>Model architecture</li>
...
...
@@ -235,7 +235,7 @@
</div>
<divclass="section"id="quantization">
<h2>4. Quantization<aclass="headerlink"href="#quantization"title="Permalink to this headline">¶</a></h2>
<p>We call <codeclass="docutils literal"><spanclass="pre">quant_aware</span></code> API to add quantization and dequantization operators in <codeclass="docutils literal"><spanclass="pre">train_program</span></code> and <codeclass="docutils literal"><spanclass="pre">val_program</span></code> according to <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#_1">default configuration</a>.</p>
<p>We call <codeclass="docutils literal"><spanclass="pre">quant_aware</span></code> API to add quantization and dequantization operators in <codeclass="docutils literal"><spanclass="pre">train_program</span></code> and <codeclass="docutils literal"><spanclass="pre">val_program</span></code> according to <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#id2">default configuration</a>.</p>
<h2>6. Save model after quantization<aclass="headerlink"href="#save-model-after-quantization"title="Permalink to this headline">¶</a></h2>
<p>The model in <codeclass="docutils literal"><spanclass="pre">4.</span><spanclass="pre">Quantization</span></code> after calling <codeclass="docutils literal"><spanclass="pre">slim.quant.quant_aware</span></code> API is only suitable to train. To get the inference model, we should use <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#convert">slim.quant.convert</a> API to change model architecture and use <aclass="reference external"href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model">fluid.io.save_inference_model</a> to save model. <codeclass="docutils literal"><spanclass="pre">float_prog</span></code>‘s parameters are float32 dtype but in int8’s range which can be used in <codeclass="docutils literal"><spanclass="pre">fluid</span></code> or <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code>. <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code> will change the parameters’ dtype from float32 to int8 first when loading the inference model. <codeclass="docutils literal"><spanclass="pre">int8_prog</span></code>‘s parameters are int8 dtype and we can get model size after quantization by saving it. <codeclass="docutils literal"><spanclass="pre">int8_prog</span></code> cannot be used in <codeclass="docutils literal"><spanclass="pre">fluid</span></code> or <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code>.</p>
<p>The model in <codeclass="docutils literal"><spanclass="pre">4.</span><spanclass="pre">Quantization</span></code> after calling <codeclass="docutils literal"><spanclass="pre">slim.quant.quant_aware</span></code> API is only suitable to train. To get the inference model, we should use <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.convert">slim.quant.convert</a> API to change model architecture and use <aclass="reference external"href="https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model">fluid.io.save_inference_model</a> to save model. <codeclass="docutils literal"><spanclass="pre">float_prog</span></code>‘s parameters are float32 dtype but in int8’s range which can be used in <codeclass="docutils literal"><spanclass="pre">fluid</span></code> or <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code>. <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code> will change the parameters’ dtype from float32 to int8 first when loading the inference model. <codeclass="docutils literal"><spanclass="pre">int8_prog</span></code>‘s parameters are int8 dtype and we can get model size after quantization by saving it. <codeclass="docutils literal"><spanclass="pre">int8_prog</span></code> cannot be used in <codeclass="docutils literal"><spanclass="pre">fluid</span></code> or <codeclass="docutils literal"><spanclass="pre">paddle-lite</span></code>.</p>
<h1>Post-training Quantization of image classification model - quick start<aclass="headerlink"href="#post-training-quantization-of-image-classification-model-quick-start"title="Permalink to this headline">¶</a></h1>
<p>This tutorial shows how to do post training quantization using <aclass="reference external"href="https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/quantization_api.md">API</a> in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:</p>
<p>This tutorial shows how to do post training quantization using <aclass="reference external"href="https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_post">API</a> in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:</p>