Merge pull request #156 from littletomatodonkey/fix_qz

add paddle lite and serving en doc

Merge pull request #156 from littletomatodonkey/fix_qz
add paddle lite and serving en doc
e294a5b8 · dyning · GitHub · d4684216 · b6229419 · e294a5b8
Showing with 375 addition and 0 deletion

docs/en/extension/paddle_inference_en.md docs/en/extension/paddle_inference_en.md +261 -0

docs/en/extension/paddle_mobile_inference_en.md docs/en/extension/paddle_mobile_inference_en.md +114 -0

未找到文件。
--- a/docs/en/extension/paddle_inference_en.md
+++ b/docs/en/extension/paddle_inference_en.md
+# Classification Framework
+
+## I. Introduction
+
+Models for Paddle are stored in many different forms, which can be roughly divided into two categories：
+1. persistable model（the models saved by fluid.save_persistables）
+    The weights are saved in checkpoint, which can be loaded to retrain, one scattered weight file saved by persistable stands for one persistable variable in the model, there is no structure information in these variable, so the weights should be used with the model structure.
+    ```
+    resnet50-vd-persistable/
+    ├── bn2a_branch1_mean
+    ├── bn2a_branch1_offset
+    ├── bn2a_branch1_scale
+    ├── bn2a_branch1_variance
+    ├── bn2a_branch2a_mean
+    ├── bn2a_branch2a_offset
+    ├── bn2a_branch2a_scale
+    ├── ...
+    └── res5c_branch2c_weights
+    ```
+2. inference model（the models saved by fluid.io.save_inference_model）
+    The model saved by this function cam be used for inference directly, compared with the ones saved by persistable, the model structure will be additionally saved in the model, with the weights, the model with trained weights can be reconstruction. as shown in the following figure, the structure information is saved in `model`
+    ```
+    resnet50-vd-persistable/
+    ├── bn2a_branch1_mean
+    ├── bn2a_branch1_offset
+    ├── bn2a_branch1_scale
+    ├── bn2a_branch1_variance
+    ├── bn2a_branch2a_mean
+    ├── bn2a_branch2a_offset
+    ├── bn2a_branch2a_scale
+    ├── ...
+    ├── res5c_branch2c_weights
+    └── model
+    ```
+    For convenience, all weight files will be saved into a `params` file when saving the inference model on Paddle, as shown below：
+    ```
+    resnet50-vd
+    ├── model
+    └── params
+    ```
+
+Both the training engine and the prediction engine in Paddle support the model's e inference, but the back propagation is not performed during the inference, so it can be customized optimization (such as layer fusion, kernel selection, etc.) to achieve low latency and high throughput during inference. The training engine can support either the persistable model or the inference model, and the prediction engine only supports the inference model, so three different inferences are derived：
+
+1. prediction engine + inference model
+2. training engine + inference model
+3. training engine + inference model
+
+Regardless of the inference method, it basically includes the following main steps：
+ Engine Build
+ Make Data to Be Predicted
+ Perform Predictions
+ Result Analysis
+
+There are two main differences in different inference methods: building the engine and executing the forecast. The following sections will be introduced in detail
+
+
+## II. Model Transformation
+
+During training, we usually save some checkpoints (persistable models). These are just model weight files and cannot be directly loaded by the prediction engine to predict, so we usually find suitable checkpoints after the training and convert them to inference model. There are two main steps: 1. Build a training engine, 2. Save the inference model, as shown below.
+
+```python
+import fluid
+
+from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+startup_prog = fluid.Program()
+infer_prog = fluid.Program()
+with fluid.program_guard(infer_prog, startup_prog):
+    with fluid.unique_name.guard():
+        image = create_input()
+        image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
+        out = ResNet50_vd.net(input=input, class_dim=1000)
+
+infer_prog = infer_prog.clone(for_test=True)
+fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
+
+fluid.io.save_inference_model(
+        dirname='./output/',
+        feeded_var_names=[image.name],
+        main_program=infer_prog,
+        target_vars=out,
+        executor=exe,
+        model_filename='model',
+        params_filename='params')
+```
+
+A complete example is provided in the `tools/export_model.py`, just execute the following command to complete the conversion：
+
+```python
+python tools/export_model.py \
+    --m=the name of model \
+    --p=the path of persistable model\
+    --o=the saved path of model and params
+```
+
+## III. prediction engine + inference model
+
+The complete example is provided in the `tools/infer/predict.py`，just execute the following command to complete the prediction:
+
+```
+python ./predict.py \
+    -i=./test.jpeg \
+    -m=./resnet50-vd/model \
+    -p=./resnet50-vd/params \
+    --use_gpu=1 \
+    --use_tensorrt=True
+```
+
+Parameter Description：
+ `image_file`(shortening i)：the path of images which are needed to predict，such as `./test.jpeg`.
+ `model_file`(shortening m)：the path of weights folder，such as `./resnet50-vd/model`.
+ `params_file`(shortening p)：the path of weights file，such as `./resnet50-vd/params`.
+ `batch_size`(shortening b)：batch size，such as  `1`.
+ `ir_optim` whether to use `IR` optimization, default: True.
+ `use_tensorrt`: whether to use TensorRT prediction engine, default:True.
+ `gpu_mem`： Initial allocation of GPU memory, the unit is M.
+ `use_gpu`: whether to use GPU, default: True.
+ `enable_benchmark`：whether to use benchmark, default: False.
+ `model_name`：the name of model.
+
+NOTE：
+when using benchmark, we use tersorrt by default to make predictions on Paddle.
+
+
+Building prediction engine：
+
+```python
+from paddle.fluid.core import AnalysisConfig
+from paddle.fluid.core import create_paddle_predictor
+config = AnalysisConfig(the path of model file, the path of params file)
+config.enable_use_gpu(8000, 0)
+config.disable_glog_info()
+config.switch_ir_optim(True)
+config.enable_tensorrt_engine(
+        precision_mode=AnalysisConfig.Precision.Float32,
+        max_batch_size=1)
+
+# no zero copy方式需要去除fetch feed op
+config.switch_use_feed_fetch_ops(False)
+
+predictor = create_paddle_predictor(config)
+```
+
+Prediction Execution：
+
+```python
+import numpy as np
+
+input_names = predictor.get_input_names()
+input_tensor = predictor.get_input_tensor(input_names[0])
+input = np.random.randn(1, 3, 224, 224).astype("float32")
+input_tensor.reshape([1, 3, 224, 224])
+input_tensor.copy_from_cpu(input)
+predictor.zero_copy_run()
+```
+
+More parameters information can be refered in [Paddle Python prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html). If you need to predict in the environment of business, we recommand you to use [Paddel C++ prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)，a rich pre-compiled prediction library is provided in the offical website[Paddle C++ prediction library](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
+
+
+By default, Paddle's wheel package does not include the TensorRT prediction engine. If you need to use TensorRT for prediction optimization, you need to compile the corresponding wheel package yourself. For the compilation method, please refer to Paddle's compilation guide. [Paddle compilation](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html)。
+
+## IV、Training engine + persistable model prediction
+
+A complete example is provided in the `tools/infer/infer.py`, just execute the following command to complete the prediction：
+
+```python
+python tools/infer/infer.py \
+    --i=the path of images which are needed to predict \
+    --m=the name of model \
+    --p=the path of persistable model \
+    --use_gpu=True
+```
+
+Parameter Description：
+ `image_file`(shortening i)：the path of images which are needed to predict，such as `./test.jpeg`
+ `model_file`(shortening m)：the path of weights folder，such as `./resnet50-vd/model`
+ `params_file`(shortening p)：the path of weights file，such as `./resnet50-vd/params`
+ `use_gpu` : whether to use GPU, default: True.
+
+
+Training Engine Construction：
+
+Since the persistable model does not contain the structural information of the model, it is necessary to construct the network structure first, and then load the weights to build the training engine。
+
+```python
+import fluid
+from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+startup_prog = fluid.Program()
+infer_prog = fluid.Program()
+with fluid.program_guard(infer_prog, startup_prog):
+    with fluid.unique_name.guard():
+        image = create_input()
+        image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
+        out = ResNet50_vd.net(input=input, class_dim=1000)
+infer_prog = infer_prog.clone(for_test=True)
+fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
+```
+
+Perform inference：
+
+```python
+outputs = exe.run(infer_prog,
+        feed={image.name: data},
+        fetch_list=[out.name],
+        return_numpy=False)
+```
+
+For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
+
+## V、training engine + inference model prediction
+
+A complete example is provided in `tools/infer/py_infer.py`, just execute the following command to complete the prediction：
+
+```python
+python tools/infer/py_infer.py \
+    --i=the path of images \
+    --d=the path of saved model \
+    --m=the path of saved model file \
+    --p=the path of saved weight file \
+    --use_gpu=True
+```
+ `image_file`(shortening i)：the path of images which are needed to predict，如 `./test.jpeg`
+ `model_file`(shortening m)：the path of model file，如 `./resnet50_vd/model`
+ `params_file`(shortening p)：the path of weights file，如 `./resnet50_vd/params`
+ `model_dir`(shortening d)：the folder of model，如`./resent50_vd`
+ `use_gpu`：whether to use GPU, default: True
+
+Training engine build
+
+Since inference model contains the structure of model, we do not need to construct the model before, load the model file and weights file directly to bulid training engine.
+
+```python
+import fluid
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+[program, feed_names, fetch_lists] = fluid.io.load_inference_model(
+        the path of saved model,
+        exe,
+        model_filename=the path of model file,
+        params_filename=the path of weights file)
+compiled_program = fluid.compiler.CompiledProgram(program)
+```
+
+> `load_inference_model` Not only supports scattered weight file collection, but also supports a single weight file。
+
+Perform inference：
+
+```python
+outputs = exe.run(compiled_program,
+        feed={feed_names[0]: data},
+        fetch_list=fetch_lists,
+        return_numpy=False)
+```
+
+For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
--- a/docs/en/extension/paddle_mobile_inference_en.md
+++ b/docs/en/extension/paddle_mobile_inference_en.md
+# Paddle-Lite
+
+## I. Introduction
+
+[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
+
+In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
+
+## II. Evaluation Steps
+
+### I. Export the Inference Model
+
+* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
+
+```shell
+python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
+```
+
+Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
+
+
+### II. Download Benchmark Binary File
+
+* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
+
+```shell
+adb shell getprop ro.product.cpu.abi
+```
+
+* Download Benchmark_bin File
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
+```
+
+If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
+```
+
+### III. the Inference Speeds
+
+After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
+
+```
+sh tools/lite/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
+```
+
+Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1                           min = 30.89100    max = 30.73600    average = 30.79750
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1                           min = 18.26600    max = 18.14000    average = 18.21637
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1                           min = 10.03200    max = 9.94300     average = 9.97627
+```
+
+Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
+
+### IV. Model Optimization and Speed Evaluation
+
+* In II.III section, we mention that the model will be optimized before evaluation, here you can  first optimize the model, and then directly load the optimized model for speed evaluation
+
+* Paddle-Lite
+In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal  and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
+
+
+```shell
+model_file="../MobileNetV1/model"
+param_file="../MobileNetV1/params"
+opt_models_dir="./opt_models"
+mkdir ${opt_models_dir}
+./opt_mac --model_file=${model_file} \
+    --param_file=${param_file} \
+    --valid_targets=arm \
+    --optimize_out_type=naive_buffer \
+    --prefer_int8_kernel=false \
+    --optimize_out=${opt_models_dir}/MobileNetV1
+```
+
+Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
+
+
+
+Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
+
+```shell
+bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
+```
+
+Finally the result is saved in `result_armv8.txt` and shown as follow.
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 30.89500    max = 30.78500    average = 30.84173
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 18.25300    max = 18.11000    average = 18.18017
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 10.00600    max = 9.90000     average = 9.96177
+```
+
+
+Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.84173FPS`.
+
+More specific parameter explanation and Paddle-Lite usage can refer to [Paddle-Lite docs](https://paddle-lite.readthedocs.io/zh/latest/)。