inference_support_in_fluid.md 14.6 KB
Newer Older
W
weixing02 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359
# Fluid Inference使用指南

- Python Inference API
- 编译Fluid Inference库
- Inference C++ API
- Inference实例
- Inference计算优化

## Python Inference API **[改进中]**
-  [保存Inference模型](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/io.py#L295)

  ```python
  def save_inference_model(dirname,
                           feeded_var_names,
                           target_vars,
                           executor,
                           main_program=None,
                           model_filename=None,
                           params_filename=None):
  ```
  Inference模型和参数将会保存到`dirname`目录下:
  - 序列化的模型
    - `model_filename``None`,保存到`dirname/__model__`
    - `model_filename``None`,保存到`dirname/model_filename`
  - 参数
    - `params_filename``None`,单独保存到各个独立的文件,各文件以参数变量的名字命名
    - `params_filename``None`,保存到`dirname/params_filename`

- 两种存储格式
  - 参数保存到各个独立的文件
    - 如,设置`model_filename``None``params_filename``None`

    ```bash
    $ cd recognize_digits_conv.inference.model
    $ ls
    $ __model__ batch_norm_1.w_0 batch_norm_1.w_2 conv2d_2.w_0 conv2d_3.w_0 fc_1.w_0 batch_norm_1.b_0 batch_norm_1.w_1 conv2d_2.b_0 conv2d_3.b_0 fc_1.b_0
    ```
  - 参数保存到同一个文件
    - 如,设置`model_filename``None``params_filename``__params__`

    ```bash
    $ cd recognize_digits_conv.inference.model
    $ ls
    $ __model__ __params__
    ```
- [加载Inference模型](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/io.py#L380)
  ```python
  def load_inference_model(dirname,
                           executor,
                           model_filename=None,
                           params_filename=None):
    ...
    return [program, feed_target_names, fetch_targets]
  ```


## 编译Fluid Inference库

  - **不需要额外的CMake选项**
    - 1、 配置CMake命令,更多配置请参考[源码编译PaddlePaddle](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/build_from_source_cn.html)
      ```bash
      $ git clone https://github.com/PaddlePaddle/Paddle.git
      $ cd Paddle
      $ mkdir build
      $ cd build
      $ cmake -DCMAKE_INSTALL_PREFIX=your/path/to/paddle_inference_lib \
          -DCMAKE_BUILD_TYPE=Release \
          -DWITH_PYTHON=ON \
          -DWITH_MKL=OFF \
          -DWITH_GPU=OFF \
          ..
      ```

    - 2、 编译PaddlePaddle
      ```bash
      $ make
      ```

    - 3、 部署。执行如下命令将PaddlePaddle Fluid Inference库部署到`your/path/to/paddle_inference_lib`目录。
      ```bash
      $ make inference_lib_dist
      ```

- 目录结构

  ```bash
  $ cd your/path/to/paddle_inference_lib
  $ tree
  .
  |-- paddle
  |   `-- fluid
  |       |-- framework
  |       |-- inference
  |       |   |-- io.h
  |       |   `-- libpaddle_fluid.so
  |       |-- memory
  |       |-- platform
  |       `-- string
  |-- third_party
  |   |-- eigen3
  |   `-- install
  |       |-- gflags
  |       |-- glog
  |       `-- protobuf
  `-- ...
  ```

  假设`PADDLE_ROOT=your/path/to/paddle_inference_lib`



## 链接Fluid Inference库
- [示例项目](https://github.com/luotao1/fluid_inference_example.git)

  - GCC配置
    ```bash
    $ g++ -o a.out -std=c++11 main.cc \
          -I${PADDLE_ROOT}/ \
          -I${PADDLE_ROOT}/third_party/install/gflags/include \
          -I${PADDLE_ROOT}/third_party/install/glog/include \
          -I${PADDLE_ROOT}/third_party/install/protobuf/include \
          -I${PADDLE_ROOT}/third_party/eigen3 \
          -L${PADDLE_ROOT}/paddle/fluid/inference -lpaddle_fluid \
          -lrt -ldl -lpthread
    ```

  - CMake配置
    ```cmake
    include_directories(${PADDLE_ROOT}/)
    include_directories(${PADDLE_ROOT}/third_party/install/gflags/include)
    include_directories(${PADDLE_ROOT}/third_party/install/glog/include)
    include_directories(${PADDLE_ROOT}/third_party/install/protobuf/include)
    include_directories(${PADDLE_ROOT}/third_party/eigen3)
    target_link_libraries(${TARGET_NAME}
                          ${PADDLE_ROOT}/paddle/fluid/inference/libpaddle_fluid.so
                          -lrt -ldl -lpthread)
    ```

  - 设置环境变量:
  `export LD_LIBRARY_PATH=${PADDLE_ROOT}/paddle/fluid/inference:$LD_LIBRARY_PATH`



## C++ Inference API

- [推断流程](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/test_helper.h#L91)

  - 1、 初始化设备
    ```cpp
    #include "paddle/fluid/framework/init.h"
    paddle::framework::InitDevices(false);
    ```

  - 2、 定义place,executor,scope
    ```cpp
    auto place = paddle::platform::CPUPlace();
    auto executor = paddle::framework::Executor(place);
    auto* scope = new paddle::framework::Scope();
    ```

  - 3、 加载模型
    ```cpp
    #include "paddle/fluid/inference/io.h"
    auto inference_program = paddle::inference::Load(executor, *scope, dirname);
    // or
    auto inference_program = paddle::inference::Load(executor,
                                                     *scope,
                                                     dirname + "/" + model_filename,
                                                     dirname + "/" + params_filename);
    ```

  - 4、 获取`feed_target_names``fetch_target_names`
    ```cpp
    const std::vector<std::string>& feed_target_names = inference_program->GetFeedTargetNames();
    const std::vector<std::string>& fetch_target_names = inference_program->GetFetchTargetNames();
    ```

  - 5、 准备`feed`数据
    ```cpp
    #include "paddle/fluid/framework/lod_tensor.h"
    std::vector<paddle::framework::LoDTensor*> cpu_feeds;
    ...
    std::map<std::string, const paddle::framework::LoDTensor*> feed_targets;
    for (size_t i = 0; i < feed_target_names.size(); ++i) {
      // Please make sure that cpu_feeds[i] is right for feed_target_names[i]
      feed_targets[feed_target_names[i]] = cpu_feeds[i];
    }
    ```

  - 6、 定义`Tensor``fetch`结果
    ```cpp
    std::vector<paddle::framework::LoDTensor*> cpu_fetchs;
    std::map<std::string, paddle::framework::LoDTensor*> fetch_targets;
    for (size_t i = 0; i < fetch_target_names.size(); ++i) {
      fetch_targets[fetch_target_names[i]] = cpu_fetchs[i];
    }
    ```

  - 7、 执行`inference_program`
    ```cpp
    executor.Run(*inference_program, scope, feed_targets, fetch_targets);
    ```

  - 8、 使用`fetch`数据
    ```cpp
    for (size_t i = 0; i < cpu_fetchs.size(); ++i) {
      std::cout << "lod_i: " << cpu_fetchs[i]->lod();
      std::cout << "dims_i: " << cpu_fetchs[i]->dims();
      std::cout << "result:";
      float* output_ptr = cpu_fetchs[i]->data<float>();
      for (int j = 0; j < cpu_fetchs[i]->numel(); ++j) {
        std::cout << " " << output_ptr[j];
      }
      std::cout << std::endl;
    }
    ```
    针对不同的数据,4. - 8.可执行多次。

  - 9、 释放内存
    ```cpp
    delete scope;
    ```


- 接口说明

  ```cpp
  void Run(const ProgramDesc& program, Scope* scope,
           std::map<std::string, const LoDTensor*>& feed_targets,
           std::map<std::string, LoDTensor*>& fetch_targets,
           bool create_vars = true,
           const std::string& feed_holder_name = "feed",
           const std::string& fetch_holder_name = "fetch");
  ```
  - 使用Python API `save_inference_model`保存的`program`里面包含了`feed_op``fetch_op`,用户提供的`feed_targets``fetch_targets`必须和`inference_program`中的`feed_op``fetch_op`保持一致。
  - 用户提供的`feed_holder_name``fetch_holder_name`也必须和`inference_program``feed_op``fetch_op`保持一致,可使用`SetFeedHolderName``SetFetchHolderName`接口重新设置`inferece_program`
  - 默认情况下,除了`persistable`属性设置为`True``Variable`之外,每次执行`executor.Run`会创建一个局部`Scope`,并且在这个局部`Scope`中创建和销毁所有的`Variable`,以最小化空闲时的内存占用。
  - `persistable`属性为`True``Variable`有:
    - Operators的参数`w``b`
    - `feed_op`的输入变量
    - `fetch_op`的输出变量


- **不在每次执行时创建和销毁变量
 [PR](https://github.com/PaddlePaddle/Paddle/pull/9301)**
  - 执行`inference_program`
    ```cpp
    // Call once
    executor.CreateVariables(*inference_program, scope, 0);
    // Call as many times as you like
    executor.Run(
        *inference_program, scope, feed_targets, fetch_targets, false);
    ```
  - **优点**
    - 节省了频繁创建、销毁变量的时间(约占每次`Run`总时间的1% ~ 12%)
    - 执行结束后可获取所有Operators的计算结果
  - **缺点**
    - 空闲时也会占用大量的内存
    - 在同一个`Scope`中,相同的变量名是公用同一块内存的,容易引起意想不到的错误


- **不在每次执行时创建Op [PR](https://github.com/PaddlePaddle/Paddle/pull/9630)**
  - 执行`inference_program`
    ```cpp
    // Call once
    auto ctx = executor.Prepare(*inference_program, 0);
    // Call as many times as you like if you have no need to change the inference_program
    executor.RunPreparedContext(ctx.get(), scope, feed_targets, fetch_targets);
    ```
  - **优点**
    - 节省了频繁创建、销毁Op的时间
  - **缺点**
    - 一旦修改了`inference_program`,则需要重新创建`ctx`


- **[多线程共享Parameters](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/test_multi_thread_helper.h)**
  - 主线程
    - 1、 初始化设备
    - 2、 定义`place``executor``scope`
    - 3、 加载模型,得到`inference_program`
  - 从线程
    - **复制`inference_program`得到`copy_program`,修改`copy_program`的`feed_holder_name`和`fetch_holder_name`**
      ```cpp
      auto copy_program = std::unique_ptr<paddle::framework::ProgramDesc>(
                 new paddle::framework::ProgramDesc(*inference_program));
      std::string feed_holder_name = "feed_" + paddle::string::to_string(thread_id);
      std::string fetch_holder_name = "fetch_" + paddle::string::to_string(thread_id);
      copy_program->SetFeedHolderName(feed_holder_name);
      copy_program->SetFetchHolderName(fetch_holder_name);
      ```
    - 4、 获取`copy_program``feed_target_names``fetch_target_names`
    - 5、 准备feed数据,定义Tensor来fetch结果
    - 6、 执行`copy_program`
      ```cpp
      executor->Run(*copy_program, scope, feed_targets, fetch_targets, true, feed_holder_name, fetch_holder_name);
      ```
    - 7、 使用fetch数据
  - 主线程
    - 8、 释放资源


- 基本概念
  - 数据相关:
    - [Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/concepts/tensor.md),一个N维数组,数据可以是任意类型(int,float,double等)
    - [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/concepts/lod_tensor.md),带LoD(Level-of-Detail)即序列信息的Tensor
    - [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md),记录了变量Variable
  - 执行相关:
    - [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/concepts/executor.md),无状态执行器,只跟设备相关
    - Place
      - CPUPlace,CPU设备
      - CUDAPlace,CUDA GPU设备
  - 神经网络表示:
    - [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/concepts/program.md)

  详细介绍请参考[**Paddle Fluid开发者指南**](https://github.com/lcy-seso/learning_notes/blob/master/Fluid/developer's_guid_for_Fluid/Developer's_Guide_to_Paddle_Fluid.md)



## Inference实例

  1. fit a line: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_fit_a_line.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_fit_a_line.cc)
  1. image classification: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_image_classification.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_image_classification.cc)
  1. label semantic roles: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_label_semantic_roles.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_label_semantic_roles.cc)
  1. recognize digits: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recognize_digits.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_recognize_digits.cc)
  1. recommender system: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_recommender_system.cc)
  1. understand sentiment: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_understand_sentiment.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_understand_sentiment.cc)
  1. word2vec: [Python](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_word2vec.py), [C++](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/book/test_inference_word2vec.cc)


## Inference计算优化
- 使用Python推理优化工具[inference_transpiler](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/inference_transpiler.py)
  ```python
  class InferenceTranspiler:
    def transpile(self, program, place, scope=None):
        ...
        if scope is None:
            scope = global_scope()
        ...
  ```
  - 使用`InferenceTranspiler`将会直接修改`program`
  - 使用`InferenceTranspiler`会修改参数的值,请确保`program`的参数在`scope`内。
- 支持的优化
  - 融合batch_norm op的计算
- [使用示例](https://github.com/Xreki/Xreki.github.io/blob/master/fluid/inference/inference_transpiler.py)
  ```python
  import paddle.fluid as fluid
  # NOTE: Applying the inference transpiler will change the inference_program.
  t = fluid.InferenceTranspiler()
  t.transpile(inference_program, place, inference_scope)
  ```




## 内存使用优化
- 使用Python内存优化工具[memory_optimization_transipiler](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/memory_optimization_transpiler.py)
  ```python
  fluid.memory_optimize(inference_program)
  ```