Add MKL-DNN (#398)

add mkldnn config

Add MKL-DNN (#398)
add mkldnn config
10e73342 · Tingquan Gao · GitHub · d3168835 · 10e73342 · 10e73342
9 changed file
--- a/deploy/hubserving/clas/config.json
+++ b/deploy/hubserving/clas/config.json
@@ -3,7 +3,8 @@
        "clas_system": {
            "init_args": {
                "version": "1.0.0",
-                "use_gpu": true
+                "use_gpu": true,
+                "enable_mkldnn": false
            },
            "predict_args": {
            }

--- a/deploy/hubserving/clas/module.py
+++ b/deploy/hubserving/clas/module.py
@@ -37,13 +37,15 @@ from deploy.hubserving.clas.params import read_params
    author_email="paddle-dev@baidu.com",
    type="cv/class")
 class ClasSystem(hub.Module):
-    def _initialize(self, use_gpu=None):
+    def _initialize(self, use_gpu=None, enable_mkldnn=None):
        """
        initialize with the necessary elements
        """
        cfg = read_params()
        if use_gpu is not None:
            cfg.use_gpu = use_gpu
+        if enable_mkldnn is not None:
+            cfg.enable_mkldnn = enable_mkldnn
        cfg.hubserving = True
        cfg.enable_benchmark = False
        self.args = cfg
@@ -59,6 +61,7 @@ class ClasSystem(hub.Module):
                )
        else:
            print("Use CPU")
+            print("Enable MKL-DNN") if enable_mkldnn else None
    def read_images(self, paths=[]):
        images = []

--- a/deploy/hubserving/clas/params.py
+++ b/deploy/hubserving/clas/params.py
@@ -28,6 +28,7 @@ def read_params():
    cfg.params_file = "./inference/cls_infer.pdiparams"
    cfg.batch_size = 1
    cfg.use_gpu = False
+    cfg.enable_mkldnn = False
    cfg.ir_optim = True
    cfg.gpu_mem = 8000
    cfg.use_fp16 = False

--- a/deploy/hubserving/readme.md
+++ b/deploy/hubserving/readme.md
@@ -76,7 +76,8 @@ $ hub serving start --modules Module1==Version1 \
        "clas_system": {
            "init_args": {
                "version": "1.0.0",
-                "use_gpu": true
+                "use_gpu": true,
+                "enable_mkldnn": false
            },
            "predict_args": {
            }
@@ -88,13 +89,16 @@ $ hub serving start --modules Module1==Version1 \
 }
 ```
- `init_args`中的可配参数与`module.py`中的`_initialize`函数接口一致。其中，**当`use_gpu`为`true`时，表示使用GPU启动服务**。  
+- `init_args`中的可配参数与`module.py`中的`_initialize`函数接口一致。其中，
+  - 当`use_gpu`为`true`时，表示使用GPU启动服务。
+  - 当`enable_mkldnn`为`true`时，表示使用MKL-DNN加速。
 - `predict_args`中的可配参数与`module.py`中的`predict`函数接口一致。
 **注意:**  
 - 使用配置文件启动服务时，其他参数会被忽略。
 - 如果使用GPU预测(即，`use_gpu`置为`true`)，则需要在启动服务之前，设置CUDA_VISIBLE_DEVICES环境变量，如：```export CUDA_VISIBLE_DEVICES=0```，否则不用设置。
 - **`use_gpu`不可与`use_multiprocess`同时为`true`**。
+- **`use_gpu`与`enable_mkldnn`同时为`true`时，将忽略`enable_mkldnn`，而使用GPU**。
 如，使用GPU 3号卡启动串联服务：  
 ```shell

--- a/deploy/hubserving/readme_en.md
+++ b/deploy/hubserving/readme_en.md
@@ -78,7 +78,8 @@ Wherein, the format of `config.json` is as follows:
        "clas_system": {
            "init_args": {
                "version": "1.0.0",
-                "use_gpu": true
+                "use_gpu": true,
+                "enable_mkldnn": false
            },
            "predict_args": {
            }
@@ -89,13 +90,16 @@ Wherein, the format of `config.json` is as follows:
    "workers": 2
 }
 ```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**.
+- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them,
+  - when `use_gpu` is `true`, it means that the GPU is used to start the service.
+  - when `enable_mkldnn` is `true`, it means that use MKL-DNN to accelerate.
 - The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
 **Note:**  
 - When using the configuration file to start the service, other parameters will be ignored.
 - If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
 - **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**  
+- **When both `use_gpu` and `enable_mkldnn` are set to `true` at the same time, GPU is used to run and `enable_mkldnn` will be ignored.**
 For example, use GPU card No. 3 to start the 2-stage series service:
 ```shell

--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
@@ -258,6 +258,8 @@ Among them:
 + `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/cls_infer.pdmodel`;
 + `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/cls_infer.pdiparams`;
 + `use_tensorrt`: Whether to use the TesorRT, default by `True`;
-+ `use_gpu`: Whether to use the GPU, default by `True`.
+ `use_gpu`: Whether to use the GPU, default by `True`
+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
 If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
--- a/docs/zh_CN/tutorials/getting_started.md
+++ b/docs/zh_CN/tutorials/getting_started.md
@@ -269,6 +269,7 @@ python tools/infer/predict.py \
 + `model_file`：模型结构文件路径，如 `./inference/cls_infer.pdmodel`
 + `params_file`：模型权重文件路径，如 `./inference/cls_infer.pdiparams`
 + `use_tensorrt`：是否使用 TesorRT 预测引擎，默认值：`True`
-+ `use_gpu`：是否使用 GPU 预测，默认值：`True`。
+ `use_gpu`：是否使用 GPU 预测，默认值：`True`
+ `enable_mkldnn`：是否启用`MKL-DNN`加速，默认为`False`。注意`enable_mkldnn`与`use_gpu`同时为`True`时，将忽略`enable_mkldnn`，而使用GPU运行。
 * 如果你希望评测模型速度，建议使用该脚本(`tools/infer/predict.py`)，同时开启TensorRT加速预测。
--- a/tools/infer/predict.py
+++ b/tools/infer/predict.py
@@ -30,6 +30,10 @@ def create_paddle_predictor(args):
        config.enable_use_gpu(args.gpu_mem, 0)
    else:
        config.disable_gpu()
+        if args.enable_mkldnn:
+            # cache 10 different shapes for mkldnn to avoid memory leak
+            config.set_mkldnn_cache_capacity(10)
+            config.enable_mkldnn()
    config.disable_glog_info()
    config.switch_ir_optim(args.ir_optim)  # default true

--- a/tools/infer/utils.py
+++ b/tools/infer/utils.py
@@ -41,6 +41,7 @@ def parse_args():
    parser.add_argument("--gpu_mem", type=int, default=8000)
    parser.add_argument("--enable_benchmark", type=str2bool, default=False)
    parser.add_argument("--top_k", type=int, default=1)
+    parser.add_argument("--enable_mkldnn", type=bool, default=False)
    parser.add_argument("--hubserving", type=str2bool, default=False)
    # params for infer