update picodet ncnn and mnn demo (#5721)

a6189b12 · Guanghua Yu · GitHub · 23f982e0 · a6189b12 · a6189b12
14 changed file
--- a/configs/picodet/README.md
+++ b/configs/picodet/README.md
@@ -226,11 +226,16 @@ paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \
 ### 部署
- OpenVINO demo [Python](../../deploy/third_engine/demo_openvino/python)
+| 预测库     | Python | C++  | 带后处理预测 |
- [PaddleLite C++ demo](../../deploy/lite)
+| :-------- | :--------: | :---------------------: | :----------------: |
- [Android demo(Paddle Lite)](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo)
+| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)（带后处理开发中） |  ✔︎ |
- ONNXRuntime demo [Python](../../deploy/third_engine/demo_onnxruntime)
+| Paddle Lite |  -    |  [C++](../../deploy/lite) | ✔︎ |
- PaddleInference demo [Python](../../deploy/python) & [C++](../../deploy/cpp)
+| Android Demo |  -  |  [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ |
+| PaddleInference | [Python](../../deploy/python) |  [C++](../../deploy/cpp) | ✔︎ |
+| ONNXRuntime  | [Python](../../deploy/third_engine/demo_onnxruntime) | Comming soon | ✔︎ |
+| NCNN |  Comming soon  | [C++](../../deploy/third_engine/demo_ncnn) | ✘ |
+| MNN  | Comming soon | [C++](../../deploy/third_engine/demo_mnn) |  ✘ |
 Android demo可视化：

--- a/configs/picodet/README_en.md
+++ b/configs/picodet/README_en.md
@@ -222,11 +222,15 @@ paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \
 ### Deploy
- OpenVINO demo [Python](../../deploy/third_engine/demo_openvino/python)
+| Infer Engine     | Python | C++  | Predict With Postprocess |
- [PaddleLite C++ demo](../../deploy/lite)
+| :-------- | :--------: | :---------------------: | :----------------: |
- [Android demo(Paddle Lite)](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo)
+| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)（postprocess comming soon） |  ✔︎ |
- ONNXRuntime demo [Python](../../deploy/third_engine/demo_onnxruntime)
+| Paddle Lite |  -    |  [C++](../../deploy/lite) | ✔︎ |
- PaddleInference demo [Python](../../deploy/python) & [C++](../../deploy/cpp)
+| Android Demo |  -  |  [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ |
+| PaddleInference | [Python](../../deploy/python) |  [C++](../../deploy/cpp) | ✔︎ |
+| ONNXRuntime  | [Python](../../deploy/third_engine/demo_onnxruntime) | Comming soon | ✔︎ |
+| NCNN |  Comming soon  | [C++](../../deploy/third_engine/demo_ncnn) | ✘ |
+| MNN  | Comming soon | [C++](../../deploy/third_engine/demo_mnn) |  ✘ |
 Android demo visualization:

--- a/deploy/third_engine/demo_mnn/CMakeLists.txt
+++ b/deploy/third_engine/demo_mnn/CMakeLists.txt
@@ -2,13 +2,14 @@ cmake_minimum_required(VERSION 3.9)
 project(picodet-mnn)
 set(CMAKE_CXX_STANDARD 17)
+set(MNN_DIR PATHS "./mnn")
 # find_package(OpenCV REQUIRED PATHS "/work/dependence/opencv/opencv-3.4.3/build")
 find_package(OpenCV REQUIRED)
 include_directories(
-        /path/to/MNN/include/MNN
+        ${MNN_DIR}/include
-        /path/to/MNN/include
+        ${MNN_DIR}/include/MNN
-        .
+        ${CMAKE_SOURCE_DIR}
 )
 link_directories(mnn/lib)

--- a/deploy/third_engine/demo_mnn/README.md
+++ b/deploy/third_engine/demo_mnn/README.md
 # PicoDet MNN Demo
-This fold provides PicoDet inference code using
+本Demo提供的预测代码是根据[Alibaba's MNN framework](https://github.com/alibaba/MNN) 推理库预测的。
-[Alibaba's MNN framework](https://github.com/alibaba/MNN). Most of the implements in
-this fold are same as *demo_ncnn*.
-## Install MNN
+## C++ Demo
-### Python library
+- 第一步：根据[MNN官方编译文档](https://www.yuque.com/mnn/en/build_linux) 编译生成预测库.
+- 第二步：编译或下载得到OpenCV库，可参考OpenCV官网，为了方便如果环境是gcc8.2 x86环境，可直接下载以下库：
-Just run:
+```shell
+wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
-``` shell
+tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
-pip install MNN
 ```
-### C++ library
+- 第三步：准备模型
-Please follow the [official document](https://www.yuque.com/mnn/en/build_linux) to build MNN engine.
- Create picodet_m_416_coco.onnx
    ```shell
-    modelName=picodet_m_416_coco
+    modelName=picodet_s_320_coco_lcnet
-    # export model
+    # 导出Inference model
    python tools/export_model.py \
            -c configs/picodet/${modelName}.yml \
            -o weights=${modelName}.pdparams \
            --output_dir=inference_model
-    # convert to onnx
+    # 转换到ONNX
    paddle2onnx --model_dir inference_model/${modelName} \
            --model_filename model.pdmodel  \
            --params_filename model.pdiparams \
            --opset_version 11 \
            --save_file ${modelName}.onnx
-    # onnxsim
+    # 简化模型
    python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx
+    # 将模型转换至MNN格式
+    python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet_s_320_lcnet_processed.onnx --MNNModel picodet_s_320_lcnet.mnn
    ```
+为了快速测试，可直接下载：[picodet_s_320_lcnet.mnn](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet.mnn)（不带后处理）。
- Convert model
+**注意：**由于MNN里，Matmul算子的输入shape如果不一致计算有问题，带后处理的Demo正在升级中，很快发布。
-   ``` shell
-   python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet-416.onnx --MNNModel picodet-416.mnn
-   ```
-Here are converted model [download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416.mnn).
-## Build
+## 编译可执行程序
-The python code *demo_mnn.py* can run directly and independently without main PicoDet repo.
-`PicoDetONNX` and `PicoDetTorch` are two classes used to check the similarity of MNN inference results
-with ONNX model and Pytorch model. They can be remove with no side effects.
-For C++ code, replace `libMNN.so` under *./mnn/lib* with the one you just compiled, modify OpenCV path and MNN path at CMake file,
-and run
+- 第一步：导入lib包
+```
+mkdir mnn && cd mnn && mkdir lib
+cp /path/to/MNN/build/libMNN.so .
+cd ..
+cp -r /path/to/MNN/include .
+```
+- 第二步：修改CMakeLists.txt中OpenCV和MNN的路径
+- 第三步：开始编译
 ``` shell
 mkdir build && cd build
 cmake ..
 make
 ```
+如果在build目录下生成`picodet-mnn`可执行文件，就证明成功了。
-Note that a flag at `main.cpp` is used to control whether to show the detection result or save it into a fold.
+## 开始运行
-``` c++
-#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows
-```
-## Run
-### Python
-`demo_mnn.py` provide an inference class `PicoDetMNN` that combines preprocess, post process, visualization.
-Besides it can be used in command line with the form:
+首先新建预测结果存放目录：
 ```shell
-demo_mnn.py [-h] [--model_path MODEL_PATH] [--cfg_path CFG_PATH]
+cp -r ../demo_onnxruntime/imgs .
-    [--img_fold IMG_FOLD] [--result_fold RESULT_FOLD]
+cd build
-    [--input_shape INPUT_SHAPE INPUT_SHAPE]
+mkdir ../results
-    [--backend {MNN,ONNX,torch}]
 ```
-For example:
+- 预测一张图片
 ``` shell
-# run MNN 416 model
+./picodet-mnn 0 ../picodet_s_320_lcnet_3.mnn 320 320 ../imgs/dog.jpg
-python ./demo_mnn.py --model_path ../model/picodet-416.mnn --img_fold ../imgs --result_fold ../results
-# run MNN 320 model
-python ./demo_mnn.py --model_path ../model/picodet-320.mnn --input_shape 320 320 --backend MNN
-# run onnx model
-python ./demo_mnn.py --model_path ../model/sim.onnx --backend ONNX
 ```
-### C++
+-测试速度Benchmark
-C++ inference interface is same with NCNN code, to detect images in a fold, run:
 ``` shell
-./picodet-mnn "1" "../imgs/test.jpg"
+./picodet-mnn 1 ../picodet_s_320_lcnet.mnn 320 320
 ```
-For speed benchmark
+## FAQ
-``` shell
+- 预测结果精度不对：
-./picodet-mnn "3" "0"
+请先确认模型输入shape是否对齐，并且模型输出name是否对齐，不带后处理的PicoDet增强版模型输出name如下：
+```shell
+# 分类分支  |  检测分支
+{"transpose_0.tmp_0", "transpose_1.tmp_0"},
+{"transpose_2.tmp_0", "transpose_3.tmp_0"},
+{"transpose_4.tmp_0", "transpose_5.tmp_0"},
+{"transpose_6.tmp_0", "transpose_7.tmp_0"},
 ```
+可使用[netron](https://netron.app)查看具体name，并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。
 ## Reference
 [MNN](https://github.com/alibaba/MNN)
--- a/deploy/third_engine/demo_mnn/main.cpp
+++ b/deploy/third_engine/demo_mnn/main.cpp
@@ -11,7 +11,6 @@
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
-// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
 #include "picodet_mnn.hpp"
 #include <iostream>
@@ -19,354 +18,186 @@
 #include <opencv2/highgui/highgui.hpp>
 #include <opencv2/imgproc/imgproc.hpp>
-#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows
+#define __SAVE_RESULT__ // if defined save drawed results to ../results, else
+                        // show it in windows
 struct object_rect {
-    int x;
+  int x;
-    int y;
+  int y;
-    int width;
+  int width;
-    int height;
+  int height;
 };
-int resize_uniform(cv::Mat& src, cv::Mat& dst, cv::Size dst_size, object_rect& effect_area)
+std::vector<int> GenerateColorMap(int num_class) {
-{
+  auto colormap = std::vector<int>(3 * num_class, 0);
-    int w = src.cols;
+  for (int i = 0; i < num_class; ++i) {
-    int h = src.rows;
+    int j = 0;
-    int dst_w = dst_size.width;
+    int lab = i;
-    int dst_h = dst_size.height;
+    while (lab) {
-    dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0));
+      colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j));
+      colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j));
-    float ratio_src = w * 1.0 / h;
+      colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j));
-    float ratio_dst = dst_w * 1.0 / dst_h;
+      ++j;
+      lab >>= 3;
-    int tmp_w = 0;
-    int tmp_h = 0;
-    if (ratio_src > ratio_dst) {
-        tmp_w = dst_w;
-        tmp_h = floor((dst_w * 1.0 / w) * h);
-    }
-    else if (ratio_src < ratio_dst) {
-        tmp_h = dst_h;
-        tmp_w = floor((dst_h * 1.0 / h) * w);
-    }
-    else {
-        cv::resize(src, dst, dst_size);
-        effect_area.x = 0;
-        effect_area.y = 0;
-        effect_area.width = dst_w;
-        effect_area.height = dst_h;
-        return 0;
-    }
-    cv::Mat tmp;
-    cv::resize(src, tmp, cv::Size(tmp_w, tmp_h));
-    if (tmp_w != dst_w) {
-        int index_w = floor((dst_w - tmp_w) / 2.0);
-        for (int i = 0; i < dst_h; i++) {
-            memcpy(dst.data + i * dst_w * 3 + index_w * 3, tmp.data + i * tmp_w * 3, tmp_w * 3);
-        }
-        effect_area.x = index_w;
-        effect_area.y = 0;
-        effect_area.width = tmp_w;
-        effect_area.height = tmp_h;
-    }
-    else if (tmp_h != dst_h) {
-        int index_h = floor((dst_h - tmp_h) / 2.0);
-        memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3);
-        effect_area.x = 0;
-        effect_area.y = index_h;
-        effect_area.width = tmp_w;
-        effect_area.height = tmp_h;
-    }
-    else {
-        printf("error\n");
    }
-    return 0;
+  }
+  return colormap;
 }
-const int color_list[80][3] =
+void draw_bboxes(const cv::Mat &im, const std::vector<BoxInfo> &bboxes,
-{
+                 std::string save_path = "None") {
-    {216 , 82 , 24},
+  static const char *class_names[] = {
-    {236 ,176 , 31},
+      "person",        "bicycle",      "car",
-    {125 , 46 ,141},
+      "motorcycle",    "airplane",     "bus",
-    {118 ,171 , 47},
+      "train",         "truck",        "boat",
-    { 76 ,189 ,237},
+      "traffic light", "fire hydrant", "stop sign",
-    {238 , 19 , 46},
+      "parking meter", "bench",        "bird",
-    { 76 , 76 , 76},
+      "cat",           "dog",          "horse",
-    {153 ,153 ,153},
+      "sheep",         "cow",          "elephant",
-    {255 ,  0 ,  0},
+      "bear",          "zebra",        "giraffe",
-    {255 ,127 ,  0},
+      "backpack",      "umbrella",     "handbag",
-    {190 ,190 ,  0},
+      "tie",           "suitcase",     "frisbee",
-    {  0 ,255 ,  0},
+      "skis",          "snowboard",    "sports ball",
-    {  0 ,  0 ,255},
+      "kite",          "baseball bat", "baseball glove",
-    {170 ,  0 ,255},
+      "skateboard",    "surfboard",    "tennis racket",
-    { 84 , 84 ,  0},
+      "bottle",        "wine glass",   "cup",
-    { 84 ,170 ,  0},
+      "fork",          "knife",        "spoon",
-    { 84 ,255 ,  0},
+      "bowl",          "banana",       "apple",
-    {170 , 84 ,  0},
+      "sandwich",      "orange",       "broccoli",
-    {170 ,170 ,  0},
+      "carrot",        "hot dog",      "pizza",
-    {170 ,255 ,  0},
+      "donut",         "cake",         "chair",
-    {255 , 84 ,  0},
+      "couch",         "potted plant", "bed",
-    {255 ,170 ,  0},
+      "dining table",  "toilet",       "tv",
-    {255 ,255 ,  0},
+      "laptop",        "mouse",        "remote",
-    {  0 , 84 ,127},
+      "keyboard",      "cell phone",   "microwave",
-    {  0 ,170 ,127},
+      "oven",          "toaster",      "sink",
-    {  0 ,255 ,127},
+      "refrigerator",  "book",         "clock",
-    { 84 ,  0 ,127},
+      "vase",          "scissors",     "teddy bear",
-    { 84 , 84 ,127},
+      "hair drier",    "toothbrush"};
-    { 84 ,170 ,127},
-    { 84 ,255 ,127},
+  cv::Mat image = im.clone();
-    {170 ,  0 ,127},
+  int src_w = image.cols;
-    {170 , 84 ,127},
+  int src_h = image.rows;
-    {170 ,170 ,127},
+  int thickness = 2;
-    {170 ,255 ,127},
+  auto colormap = GenerateColorMap(sizeof(class_names));
-    {255 ,  0 ,127},
-    {255 , 84 ,127},
+  for (size_t i = 0; i < bboxes.size(); i++) {
-    {255 ,170 ,127},
+    const BoxInfo &bbox = bboxes[i];
-    {255 ,255 ,127},
+    std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". "
-    {  0 , 84 ,255},
+              << bbox.y2 << ". " << std::endl;
-    {  0 ,170 ,255},
+    int c1 = colormap[3 * bbox.label + 0];
-    {  0 ,255 ,255},
+    int c2 = colormap[3 * bbox.label + 1];
-    { 84 ,  0 ,255},
+    int c3 = colormap[3 * bbox.label + 2];
-    { 84 , 84 ,255},
+    cv::Scalar color = cv::Scalar(c1, c2, c3);
-    { 84 ,170 ,255},
+    // cv::Scalar color = cv::Scalar(0, 0, 255);
-    { 84 ,255 ,255},
+    cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1),
-    {170 ,  0 ,255},
+                                  cv::Point(bbox.x2, bbox.y2)),
-    {170 , 84 ,255},
+                  color, 1, cv::LINE_AA);
-    {170 ,170 ,255},
-    {170 ,255 ,255},
+    char text[256];
-    {255 ,  0 ,255},
+    sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
-    {255 , 84 ,255},
-    {255 ,170 ,255},
+    int baseLine = 0;
-    { 42 ,  0 ,  0},
+    cv::Size label_size =
-    { 84 ,  0 ,  0},
+        cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
-    {127 ,  0 ,  0},
-    {170 ,  0 ,  0},
+    int x = bbox.x1;
-    {212 ,  0 ,  0},
+    int y = bbox.y1 - label_size.height - baseLine;
-    {255 ,  0 ,  0},
+    if (y < 0)
-    {  0 , 42 ,  0},
+      y = 0;
-    {  0 , 84 ,  0},
+    if (x + label_size.width > image.cols)
-    {  0 ,127 ,  0},
+      x = image.cols - label_size.width;
-    {  0 ,170 ,  0},
-    {  0 ,212 ,  0},
+    cv::rectangle(image, cv::Rect(cv::Point(x, y),
-    {  0 ,255 ,  0},
+                                  cv::Size(label_size.width,
-    {  0 ,  0 , 42},
+                                           label_size.height + baseLine)),
-    {  0 ,  0 , 84},
+                  color, -1);
-    {  0 ,  0 ,127},
-    {  0 ,  0 ,170},
+    cv::putText(image, text, cv::Point(x, y + label_size.height),
-    {  0 ,  0 ,212},
+                cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1,
-    {  0 ,  0 ,255},
+                cv::LINE_AA);
-    {  0 ,  0 ,  0},
+  }
-    { 36 , 36 , 36},
-    { 72 , 72 , 72},
+  if (save_path == "None") {
-    {109 ,109 ,109},
+    cv::imshow("image", image);
-    {145 ,145 ,145},
+  } else {
-    {182 ,182 ,182},
+    cv::imwrite(save_path, image);
-    {218 ,218 ,218},
+    std::cout << save_path << std::endl;
-    {  0 ,113 ,188},
+  }
-    { 80 ,182 ,188},
-    {127 ,127 ,  0},
-};
-void draw_bboxes(const cv::Mat& bgr, const std::vector<BoxInfo>& bboxes, object_rect effect_roi, std::string save_path="None")
-{
-    static const char* class_names[] = { "person", "bicycle", "car", "motorcycle", "airplane", "bus",
-                                        "train", "truck", "boat", "traffic light", "fire hydrant",
-                                        "stop sign", "parking meter", "bench", "bird", "cat", "dog",
-                                        "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
-                                        "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
-                                        "skis", "snowboard", "sports ball", "kite", "baseball bat",
-                                        "baseball glove", "skateboard", "surfboard", "tennis racket",
-                                        "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl",
-                                        "banana", "apple", "sandwich", "orange", "broccoli", "carrot",
-                                        "hot dog", "pizza", "donut", "cake", "chair", "couch",
-                                        "potted plant", "bed", "dining table", "toilet", "tv", "laptop",
-                                        "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
-                                        "toaster", "sink", "refrigerator", "book", "clock", "vase",
-                                        "scissors", "teddy bear", "hair drier", "toothbrush"
-    };
-    cv::Mat image = bgr.clone();
-    int src_w = image.cols;
-    int src_h = image.rows;
-    int dst_w = effect_roi.width;
-    int dst_h = effect_roi.height;
-    float width_ratio = (float)src_w / (float)dst_w;
-    float height_ratio = (float)src_h / (float)dst_h;
-    for (size_t i = 0; i < bboxes.size(); i++)
-    {
-        const BoxInfo& bbox = bboxes[i];
-        cv::Scalar color = cv::Scalar(color_list[bbox.label][0], color_list[bbox.label][1], color_list[bbox.label][2]);
-        cv::rectangle(image, cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, (bbox.y1 - effect_roi.y) * height_ratio),
-                                      cv::Point((bbox.x2 - effect_roi.x) * width_ratio, (bbox.y2 - effect_roi.y) * height_ratio)), color);
-        char text[256];
-        sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
-        int baseLine = 0;
-        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
-        int x = (bbox.x1 - effect_roi.x) * width_ratio;
-        int y = (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine;
-        if (y < 0)
-            y = 0;
-        if (x + label_size.width > image.cols)
-            x = image.cols - label_size.width;
-        cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
-            color, -1);
-        cv::putText(image, text, cv::Point(x, y + label_size.height),
-            cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255));
-    }
-    if (save_path == "None")
-    {
-        cv::imshow("image", image);
-    }
-    else
-    {
-        cv::imwrite(save_path, image);
-        std::cout << save_path << std::endl;
-    }
-}
-int image_demo(PicoDet &detector, const char* imagepath)
-{
-    std::vector<cv::String> filenames;
-    cv::glob(imagepath, filenames, false);
-    for (auto img_name : filenames)
-    {
-        cv::Mat image = cv::imread(img_name);
-        if (image.empty())
-        {
-            fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
-            return -1;
-        }
-        object_rect effect_roi;
-        cv::Mat resized_img;
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
-        std::vector<BoxInfo> results;
-        detector.detect(resized_img, results);
-        #ifdef __SAVE_RESULT__
-            std::string save_path = img_name;
-            draw_bboxes(image, results, effect_roi, save_path.replace(3, 4, "results"));
-        #else
-            draw_bboxes(image, results, effect_roi);
-            cv::waitKey(0);
-        #endif
-    }
-    return 0;
 }
-int webcam_demo(PicoDet& detector, int cam_id)
+int image_demo(PicoDet &detector, const char *imagepath) {
-{
+  std::vector<cv::String> filenames;
-    cv::Mat image;
+  cv::glob(imagepath, filenames, false);
-    cv::VideoCapture cap(cam_id);
-    while (true)
+  for (auto img_name : filenames) {
-    {
+    cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR);
-        cap >> image;
+    if (image.empty()) {
-        object_rect effect_roi;
+      fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
-        cv::Mat resized_img;
+      return -1;
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
-        std::vector<BoxInfo> results;
-        detector.detect(resized_img, results);
-        draw_bboxes(image, results, effect_roi);
-        cv::waitKey(1);
    }
-    return 0;
+    std::vector<BoxInfo> results;
+    detector.detect(image, results, false);
+    std::cout << "detect done." << std::endl;
+#ifdef __SAVE_RESULT__
+    std::string save_path = img_name;
+    draw_bboxes(image, results, save_path.replace(3, 4, "results"));
+#else
+    draw_bboxes(image, results);
+    cv::waitKey(0);
+#endif
+  }
+  return 0;
 }
-int video_demo(PicoDet& detector, const char* path)
+int benchmark(PicoDet &detector, int width, int height) {
-{
+  int loop_num = 100;
-    cv::Mat image;
+  int warm_up = 8;
-    cv::VideoCapture cap(path);
+  double time_min = DBL_MAX;
-    while (true)
+  double time_max = -DBL_MAX;
-    {
+  double time_avg = 0;
-        cap >> image;
+  cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1));
-        object_rect effect_roi;
+  for (int i = 0; i < warm_up + loop_num; i++) {
-        cv::Mat resized_img;
+    auto start = std::chrono::steady_clock::now();
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
+    std::vector<BoxInfo> results;
-        std::vector<BoxInfo> results;
+    detector.detect(image, results, false);
-        detector.detect(resized_img, results);
+    auto end = std::chrono::steady_clock::now();
-        draw_bboxes(image, results, effect_roi);
-        cv::waitKey(1);
+    std::chrono::duration<double> elapsed = end - start;
+    double time = elapsed.count();
+    if (i >= warm_up) {
+      time_min = (std::min)(time_min, time);
+      time_max = (std::max)(time_max, time);
+      time_avg += time;
    }
-    return 0;
+  }
+  time_avg /= loop_num;
+  fprintf(stderr, "%20s  min = %7.2f  max = %7.2f  avg = %7.2f\n", "picodet",
+          time_min, time_max, time_avg);
+  return 0;
 }
-int benchmark(PicoDet& detector)
+int main(int argc, char **argv) {
-{
+  int mode = atoi(argv[1]);
-    int loop_num = 100;
+  std::string model_path = argv[2];
-    int warm_up = 8;
+  int height = 320;
+  int width = 320;
-    double time_min = DBL_MAX;
+  if (argc == 4) {
-    double time_max = -DBL_MAX;
+    height = atoi(argv[3]);
-    double time_avg = 0;
+    width = atoi(argv[4]);
-    cv::Mat image(320, 320, CV_8UC3, cv::Scalar(1, 1, 1));
+  }
-    for (int i = 0; i < warm_up + loop_num; i++)
+  PicoDet detector = PicoDet(model_path, width, height, 4, 0.45, 0.3);
-    {
+  if (mode == 1) {
-        auto start = std::chrono::steady_clock::now();
+    benchmark(detector, width, height);
-        std::vector<BoxInfo> results;
+  } else {
-        detector.detect(image, results);
+    if (argc != 5) {
-        auto end = std::chrono::steady_clock::now();
+      std::cout << "Must set image file, such as ./picodet-mnn 0 "
+                   "../picodet_s_320_lcnet.mnn 320 320 img.jpg"
-        std::chrono::duration<double> elapsed = end - start;
+                << std::endl;
-        double time = elapsed.count();
-        if (i >= warm_up)
-        {
-            time_min = (std::min)(time_min, time);
-            time_max = (std::max)(time_max, time);
-            time_avg += time;
-        }
-    }
-    time_avg /= loop_num;
-    fprintf(stderr, "%20s  min = %7.2f  max = %7.2f  avg = %7.2f\n", "picodet", time_min, time_max, time_avg);
-    return 0;
-}
-int main(int argc, char** argv)
-{
-    if (argc != 3)
-    {
-        fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]);
-        return -1;
-    }
-    PicoDet detector = PicoDet("../weight/picodet-416.mnn", 416, 416, 4, 0.45, 0.3);
-    int mode = atoi(argv[1]);
-    switch (mode)
-    {
-    case 0:{
-        int cam_id = atoi(argv[2]);
-        webcam_demo(detector, cam_id);
-        break;
-        }
-    case 1:{
-        const char* images = argv[2];
-        image_demo(detector, images);
-        break;
-        }
-    case 2:{
-        const char* path = argv[2];
-        video_demo(detector, path);
-        break;
-        }
-    case 3:{
-        benchmark(detector);
-        break;
-        }
-    default:{
-        fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]);
-        break;
-        }
    }
+    const char *images = argv[5];
+    image_demo(detector, images);
+  }
 }
--- a/deploy/third_engine/demo_mnn/picodet_mnn.cpp
+++ b/deploy/third_engine/demo_mnn/picodet_mnn.cpp
@@ -44,7 +44,8 @@ PicoDet::~PicoDet() {
  PicoDet_interpreter->releaseSession(PicoDet_session);
 }
-int PicoDet::detect(cv::Mat &raw_image, std::vector<BoxInfo> &result_list) {
+int PicoDet::detect(cv::Mat &raw_image, std::vector<BoxInfo> &result_list,
+                    bool has_postprocess) {
  if (raw_image.empty()) {
    std::cout << "image is empty ,please check!" << std::endl;
    return -1;
@@ -70,22 +71,57 @@ int PicoDet::detect(cv::Mat &raw_image, std::vector<BoxInfo> &result_list) {
  std::vector<std::vector<BoxInfo>> results;
  results.resize(num_class);
-  for (const auto &head_info : heads_info) {
+  if (has_postprocess) {
-    MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput(
+    auto bbox_out_tensor = PicoDet_interpreter->getSessionOutput(
-        PicoDet_session, head_info.cls_layer.c_str());
+        PicoDet_session, nms_heads_info[0].c_str());
-    MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput(
+    auto class_out_tensor = PicoDet_interpreter->getSessionOutput(
-        PicoDet_session, head_info.dis_layer.c_str());
+        PicoDet_session, nms_heads_info[1].c_str());
+    // bbox branch
-    MNN::Tensor tensor_scores_host(tensor_scores,
+    auto tensor_bbox_host =
-                                   tensor_scores->getDimensionType());
+        new MNN::Tensor(bbox_out_tensor, MNN::Tensor::CAFFE);
-    tensor_scores->copyToHostTensor(&tensor_scores_host);
+    bbox_out_tensor->copyToHostTensor(tensor_bbox_host);
+    auto bbox_output_shape = tensor_bbox_host->shape();
-    MNN::Tensor tensor_boxes_host(tensor_boxes,
+    int output_size = 1;
-                                  tensor_boxes->getDimensionType());
+    for (int j = 0; j < bbox_output_shape.size(); ++j) {
-    tensor_boxes->copyToHostTensor(&tensor_boxes_host);
+      output_size *= bbox_output_shape[j];
+    }
-    decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride,
+    std::cout << "output_size:" << output_size << std::endl;
-                 score_threshold, results);
+    bbox_output_data_.resize(output_size);
+    std::copy_n(tensor_bbox_host->host<float>(), output_size,
+                bbox_output_data_.data());
+    delete tensor_bbox_host;
+    // class branch
+    auto tensor_class_host =
+        new MNN::Tensor(class_out_tensor, MNN::Tensor::CAFFE);
+    class_out_tensor->copyToHostTensor(tensor_class_host);
+    auto class_output_shape = tensor_class_host->shape();
+    output_size = 1;
+    for (int j = 0; j < class_output_shape.size(); ++j) {
+      output_size *= class_output_shape[j];
+    }
+    std::cout << "output_size:" << output_size << std::endl;
+    class_output_data_.resize(output_size);
+    std::copy_n(tensor_class_host->host<float>(), output_size,
+                class_output_data_.data());
+    delete tensor_class_host;
+  } else {
+    for (const auto &head_info : non_postprocess_heads_info) {
+      MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput(
+          PicoDet_session, head_info.cls_layer.c_str());
+      MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput(
+          PicoDet_session, head_info.dis_layer.c_str());
+      MNN::Tensor tensor_scores_host(tensor_scores,
+                                     tensor_scores->getDimensionType());
+      tensor_scores->copyToHostTensor(&tensor_scores_host);
+      MNN::Tensor tensor_boxes_host(tensor_boxes,
+                                    tensor_boxes->getDimensionType());
+      tensor_boxes->copyToHostTensor(&tensor_boxes_host);
+      decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride,
+                   score_threshold, results);
+    }
  }
  auto end = chrono::steady_clock::now();
@@ -188,8 +224,6 @@ void PicoDet::nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH) {
  }
 }
-string PicoDet::get_label_str(int label) { return labels[label]; }
 inline float fast_exp(float x) {
  union {
    uint32_t i;

--- a/deploy/third_engine/demo_mnn/picodet_mnn.hpp
+++ b/deploy/third_engine/demo_mnn/picodet_mnn.hpp
@@ -11,7 +11,6 @@
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
-// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
 #ifndef __PicoDet_H__
 #define __PicoDet_H__
@@ -20,90 +19,84 @@
 #include "Interpreter.hpp"
+#include "ImageProcess.hpp"
 #include "MNNDefine.h"
 #include "Tensor.hpp"
-#include "ImageProcess.hpp"
-#include <opencv2/opencv.hpp>
 #include <algorithm>
+#include <chrono>
 #include <iostream>
+#include <memory>
+#include <opencv2/opencv.hpp>
 #include <string>
 #include <vector>
-#include <memory>
-#include <chrono>
+typedef struct NonPostProcessHeadInfo_ {
-typedef struct HeadInfo_
+  std::string cls_layer;
-{
+  std::string dis_layer;
-    std::string cls_layer;
+  int stride;
-    std::string dis_layer;
+} NonPostProcessHeadInfo;
-    int stride;
-} HeadInfo;
+typedef struct BoxInfo_ {
+  float x1;
-typedef struct BoxInfo_
+  float y1;
-{
+  float x2;
-    float x1;
+  float y2;
-    float y1;
+  float score;
-    float x2;
+  int label;
-    float y2;
-    float score;
-    int label;
 } BoxInfo;
 class PicoDet {
 public:
-    PicoDet(const std::string &mnn_path,
+  PicoDet(const std::string &mnn_path, int input_width, int input_length,
-            int input_width, int input_length, int num_thread_ = 4, float score_threshold_ = 0.5, float nms_threshold_ = 0.3);
+          int num_thread_ = 4, float score_threshold_ = 0.5,
+          float nms_threshold_ = 0.3);
-    ~PicoDet();
+  ~PicoDet();
-    int detect(cv::Mat &img, std::vector<BoxInfo> &result_list);
+  int detect(cv::Mat &img, std::vector<BoxInfo> &result_list,
-    std::string get_label_str(int label);
+             bool has_postprocess);
 private:
-    void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride, float threshold, std::vector<std::vector<BoxInfo>> &results);
+  void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride,
-    BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, int y, int stride);
+                    float threshold,
-    void nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH);
+                    std::vector<std::vector<BoxInfo>> &results);
+  BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x,
+                       int y, int stride);
+  void nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH);
 private:
+  std::shared_ptr<MNN::Interpreter> PicoDet_interpreter;
-    std::shared_ptr<MNN::Interpreter> PicoDet_interpreter;
+  MNN::Session *PicoDet_session = nullptr;
-    MNN::Session *PicoDet_session = nullptr;
+  MNN::Tensor *input_tensor = nullptr;
-    MNN::Tensor *input_tensor = nullptr;
+  int num_thread;
-    int num_thread;
+  int image_w;
-    int image_w;
+  int image_h;
-    int image_h;
+  int in_w = 320;
-    int in_w = 320;
+  int in_h = 320;
-    int in_h = 320;
+  float score_threshold;
-    float score_threshold;
+  float nms_threshold;
-    float nms_threshold;
+  const float mean_vals[3] = {103.53f, 116.28f, 123.675f};
-    const float mean_vals[3] = { 103.53f, 116.28f, 123.675f };
+  const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f};
-    const float norm_vals[3] = { 0.017429f, 0.017507f, 0.017125f };
+  const int num_class = 80;
-    const int num_class = 80;
+  const int reg_max = 7;
-    const int reg_max = 7;
+  std::vector<float> bbox_output_data_;
-    std::vector<HeadInfo> heads_info{
+  std::vector<float> class_output_data_;
-        // cls_pred|dis_pred|stride
-        {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8},
+  std::vector<std::string> nms_heads_info{"tmp_16", "concat_4.tmp_0"};
-        {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16},
+  // If not export post-process, will use non_postprocess_heads_info
-        {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32},
+  std::vector<NonPostProcessHeadInfo> non_postprocess_heads_info{
-        {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64},
+      // cls_pred|dis_pred|stride
-    };
+      {"transpose_0.tmp_0", "transpose_1.tmp_0", 8},
+      {"transpose_2.tmp_0", "transpose_3.tmp_0", 16},
-    std::vector<std::string>
+      {"transpose_4.tmp_0", "transpose_5.tmp_0", 32},
-    labels{"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
+      {"transpose_6.tmp_0", "transpose_7.tmp_0", 64},
-           "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
+  };
-           "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
-           "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
-           "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
-           "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
-           "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
-           "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
-           "hair drier", "toothbrush"};
 };
 template <typename _Tp>

--- a/deploy/third_engine/demo_mnn/python/demo_mnn.py
+++ b/deploy/third_engine/demo_mnn/python/demo_mnn.py
--- a/deploy/third_engine/demo_ncnn/CMakeLists.txt
+++ b/deploy/third_engine/demo_ncnn/CMakeLists.txt
-cmake_minimum_required(VERSION 3.4.1)
+cmake_minimum_required(VERSION 3.9)
 set(CMAKE_CXX_STANDARD 17)
 project(picodet_demo)
@@ -11,9 +11,11 @@ if(OPENMP_FOUND)
    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
 endif()
-find_package(OpenCV REQUIRED)
+# find_package(OpenCV REQUIRED)
+find_package(OpenCV REQUIRED PATHS "/path/to/opencv-3.4.16_gcc8.2_ffmpeg")
-find_package(ncnn REQUIRED)
+# find_package(ncnn REQUIRED)
+find_package(ncnn REQUIRED PATHS "/path/to/ncnn/build/install/lib/cmake/ncnn")
 if(NOT TARGET ncnn)
    message(WARNING "ncnn NOT FOUND!  Please set ncnn_DIR environment variable")
 else()

--- a/deploy/third_engine/demo_ncnn/README.md
+++ b/deploy/third_engine/demo_ncnn/README.md
 # PicoDet NCNN Demo
-This project provides PicoDet image inference, webcam inference and benchmark using
+该Demo提供的预测代码是根据[Tencent's NCNN framework](https://github.com/Tencent/ncnn)推理库预测的。
-[Tencent's NCNN framework](https://github.com/Tencent/ncnn).
-# How to build
+# 第一步：编译
 ## Windows
 ### Step1.
 Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/community/
@@ -12,11 +10,16 @@ Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/co
 ### Step2.
 Download and install OpenCV from https://github.com/opencv/opencv/releases
-### Step3(Optional).
+为了方便，如果环境是gcc8.2 x86环境，可直接下载以下库：
+```shell
+wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
+tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
+```
+### Step3(可选).
 Download and install Vulkan SDK from https://vulkan.lunarg.com/sdk/home
-### Step4.
+### Step4：编译NCNN
-Clone NCNN repository
 ``` shell script
 git clone --recursive https://github.com/Tencent/ncnn.git
@@ -25,7 +28,7 @@ Build NCNN following this tutorial: [Build for Windows x64 using VS2017](https:/
 ### Step5.
-Add `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` to system environment variables.
+增加 `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` 到系统变量中
 Build project: Open x64 Native Tools Command Prompt for VS 2019 or 2017
@@ -42,10 +45,10 @@ msbuild picodet_demo.vcxproj /p:configuration=release /p:platform=x64
 ### Step1.
 Build and install OpenCV from https://github.com/opencv/opencv
-### Step2(Optional).
+### Step2(可选).
 Download Vulkan SDK from https://vulkan.lunarg.com/sdk/home
-### Step3.
+### Step3：编译NCNN
 Clone NCNN repository
 ``` shell script
@@ -54,15 +57,7 @@ git clone --recursive https://github.com/Tencent/ncnn.git
 Build NCNN following this tutorial: [Build for Linux / NVIDIA Jetson / Raspberry Pi](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-linux)
-### Step4.
+### Step4：编译可执行文件
-Set environment variables. Run:
-``` shell script
-export ncnn_DIR=YOUR_NCNN_PATH/build/install/lib/cmake/ncnn
-```
-Build project
 ``` shell script
 cd <this-folder>
@@ -71,47 +66,64 @@ cd build
 cmake ..
 make
 ```
 # Run demo
-Download PicoDet ncnn model.
+- 准备模型
-* [PicoDet ncnn model download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_ncnn.zip)
+    ```shell
+    modelName=picodet_s_320_coco_lcnet
+    # 导出Inference model
-## Webcam
+    python tools/export_model.py \
+            -c configs/picodet/${modelName}.yml \
-```shell script
+            -o weights=${modelName}.pdparams \
-picodet_demo 0 0
+            --output_dir=inference_model
+    # 转换到ONNX
+    paddle2onnx --model_dir inference_model/${modelName} \
+            --model_filename model.pdmodel  \
+            --params_filename model.pdiparams \
+            --opset_version 11 \
+            --save_file ${modelName}.onnx
+    # 简化模型
+    python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx
+    # 将模型转换至NCNN格式
+    Run onnx2ncnn in ncnn tools to generate ncnn .param and .bin file.
+    ```
+转NCNN模型可以利用在线转换工具 [https://convertmodel.com](https://convertmodel.com/)
+为了快速测试，可直接下载：[picodet_s_320_coco_lcnet-opt.bin](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.bin)/ [picodet_s_320_coco_lcnet-opt.param](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.param)（不带后处理）。
+**注意：**由于带后处理后，NCNN预测会出NAN，暂时使用不带后处理Demo即可，带后处理的Demo正在升级中，很快发布。
+## 开始运行
+首先新建预测结果存放目录：
+```shell
+cp -r ../demo_onnxruntime/imgs .
+cd build
+mkdir ../results
 ```
-## Inference images
+- 预测一张图片
+``` shell
-```shell script
+./picodet_demo 0 ../picodet_s_320_coco_lcnet.bin ../picodet_s_320_coco_lcnet.param 320 320 ../imgs/dog.jpg 0
-picodet_demo 1 IMAGE_FOLDER/*.jpg
 ```
+具体参数解析可参考`main.cpp`。
-## Inference video
+-测试速度Benchmark
-```shell script
+``` shell
-picodet_demo 2 VIDEO_PATH
+./picodet_demo 1 ../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param 320 320  0
 ```
-## Benchmark
+## FAQ
-```shell script
-picodet_demo 3 0
-result: picodet  min = 17.74  max = 22.71  avg = 18.16
-```
-****
-Notice:
-If benchmark speed is slow, try to limit omp thread num.
-Linux:
-```shell script
+- 预测结果精度不对：
-export OMP_THREAD_LIMIT=4
+请先确认模型输入shape是否对齐，并且模型输出name是否对齐，不带后处理的PicoDet增强版模型输出name如下：
+```shell
+# 分类分支  |  检测分支
+{"transpose_0.tmp_0", "transpose_1.tmp_0"},
+{"transpose_2.tmp_0", "transpose_3.tmp_0"},
+{"transpose_4.tmp_0", "transpose_5.tmp_0"},
+{"transpose_6.tmp_0", "transpose_7.tmp_0"},
 ```
+可使用[netron](https://netron.app)查看具体name，并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。
--- a/deploy/third_engine/demo_ncnn/main.cpp
+++ b/deploy/third_engine/demo_ncnn/main.cpp
@@ -13,353 +13,198 @@
 // limitations under the License.
 // reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn
+#include "picodet.h"
+#include <benchmark.h>
+#include <iostream>
+#include <net.h>
 #include <opencv2/core/core.hpp>
 #include <opencv2/highgui/highgui.hpp>
 #include <opencv2/imgproc/imgproc.hpp>
-#include <iostream>
-#include <net.h>
-#include "picodet.h"
-#include <benchmark.h>
+#define __SAVE_RESULT__ // if defined save drawed results to ../results, else
+                        // show it in windows
 struct object_rect {
-    int x;
+  int x;
-    int y;
+  int y;
-    int width;
+  int width;
-    int height;
+  int height;
-};
-int resize_uniform(cv::Mat& src, cv::Mat& dst, cv::Size dst_size, object_rect& effect_area)
-{
-    int w = src.cols;
-    int h = src.rows;
-    int dst_w = dst_size.width;
-    int dst_h = dst_size.height;
-    dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0));
-    float ratio_src = w * 1.0 / h;
-    float ratio_dst = dst_w * 1.0 / dst_h;
-    int tmp_w = 0;
-    int tmp_h = 0;
-    if (ratio_src > ratio_dst) {
-        tmp_w = dst_w;
-        tmp_h = floor((dst_w * 1.0 / w) * h);
-    }
-    else if (ratio_src < ratio_dst) {
-        tmp_h = dst_h;
-        tmp_w = floor((dst_h * 1.0 / h) * w);
-    }
-    else {
-        cv::resize(src, dst, dst_size);
-        effect_area.x = 0;
-        effect_area.y = 0;
-        effect_area.width = dst_w;
-        effect_area.height = dst_h;
-        return 0;
-    }
-    cv::Mat tmp;
-    cv::resize(src, tmp, cv::Size(tmp_w, tmp_h));
-    if (tmp_w != dst_w) {
-        int index_w = floor((dst_w - tmp_w) / 2.0);
-        for (int i = 0; i < dst_h; i++) {
-            memcpy(dst.data + i * dst_w * 3 + index_w * 3, tmp.data + i * tmp_w * 3, tmp_w * 3);
-        }
-        effect_area.x = index_w;
-        effect_area.y = 0;
-        effect_area.width = tmp_w;
-        effect_area.height = tmp_h;
-    }
-    else if (tmp_h != dst_h) {
-        int index_h = floor((dst_h - tmp_h) / 2.0);
-        memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3);
-        effect_area.x = 0;
-        effect_area.y = index_h;
-        effect_area.width = tmp_w;
-        effect_area.height = tmp_h;
-    }
-    else {
-        printf("error\n");
-    }
-    return 0;
-}
-const int color_list[80][3] =
-{
-    {216 , 82 , 24},
-    {236 ,176 , 31},
-    {125 , 46 ,141},
-    {118 ,171 , 47},
-    { 76 ,189 ,237},
-    {238 , 19 , 46},
-    { 76 , 76 , 76},
-    {153 ,153 ,153},
-    {255 ,  0 ,  0},
-    {255 ,127 ,  0},
-    {190 ,190 ,  0},
-    {  0 ,255 ,  0},
-    {  0 ,  0 ,255},
-    {170 ,  0 ,255},
-    { 84 , 84 ,  0},
-    { 84 ,170 ,  0},
-    { 84 ,255 ,  0},
-    {170 , 84 ,  0},
-    {170 ,170 ,  0},
-    {170 ,255 ,  0},
-    {255 , 84 ,  0},
-    {255 ,170 ,  0},
-    {255 ,255 ,  0},
-    {  0 , 84 ,127},
-    {  0 ,170 ,127},
-    {  0 ,255 ,127},
-    { 84 ,  0 ,127},
-    { 84 , 84 ,127},
-    { 84 ,170 ,127},
-    { 84 ,255 ,127},
-    {170 ,  0 ,127},
-    {170 , 84 ,127},
-    {170 ,170 ,127},
-    {170 ,255 ,127},
-    {255 ,  0 ,127},
-    {255 , 84 ,127},
-    {255 ,170 ,127},
-    {255 ,255 ,127},
-    {  0 , 84 ,255},
-    {  0 ,170 ,255},
-    {  0 ,255 ,255},
-    { 84 ,  0 ,255},
-    { 84 , 84 ,255},
-    { 84 ,170 ,255},
-    { 84 ,255 ,255},
-    {170 ,  0 ,255},
-    {170 , 84 ,255},
-    {170 ,170 ,255},
-    {170 ,255 ,255},
-    {255 ,  0 ,255},
-    {255 , 84 ,255},
-    {255 ,170 ,255},
-    { 42 ,  0 ,  0},
-    { 84 ,  0 ,  0},
-    {127 ,  0 ,  0},
-    {170 ,  0 ,  0},
-    {212 ,  0 ,  0},
-    {255 ,  0 ,  0},
-    {  0 , 42 ,  0},
-    {  0 , 84 ,  0},
-    {  0 ,127 ,  0},
-    {  0 ,170 ,  0},
-    {  0 ,212 ,  0},
-    {  0 ,255 ,  0},
-    {  0 ,  0 , 42},
-    {  0 ,  0 , 84},
-    {  0 ,  0 ,127},
-    {  0 ,  0 ,170},
-    {  0 ,  0 ,212},
-    {  0 ,  0 ,255},
-    {  0 ,  0 ,  0},
-    { 36 , 36 , 36},
-    { 72 , 72 , 72},
-    {109 ,109 ,109},
-    {145 ,145 ,145},
-    {182 ,182 ,182},
-    {218 ,218 ,218},
-    {  0 ,113 ,188},
-    { 80 ,182 ,188},
-    {127 ,127 ,  0},
 };
-void draw_bboxes(const cv::Mat& bgr, const std::vector<BoxInfo>& bboxes, object_rect effect_roi)
+std::vector<int> GenerateColorMap(int num_class) {
-{
+  auto colormap = std::vector<int>(3 * num_class, 0);
-    static const char* class_names[] = { "person", "bicycle", "car", "motorcycle", "airplane", "bus",
+  for (int i = 0; i < num_class; ++i) {
-                                        "train", "truck", "boat", "traffic light", "fire hydrant",
+    int j = 0;
-                                        "stop sign", "parking meter", "bench", "bird", "cat", "dog",
+    int lab = i;
-                                        "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
+    while (lab) {
-                                        "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
+      colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j));
-                                        "skis", "snowboard", "sports ball", "kite", "baseball bat",
+      colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j));
-                                        "baseball glove", "skateboard", "surfboard", "tennis racket",
+      colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j));
-                                        "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl",
+      ++j;
-                                        "banana", "apple", "sandwich", "orange", "broccoli", "carrot",
+      lab >>= 3;
-                                        "hot dog", "pizza", "donut", "cake", "chair", "couch",
-                                        "potted plant", "bed", "dining table", "toilet", "tv", "laptop",
-                                        "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
-                                        "toaster", "sink", "refrigerator", "book", "clock", "vase",
-                                        "scissors", "teddy bear", "hair drier", "toothbrush"
-    };
-    cv::Mat image = bgr.clone();
-    int src_w = image.cols;
-    int src_h = image.rows;
-    int dst_w = effect_roi.width;
-    int dst_h = effect_roi.height;
-    float width_ratio = (float)src_w / (float)dst_w;
-    float height_ratio = (float)src_h / (float)dst_h;
-    for (size_t i = 0; i < bboxes.size(); i++)
-    {
-        const BoxInfo& bbox = bboxes[i];
-        cv::Scalar color = cv::Scalar(color_list[bbox.label][0], color_list[bbox.label][1], color_list[bbox.label][2]);
-        cv::rectangle(image, cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, (bbox.y1 - effect_roi.y) * height_ratio),
-                                      cv::Point((bbox.x2 - effect_roi.x) * width_ratio, (bbox.y2 - effect_roi.y) * height_ratio)), color);
-        char text[256];
-        sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
-        int baseLine = 0;
-        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
-        int x = (bbox.x1 - effect_roi.x) * width_ratio;
-        int y = (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine;
-        if (y < 0)
-            y = 0;
-        if (x + label_size.width > image.cols)
-            x = image.cols - label_size.width;
-        cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
-            color, -1);
-        cv::putText(image, text, cv::Point(x, y + label_size.height),
-            cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255));
-    }
-    cv::imwrite("../result/test_picodet.jpg", image);
-    printf("************infer image success!!!**********\n");
-}
-int image_demo(PicoDet &detector, const char* imagepath)
-{
-    std::vector<std::string> filenames;
-    cv::glob(imagepath, filenames, false);
-    for (auto img_name : filenames)
-    {
-        cv::Mat image = cv::imread(img_name);
-        if (image.empty())
-        {
-            fprintf(stderr, "cv::imread %s failed\n", img_name);
-            return -1;
-        }
-        object_rect effect_roi;
-        cv::Mat resized_img;
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
-        auto results = detector.detect(resized_img, 0.4, 0.5);
-        char imgName[20] = {};
-        draw_bboxes(image, results, effect_roi);
-        cv::waitKey(0);
    }
-    return 0;
+  }
+  return colormap;
 }
-int webcam_demo(PicoDet& detector, int cam_id)
+void draw_bboxes(const cv::Mat &im, const std::vector<BoxInfo> &bboxes,
-{
+                 std::string save_path = "None") {
-    cv::Mat image;
+  static const char *class_names[] = {
-    cv::VideoCapture cap(cam_id);
+      "person",        "bicycle",      "car",
+      "motorcycle",    "airplane",     "bus",
-    while (true)
+      "train",         "truck",        "boat",
-    {
+      "traffic light", "fire hydrant", "stop sign",
-        cap >> image;
+      "parking meter", "bench",        "bird",
-        object_rect effect_roi;
+      "cat",           "dog",          "horse",
-        cv::Mat resized_img;
+      "sheep",         "cow",          "elephant",
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
+      "bear",          "zebra",        "giraffe",
-        auto results = detector.detect(resized_img, 0.4, 0.5);
+      "backpack",      "umbrella",     "handbag",
-        draw_bboxes(image, results, effect_roi);
+      "tie",           "suitcase",     "frisbee",
-        cv::waitKey(1);
+      "skis",          "snowboard",    "sports ball",
-    }
+      "kite",          "baseball bat", "baseball glove",
-    return 0;
+      "skateboard",    "surfboard",    "tennis racket",
+      "bottle",        "wine glass",   "cup",
+      "fork",          "knife",        "spoon",
+      "bowl",          "banana",       "apple",
+      "sandwich",      "orange",       "broccoli",
+      "carrot",        "hot dog",      "pizza",
+      "donut",         "cake",         "chair",
+      "couch",         "potted plant", "bed",
+      "dining table",  "toilet",       "tv",
+      "laptop",        "mouse",        "remote",
+      "keyboard",      "cell phone",   "microwave",
+      "oven",          "toaster",      "sink",
+      "refrigerator",  "book",         "clock",
+      "vase",          "scissors",     "teddy bear",
+      "hair drier",    "toothbrush"};
+  cv::Mat image = im.clone();
+  int src_w = image.cols;
+  int src_h = image.rows;
+  int thickness = 2;
+  auto colormap = GenerateColorMap(sizeof(class_names));
+  for (size_t i = 0; i < bboxes.size(); i++) {
+    const BoxInfo &bbox = bboxes[i];
+    std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". "
+              << bbox.y2 << ". " << std::endl;
+    int c1 = colormap[3 * bbox.label + 0];
+    int c2 = colormap[3 * bbox.label + 1];
+    int c3 = colormap[3 * bbox.label + 2];
+    cv::Scalar color = cv::Scalar(c1, c2, c3);
+    // cv::Scalar color = cv::Scalar(0, 0, 255);
+    cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1),
+                                  cv::Point(bbox.x2, bbox.y2)),
+                  color, 1);
+    char text[256];
+    sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
+    int baseLine = 0;
+    cv::Size label_size =
+        cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
+    int x = bbox.x1;
+    int y = bbox.y1 - label_size.height - baseLine;
+    if (y < 0)
+      y = 0;
+    if (x + label_size.width > image.cols)
+      x = image.cols - label_size.width;
+    cv::rectangle(image, cv::Rect(cv::Point(x, y),
+                                  cv::Size(label_size.width,
+                                           label_size.height + baseLine)),
+                  color, -1);
+    cv::putText(image, text, cv::Point(x, y + label_size.height),
+                cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1);
+  }
+  if (save_path == "None") {
+    cv::imshow("image", image);
+  } else {
+    cv::imwrite(save_path, image);
+    std::cout << "Result save in: " << save_path << std::endl;
+  }
 }
-int video_demo(PicoDet& detector, const char* path)
+int image_demo(PicoDet &detector, const char *imagepath,
-{
+               int has_postprocess = 0) {
-    cv::Mat image;
+  std::vector<cv::String> filenames;
-    cv::VideoCapture cap(path);
+  cv::glob(imagepath, filenames, false);
+  bool is_postprocess = has_postprocess > 0 ? true : false;
-    while (true)
+  for (auto img_name : filenames) {
-    {
+    cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR);
-        cap >> image;
+    if (image.empty()) {
-        object_rect effect_roi;
+      fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
-        cv::Mat resized_img;
+      return -1;
-        resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
-        auto results = detector.detect(resized_img, 0.4, 0.5);
-        draw_bboxes(image, results, effect_roi);
-        cv::waitKey(1);
    }
-    return 0;
+    std::vector<BoxInfo> results;
+    detector.detect(image, results, is_postprocess);
+    std::cout << "detect done." << std::endl;
+#ifdef __SAVE_RESULT__
+    std::string save_path = img_name;
+    draw_bboxes(image, results, save_path.replace(3, 4, "results"));
+#else
+    draw_bboxes(image, results);
+    cv::waitKey(0);
+#endif
+  }
+  return 0;
 }
-int benchmark(PicoDet& detector)
+int benchmark(PicoDet &detector, int width, int height,
-{
+              int has_postprocess = 0) {
-    int loop_num = 100;
+  int loop_num = 100;
-    int warm_up = 8;
+  int warm_up = 8;
-    double time_min = DBL_MAX;
+  double time_min = DBL_MAX;
-    double time_max = -DBL_MAX;
+  double time_max = -DBL_MAX;
-    double time_avg = 0;
+  double time_avg = 0;
-    ncnn::Mat input = ncnn::Mat(320, 320, 3);
+  cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1));
-    input.fill(0.01f);
+  bool is_postprocess = has_postprocess > 0 ? true : false;
-    for (int i = 0; i < warm_up + loop_num; i++)
+  for (int i = 0; i < warm_up + loop_num; i++) {
-    {
+    double start = ncnn::get_current_time();
-        double start = ncnn::get_current_time();
+    std::vector<BoxInfo> results;
-        ncnn::Extractor ex = detector.Net->create_extractor();
+    detector.detect(image, results, is_postprocess);
-        ex.input("image", input); // picodet
+    double end = ncnn::get_current_time();
-        for (const auto& head_info : detector.heads_info)
-        {
+    double time = end - start;
-            ncnn::Mat dis_pred;
+    if (i >= warm_up) {
-            ncnn::Mat cls_pred;
+      time_min = (std::min)(time_min, time);
-            ex.extract(head_info.dis_layer.c_str(), dis_pred);
+      time_max = (std::max)(time_max, time);
-            ex.extract(head_info.cls_layer.c_str(), cls_pred);
+      time_avg += time;
-        }
-        double end = ncnn::get_current_time();
-        double time = end - start;
-        if (i >= warm_up)
-        {
-            time_min = (std::min)(time_min, time);
-            time_max = (std::max)(time_max, time);
-            time_avg += time;
-        }
    }
-    time_avg /= loop_num;
+  }
-    fprintf(stderr, "%20s  min = %7.2f  max = %7.2f  avg = %7.2f\n", "picodet", time_min, time_max, time_avg);
+  time_avg /= loop_num;
-    return 0;
+  fprintf(stderr, "%20s  min = %7.2f  max = %7.2f  avg = %7.2f\n", "picodet",
+          time_min, time_max, time_avg);
+  return 0;
 }
+int main(int argc, char **argv) {
-int main(int argc, char** argv)
+  int mode = atoi(argv[1]);
-{
+  char *bin_model_path = argv[2];
-    if (argc != 3)
+  char *param_model_path = argv[3];
-    {
+  int height = 320;
-        fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]);
+  int width = 320;
-        return -1;
+  if (argc == 5) {
-    }
+    height = atoi(argv[4]);
-    PicoDet detector = PicoDet("../weight/picodet_m_416.param", "../weight/picodet_m_416.bin", true);
+    width = atoi(argv[5]);
-    int mode = atoi(argv[1]);
+  }
-    switch (mode)
+  PicoDet detector =
-    {
+      PicoDet(param_model_path, bin_model_path, width, height, true, 0.45, 0.3);
-    case 0:{
+  if (mode == 1) {
-        int cam_id = atoi(argv[2]);
-        webcam_demo(detector, cam_id);
+    benchmark(detector, width, height, atoi(argv[6]));
-        break;
+  } else {
-        }
+    if (argc != 6) {
-    case 1:{
+      std::cout << "Must set image file, such as ./picodet_demo 0 "
-        const char* images = argv[2];
+                   "../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param "
-        image_demo(detector, images);
+                   "320 320 img.jpg"
-        break;
+                << std::endl;
-        }
-    case 2:{
-        const char* path = argv[2];
-        video_demo(detector, path);
-        break;
-        }
-    case 3:{
-        benchmark(detector);
-        break;
-        }
-    default:{
-        fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]);
-        break;
-        }
    }
+    const char *images = argv[6];
+    image_demo(detector, images, atoi(argv[7]));
+  }
 }
--- a/deploy/third_engine/demo_ncnn/picodet.cpp
+++ b/deploy/third_engine/demo_ncnn/picodet.cpp
@@ -48,7 +48,9 @@ int activation_function_softmax(const _Tp *src, _Tp *dst, int length) {
 bool PicoDet::hasGPU = false;
 PicoDet *PicoDet::detector = nullptr;
-PicoDet::PicoDet(const char *param, const char *bin, bool useGPU) {
+PicoDet::PicoDet(const char *param, const char *bin, int input_width,
+                 int input_hight, bool useGPU, float score_threshold_ = 0.5,
+                 float nms_threshold_ = 0.3) {
  this->Net = new ncnn::Net();
 #if NCNN_VULKAN
  this->hasGPU = ncnn::get_gpu_count() > 0;
@@ -57,21 +59,28 @@ PicoDet::PicoDet(const char *param, const char *bin, bool useGPU) {
  this->Net->opt.use_fp16_arithmetic = true;
  this->Net->load_param(param);
  this->Net->load_model(bin);
+  this->in_w = input_width;
+  this->in_h = input_hight;
+  this->score_threshold = score_threshold_;
+  this->nms_threshold = nms_threshold_;
 }
 PicoDet::~PicoDet() { delete this->Net; }
 void PicoDet::preprocess(cv::Mat &image, ncnn::Mat &in) {
+  // cv::resize(image, image, cv::Size(this->in_w, this->in_h), 0.f, 0.f);
  int img_w = image.cols;
  int img_h = image.rows;
-  in = ncnn::Mat::from_pixels(image.data, ncnn::Mat::PIXEL_BGR, img_w, img_h);
+  in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR, img_w,
+                                     img_h, this->in_w, this->in_h);
  const float mean_vals[3] = {103.53f, 116.28f, 123.675f};
  const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f};
  in.substract_mean_normalize(mean_vals, norm_vals);
 }
-std::vector<BoxInfo> PicoDet::detect(cv::Mat image, float score_threshold,
+int PicoDet::detect(cv::Mat image, std::vector<BoxInfo> &result_list,
-                                     float nms_threshold) {
+                    bool has_postprocess) {
  ncnn::Mat input;
  preprocess(image, input);
  auto ex = this->Net->create_extractor();
@@ -82,34 +91,76 @@ std::vector<BoxInfo> PicoDet::detect(cv::Mat image, float score_threshold,
 #endif
  ex.input("image", input); // picodet
+  this->image_h = image.rows;
+  this->image_w = image.cols;
  std::vector<std::vector<BoxInfo>> results;
  results.resize(this->num_class);
-  for (const auto &head_info : this->heads_info) {
+  if (has_postprocess) {
    ncnn::Mat dis_pred;
    ncnn::Mat cls_pred;
-    ex.extract(head_info.dis_layer.c_str(), dis_pred);
+    ex.extract(this->nms_heads_info[0].c_str(), dis_pred);
-    ex.extract(head_info.cls_layer.c_str(), cls_pred);
+    ex.extract(this->nms_heads_info[1].c_str(), cls_pred);
-    this->decode_infer(cls_pred, dis_pred, head_info.stride, score_threshold,
+    std::cout << dis_pred.h << "  " << dis_pred.w << std::endl;
-                       results);
+    std::cout << cls_pred.h << "  " << cls_pred.w << std::endl;
+    this->nms_boxes(cls_pred, dis_pred, this->score_threshold, results);
+  } else {
+    for (const auto &head_info : this->non_postprocess_heads_info) {
+      ncnn::Mat dis_pred;
+      ncnn::Mat cls_pred;
+      ex.extract(head_info.dis_layer.c_str(), dis_pred);
+      ex.extract(head_info.cls_layer.c_str(), cls_pred);
+      this->decode_infer(cls_pred, dis_pred, head_info.stride,
+                         this->score_threshold, results);
+    }
  }
-  std::vector<BoxInfo> dets;
  for (int i = 0; i < (int)results.size(); i++) {
-    this->nms(results[i], nms_threshold);
+    this->nms(results[i], this->nms_threshold);
    for (auto box : results[i]) {
-      dets.push_back(box);
+      box.x1 = box.x1 / this->in_w * this->image_w;
+      box.x2 = box.x2 / this->in_w * this->image_w;
+      box.y1 = box.y1 / this->in_h * this->image_h;
+      box.y2 = box.y2 / this->in_h * this->image_h;
+      result_list.push_back(box);
+    }
+  }
+  return 0;
+}
+void PicoDet::nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred,
+                        float score_threshold,
+                        std::vector<std::vector<BoxInfo>> &result_list) {
+  BoxInfo bbox;
+  int i, j;
+  for (i = 0; i < dis_pred.h; i++) {
+    bbox.x1 = dis_pred.row(i)[0];
+    bbox.y1 = dis_pred.row(i)[1];
+    bbox.x2 = dis_pred.row(i)[2];
+    bbox.y2 = dis_pred.row(i)[3];
+    const float *scores = cls_pred.row(i);
+    float score = 0;
+    int cur_label = 0;
+    for (int label = 0; label < this->num_class; label++) {
+      float score_ = cls_pred.row(label)[i];
+      if (score_ > score) {
+        score = score_;
+        cur_label = label;
+      }
    }
+    bbox.score = score;
+    bbox.label = cur_label;
+    result_list[cur_label].push_back(bbox);
  }
-  return dets;
 }
 void PicoDet::decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride,
                           float threshold,
                           std::vector<std::vector<BoxInfo>> &results) {
-  int feature_h = ceil((float)this->input_size[1] / stride);
+  int feature_h = ceil((float)this->in_w / stride);
-  int feature_w = ceil((float)this->input_size[0] / stride);
+  int feature_w = ceil((float)this->in_h / stride);
  for (int idx = 0; idx < feature_h * feature_w; idx++) {
    const float *scores = cls_pred.row(idx);
@@ -151,8 +202,8 @@ BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score,
  }
  float xmin = (std::max)(ct_x - dis_pred[0], .0f);
  float ymin = (std::max)(ct_y - dis_pred[1], .0f);
-  float xmax = (std::min)(ct_x + dis_pred[2], (float)this->input_size[0]);
+  float xmax = (std::min)(ct_x + dis_pred[2], (float)this->in_w);
-  float ymax = (std::min)(ct_y + dis_pred[3], (float)this->input_size[1]);
+  float ymax = (std::min)(ct_y + dis_pred[3], (float)this->in_w);
  return BoxInfo{xmin, ymin, xmax, ymax, score, label};
 }

--- a/deploy/third_engine/demo_ncnn/picodet.h
+++ b/deploy/third_engine/demo_ncnn/picodet.h
@@ -16,66 +16,72 @@
 #ifndef PICODET_H
 #define PICODET_H
-#include <opencv2/core/core.hpp>
 #include <net.h>
+#include <opencv2/core/core.hpp>
-typedef struct HeadInfo
+typedef struct NonPostProcessHeadInfo {
-{
+  std::string cls_layer;
-    std::string cls_layer;
+  std::string dis_layer;
-    std::string dis_layer;
+  int stride;
-    int stride;
+} NonPostProcessHeadInfo;
-};
-typedef struct BoxInfo
+typedef struct BoxInfo {
-{
+  float x1;
-    float x1;
+  float y1;
-    float y1;
+  float x2;
-    float x2;
+  float y2;
-    float y2;
+  float score;
-    float score;
+  int label;
-    int label;
 } BoxInfo;
-class PicoDet
+class PicoDet {
-{
 public:
-    PicoDet(const char* param, const char* bin, bool useGPU);
+  PicoDet(const char *param, const char *bin, int input_width, int input_hight,
+          bool useGPU, float score_threshold_, float nms_threshold_);
-    ~PicoDet();
-    static PicoDet* detector;
+  ~PicoDet();
-    ncnn::Net* Net;
-    static bool hasGPU;
-    std::vector<HeadInfo> heads_info{
+  static PicoDet *detector;
-        // cls_pred|dis_pred|stride
+  ncnn::Net *Net;
-        {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8},
+  static bool hasGPU;
-        {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16},
-        {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32},
-        {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64},
-    };
-    std::vector<BoxInfo> detect(cv::Mat image, float score_threshold, float nms_threshold);
+  int detect(cv::Mat image, std::vector<BoxInfo> &result_list,
+             bool has_postprocess);
-    std::vector<std::string> labels{ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
-                                    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
-                                    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
-                                    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
-                                    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
-                                    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
-                                    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
-                                    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
-                                    "hair drier", "toothbrush" };
 private:
-    void preprocess(cv::Mat& image, ncnn::Mat& in);
+  void preprocess(cv::Mat &image, ncnn::Mat &in);
-    void decode_infer(ncnn::Mat& cls_pred, ncnn::Mat& dis_pred, int stride, float threshold, std::vector<std::vector<BoxInfo>>& results);
+  void decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride,
-    BoxInfo disPred2Bbox(const float*& dfl_det, int label, float score, int x, int y, int stride);
+                    float threshold,
-    static void nms(std::vector<BoxInfo>& result, float nms_threshold);
+                    std::vector<std::vector<BoxInfo>> &results);
-    int input_size[2] = {320, 320};
+  BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x,
-    int num_class = 80;
+                       int y, int stride);
-    int reg_max = 7;
+  static void nms(std::vector<BoxInfo> &result, float nms_threshold);
+  void nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred,
+                 float score_threshold,
+                 std::vector<std::vector<BoxInfo>> &result_list);
-};
+  int image_w;
+  int image_h;
+  int in_w = 320;
+  int in_h = 320;
+  int num_class = 80;
+  int reg_max = 7;
+  float score_threshold;
+  float nms_threshold;
+  std::vector<float> bbox_output_data_;
+  std::vector<float> class_output_data_;
+  std::vector<std::string> nms_heads_info{"tmp_16", "concat_4.tmp_0"};
+  // If not export post-process, will use non_postprocess_heads_info
+  std::vector<NonPostProcessHeadInfo> non_postprocess_heads_info{
+      // cls_pred|dis_pred|stride
+      {"transpose_0.tmp_0", "transpose_1.tmp_0", 8},
+      {"transpose_2.tmp_0", "transpose_3.tmp_0", 16},
+      {"transpose_4.tmp_0", "transpose_5.tmp_0", 32},
+      {"transpose_6.tmp_0", "transpose_7.tmp_0", 64},
+  };
+};
 #endif
--- a/deploy/third_engine/demo_ncnn/python/demo_ncnn.py
+++ b/deploy/third_engine/demo_ncnn/python/demo_ncnn.py