f6969885 · f6969885 · f6969885 · f6969885 · f6969885 · f6969885
7 changed file
--- a/README.md
+++ b/README.md
@@ -21,30 +21,7 @@ Guidelines:

 ## Models & Benchmark Results

-| Model                                                   | Task                          | Input Size | CPU-INTEL (ms) | CPU-RPI (ms) | GPU-JETSON (ms) | NPU-KV3 (ms) | NPU-Ascend310 (ms) | CPU-D1 (ms) |
-| ------------------------------------------------------- | ----------------------------- | ---------- | -------------- | ------------ | --------------- | ------------ | ------------------ | ----------- |
-| [YuNet](./models/face_detection_yunet)                  | Face Detection                | 160x120    | 0.72           | 5.43         | 12.18           | 4.04         | 2.24               | 86.69       |
-| [SFace](./models/face_recognition_sface)                | Face Recognition              | 112x112    | 6.04           | 78.83        | 24.88           | 46.25        | 2.66               | ---         |
-| [FER](./models/facial_expression_recognition/)          | Facial Expression Recognition | 112x112    | 3.16           | 32.53        | 31.07           | 29.80        | 2.19               | ---         |
-| [LPD-YuNet](./models/license_plate_detection_yunet/)    | License Plate Detection       | 320x240    | 8.63           | 167.70       | 56.12           | 29.53        | 7.63               | ---         |
-| [YOLOX](./models/object_detection_yolox/)               | Object Detection              | 640x640    | 141.20         | 1805.87      | 388.95          | 420.98       | 28.59              | ---         |
-| [NanoDet](./models/object_detection_nanodet/)           | Object Detection              | 416x416    | 66.03          | 225.10       | 64.94           | 116.64       | 20.62              | ---         |
-| [DB-IC15](./models/text_detection_db) (EN)              | Text Detection                | 640x480    | 71.03          | 1862.75      | 208.41          | ---          | 17.15              | ---         |
-| [DB-TD500](./models/text_detection_db) (EN&CN)          | Text Detection                | 640x480    | 72.31          | 1878.45      | 210.51          | ---          | 17.95              | ---         |
-| [CRNN-EN](./models/text_recognition_crnn)               | Text Recognition              | 100x32     | 20.16          | 278.11       | 196.15          | 125.30       | ---                | ---         |
-| [CRNN-CN](./models/text_recognition_crnn)               | Text Recognition              | 100x32     | 23.07          | 297.48       | 239.76          | 166.79       | ---                | ---         |
-| [PP-ResNet](./models/image_classification_ppresnet)     | Image Classification          | 224x224    | 34.71          | 463.93       | 98.64           | 75.45        | 6.99               | ---         |
-| [MobileNet-V1](./models/image_classification_mobilenet) | Image Classification          | 224x224    | 5.90           | 72.33        | 33.18           | 145.66\*     | 5.15               | ---         |
-| [MobileNet-V2](./models/image_classification_mobilenet) | Image Classification          | 224x224    | 5.97           | 66.56        | 31.92           | 146.31\*     | 5.41               | ---         |
-| [PP-HumanSeg](./models/human_segmentation_pphumanseg)   | Human Segmentation            | 192x192    | 8.81           | 73.13        | 67.97           | 74.77        | 6.94               | ---         |
-| [WeChatQRCode](./models/qrcode_wechatqrcode)            | QR Code Detection and Parsing | 100x100    | 1.29           | 5.71         | ---             | ---          | ---                | ---         |
-| [DaSiamRPN](./models/object_tracking_dasiamrpn)         | Object Tracking               | 1280x720   | 29.05          | 712.94       | 76.82           | ---          | ---                | ---         |
-| [YoutuReID](./models/person_reid_youtureid)             | Person Re-Identification      | 128x256    | 30.39          | 625.56       | 90.07           | 44.61        | 5.58               | ---         |
-| [MP-PalmDet](./models/palm_detection_mediapipe)         | Palm Detection                | 192x192    | 6.29           | 86.83        | 83.20           | 33.81        | 5.17               | ---         |
-| [MP-HandPose](./models/handpose_estimation_mediapipe)   | Hand Pose Estimation          | 224x224    | 4.68           | 43.57        | 40.10           | 19.47        | 6.27               | ---         |
-| [MP-PersonDet](./models/person_detection_mediapipe)     | Person Detection              | 224x224    | 13.88          | 98.52        | 56.69           | ---          | 16.45              | ---         |
-
-\*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.
+![](benchmark/color_table.svg?raw=true)

 Hardware Setup:


--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -57,6 +57,32 @@ python benchmark.py --all --cfg_overwrite_backend_target 1

 Benchmark is done with latest `opencv-python==4.7.0.72` and `opencv-contrib-python==4.7.0.72` on the following platforms. Some models are excluded because of support issues.

+
+| Model                                                    | Task                          | Input Size | [CPU-INTEL (ms)](#intel-12700k) | [CPU-RPI (ms)](#rasberry-pi-4b) | [GPU-JETSON (ms)](#jetson-nano-b01) | [NPU-KV3 (ms)](#khadas-vim3) | [NPU-Ascend310 (ms)](#atlas-200-dk) | CPU-D1 (ms) |
+|----------------------------------------------------------| ----------------------------- | ---------- |---------------------------------|---------------------------------|-------------------------------------|------------------------------|-------------------------------------|-------------|
+| [YuNet](../models/face_detection_yunet)                  | Face Detection                | 160x120    | 0.72                            | 5.43                            | 12.18                               | 4.04                         | 2.24                                | 86.69       |
+| [SFace](../models/face_recognition_sface)                | Face Recognition              | 112x112    | 6.04                            | 78.83                           | 24.88                               | 46.25                        | 2.66                                | ---         |
+| [FER](../models/facial_expression_recognition/)          | Facial Expression Recognition | 112x112    | 3.16                            | 32.53                           | 31.07                               | 29.80                        | 2.19                                | ---         |
+| [LPD-YuNet](../models/license_plate_detection_yunet/)    | License Plate Detection       | 320x240    | 8.63                            | 167.70                          | 56.12                               | 29.53                        | 7.63                                | ---         |
+| [YOLOX](../models/object_detection_yolox/)               | Object Detection              | 640x640    | 141.20                          | 1805.87                         | 388.95                              | 420.98                       | 28.59                               | ---         |
+| [NanoDet](../models/object_detection_nanodet/)           | Object Detection              | 416x416    | 66.03                           | 225.10                          | 64.94                               | 116.64                       | 20.62                               | ---         |
+| [DB-IC15](../models/text_detection_db) (EN)              | Text Detection                | 640x480    | 71.03                           | 1862.75                         | 208.41                              | ---                          | 17.15                               | ---         |
+| [DB-TD500](../models/text_detection_db) (EN&CN)          | Text Detection                | 640x480    | 72.31                           | 1878.45                         | 210.51                              | ---                          | 17.95                               | ---         |
+| [CRNN-EN](../models/text_recognition_crnn)               | Text Recognition              | 100x32     | 20.16                           | 278.11                          | 196.15                              | 125.30                       | ---                                 | ---         |
+| [CRNN-CN](../models/text_recognition_crnn)               | Text Recognition              | 100x32     | 23.07                           | 297.48                          | 239.76                              | 166.79                       | ---                                 | ---         |
+| [PP-ResNet](../models/image_classification_ppresnet)     | Image Classification          | 224x224    | 34.71                           | 463.93                          | 98.64                               | 75.45                        | 6.99                                | ---         |
+| [MobileNet-V1](../models/image_classification_mobilenet) | Image Classification          | 224x224    | 5.90                            | 72.33                           | 33.18                               | 145.66\*                     | 5.15                                | ---         |
+| [MobileNet-V2](../models/image_classification_mobilenet) | Image Classification          | 224x224    | 5.97                            | 66.56                           | 31.92                               | 146.31\*                     | 5.41                                | ---         |
+| [PP-HumanSeg](../models/human_segmentation_pphumanseg)   | Human Segmentation            | 192x192    | 8.81                            | 73.13                           | 67.97                               | 74.77                        | 6.94                                | ---         |
+| [WeChatQRCode](../models/qrcode_wechatqrcode)            | QR Code Detection and Parsing | 100x100    | 1.29                            | 5.71                            | ---                                 | ---                          | ---                                 | ---         |
+| [DaSiamRPN](../models/object_tracking_dasiamrpn)         | Object Tracking               | 1280x720   | 29.05                           | 712.94                          | 76.82                               | ---                          | ---                                 | ---         |
+| [YoutuReID](../models/person_reid_youtureid)             | Person Re-Identification      | 128x256    | 30.39                           | 625.56                          | 90.07                               | 44.61                        | 5.58                                | ---         |
+| [MP-PalmDet](../models/palm_detection_mediapipe)         | Palm Detection                | 192x192    | 6.29                            | 86.83                           | 83.20                               | 33.81                        | 5.17                                | ---         |
+| [MP-HandPose](../models/handpose_estimation_mediapipe)   | Hand Pose Estimation          | 224x224    | 4.68                            | 43.57                           | 40.10                               | 19.47                        | 6.27                                | ---         |
+| [MP-PersonDet](./models/person_detection_mediapipe)      | Person Detection              | 224x224    | 13.88                           | 98.52                           | 56.69                               | ---                          | 16.45                               | ---         |
+
+\*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.
+
 ### Intel 12700K

 Specs: [details](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)

--- a/benchmark/color_table.svg
+++ b/benchmark/color_table.svg
--- a/benchmark/generate_table.py
+++ b/benchmark/generate_table.py
+import re
+import matplotlib.pyplot as plt
+import matplotlib as mpl
+import numpy as np
+
+mpl.use("svg")
+
+# parse a '.md' file and find a table. return table information
+def parse_table(filepath):
+    with open(filepath, "r", encoding="utf-8") as f:
+        content = f.read()
+    lines = content.split("\n")
+
+    header = []
+    body = []
+
+    found_start = False  # if found table start line
+    parse_done = False  # if parse table done
+    for l in lines:
+        if found_start and parse_done:
+            break
+        l = l.strip()
+        if not l:
+            continue
+        if l.startswith("|") and l.endswith("|"):
+            if not found_start:
+                found_start = True
+            row = [c.strip() for c in l.split("|") if c.strip()]
+            if not header:
+                header = row
+            else:
+                body.append(row)
+        elif found_start:
+            parse_done = True
+    return header, body
+
+
+# parse models information
+def parse_data(models_info):
+    min_list = []
+    max_list = []
+    colors = []
+    for model in models_info:
+        # remove \*
+        data = [x.replace("\\*", "") for x in model]
+        # get max data
+        max_data = -1
+        max_idx = -1
+        min_data = 9999999
+        min_idx = -1
+
+        for i in range(len(data)):
+            try:
+                d = float(data[i])
+                if d > max_data:
+                    max_data = d
+                    max_idx = i
+                if d < min_data:
+                    min_data = d
+                    min_idx = i
+            except:
+                pass
+
+        min_list.append(min_idx)
+        max_list.append(max_idx)
+
+        # calculate colors
+        color = []
+        for t in data:
+            try:
+                t = (float(t) - min_data) / (max_data - min_data)
+                color.append(cmap(t))
+            except:
+                color.append('white')
+        colors.append(color)
+    return colors, min_list, max_list
+
+
+if __name__ == '__main__':
+    hardware_info, models_info = parse_table("./README.md")
+    cmap = mpl.colormaps.get_cmap("RdYlGn_r")
+    # remove empty line
+    models_info.pop(0)
+    # remove reference
+    hardware_info = [re.sub(r'\[(.+?)]\(.+?\)', r'\1', r) for r in hardware_info]
+    models_info = [[re.sub(r'\[(.+?)]\(.+?\)', r'\1', c) for c in r] for r in models_info]
+
+    table_colors, min_list, max_list = parse_data(models_info)
+    table_texts = [hardware_info] + models_info
+    table_colors = [['white'] * len(hardware_info)] + table_colors
+    # create a color bar. base width set to 1000, color map height set to 80
+    fig, axs = plt.subplots(nrows=3, figsize=(10, 0.8))
+    gradient = np.linspace(0, 1, 256)
+    gradient = np.vstack((gradient, gradient))
+    axs[0].imshow(gradient, aspect='auto', cmap=cmap)
+    axs[0].text(-0.01, 0.5, "Faster", va='center', ha='right', fontsize=11, transform=axs[0].transAxes)
+    axs[0].text(1.01, 0.5, "Slower", va='center', ha='left', fontsize=11, transform=axs[0].transAxes)
+
+    # initialize a table
+    table = axs[1].table(cellText=table_texts,
+                         cellColours=table_colors,
+                         cellLoc="left",
+                         loc="upper left")
+
+    # adjust table position
+    table_pos = axs[1].get_position()
+    axs[1].set_position([
+        table_pos.x0,
+        table_pos.y0 - table_pos.height,
+        table_pos.width,
+        table_pos.height
+    ])
+
+    table.set_fontsize(11)
+    table.auto_set_font_size(False)
+    table.scale(1, 2)
+    table.auto_set_column_width(list(range(len(table_texts[0]))))
+    table.AXESPAD = 0  # cancel padding
+
+    # highlight the best number
+    for i in range(len(min_list)):
+        cell = table.get_celld()[(i + 1, min_list[i])]
+        cell.set_text_props(weight='bold', color='white')
+
+    table_height = 0
+    table_width = 0
+    # calculate table height and width
+    for i in range(len(table_texts)):
+        cell = table.get_celld()[(i, 0)]
+        table_height += cell.get_height()
+    for i in range(len(table_texts[0])):
+        cell = table.get_celld()[(0, i)]
+        table_width += cell.get_width() + 0.1
+
+    # add notes for table
+    axs[2].text(0, -table_height - 0.8, "\*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.", va='bottom', ha='left', fontsize=11, transform=axs[1].transAxes)
+
+    # turn off labels
+    for ax in axs:
+        ax.set_axis_off()
+        ax.set_xticks([])
+        ax.set_yticks([])
+
+    # adjust color map position to center
+    cm_pos = axs[0].get_position()
+    axs[0].set_position([
+        (table_width - 1) / 2,
+        cm_pos.y0,
+        cm_pos.width,
+        cm_pos.height
+    ])
+
+    plt.rcParams['svg.fonttype'] = 'none'
+    plt.savefig("./color_table.svg", format='svg', bbox_inches="tight", pad_inches=0, metadata={'Date': None, 'Creator': None})
--- a/benchmark/requirements.txt
+++ b/benchmark/requirements.txt
 numpy
-opencv-python==4.5.4.58
+opencv-python<5.0
 pyyaml
-requests
\ No newline at end of file
+requests
+matplotlib>=3.7.1
\ No newline at end of file
--- a/models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx
+++ b/models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx
--- a/models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx
+++ b/models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx