未验证 提交 05a07912 编写于 作者: W Wanli 提交者: GitHub

Update handpose estimation model from MediaPipe (2023feb) (#133)

* update handpose model

* update quantize model

* fix quantize path

* update readme of quantization and benchmark result

* fix document
上级 d7232530
...@@ -35,7 +35,7 @@ Guidelines: ...@@ -35,7 +35,7 @@ Guidelines:
| [DaSiamRPN](./models/object_tracking_dasiamrpn) | Object Tracking | 1280x720 | 36.15 | 705.48 | 76.82 | --- | --- | | [DaSiamRPN](./models/object_tracking_dasiamrpn) | Object Tracking | 1280x720 | 36.15 | 705.48 | 76.82 | --- | --- |
| [YoutuReID](./models/person_reid_youtureid) | Person Re-Identification | 128x256 | 35.81 | 521.98 | 90.07 | 44.61 | --- | | [YoutuReID](./models/person_reid_youtureid) | Person Re-Identification | 128x256 | 35.81 | 521.98 | 90.07 | 44.61 | --- |
| [MP-PalmDet](./models/palm_detection_mediapipe) | Palm Detection | 192x192 | 11.09 | 63.79 | 83.20 | 33.81 | --- | | [MP-PalmDet](./models/palm_detection_mediapipe) | Palm Detection | 192x192 | 11.09 | 63.79 | 83.20 | 33.81 | --- |
| [MP-HandPose](./models/handpose_estimation_mediapipe) | Hand Pose Estimation | 256x256 | 20.16 | 148.24 | 156.30 | 42.70 | --- | | [MP-HandPose](./models/handpose_estimation_mediapipe) | Hand Pose Estimation | 224x224 | 4.28 | 36.19 | 40.10 | 19.47 | --- |
\*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU. \*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.
...@@ -91,7 +91,7 @@ Some examples are listed below. You can find more in the directory of each model ...@@ -91,7 +91,7 @@ Some examples are listed below. You can find more in the directory of each model
### Hand Pose Estimation with [MP-HandPose](models/handpose_estimation_mediapipe/) ### Hand Pose Estimation with [MP-HandPose](models/handpose_estimation_mediapipe/)
![handpose estimation](models/handpose_estimation_mediapipe/examples/mphandpose_demo.gif) ![handpose estimation](models/handpose_estimation_mediapipe/examples/mphandpose_demo.webp)
### QR Code Detection and Parsing with [WeChatQRCode](./models/qrcode_wechatqrcode/) ### QR Code Detection and Parsing with [WeChatQRCode](./models/qrcode_wechatqrcode/)
......
...@@ -5,7 +5,7 @@ Benchmark: ...@@ -5,7 +5,7 @@ Benchmark:
path: "data/palm_detection_20230125" path: "data/palm_detection_20230125"
files: ["palm1.jpg", "palm2.jpg", "palm3.jpg"] files: ["palm1.jpg", "palm2.jpg", "palm3.jpg"]
sizes: # [[w1, h1], ...], Omit to run at original scale sizes: # [[w1, h1], ...], Omit to run at original scale
- [256, 256] - [224, 224]
metric: metric:
warmup: 30 warmup: 30
repeat: 10 repeat: 10
......
...@@ -4,11 +4,14 @@ This model estimates 21 hand keypoints per detected hand from [palm detector](.. ...@@ -4,11 +4,14 @@ This model estimates 21 hand keypoints per detected hand from [palm detector](..
![MediaPipe Hands Keypoints](./examples/hand_keypoints.png) ![MediaPipe Hands Keypoints](./examples/hand_keypoints.png)
This model is converted from Tensorflow-JS to ONNX using following tools: This model is converted from TFlite to ONNX using following tools:
- tfjs to tf_saved_model: https://github.com/patlevin/tfjs-to-tf/ - TFLite model to ONNX: https://github.com/onnx/tensorflow-onnx
- tf_saved_model to ONNX: https://github.com/onnx/tensorflow-onnx
- simplified by [onnx-simplifier](https://github.com/daquexian/onnx-simplifier) - simplified by [onnx-simplifier](https://github.com/daquexian/onnx-simplifier)
**Note**:
- The int8-quantized model may produce invalid results due to a significant drop of accuracy.
- Visit https://google.github.io/mediapipe/solutions/models.html#hands for models of larger scale.
## Demo ## Demo
Run the following commands to try the demo: Run the following commands to try the demo:
...@@ -21,7 +24,7 @@ python demo.py -i /path/to/image ...@@ -21,7 +24,7 @@ python demo.py -i /path/to/image
### Example outputs ### Example outputs
![webcam demo](./examples/mphandpose_demo.gif) ![webcam demo](./examples/mphandpose_demo.webp)
## License ## License
...@@ -30,3 +33,5 @@ All files in this directory are licensed under [Apache 2.0 License](./LICENSE). ...@@ -30,3 +33,5 @@ All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
## Reference ## Reference
- MediaPipe Handpose: https://github.com/tensorflow/tfjs-models/tree/master/handpose - MediaPipe Handpose: https://github.com/tensorflow/tfjs-models/tree/master/handpose
- MediaPipe hands model and model card: https://google.github.io/mediapipe/solutions/models.html#hands
- Int8 model quantized with rgb evaluation set of FreiHAND: https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html
...@@ -31,69 +31,126 @@ except: ...@@ -31,69 +31,126 @@ except:
parser = argparse.ArgumentParser(description='Hand Pose Estimation from MediaPipe') parser = argparse.ArgumentParser(description='Hand Pose Estimation from MediaPipe')
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.') parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
parser.add_argument('--model', '-m', type=str, default='./handpose_estimation_mediapipe_2022may.onnx', help='Path to the model.') parser.add_argument('--model', '-m', type=str, default='./handpose_estimation_mediapipe_2023feb.onnx', help='Path to the model.')
parser.add_argument('--backend', '-b', type=int, default=backends[0], help=help_msg_backends.format(*backends)) parser.add_argument('--backend', '-b', type=int, default=backends[0], help=help_msg_backends.format(*backends))
parser.add_argument('--target', '-t', type=int, default=targets[0], help=help_msg_targets.format(*targets)) parser.add_argument('--target', '-t', type=int, default=targets[0], help=help_msg_targets.format(*targets))
parser.add_argument('--conf_threshold', type=float, default=0.8, help='Filter out hands of confidence < conf_threshold.') parser.add_argument('--conf_threshold', type=float, default=0.9, help='Filter out hands of confidence < conf_threshold.')
parser.add_argument('--save', '-s', type=str, default=False, help='Set true to save results. This flag is invalid when using camera.') parser.add_argument('--save', '-s', type=str, default=False, help='Set true to save results. This flag is invalid when using camera.')
parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.') parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.')
args = parser.parse_args() args = parser.parse_args()
def visualize(image, hands, print_result=False): def visualize(image, hands, print_result=False):
output = image.copy() display_screen = image.copy()
display_3d = np.zeros((400, 400, 3), np.uint8)
cv.line(display_3d, (200, 0), (200, 400), (255, 255, 255), 2)
cv.line(display_3d, (0, 200), (400, 200), (255, 255, 255), 2)
cv.putText(display_3d, 'Main View', (0, 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
cv.putText(display_3d, 'Top View', (200, 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
cv.putText(display_3d, 'Left View', (0, 212), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
cv.putText(display_3d, 'Right View', (200, 212), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
is_draw = False # ensure only one hand is drawn
def draw_lines(image, landmarks, is_draw_point=True, thickness=2):
cv.line(image, landmarks[0], landmarks[1], (255, 255, 255), thickness)
cv.line(image, landmarks[1], landmarks[2], (255, 255, 255), thickness)
cv.line(image, landmarks[2], landmarks[3], (255, 255, 255), thickness)
cv.line(image, landmarks[3], landmarks[4], (255, 255, 255), thickness)
cv.line(image, landmarks[0], landmarks[5], (255, 255, 255), thickness)
cv.line(image, landmarks[5], landmarks[6], (255, 255, 255), thickness)
cv.line(image, landmarks[6], landmarks[7], (255, 255, 255), thickness)
cv.line(image, landmarks[7], landmarks[8], (255, 255, 255), thickness)
cv.line(image, landmarks[0], landmarks[9], (255, 255, 255), thickness)
cv.line(image, landmarks[9], landmarks[10], (255, 255, 255), thickness)
cv.line(image, landmarks[10], landmarks[11], (255, 255, 255), thickness)
cv.line(image, landmarks[11], landmarks[12], (255, 255, 255), thickness)
cv.line(image, landmarks[0], landmarks[13], (255, 255, 255), thickness)
cv.line(image, landmarks[13], landmarks[14], (255, 255, 255), thickness)
cv.line(image, landmarks[14], landmarks[15], (255, 255, 255), thickness)
cv.line(image, landmarks[15], landmarks[16], (255, 255, 255), thickness)
cv.line(image, landmarks[0], landmarks[17], (255, 255, 255), thickness)
cv.line(image, landmarks[17], landmarks[18], (255, 255, 255), thickness)
cv.line(image, landmarks[18], landmarks[19], (255, 255, 255), thickness)
cv.line(image, landmarks[19], landmarks[20], (255, 255, 255), thickness)
if is_draw_point:
for p in landmarks:
cv.circle(image, p, thickness, (0, 0, 255), -1)
for idx, handpose in enumerate(hands): for idx, handpose in enumerate(hands):
conf = handpose[-1] conf = handpose[-1]
bbox = handpose[0:4].astype(np.int32) bbox = handpose[0:4].astype(np.int32)
landmarks = handpose[4:-1].reshape(21, 2).astype(np.int32) handedness = handpose[-2]
if handedness <= 0.5:
handedness_text = 'Left'
else:
handedness_text = 'Right'
landmarks_screen = handpose[4:67].reshape(21, 3).astype(np.int32)
landmarks_word = handpose[67:130].reshape(21, 3)
# Print results # Print results
if print_result: if print_result:
print('-----------hand {}-----------'.format(idx + 1)) print('-----------hand {}-----------'.format(idx + 1))
print('conf: {:.2f}'.format(conf)) print('conf: {:.2f}'.format(conf))
print('handedness: {}'.format(handedness_text))
print('hand box: {}'.format(bbox)) print('hand box: {}'.format(bbox))
print('hand landmarks: ') print('hand landmarks: ')
for l in landmarks: for l in landmarks_screen:
print('\t{}'.format(l))
print('hand world landmarks: ')
for l in landmarks_word:
print('\t{}'.format(l)) print('\t{}'.format(l))
# draw box
cv.rectangle(display_screen, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (0, 255, 0), 2)
# draw handedness
cv.putText(display_screen, '{}'.format(handedness_text), (bbox[0], bbox[1] + 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
# Draw line between each key points # Draw line between each key points
cv.line(output, landmarks[0], landmarks[1], (255, 255, 255), 2) landmarks_xy = landmarks_screen[:, 0:2]
cv.line(output, landmarks[1], landmarks[2], (255, 255, 255), 2) draw_lines(display_screen, landmarks_xy, is_draw_point=False)
cv.line(output, landmarks[2], landmarks[3], (255, 255, 255), 2)
cv.line(output, landmarks[3], landmarks[4], (255, 255, 255), 2) # z value is relative to WRIST
for p in landmarks_screen:
cv.line(output, landmarks[0], landmarks[5], (255, 255, 255), 2) r = max(5 - p[2] // 5, 0)
cv.line(output, landmarks[5], landmarks[6], (255, 255, 255), 2) r = min(r, 14)
cv.line(output, landmarks[6], landmarks[7], (255, 255, 255), 2) cv.circle(display_screen, np.array([p[0], p[1]]), r, (0, 0, 255), -1)
cv.line(output, landmarks[7], landmarks[8], (255, 255, 255), 2)
if is_draw is False:
cv.line(output, landmarks[0], landmarks[9], (255, 255, 255), 2) is_draw = True
cv.line(output, landmarks[9], landmarks[10], (255, 255, 255), 2) # Main view
cv.line(output, landmarks[10], landmarks[11], (255, 255, 255), 2) landmarks_xy = landmarks_word[:, [0, 1]]
cv.line(output, landmarks[11], landmarks[12], (255, 255, 255), 2) landmarks_xy = (landmarks_xy * 1000 + 100).astype(np.int32)
draw_lines(display_3d, landmarks_xy, thickness=5)
cv.line(output, landmarks[0], landmarks[13], (255, 255, 255), 2)
cv.line(output, landmarks[13], landmarks[14], (255, 255, 255), 2) # Top view
cv.line(output, landmarks[14], landmarks[15], (255, 255, 255), 2) landmarks_xz = landmarks_word[:, [0, 2]]
cv.line(output, landmarks[15], landmarks[16], (255, 255, 255), 2) landmarks_xz[:, 1] = -landmarks_xz[:, 1]
landmarks_xz = (landmarks_xz * 1000 + np.array([300, 100])).astype(np.int32)
cv.line(output, landmarks[0], landmarks[17], (255, 255, 255), 2) draw_lines(display_3d, landmarks_xz, thickness=5)
cv.line(output, landmarks[17], landmarks[18], (255, 255, 255), 2)
cv.line(output, landmarks[18], landmarks[19], (255, 255, 255), 2) # Left view
cv.line(output, landmarks[19], landmarks[20], (255, 255, 255), 2) landmarks_yz = landmarks_word[:, [2, 1]]
landmarks_yz[:, 0] = -landmarks_yz[:, 0]
for p in landmarks: landmarks_yz = (landmarks_yz * 1000 + np.array([100, 300])).astype(np.int32)
cv.circle(output, p, 2, (0, 0, 255), 2) draw_lines(display_3d, landmarks_yz, thickness=5)
return output # Right view
landmarks_zy = landmarks_word[:, [2, 1]]
landmarks_zy = (landmarks_zy * 1000 + np.array([300, 300])).astype(np.int32)
draw_lines(display_3d, landmarks_zy, thickness=5)
return display_screen, display_3d
if __name__ == '__main__': if __name__ == '__main__':
# palm detector # palm detector
palm_detector = MPPalmDet(modelPath='../palm_detection_mediapipe/palm_detection_mediapipe_2023feb.onnx', palm_detector = MPPalmDet(modelPath='../palm_detection_mediapipe/palm_detection_mediapipe_2023feb.onnx',
nmsThreshold=0.3, nmsThreshold=0.3,
scoreThreshold=0.8, scoreThreshold=0.6,
backendId=args.backend, backendId=args.backend,
targetId=args.target) targetId=args.target)
# handpose detector # handpose detector
...@@ -108,7 +165,7 @@ if __name__ == '__main__': ...@@ -108,7 +165,7 @@ if __name__ == '__main__':
# Palm detector inference # Palm detector inference
palms = palm_detector.infer(image) palms = palm_detector.infer(image)
hands = np.empty(shape=(0, 47)) hands = np.empty(shape=(0, 132))
# Estimate the pose of each hand # Estimate the pose of each hand
for palm in palms: for palm in palms:
...@@ -117,10 +174,12 @@ if __name__ == '__main__': ...@@ -117,10 +174,12 @@ if __name__ == '__main__':
if handpose is not None: if handpose is not None:
hands = np.vstack((hands, handpose)) hands = np.vstack((hands, handpose))
# Draw results on the input image # Draw results on the input image
image = visualize(image, hands, True) image, view_3d = visualize(image, hands, True)
if len(palms) == 0: if len(palms) == 0:
print('No palm detected!') print('No palm detected!')
else:
print('Palm detected!')
# Save results # Save results
if args.save: if args.save:
...@@ -131,6 +190,7 @@ if __name__ == '__main__': ...@@ -131,6 +190,7 @@ if __name__ == '__main__':
if args.vis: if args.vis:
cv.namedWindow(args.input, cv.WINDOW_AUTOSIZE) cv.namedWindow(args.input, cv.WINDOW_AUTOSIZE)
cv.imshow(args.input, image) cv.imshow(args.input, image)
cv.imshow('3D HandPose Demo', view_3d)
cv.waitKey(0) cv.waitKey(0)
else: # Omit input to call default camera else: # Omit input to call default camera
deviceId = 0 deviceId = 0
...@@ -145,7 +205,7 @@ if __name__ == '__main__': ...@@ -145,7 +205,7 @@ if __name__ == '__main__':
# Palm detector inference # Palm detector inference
palms = palm_detector.infer(frame) palms = palm_detector.infer(frame)
hands = np.empty(shape=(0, 47)) hands = np.empty(shape=(0, 132))
tm.start() tm.start()
# Estimate the pose of each hand # Estimate the pose of each hand
...@@ -156,12 +216,14 @@ if __name__ == '__main__': ...@@ -156,12 +216,14 @@ if __name__ == '__main__':
hands = np.vstack((hands, handpose)) hands = np.vstack((hands, handpose))
tm.stop() tm.stop()
# Draw results on the input image # Draw results on the input image
frame = visualize(frame, hands) frame, view_3d = visualize(frame, hands)
if len(palms) == 0: if len(palms) == 0:
print('No palm detected!') print('No palm detected!')
else: else:
print('Palm detected!')
cv.putText(frame, 'FPS: {:.2f}'.format(tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255)) cv.putText(frame, 'FPS: {:.2f}'.format(tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
cv.imshow('MediaPipe Handpose Detection Demo', frame) cv.imshow('MediaPipe Handpose Detection Demo', frame)
cv.imshow('3D HandPose Demo', view_3d)
tm.reset() tm.reset()
...@@ -9,7 +9,7 @@ class MPHandPose: ...@@ -9,7 +9,7 @@ class MPHandPose:
self.backend_id = backendId self.backend_id = backendId
self.target_id = targetId self.target_id = targetId
self.input_size = np.array([256, 256]) # wh self.input_size = np.array([224, 224]) # wh
self.PALM_LANDMARK_IDS = [0, 5, 9, 13, 17, 1, 2] self.PALM_LANDMARK_IDS = [0, 5, 9, 13, 17, 1, 2]
self.PALM_LANDMARKS_INDEX_OF_PALM_BASE = 0 self.PALM_LANDMARKS_INDEX_OF_PALM_BASE = 0
self.PALM_LANDMARKS_INDEX_OF_MIDDLE_FINGER_BASE = 2 self.PALM_LANDMARKS_INDEX_OF_MIDDLE_FINGER_BASE = 2
...@@ -115,20 +115,25 @@ class MPHandPose: ...@@ -115,20 +115,25 @@ class MPHandPose:
return results # [bbox_coords, landmarks_coords, conf] return results # [bbox_coords, landmarks_coords, conf]
def _postprocess(self, blob, rotated_palm_bbox, angle, rotation_matrix): def _postprocess(self, blob, rotated_palm_bbox, angle, rotation_matrix):
landmarks, conf = blob landmarks, conf, handedness, landmarks_word = blob
conf = conf[0][0]
if conf < self.conf_threshold: if conf < self.conf_threshold:
return None return None
landmarks = landmarks.reshape(-1, 3) # shape: (1, 63) -> (21, 3) landmarks = landmarks[0].reshape(-1, 3) # shape: (1, 63) -> (21, 3)
landmarks_word = landmarks_word[0].reshape(-1, 3) # shape: (1, 63) -> (21, 3)
# transform coords back to the input coords # transform coords back to the input coords
wh_rotated_palm_bbox = rotated_palm_bbox[1] - rotated_palm_bbox[0] wh_rotated_palm_bbox = rotated_palm_bbox[1] - rotated_palm_bbox[0]
scale_factor = wh_rotated_palm_bbox / self.input_size scale_factor = wh_rotated_palm_bbox / self.input_size
landmarks[:, :2] = (landmarks[:, :2] - self.input_size / 2) * scale_factor landmarks[:, :2] = (landmarks[:, :2] - self.input_size / 2) * scale_factor
landmarks[:, 2] = landmarks[:, 2] * max(scale_factor) # depth scaling
coords_rotation_matrix = cv.getRotationMatrix2D((0, 0), angle, 1.0) coords_rotation_matrix = cv.getRotationMatrix2D((0, 0), angle, 1.0)
rotated_landmarks = np.dot(landmarks[:, :2], coords_rotation_matrix[:, :2]) rotated_landmarks = np.dot(landmarks[:, :2], coords_rotation_matrix[:, :2])
rotated_landmarks = np.c_[rotated_landmarks, landmarks[:, 2]] rotated_landmarks = np.c_[rotated_landmarks, landmarks[:, 2]]
rotated_landmarks_world = np.dot(landmarks_word[:, :2], coords_rotation_matrix[:, :2])
rotated_landmarks_world = np.c_[rotated_landmarks_world, landmarks_word[:, 2]]
# invert rotation # invert rotation
rotation_component = np.array([ rotation_component = np.array([
[rotation_matrix[0][0], rotation_matrix[1][0]], [rotation_matrix[0][0], rotation_matrix[1][0]],
...@@ -144,12 +149,12 @@ class MPHandPose: ...@@ -144,12 +149,12 @@ class MPHandPose:
original_center = np.array([ original_center = np.array([
np.dot(center, inverse_rotation_matrix[0]), np.dot(center, inverse_rotation_matrix[0]),
np.dot(center, inverse_rotation_matrix[1])]) np.dot(center, inverse_rotation_matrix[1])])
landmarks = rotated_landmarks[:, :2] + original_center landmarks[:, :2] = rotated_landmarks[:, :2] + original_center
# get bounding box from rotated_landmarks # get bounding box from rotated_landmarks
bbox = np.array([ bbox = np.array([
np.amin(landmarks, axis=0), np.amin(landmarks[:, :2], axis=0),
np.amax(landmarks, axis=0)]) # [top-left, bottom-right] np.amax(landmarks[:, :2], axis=0)]) # [top-left, bottom-right]
# shift bounding box # shift bounding box
wh_bbox = bbox[1] - bbox[0] wh_bbox = bbox[1] - bbox[0]
shift_vector = self.HAND_BOX_SHIFT_VECTOR * wh_bbox shift_vector = self.HAND_BOX_SHIFT_VECTOR * wh_bbox
...@@ -162,4 +167,9 @@ class MPHandPose: ...@@ -162,4 +167,9 @@ class MPHandPose:
center_bbox - new_half_size, center_bbox - new_half_size,
center_bbox + new_half_size]) center_bbox + new_half_size])
return np.r_[bbox.reshape(-1), landmarks.reshape(-1), conf[0]] # [0: 4]: hand bounding box found in image of format [x1, y1, x2, y2] (top-left and bottom-right points)
# [4: 67]: screen landmarks with format [x1, y1, z1, x2, y2 ... x21, y21, z21], z value is relative to WRIST
# [67: 130]: world landmarks with format [x1, y1, z1, x2, y2 ... x21, y21, z21], 3D metric x, y, z coordinate
# [130]: handedness, (left)[0, 1](right) hand
# [131]: confidence
return np.r_[bbox.reshape(-1), landmarks.reshape(-1), rotated_landmarks_world.reshape(-1), handedness[0][0], conf]
...@@ -7,6 +7,9 @@ This model detects palm bounding boxes and palm landmarks, and is converted from ...@@ -7,6 +7,9 @@ This model detects palm bounding boxes and palm landmarks, and is converted from
- SSD Anchors are generated from [GenMediaPipePalmDectionSSDAnchors](https://github.com/VimalMollyn/GenMediaPipePalmDectionSSDAnchors) - SSD Anchors are generated from [GenMediaPipePalmDectionSSDAnchors](https://github.com/VimalMollyn/GenMediaPipePalmDectionSSDAnchors)
**Note**:
- Visit https://google.github.io/mediapipe/solutions/models.html#hands for models of larger scale.
## Demo ## Demo
Run the following commands to try the demo: Run the following commands to try the demo:
......
...@@ -54,4 +54,4 @@ python quantize-inc.py model1 ...@@ -54,4 +54,4 @@ python quantize-inc.py model1
## Dataset ## Dataset
Some models are quantized with extra datasets. Some models are quantized with extra datasets.
- [MP-PalmDet](../../models/palm_detection_mediapipe) int8 model quantized with evaluation set of [FreiHAND](https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html). The dataset downloaded from [link](https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2_eval.zip). Unpack it and path to `FreiHAND_pub_v2_eval/evaluation/rgb`. - [MP-PalmDet](../../models/palm_detection_mediapipe) and [MP-HandPose](../../models/handpose_estimation_mediapipe) are quantized with evaluation set of [FreiHAND](https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html). Download the dataset from [this link](https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2_eval.zip). Unpack it and replace `path/to/dataset` with the path to `FreiHAND_pub_v2_eval/evaluation/rgb`.
\ No newline at end of file
...@@ -14,7 +14,7 @@ from onnx import version_converter ...@@ -14,7 +14,7 @@ from onnx import version_converter
import onnxruntime import onnxruntime
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType, QuantFormat from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType, QuantFormat
from transform import Compose, Resize, CenterCrop, Normalize, ColorConvert from transform import Compose, Resize, CenterCrop, Normalize, ColorConvert, HandAlign
class DataReader(CalibrationDataReader): class DataReader(CalibrationDataReader):
def __init__(self, model_path, image_dir, transforms, data_dim): def __init__(self, model_path, image_dir, transforms, data_dim):
...@@ -37,6 +37,8 @@ class DataReader(CalibrationDataReader): ...@@ -37,6 +37,8 @@ class DataReader(CalibrationDataReader):
continue continue
img = cv.imread(os.path.join(image_dir, image_name)) img = cv.imread(os.path.join(image_dir, image_name))
img = self.transforms(img) img = self.transforms(img)
if img is None:
continue
blob = cv.dnn.blobFromImage(img) blob = cv.dnn.blobFromImage(img)
if self.data_dim == 'hwc': if self.data_dim == 'hwc':
blob = cv.transposeND(blob, [0, 2, 3, 1]) blob = cv.transposeND(blob, [0, 2, 3, 1])
...@@ -110,7 +112,10 @@ models=dict( ...@@ -110,7 +112,10 @@ models=dict(
calibration_image_dir='path/to/dataset', calibration_image_dir='path/to/dataset',
transforms=Compose([Resize(size=(192, 192)), Normalize(std=[255, 255, 255]), transforms=Compose([Resize(size=(192, 192)), Normalize(std=[255, 255, 255]),
ColorConvert(ctype=cv.COLOR_BGR2RGB)]), data_dim='hwc'), ColorConvert(ctype=cv.COLOR_BGR2RGB)]), data_dim='hwc'),
mp_handpose=Quantize(model_path='../../models/handpose_estimation_mediapipe/handpose_estimation_mediapipe_2023feb.onnx',
calibration_image_dir='path/to/dataset',
transforms=Compose([HandAlign("mp_handpose"), Resize(size=(224, 224)), Normalize(std=[255, 255, 255]),
ColorConvert(ctype=cv.COLOR_BGR2RGB)]), data_dim='hwc'),
) )
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -5,8 +5,9 @@ ...@@ -5,8 +5,9 @@
# Third party copyrights are property of their respective owners. # Third party copyrights are property of their respective owners.
import collections import collections
import numpy as numpy import numpy as np
import cv2 as cv import cv2 as cv
import sys
class Compose: class Compose:
def __init__(self, transforms=[]): def __init__(self, transforms=[]):
...@@ -15,6 +16,8 @@ class Compose: ...@@ -15,6 +16,8 @@ class Compose:
def __call__(self, img): def __call__(self, img):
for t in self.transforms: for t in self.transforms:
img = t(img) img = t(img)
if img is None:
break
return img return img
class Resize: class Resize:
...@@ -58,3 +61,69 @@ class ColorConvert: ...@@ -58,3 +61,69 @@ class ColorConvert:
def __call__(self, img): def __call__(self, img):
return cv.cvtColor(img, self.ctype) return cv.cvtColor(img, self.ctype)
class HandAlign:
def __init__(self, model):
self.model = model
sys.path.append('../../models/palm_detection_mediapipe')
from mp_palmdet import MPPalmDet
self.palm_detector = MPPalmDet(modelPath='../../models/palm_detection_mediapipe/palm_detection_mediapipe_2023feb.onnx', nmsThreshold=0.3, scoreThreshold=0.9)
def __call__(self, img):
return self.mp_handpose_align(img)
def mp_handpose_align(self, img):
palms = self.palm_detector.infer(img)
if len(palms) == 0:
return None
palm = palms[0]
palm_bbox = palm[0:4].reshape(2, 2)
palm_landmarks = palm[4:18].reshape(7, 2)
p1 = palm_landmarks[0]
p2 = palm_landmarks[2]
radians = np.pi / 2 - np.arctan2(-(p2[1] - p1[1]), p2[0] - p1[0])
radians = radians - 2 * np.pi * np.floor((radians + np.pi) / (2 * np.pi))
angle = np.rad2deg(radians)
# get bbox center
center_palm_bbox = np.sum(palm_bbox, axis=0) / 2
# get rotation matrix
rotation_matrix = cv.getRotationMatrix2D(center_palm_bbox, angle, 1.0)
# get rotated image
rotated_image = cv.warpAffine(img, rotation_matrix, (img.shape[1], img.shape[0]))
# get bounding boxes from rotated palm landmarks
homogeneous_coord = np.c_[palm_landmarks, np.ones(palm_landmarks.shape[0])]
rotated_palm_landmarks = np.array([
np.dot(homogeneous_coord, rotation_matrix[0]),
np.dot(homogeneous_coord, rotation_matrix[1])])
# get landmark bounding box
rotated_palm_bbox = np.array([
np.amin(rotated_palm_landmarks, axis=1),
np.amax(rotated_palm_landmarks, axis=1)]) # [top-left, bottom-right]
# shift bounding box
wh_rotated_palm_bbox = rotated_palm_bbox[1] - rotated_palm_bbox[0]
shift_vector = [0, -0.1] * wh_rotated_palm_bbox
rotated_palm_bbox = rotated_palm_bbox + shift_vector
# squarify bounding boxx
center_rotated_plam_bbox = np.sum(rotated_palm_bbox, axis=0) / 2
wh_rotated_palm_bbox = rotated_palm_bbox[1] - rotated_palm_bbox[0]
new_half_size = np.amax(wh_rotated_palm_bbox) / 2
rotated_palm_bbox = np.array([
center_rotated_plam_bbox - new_half_size,
center_rotated_plam_bbox + new_half_size])
# enlarge bounding box
center_rotated_plam_bbox = np.sum(rotated_palm_bbox, axis=0) / 2
wh_rotated_palm_bbox = rotated_palm_bbox[1] - rotated_palm_bbox[0]
new_half_size = wh_rotated_palm_bbox * 1.5
rotated_palm_bbox = np.array([
center_rotated_plam_bbox - new_half_size,
center_rotated_plam_bbox + new_half_size])
# Crop the rotated image by the bounding box
[[x1, y1], [x2, y2]] = rotated_palm_bbox.astype(np.int32)
diff = np.maximum([-x1, -y1, x2 - rotated_image.shape[1], y2 - rotated_image.shape[0]], 0)
[x1, y1, x2, y2] = [x1, y1, x2, y2] + diff
crop = rotated_image[y1:y2, x1:x2, :]
crop = cv.copyMakeBorder(crop, diff[1], diff[3], diff[0], diff[2], cv.BORDER_CONSTANT, value=(0, 0, 0))
return crop
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册