add break in counting for pphuman (#6372)

* add break in counting for pphuman * fix break in doc and codes * fix mot break in region, test=document_fix * fix region_polygon and region_type, test=document_fix * add tools for get_video_info, test=document_fix

add break in counting for pphuman (#6372)
* add break in counting for pphuman * fix break in doc and codes * fix mot break in region, test=document_fix * fix region_polygon and region_type, test=document_fix * add tools for get_video_info, test=document_fix
aa78ab80 · Feng Ni · GitHub · d174b1c2 · aa78ab80 · aa78ab80
11 changed file
--- a/deploy/pipeline/README.md
+++ b/deploy/pipeline/README.md
@@ -70,9 +70,12 @@ PP-Human支持图片/单镜头视频/多镜头视频多种输入方式，功能
  * 数据准备
  * 模型优化
-### 人流量计数与轨迹记录
+### 行人跟踪、人流量计数与轨迹记录
 * [快速开始](docs/tutorials/mot.md)
+  * 行人跟踪
+  * 人流量计数与轨迹记录
+  * 区域闯入判断和计数
 * [二次开发教程](../../docs/advanced_tutorials/customization/mot.md)
  * 数据准备
  * 模型优化
--- a/deploy/pipeline/docs/tutorials/action.md
+++ b/deploy/pipeline/docs/tutorials/action.md
@@ -155,6 +155,14 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
 1: 其他
 ```
+## 基于检测的行为识别——闯入识别
+具体使用请参照[PP-Human检测跟踪模块](mot.md)的`5. 区域闯入判断和计数`。
+### 方案说明
+1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号，模型方案为PP-YOLOE，详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。
+2. 通过行人检测框的下边界中点在相邻帧位于用户所选区域的内外位置，来识别是否闯入所选区域。
 ## 基于视频分类的行为识别——打架识别

--- a/deploy/pipeline/docs/tutorials/mot.md
+++ b/deploy/pipeline/docs/tutorials/mot.md
@@ -35,14 +35,16 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
 python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
                                                   --video_file=test_video.mp4 \
                                                   --device=gpu \
+                                                   --region_type=horizontal \
                                                   --do_entrance_counting \
                                                   --draw_center_traj \
                                                   --model_dir det=ppyoloe/
 ```
 **注意:**
- - `--do_entrance_counting`表示是否统计出入口流量，不设置即默认为False
+ - `--do_entrance_counting`表示是否统计出入口流量，不设置即默认为False。
 - `--draw_center_traj`表示是否绘制跟踪轨迹，不设置即默认为False。注意绘制跟踪轨迹的测试视频最好是静止摄像头拍摄的。
+ - `--region_type`表示流量计数的区域，当设置`--do_entrance_counting`时可选择`horizontal`或者`vertical`，默认是`horizontal`，表示以视频图片的中心水平线为出入口，同一物体框的中心点在相邻两秒内分别在区域中心水平线的两侧，即完成计数加一。
 测试效果如下：
@@ -52,10 +54,34 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
 数据来源及版权归属：天覆科技，感谢提供并开源实际场景数据，仅限学术研究使用
+5. 区域闯入判断和计数
+注意首先设置infer_cfg_pphuman.yml中的MOT配置的enable=True，然后启动命令如下
+```python
+python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
+                                                   --video_file=test_video.mp4 \
+                                                   --device=gpu \
+                                                   --draw_center_traj \
+                                                   --do_break_in_counting \
+                                                   --region_type=custom \
+                                                   --region_polygon 200 200 400 200 300 400 100 400
+```
+**注意:**
+ - `--do_break_in_counting`表示是否进行区域出入后计数，不设置即默认为False。
+ - `--region_type`表示流量计数的区域，当设置`--do_break_in_counting`时仅可选择`custom`，默认是`custom`，表示以用户自定义区域为出入口，同一物体框的下边界中点坐标在相邻两秒内从区域外到区域内，即完成计数加一。
+ - `--region_polygon`表示用户自定义区域的多边形的点坐标序列，每两个为一对点坐标(x,y坐标),按顺时针顺序连成一个封闭区域，至少需要3对点也即6个整数，默认值是`[]`，需要用户自行设置点坐标。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数，以及可以自定义画出自己想要的多边形区域的可视化并自己调整。运行方式如下：``` ```
+ 自定义多边形区域的可视化代码运行如下：
+  <details>
+  ```python
+  python3.7 get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400
+  </details>
 ## 方案说明
 1. 目标检测/多目标跟踪获取图片/视频输入中的行人检测框，模型方案为PP-YOLOE，详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)
-2. 多目标跟踪模型方案基于[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)，采用PP-YOLOE替换原文的YOLOX作为检测器，采用BYTETracker作为跟踪器，详细文档参考[ByteTrack](../../../../configs/mot/bytetrack)
+2. 多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)和[OC-SORT](https://arxiv.org/pdf/2203.14360.pdf)，采用PP-YOLOE替换原文的YOLOX作为检测器，采用BYTETracker和OCSORTTracker作为跟踪器，详细文档参考[ByteTrack](../../../../configs/mot/bytetrack)和[OC-SORT](../../../../configs/mot/ocsort)
 ## 参考文献
 ```
@@ -65,4 +91,11 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
  journal={arXiv preprint arXiv:2110.06864},
  year={2021}
 }
+@article{cao2022observation,
+  title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking},
+  author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris},
+  journal={arXiv preprint arXiv:2203.14360},
+  year={2022}
+}
 ```
--- a/deploy/pipeline/pipe_utils.py
+++ b/deploy/pipeline/pipe_utils.py
@@ -102,8 +102,30 @@ def argsparser():
        "--do_entrance_counting",
        action='store_true',
        help="Whether counting the numbers of identifiers entering "
-        "or getting out from the entrance. Note that only support one-class"
+        "or getting out from the entrance. Note that only support single-class MOT."
-        "counting, multi-class counting is coming soon.")
+    )
+    parser.add_argument(
+        "--do_break_in_counting",
+        action='store_true',
+        help="Whether counting the numbers of identifiers break in "
+        "the area. Note that only support single-class MOT and "
+        "the video should be taken by a static camera.")
+    parser.add_argument(
+        "--region_type",
+        type=str,
+        default='horizontal',
+        help="Area type for entrance counting or break in counting, 'horizontal' and "
+        "'vertical' used when do entrance counting. 'custom' used when do break in counting. "
+        "Note that only support single-class MOT, and the video should be taken by a static camera."
+    )
+    parser.add_argument(
+        '--region_polygon',
+        nargs='+',
+        type=int,
+        default=[],
+        help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when "
+        "do_break_in_counting. Note that only support single-class MOT and "
+        "the video should be taken by a static camera.")
    parser.add_argument(
        "--secs_interval",
        type=int,

--- a/deploy/pipeline/pipeline.py
+++ b/deploy/pipeline/pipeline.py
@@ -111,6 +111,13 @@ class Pipeline(object):
        self.draw_center_traj = args.draw_center_traj
        self.secs_interval = args.secs_interval
        self.do_entrance_counting = args.do_entrance_counting
+        self.do_break_in_counting = args.do_break_in_counting
+        self.region_type = args.region_type
+        self.region_polygon = args.region_polygon
+        if self.region_type == 'custom':
+            assert len(
+                self.region_polygon
+            ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.'
    def _parse_input(self, image_file, image_dir, video_file, video_dir,
                     camera_id):
@@ -261,6 +268,9 @@ class PipePredictor(object):
        draw_center_traj = args.draw_center_traj
        secs_interval = args.secs_interval
        do_entrance_counting = args.do_entrance_counting
+        do_break_in_counting = args.do_break_in_counting
+        region_type = args.region_type
+        region_polygon = args.region_polygon
        # general module for pphuman and ppvehicle
        self.with_mot = cfg.get('MOT', False)['enable'] if cfg.get(
@@ -326,6 +336,9 @@ class PipePredictor(object):
        self.draw_center_traj = draw_center_traj
        self.secs_interval = secs_interval
        self.do_entrance_counting = do_entrance_counting
+        self.do_break_in_counting = do_break_in_counting
+        self.region_type = region_type
+        self.region_polygon = region_polygon
        self.warmup_frame = self.cfg['warmup_frame']
        self.pipeline_res = Result()
@@ -527,7 +540,10 @@ class PipePredictor(object):
                    enable_mkldnn,
                    draw_center_traj=draw_center_traj,
                    secs_interval=secs_interval,
-                    do_entrance_counting=do_entrance_counting)
+                    do_entrance_counting=do_entrance_counting,
+                    do_break_in_counting=do_break_in_counting,
+                    region_type=region_type,
+                    region_polygon=region_polygon)
            if self.with_video_action:
                video_action_cfg = self.cfg['VIDEO_ACTION']
@@ -667,7 +683,24 @@ class PipePredictor(object):
        out_id_list = list()
        prev_center = dict()
        records = list()
-        entrance = [0, height / 2., width, height / 2.]
+        if self.do_entrance_counting or self.do_break_in_counting:
+            if self.region_type == 'horizontal':
+                entrance = [0, height / 2., width, height / 2.]
+            elif self.region_type == 'vertical':
+                entrance = [width / 2, 0., width / 2, height]
+            elif self.region_type == 'custom':
+                entrance = []
+                assert len(
+                    self.region_polygon
+                ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting."
+                for i in range(0, len(self.region_polygon), 2):
+                    entrance.append(
+                        [self.region_polygon[i], self.region_polygon[i + 1]])
+                entrance.append([width, height])
+            else:
+                raise ValueError("region_type:{} unsupported.".format(
+                    self.region_type))
        video_fps = fps
        video_action_imgs = []
@@ -704,8 +737,9 @@ class PipePredictor(object):
                              ids[0])  # single class
                statistic = flow_statistic(
                    mot_result, self.secs_interval, self.do_entrance_counting,
-                    video_fps, entrance, id_set, interval_id_set, in_id_list,
+                    self.do_break_in_counting, self.region_type, video_fps,
-                    out_id_list, prev_center, records)
+                    entrance, id_set, interval_id_set, in_id_list, out_id_list,
+                    prev_center, records)
                records = statistic['records']
                # nothing detected
@@ -933,6 +967,7 @@ class PipePredictor(object):
                frame_id=frame_id,
                fps=fps,
                do_entrance_counting=self.do_entrance_counting,
+                do_break_in_counting=self.do_break_in_counting,
                entrance=entrance,
                records=records,
                center_traj=center_traj)

--- a/deploy/pipeline/tools/get_video_info.py
+++ b/deploy/pipeline/tools/get_video_info.py
+import os
+import sys
+import cv2
+import numpy as np
+import argparse
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--video_file",
+        type=str,
+        default=None,
+        help="Path of video file, `video_file` or `camera_id` has a highest priority."
+    )
+    parser.add_argument(
+        '--region_polygon',
+        nargs='+',
+        type=int,
+        default=[],
+        help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when "
+        "do_break_in_counting. Note that only support single-class MOT and "
+        "the video should be taken by a static camera.")
+    return parser
+def get_video_info(video_file, region_polygon):
+    entrance = []
+    assert len(region_polygon
+               ) % 2 == 0, "region_polygon should be pairs of coords points."
+    for i in range(0, len(region_polygon), 2):
+        entrance.append([region_polygon[i], region_polygon[i + 1]])
+    if not os.path.exists(video_file):
+        print("video path '{}' not exists".format(video_file))
+        sys.exit(-1)
+    capture = cv2.VideoCapture(video_file)
+    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    print("video width: %d, height: %d" % (width, height))
+    np_masks = np.zeros((height, width, 1), np.uint8)
+    entrance = np.array(entrance)
+    cv2.fillPoly(np_masks, [entrance], 255)
+    fps = int(capture.get(cv2.CAP_PROP_FPS))
+    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+    print("video fps: %d, frame_count: %d" % (fps, frame_count))
+    cnt = 0
+    while (1):
+        ret, frame = capture.read()
+        cnt += 1
+        if cnt == 3: break
+    alpha = 0.3
+    img = np.array(frame).astype('float32')
+    mask = np_masks[:, :, 0]
+    color_mask = [0, 0, 255]
+    idx = np.nonzero(mask)
+    color_mask = np.array(color_mask)
+    img[idx[0], idx[1], :] *= 1.0 - alpha
+    img[idx[0], idx[1], :] += alpha * color_mask
+    cv2.imwrite('region_vis.jpg', img)
+if __name__ == "__main__":
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    get_video_info(FLAGS.video_file, FLAGS.region_polygon)
+    # python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400
--- a/deploy/pptracking/python/mot/utils.py
+++ b/deploy/pptracking/python/mot/utils.py
@@ -211,6 +211,8 @@ def preprocess_reid(imgs,
 def flow_statistic(result,
                   secs_interval,
                   do_entrance_counting,
+                   do_break_in_counting,
+                   region_type,
                   video_fps,
                   entrance,
                   id_set,
@@ -221,39 +223,84 @@ def flow_statistic(result,
                   records,
                   data_type='mot',
                   num_classes=1):
-    # Count in and out number: 
+    # Count in/out number: 
-    # Use horizontal center line as the entrance just for simplification.
+    # Note that 'region_type' should be one of ['horizontal', 'vertical', 'custom'],
-    # If a person located in the above the horizontal center line 
+    # 'horizontal' and 'vertical' means entrance is the center line as the entrance when do_entrance_counting, 
-    # at the previous frame and is in the below the line at the current frame,
+    # 'custom' means entrance is a region defined by users when do_break_in_counting.
-    # the in number is increased by one.
-    # If a person was in the below the horizontal center line 
-    # at the previous frame and locates in the below the line at the current frame,
-    # the out number is increased by one.
-    # TODO: if the entrance is not the horizontal center line,
-    # the counting method should be optimized.
    if do_entrance_counting:
-        entrance_y = entrance[1]  # xmin, ymin, xmax, ymax
+        assert region_type in [
+            'horizontal', 'vertical'
+        ], "region_type should be 'horizontal' or 'vertical' when do entrance counting."
+        entrance_x, entrance_y = entrance[0], entrance[1]
        frame_id, tlwhs, tscores, track_ids = result
        for tlwh, score, track_id in zip(tlwhs, tscores, track_ids):
            if track_id < 0: continue
            if data_type == 'kitti':
                frame_id -= 1
            x1, y1, w, h = tlwh
            center_x = x1 + w / 2.
            center_y = y1 + h / 2.
            if track_id in prev_center:
-                if prev_center[track_id][1] <= entrance_y and \
+                if region_type == 'horizontal':
-                   center_y > entrance_y:
+                    # horizontal center line
-                    in_id_list.append(track_id)
+                    if prev_center[track_id][1] <= entrance_y and \
-                if prev_center[track_id][1] >= entrance_y and \
+                    center_y > entrance_y:
-                   center_y < entrance_y:
+                        in_id_list.append(track_id)
-                    out_id_list.append(track_id)
+                    if prev_center[track_id][1] >= entrance_y and \
+                    center_y < entrance_y:
+                        out_id_list.append(track_id)
+                else:
+                    # vertical center line
+                    if prev_center[track_id][0] <= entrance_x and \
+                    center_x > entrance_x:
+                        in_id_list.append(track_id)
+                    if prev_center[track_id][0] >= entrance_x and \
+                    center_x < entrance_x:
+                        out_id_list.append(track_id)
                prev_center[track_id][0] = center_x
                prev_center[track_id][1] = center_y
            else:
                prev_center[track_id] = [center_x, center_y]
-    # Count totol number, number at a manual-setting interval
+    if do_break_in_counting:
+        assert region_type in [
+            'custom'
+        ], "region_type should be 'custom' when do break_in counting."
+        assert len(
+            entrance
+        ) >= 4, "entrance should be at least 3 points and (w,h) of image when do break_in counting."
+        im_w, im_h = entrance[-1][:]
+        entrance = np.array(entrance[:-1])
+        frame_id, tlwhs, tscores, track_ids = result
+        for tlwh, score, track_id in zip(tlwhs, tscores, track_ids):
+            if track_id < 0: continue
+            if data_type == 'kitti':
+                frame_id -= 1
+            x1, y1, w, h = tlwh
+            center_x = min(x1 + w / 2., im_w - 1)
+            center_down_y = min(y1 + h, im_h - 1)
+            # counting objects in region of the first frame
+            if frame_id == 1:
+                if in_quadrangle([center_x, center_down_y], entrance, im_h,
+                                 im_w):
+                    in_id_list.append(-1)
+                else:
+                    prev_center[track_id] = [center_x, center_down_y]
+            else:
+                if track_id in prev_center:
+                    if not in_quadrangle(prev_center[track_id], entrance, im_h,
+                                         im_w) and in_quadrangle(
+                                             [center_x, center_down_y],
+                                             entrance, im_h, im_w):
+                        in_id_list.append(track_id)
+                    prev_center[track_id] = [center_x, center_down_y]
+                else:
+                    prev_center[track_id] = [center_x, center_down_y]
+# Count totol number, number at a manual-setting interval
    frame_id, tlwhs, tscores, track_ids = result
    for tlwh, score, track_id in zip(tlwhs, tscores, track_ids):
        if track_id < 0: continue
@@ -268,6 +315,8 @@ def flow_statistic(result,
    if do_entrance_counting:
        info += ", In count: {}, Out count: {}".format(
            len(in_id_list), len(out_id_list))
+    if do_break_in_counting:
+        info += ", Break_in count: {}".format(len(in_id_list))
    if frame_id % video_fps == 0 and frame_id / video_fps % secs_interval == 0:
        info += ", Count during {} secs: {}".format(secs_interval,
                                                    curr_interval_count)
@@ -282,5 +331,15 @@ def flow_statistic(result,
        "in_id_list": in_id_list,
        "out_id_list": out_id_list,
        "prev_center": prev_center,
-        "records": records
+        "records": records,
    }
+def in_quadrangle(point, entrance, im_h, im_w):
+    mask = np.zeros((im_h, im_w, 1), np.uint8)
+    cv2.fillPoly(mask, [entrance], 255)
+    p = tuple(map(int, point))
+    if mask[p[1], p[0], :] > 0:
+        return True
+    else:
+        return False
--- a/deploy/pptracking/python/mot/visualize.py
+++ b/deploy/pptracking/python/mot/visualize.py
@@ -191,13 +191,16 @@ def plot_tracking_dict(image,
                       scores_dict,
                       frame_id=0,
                       fps=0.,
-                       ids2names=[],
+                       ids2names=['pedestrian'],
                       do_entrance_counting=False,
+                       do_break_in_counting=False,
                       entrance=None,
                       records=None,
                       center_traj=None):
    im = np.ascontiguousarray(np.copy(image))
    im_h, im_w = im.shape[:2]
+    if do_break_in_counting:
+        entrance = np.array(entrance[:-1])  # last pair is [im_w, im_h] 
    text_scale = max(0.5, image.shape[1] / 3000.)
    text_thickness = 2
@@ -231,6 +234,30 @@ def plot_tracking_dict(image,
            text_scale, (0, 0, 255),
            thickness=text_thickness)
+    if num_classes == 1 and do_break_in_counting:
+        np_masks = np.zeros((im_h, im_w, 1), np.uint8)
+        cv2.fillPoly(np_masks, [entrance], 255)
+        # Draw region mask
+        alpha = 0.3
+        im = np.array(im).astype('float32')
+        mask = np_masks[:, :, 0]
+        color_mask = [0, 0, 255]
+        idx = np.nonzero(mask)
+        color_mask = np.array(color_mask)
+        im[idx[0], idx[1], :] *= 1.0 - alpha
+        im[idx[0], idx[1], :] += alpha * color_mask
+        im = np.array(im).astype('uint8')
+        # find start location for break in counting data
+        start = records[-1].find('Break_in')
+        cv2.putText(
+            im,
+            records[-1][start:-1], (entrance[0][0] - 10, entrance[0][1] - 10),
+            cv2.FONT_ITALIC,
+            text_scale, (0, 0, 255),
+            thickness=text_thickness)
    for cls_id in range(num_classes):
        tlwhs = tlwhs_dict[cls_id]
        obj_ids = obj_ids_dict[cls_id]
@@ -262,7 +289,17 @@ def plot_tracking_dict(image,
                id_text = 'class{}_{}'.format(cls_id, id_text)
            _line_thickness = 1 if obj_id <= 0 else line_thickness
-            color = get_color(abs(obj_id))
+            in_region = False
+            if do_break_in_counting:
+                center_x = min(x1 + w / 2., im_w - 1)
+                center_down_y = min(y1 + h, im_h - 1)
+                if in_quadrangle([center_x, center_down_y], entrance, im_h,
+                                 im_w):
+                    in_region = True
+            color = get_color(abs(obj_id)) if in_region == False else (0, 0,
+                                                                       255)
            cv2.rectangle(
                im,
                intbox[0:2],
@@ -273,16 +310,26 @@ def plot_tracking_dict(image,
                im,
                id_text, (intbox[0], intbox[1] - 25),
                cv2.FONT_ITALIC,
-                text_scale, (0, 255, 255),
+                text_scale,
+                color,
                thickness=text_thickness)
+            if do_break_in_counting and in_region:
+                cv2.putText(
+                    im,
+                    'Break in now.', (intbox[0], intbox[1] - 50),
+                    cv2.FONT_ITALIC,
+                    text_scale, (0, 0, 255),
+                    thickness=text_thickness)
            if scores is not None:
                text = 'score: {:.2f}'.format(float(scores[i]))
                cv2.putText(
                    im,
                    text, (intbox[0], intbox[1] - 6),
                    cv2.FONT_ITALIC,
-                    text_scale, (0, 255, 0),
+                    text_scale,
+                    color,
                    thickness=text_thickness)
        if center_traj is not None:
            for traj in center_traj:
@@ -292,3 +339,13 @@ def plot_tracking_dict(image,
                    for point in traj[i]:
                        cv2.circle(im, point, 3, (0, 0, 255), -1)
    return im
+def in_quadrangle(point, entrance, im_h, im_w):
+    mask = np.zeros((im_h, im_w, 1), np.uint8)
+    cv2.fillPoly(mask, [entrance], 255)
+    p = tuple(map(int, point))
+    if mask[p[1], p[0], :] > 0:
+        return True
+    else:
+        return False
--- a/deploy/pptracking/python/mot_jde_infer.py
+++ b/deploy/pptracking/python/mot_jde_infer.py
@@ -64,28 +64,39 @@ class JDE_Detector(Detector):
        do_entrance_counting(bool): Whether counting the numbers of identifiers entering 
            or getting out from the entrance, default as False，only support single class
            counting in MOT.
+        do_break_in_counting(bool): Whether counting the numbers of identifiers break in
+            the area, default as False，only support single class counting in MOT,
+            and the video should be taken by a static camera.
+        region_type (str): Area type for entrance counting or break in counting, 'horizontal'
+            and 'vertical' used when do entrance counting. 'custom' used when do break in counting. 
+            Note that only support single-class MOT, and the video should be taken by a static camera.
+        region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when
+            do_break_in_counting. Note that only support single-class MOT and
+            the video should be taken by a static camera.
    """
-    def __init__(
+    def __init__(self,
-            self,
+                 model_dir,
-            model_dir,
+                 tracker_config=None,
-            tracker_config=None,
+                 device='CPU',
-            device='CPU',
+                 run_mode='paddle',
-            run_mode='paddle',
+                 batch_size=1,
-            batch_size=1,
+                 trt_min_shape=1,
-            trt_min_shape=1,
+                 trt_max_shape=1088,
-            trt_max_shape=1088,
+                 trt_opt_shape=608,
-            trt_opt_shape=608,
+                 trt_calib_mode=False,
-            trt_calib_mode=False,
+                 cpu_threads=1,
-            cpu_threads=1,
+                 enable_mkldnn=False,
-            enable_mkldnn=False,
+                 output_dir='output',
-            output_dir='output',
+                 threshold=0.5,
-            threshold=0.5,
+                 save_images=False,
-            save_images=False,
+                 save_mot_txts=False,
-            save_mot_txts=False,
+                 draw_center_traj=False,
-            draw_center_traj=False,
+                 secs_interval=10,
-            secs_interval=10,
+                 do_entrance_counting=False,
-            do_entrance_counting=False, ):
+                 do_break_in_counting=False,
+                 region_type='horizontal',
+                 region_polygon=[]):
        super(JDE_Detector, self).__init__(
            model_dir=model_dir,
            device=device,
@@ -104,6 +115,13 @@ class JDE_Detector(Detector):
        self.draw_center_traj = draw_center_traj
        self.secs_interval = secs_interval
        self.do_entrance_counting = do_entrance_counting
+        self.do_break_in_counting = do_break_in_counting
+        self.region_type = region_type
+        self.region_polygon = region_polygon
+        if self.region_type == 'custom':
+            assert len(
+                self.region_polygon
+            ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.'
        assert batch_size == 1, "MOT model only supports batch_size=1."
        self.det_times = Timer(with_tracker=True)
@@ -310,7 +328,24 @@ class JDE_Detector(Detector):
            out_id_list = list()
            prev_center = dict()
            records = list()
-            entrance = [0, height / 2., width, height / 2.]
+            if self.do_entrance_counting or self.do_break_in_counting:
+                if self.region_type == 'horizontal':
+                    entrance = [0, height / 2., width, height / 2.]
+                elif self.region_type == 'vertical':
+                    entrance = [width / 2, 0., width / 2, height]
+                elif self.region_type == 'custom':
+                    entrance = []
+                    assert len(
+                        self.region_polygon
+                    ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting."
+                    for i in range(0, len(self.region_polygon), 2):
+                        entrance.append([
+                            self.region_polygon[i], self.region_polygon[i + 1]
+                        ])
+                    entrance.append([width, height])
+                else:
+                    raise ValueError("region_type:{} is not supported.".format(
+                        self.region_type))
        video_fps = fps
@@ -340,8 +375,9 @@ class JDE_Detector(Detector):
                          online_ids[0])
                statistic = flow_statistic(
                    result, self.secs_interval, self.do_entrance_counting,
-                    video_fps, entrance, id_set, interval_id_set, in_id_list,
+                    self.do_break_in_counting, self.region_type, video_fps,
-                    out_id_list, prev_center, records, data_type, num_classes)
+                    entrance, id_set, interval_id_set, in_id_list, out_id_list,
+                    prev_center, records, data_type, num_classes)
                records = statistic['records']
            fps = 1. / timer.duration
@@ -403,7 +439,10 @@ def main():
        save_mot_txts=FLAGS.save_mot_txts,
        draw_center_traj=FLAGS.draw_center_traj,
        secs_interval=FLAGS.secs_interval,
-        do_entrance_counting=FLAGS.do_entrance_counting, )
+        do_entrance_counting=FLAGS.do_entrance_counting,
+        do_break_in_counting=FLAGS.do_break_in_counting,
+        region_type=FLAGS.region_type,
+        region_polygon=FLAGS.region_polygon)
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:

--- a/deploy/pptracking/python/mot_sde_infer.py
+++ b/deploy/pptracking/python/mot_sde_infer.py
@@ -64,7 +64,16 @@ class SDE_Detector(Detector):
        secs_interval (int): The seconds interval to count after tracking, default as 10
        do_entrance_counting(bool): Whether counting the numbers of identifiers entering 
            or getting out from the entrance, default as False，only support single class
-            counting in MOT.
+            counting in MOT, and the video should be taken by a static camera.
+        do_break_in_counting(bool): Whether counting the numbers of identifiers break in
+            the area, default as False，only support single class counting in MOT,
+            and the video should be taken by a static camera.
+        region_type (str): Area type for entrance counting or break in counting, 'horizontal'
+            and 'vertical' used when do entrance counting. 'custom' used when do break in counting. 
+            Note that only support single-class MOT, and the video should be taken by a static camera.
+        region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when
+            do_break_in_counting. Note that only support single-class MOT and
+            the video should be taken by a static camera.
        reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT
        mtmct_dir (str): MTMCT dir, default None, set for doing MTMCT
    """
@@ -88,6 +97,9 @@ class SDE_Detector(Detector):
                 draw_center_traj=False,
                 secs_interval=10,
                 do_entrance_counting=False,
+                 do_break_in_counting=False,
+                 region_type='horizontal',
+                 region_polygon=[],
                 reid_model_dir=None,
                 mtmct_dir=None):
        super(SDE_Detector, self).__init__(
@@ -108,6 +120,13 @@ class SDE_Detector(Detector):
        self.draw_center_traj = draw_center_traj
        self.secs_interval = secs_interval
        self.do_entrance_counting = do_entrance_counting
+        self.do_break_in_counting = do_break_in_counting
+        self.region_type = region_type
+        self.region_polygon = region_polygon
+        if self.region_type == 'custom':
+            assert len(
+                self.region_polygon
+            ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.'
        assert batch_size == 1, "MOT model only supports batch_size=1."
        self.det_times = Timer(with_tracker=True)
@@ -552,7 +571,25 @@ class SDE_Detector(Detector):
            out_id_list = list()
            prev_center = dict()
            records = list()
-            entrance = [0, height / 2., width, height / 2.]
+            if self.do_entrance_counting or self.do_break_in_counting:
+                if self.region_type == 'horizontal':
+                    entrance = [0, height / 2., width, height / 2.]
+                elif self.region_type == 'vertical':
+                    entrance = [width / 2, 0., width / 2, height]
+                elif self.region_type == 'custom':
+                    entrance = []
+                    assert len(
+                        self.region_polygon
+                    ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting."
+                    for i in range(0, len(self.region_polygon), 2):
+                        entrance.append([
+                            self.region_polygon[i], self.region_polygon[i + 1]
+                        ])
+                    entrance.append([width, height])
+                else:
+                    raise ValueError("region_type:{} is not supported.".format(
+                        self.region_type))
        video_fps = fps
        while (1):
@@ -578,8 +615,9 @@ class SDE_Detector(Detector):
                          online_ids[0])
                statistic = flow_statistic(
                    result, self.secs_interval, self.do_entrance_counting,
-                    video_fps, entrance, id_set, interval_id_set, in_id_list,
+                    self.do_break_in_counting, self.region_type, video_fps,
-                    out_id_list, prev_center, records, data_type, num_classes)
+                    entrance, id_set, interval_id_set, in_id_list, out_id_list,
+                    prev_center, records, data_type, num_classes)
                records = statistic['records']
            fps = 1. / timer.duration
@@ -764,6 +802,9 @@ def main():
        draw_center_traj=FLAGS.draw_center_traj,
        secs_interval=FLAGS.secs_interval,
        do_entrance_counting=FLAGS.do_entrance_counting,
+        do_break_in_counting=FLAGS.do_break_in_counting,
+        region_type=FLAGS.region_type,
+        region_polygon=FLAGS.region_polygon,
        reid_model_dir=FLAGS.reid_model_dir,
        mtmct_dir=FLAGS.mtmct_dir, )

--- a/deploy/pptracking/python/mot_utils.py
+++ b/deploy/pptracking/python/mot_utils.py
@@ -141,8 +141,30 @@ def argsparser():
        "--do_entrance_counting",
        action='store_true',
        help="Whether counting the numbers of identifiers entering "
-        "or getting out from the entrance. Note that only support one-class"
+        "or getting out from the entrance. Note that only support single-class MOT."
-        "counting, multi-class counting is coming soon.")
+    )
+    parser.add_argument(
+        "--do_break_in_counting",
+        action='store_true',
+        help="Whether counting the numbers of identifiers break in "
+        "the area. Note that only support single-class MOT and "
+        "the video should be taken by a static camera.")
+    parser.add_argument(
+        "--region_type",
+        type=str,
+        default='horizontal',
+        help="Area type for entrance counting or break in counting, 'horizontal' and "
+        "'vertical' used when do entrance counting. 'custom' used when do break in counting. "
+        "Note that only support single-class MOT, and the video should be taken by a static camera."
+    )
+    parser.add_argument(
+        '--region_polygon',
+        nargs='+',
+        type=int,
+        default=[],
+        help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when "
+        "do_break_in_counting. Note that only support single-class MOT and "
+        "the video should be taken by a static camera.")
    parser.add_argument(
        "--secs_interval",
        type=int,