提交 951d342d 编写于 作者: W wuzewu

First commit

上级 f8e28e8b
# PaddleSeg
\ No newline at end of file
# PaddleSeg 语义分割库
## 简介
PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义分割库,覆盖了DeepLabv3+, U-Net, ICNet三类主流的分割模型。通过统一的配置,帮助用户更便捷地完成从训练到部署的全流程图像分割应用。
具备高性能、丰富的数据增强、工业级部署、全流程应用的特点。
- **丰富的数据增强**
- 基于百度视觉技术部的实际业务经验,内置10+种数据增强策略,可结合实际业务场景进行定制组合,提升模型泛化能力和鲁棒性。
- **主流模型覆盖**
- 支持U-Net, DeepLabv3+, ICNet三类主流分割网络,结合预训练模型和可调节的骨干网络,满足不同性能和精度的要求。
- **高性能**
- PaddleSeg支持多进程IO、多卡并行、多卡Batch Norm, FP16混合精度等训练加速策略,通过飞桨核心框架的显存优化算法,可以大幅度节约分割模型的显存开销,更快完成分割模型训练。
- **工业级部署**
- 基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)和PaddlePaddle高性能预测引擎, 结合百度开放的AI能力,轻松搭建人像分割和车道线分割服务。
更多模型信息与技术细节请查看[模型介绍](./docs/models.md)[预训练模型](./docs/mode_zoo.md)
## AI Studio教程
### 快速开始
通过 [PaddleSeg人像分割](https://aistudio.baidu.com/aistudio/projectDetail/100798) 教程可快速体验PaddleSeg人像分割模型的效果。
### 入门教程
入门教程以经典的U-Net模型为例, 结合Oxford-IIIT宠物数据集,快速熟悉PaddleSeg使用流程, 详情请点击[U-Net宠物分割](https://aistudio.baidu.com/aistudio/projectDetail/102889)
### 高级教程
高级教程以DeepLabv3+模型为例,结合Cityscapes数据集,快速了解ASPP, Backbone网络切换,多卡Batch Norm同步等策略,详情请点击[DeepLabv3+图像分割](https://aistudio.baidu.com/aistudio/projectDetail/101696)
### 垂类模型
更多特色垂类分割模型如LIP人体部件分割、人像分割、车道线分割模型可以参考[contrib](./contrib/README.md)
## 使用文档
* [安装说明](./docs/installation.md)
* [数据准备](./docs/data_prepare.md)
* [数据增强](./docs/data_aug.md)
* [预训练模型](./docs/model_zoo.md)
* [训练/评估/预测(可视化)](./docs/usage.md)
* [预测库集成](./inference/README.md)
* [服务端部署](./serving/README.md)
* [垂类分割模型](./contrib/README.md)
## FAQ
#### Q:图像分割的数据增强如何配置,unpadding, step scaling, range scaling的原理是什么?
A:数据增强的配置可以参考文档[数据增强](./docs/data_aug.md)
#### Q: 预测时图片过大,导致显存不足如何处理?
A: 降低Batch size,使用Group Norm策略等。
## 更新日志
### 2019.08.25
#### v0.1.0
* PaddleSeg分割库初始版本发布,包含DeepLabv3+, U-Net, ICNet三类分割模型, 其中DeepLabv3+支持Xception, MobileNet两种可调节的骨干网络。
* CVPR 19' LIP人体部件分割比赛冠军预测模型发布[ACE2P](./contrib/ACE2P)
* 预置基于DeepLabv3+网络的[人像分割](./contrib/HumanSeg/)[车道线分割](./contrib/RoadLine)预测模型发布
## 如何贡献代码
我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。
EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 4
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "gn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
ASPP_WITH_SEP_CONV: True
DECODER_USE_SEP_CONV: True
TEST:
TEST_MODEL: "snapshots/cityscape_v5/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
PRETRAINED_MODEL: u"pretrain/deeplabv3plus_gn_init"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 700
EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 8
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: "./data/COCO2014/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 21
TEST_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
TRAIN_FILE_LIST: "data/COCO2014/ImageSets/train.txt"
VAL_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
TEST:
TEST_MODEL: "snapshots/coco_v1/final"
TRAIN:
MODEL_SAVE_DIR: "snapshots/coco_v1/"
PRETRAINED_MODEL: "pretrain/xception65_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 5
SOLVER:
LR: 0.007
WEIGHT_DECAY: 0.00004
NUM_EPOCHS: 40
LR_POLICY: "poly"
OPTIMIZER: "SGD"
TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (513, 513) # (width, height), for unpadding
INF_RESIZE_VALUE: 513 # for rangescaling
MAX_RESIZE_VALUE: 400 # for rangescaling
MIN_RESIZE_VALUE: 513 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: True
ASPECT_RATIO: 0
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 45
MIN_AREA_RATIO: 0
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 24
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: u"./data/humanseg/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: u"data/humanseg/list/val.txt"
TRAIN_FILE_LIST: u"data/humanseg/list/train.txt"
VAL_FILE_LIST: u"data/humanseg/list/val.txt"
IGNORE_INDEX: 255
SEPARATOR: "|"
FREEZE:
MODEL_FILENAME: u"model"
PARAMS_FILENAME: u"params"
SAVE_DIR: u"human_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: u"bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "xception_65"
TEST:
TEST_MODEL: "snapshots/humanseg/aic_v2/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/humanseg/aic_v2/"
PRETRAINED_MODEL: u"pretrain/xception65_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 5
SOLVER:
LR: 0.1
NUM_EPOCHS: 40
LR_POLICY: "poly"
OPTIMIZER: "sgd"
EVAL_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (1536, 576) # (width, height), for unpadding
INF_RESIZE_VALUE: 1280 # for rangescaling
MAX_RESIZE_VALUE: 1024 # for rangescaling
MIN_RESIZE_VALUE: 1536 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 1
MEAN: [127.5, 127.5, 127.5]
STD: [127.5, 127.5, 127.5]
DATASET:
DATA_DIR: "./data/line/L4_lane_mask_dataset_app/L4_360_0_2class/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
TRAIN_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/train.txt"
VAL_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
SAVE_DIR: "line_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "mobilenet"
TEST:
TEST_MODEL: "snapshots/line_v4/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/line_v4/"
PRETRAINED_MODEL: u"pretrain/MobileNetV2_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 40
TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 1.25 # for stepscaling
MIN_SCALE_FACTOR: 0.75 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 4
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: "./dataset/mini_pet/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 3
TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
IGNORE_INDEX: 255
SEPARATOR: " "
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
MODEL:
MODEL_NAME: "unet"
DEFAULT_NORM_TYPE: "bn"
TEST:
TEST_MODEL: "./test/saved_model/unet_pet/final/"
TRAIN:
MODEL_SAVE_DIR: "./test/saved_models/unet_pet/"
PRETRAINED_MODEL: "./test/models/unet_coco/"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
NUM_EPOCHS: 500
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "adam"
# Augmented Context Embedding with Edge Perceiving(ACE2P)
- 类别: 图像-语义分割
- 网络: ACE2P
- 数据集: LIP
## 模型概述
人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。ACE2P通过融合底层特征、全局上下文信息和边缘细节,
端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
## 模型框架图
![](imgs/net.jpg)
## 模型细节
ACE2P模型包含三个分支:
* 语义分割分支
* 边缘检测分支
* 融合分支
语义分割分支采用resnet101作为backbone,通过Pyramid Scene Parsing Network融合上下文信息以获得更加精确的特征表征
边缘检测分支采用backbone的中间层特征作为输入,预测二值边缘信息
融合分支将语义分割分支以及边缘检测分支的特征进行融合,以获得边缘细节更加准确的分割图像。
分割问题一般采用mIoU作为评价指标,特别引入了IoU loss结合cross-entropy loss以针对性优化这一指标
测试阶段,采用多尺度以及水平翻转的结果进行融合生成最终预测结果
训练阶段,采用余弦退火的学习率策略, 并且在学习初始阶段采用线性warm up
数据预处理方面,保持图片比例并进行随机缩放,随机旋转,水平翻转作为数据增强策略
## LIP指标
该模型在测试尺度为'377,377,473,473,567,567'且水平翻转的情况下,meanIoU为62.63
多模型ensemble后meanIoU为65.18, 居LIP Single-Person Human Parsing Track榜单第一
## 模型预测效果展示
![](imgs/result.jpg)
## 引用
**论文**
*Devil in the Details: Towards Accurate Single and Multiple Human Parsing* https://arxiv.org/abs/1809.05996
**代码**
https://github.com/Microsoft/human-pose-estimation.pytorch
https://github.com/liutinglt/CE2P
# -*- coding: utf-8 -*-
from utils.util import AttrDict, merge_cfg_from_args, get_arguments
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "testing_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test_id.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "ACE2P")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 20
# 均值, 图像预处理减去的均值
cfg.MEAN = 0.406, 0.456, 0.485
# 标准差,图像预处理除以标准差
cfg.STD = 0.225, 0.224, 0.229
# 多尺度预测时图像尺寸
cfg.multi_scales = (377,377), (473,473), (567,567)
# 多尺度预测时图像是否水平翻转
cfg.flip = True
merge_cfg_from_args(args, cfg)
# -*- coding: utf-8 -*-
import numpy as np
import paddle.fluid as fluid
from ACE2P.config import cfg
import cv2
def get_affine_points(src_shape, dst_shape, rot_grad=0):
# 获取图像和仿射后图像的三组对应点坐标
# 三组点为仿射变换后图像的中心点, [w/2,0], [0,0],及对应原始图像的点
if dst_shape[0] == 0 or dst_shape[1] == 0:
raise Exception('scale shape should not be 0')
# 旋转角度
rotation = rot_grad * np.pi / 180.0
sin_v = np.sin(rotation)
cos_v = np.cos(rotation)
dst_ratio = float(dst_shape[0]) / dst_shape[1]
h, w = src_shape
src_ratio = float(h) / w if w != 0 else 0
affine_shape = [h, h * dst_ratio] if src_ratio > dst_ratio \
else [w / dst_ratio, w]
# 原始图像三组点
points = [[0, 0]] * 3
points[0] = (np.array([w, h]) - 1) * 0.5
points[1] = points[0] + 0.5 * affine_shape[0] * np.array([sin_v, -cos_v])
points[2] = points[1] - 0.5 * affine_shape[1] * np.array([cos_v, sin_v])
# 仿射变换后图三组点
points_trans = [[0, 0]] * 3
points_trans[0] = (np.array(dst_shape[::-1]) - 1) * 0.5
points_trans[1] = [points_trans[0][0], 0]
return points, points_trans
def preprocess(im):
# ACE2P模型数据预处理
im_shape = im.shape[:2]
input_images = []
for i, scale in enumerate(cfg.multi_scales):
# 获取图像和仿射变换后图像的对应点坐标
points, points_trans = get_affine_points(im_shape, scale)
# 根据对应点集获得仿射矩阵
trans = cv2.getAffineTransform(np.float32(points),
np.float32(points_trans))
# 根据仿射矩阵对图像进行仿射
input = cv2.warpAffine(im,
trans,
scale[::-1],
flags=cv2.INTER_LINEAR)
# 减均值测,除以方差,转换数据格式为NCHW
input = input.astype(np.float32)
input = (input / 255. - np.array(cfg.MEAN)) / np.array(cfg.STD)
input = input.transpose(2, 0, 1).astype(np.float32)
input = np.expand_dims(input, 0)
# 水平翻转
if cfg.flip:
flip_input = input[:, :, :, ::-1]
input_images.append(np.vstack((input, flip_input)))
else:
input_images.append(input)
return input_images
def multi_scale_test(exe, test_prog, feed_name, fetch_list,
input_ims, im_shape):
# 由于部分类别分左右部位, flipped_idx为其水平翻转后对应的标签
flipped_idx = (15, 14, 17, 16, 19, 18)
ms_outputs = []
# 多尺度预测
for idx, scale in enumerate(cfg.multi_scales):
input_im = input_ims[idx]
parsing_output = exe.run(program=test_prog,
feed={feed_name[0]: input_im},
fetch_list=fetch_list)
output = parsing_output[0][0]
if cfg.flip:
# 若水平翻转,对部分类别进行翻转,与原始预测结果取均值
flipped_output = parsing_output[0][1]
flipped_output[14:20, :, :] = flipped_output[flipped_idx, :, :]
flipped_output = flipped_output[:, :, ::-1]
output += flipped_output
output *= 0.5
output = np.transpose(output, [1, 2, 0])
# 仿射变换回图像原始尺寸
points, points_trans = get_affine_points(im_shape, scale)
M = cv2.getAffineTransform(np.float32(points_trans), np.float32(points))
logits_result = cv2.warpAffine(output, M, im_shape[::-1], flags=cv2.INTER_LINEAR)
ms_outputs.append(logits_result)
# 多尺度预测结果求均值,求预测概率最大的类别
ms_fused_parsing_output = np.stack(ms_outputs)
ms_fused_parsing_output = np.mean(ms_fused_parsing_output, axis=0)
parsing = np.argmax(ms_fused_parsing_output, axis=2)
return parsing, ms_fused_parsing_output
# -*- coding: utf-8 -*-
from utils.util import AttrDict, get_arguments, merge_cfg_from_args
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "test_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "model")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 2
# 均值, 图像预处理减去的均值
cfg.MEAN = 104.008, 116.669, 122.675
# 标准差,图像预处理除以标准差
cfg.STD = 1.0, 1.0, 1.0
# 待预测图像输入尺寸
cfg.input_size = 513, 513
merge_cfg_from_args(args, cfg)
# PaddleSeg 特色垂类分割模型
提供基于PaddlePaddle最新的分割特色模型
## Augmented Context Embedding with Edge Perceiving (ACE2P)
### 1. 模型概述
CVPR 19 Look into Person (LIP) 单人人像分割比赛冠军模型,详见[ACE2P/README](http://gitlab.baidu.com/Paddle/PaddleSeg/tree/master/contrib/ACE2P)
### 2. 模型下载
点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz),下载, 在contrib/ACE2P下解压, `tar -xzf ACE2P.tgz`
### 3. 数据下载
前往LIP数据集官网: http://47.100.21.47:9999/overview.php 或点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760),
加载Testing_images.zip, 解压到contrib/ACE2P/data文件夹下
### 4. 运行
**NOTE:** 运行该模型需要需至少2.5G显存
使用GPU预测
```
python -u infer.py --example ACE2P --use_gpu
```
使用CPU预测:
```
python -u infer.py --example ACE2P
```
## 人像分割 (HumanSeg)
### 1. 模型结构
DeepLabv3+ backbone为Xception65
### 2. 下载模型和数据
点击[链接](https://paddleseg.bj.bcebos.com/models/HumanSeg.tgz),下载解压到contrib文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example HumanSeg --use_gpu
```
使用CPU预测:
```
python -u infer.py --example HumanSeg
```
### 4. 预测结果示例:
原图:![](imgs/Human.jpg)
预测结果:![](imgs/HumanSeg.jpg)
## 车道线分割 (RoadLine)
### 1. 模型结构
Deeplabv3+ backbone为MobileNetv2
### 2. 下载模型和数据
点击[链接](https://paddleseg.bj.bcebos.com/inference_model/RoadLine.tgz),下载解压在contrib文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example RoadLine --use_gpu
```
使用CPU预测:
```
python -u infer.py --example RoadLine
```
#### 4. 预测结果示例:
原图:![](imgs/RoadLine.jpg)
预测结果:![](imgs/RoadLine.png)
# 备注
1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
2. ACE2P模型需预留2G显存,若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
# -*- coding: utf-8 -*-
from utils.util import AttrDict, merge_cfg_from_args, get_arguments
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "test_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "model")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 2
# 均值, 图像预处理减去的均值
cfg.MEAN = 127.5, 127.5, 127.5
# 标准差,图像预处理除以标准差
cfg.STD = 127.5, 127.5, 127.5
# 待预测图像输入尺寸
cfg.input_size = 1536, 576
merge_cfg_from_args(args, cfg)
# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np
from utils.util import get_arguments
from utils.palette import get_palette
from PIL import Image as PILImage
import importlib
args = get_arguments()
config = importlib.import_module(args.example+'.config')
cfg = getattr(config, 'cfg')
# paddle垃圾回收策略FLAG,ACE2P模型较大,当显存不够时建议开启
os.environ['FLAGS_eager_delete_tensor_gb']='0.0'
import paddle.fluid as fluid
# 预测数据集类
class TestDataSet():
def __init__(self):
self.data_dir = cfg.data_dir
self.data_list_file = cfg.data_list_file
self.data_list = self.get_data_list()
self.data_num = len(self.data_list)
def get_data_list(self):
# 获取预测图像路径列表
data_list = []
data_file_handler = open(self.data_list_file, 'r')
for line in data_file_handler:
img_name = line.strip()
name_prefix = img_name.split('.')[0]
if len(img_name.split('.')) == 1:
img_name = img_name + '.jpg'
img_path = os.path.join(self.data_dir, img_name)
data_list.append(img_path)
return data_list
def preprocess(self, img):
# 图像预处理
if cfg.example == 'ACE2P':
reader = importlib.import_module(args.example+'.reader')
ACE2P_preprocess = getattr(reader, 'preprocess')
img = ACE2P_preprocess(img)
else:
img = cv2.resize(img, cfg.input_size).astype(np.float32)
img -= np.array(cfg.MEAN)
img /= np.array(cfg.STD)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0)
return img
def get_data(self, index):
# 获取图像信息
img_path = self.data_list[index]
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
if img is None:
return img, img,img_path, None
img_name = img_path.split(os.sep)[-1]
name_prefix = img_name.replace('.'+img_name.split('.')[-1],'')
img_shape = img.shape[:2]
img_process = self.preprocess(img)
return img, img_process, name_prefix, img_shape
def infer():
if not os.path.exists(cfg.vis_dir):
os.makedirs(cfg.vis_dir)
palette = get_palette(cfg.class_num)
# 人像分割结果显示阈值
thresh = 120
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# 加载预测模型
test_prog, feed_name, fetch_list = fluid.io.load_inference_model(
dirname=cfg.model_path, executor=exe, params_filename='__params__')
#加载预测数据集
test_dataset = TestDataSet()
data_num = test_dataset.data_num
for idx in range(data_num):
# 数据获取
ori_img, image, im_name, im_shape = test_dataset.get_data(idx)
if image is None:
print(im_name, 'is None')
continue
# 预测
if cfg.example == 'ACE2P':
# ACE2P模型使用多尺度预测
reader = importlib.import_module(args.example+'.reader')
multi_scale_test = getattr(reader, 'multi_scale_test')
parsing, logits = multi_scale_test(exe, test_prog, feed_name, fetch_list, image, im_shape)
else:
# HumanSeg,RoadLine模型单尺度预测
result = exe.run(program=test_prog, feed={feed_name[0]: image}, fetch_list=fetch_list)
parsing = np.argmax(result[0][0], axis=0)
parsing = cv2.resize(parsing.astype(np.uint8), im_shape[::-1])
# 预测结果保存
result_path = os.path.join(cfg.vis_dir, im_name + '.png')
if cfg.example == 'HumanSeg':
logits = result[0][0][1]*255
logits = cv2.resize(logits, im_shape[::-1])
ret, logits = cv2.threshold(logits, thresh, 0, cv2.THRESH_TOZERO)
logits = 255 *(logits - thresh)/(255 - thresh)
# 将分割结果添加到alpha通道
rgba = np.concatenate((ori_img, np.expand_dims(logits, axis=2)), axis=2)
cv2.imwrite(result_path, rgba)
else:
output_im = PILImage.fromarray(np.asarray(parsing, dtype=np.uint8))
output_im.putpalette(palette)
output_im.save(result_path)
if idx % 100 == 0:
print('%d processd' % (idx))
print('%d processd done' % (idx))
return 0
if __name__ == "__main__":
infer()
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## Created by: RainbowSecret
## Microsoft Research
## yuyua@microsoft.com
## Copyright (c) 2018
##
## This source code is licensed under the MIT-style license found in the
## LICENSE file in the root directory of this source tree
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import cv2
def get_palette(num_cls):
""" Returns the color map for visualizing the segmentation mask.
Args:
num_cls: Number of classes
Returns:
The color map
"""
n = num_cls
palette = [0] * (n * 3)
for j in range(0, n):
lab = j
palette[j * 3 + 0] = 0
palette[j * 3 + 1] = 0
palette[j * 3 + 2] = 0
i = 0
while lab:
palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
i += 1
lab >>= 3
return palette
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import os
def get_arguments():
parser = argparse.ArgumentParser()
parser.add_argument("--use_gpu",
action="store_true",
help="Use gpu or cpu to test.")
parser.add_argument('--example',
type=str,
help='RoadLine, HumanSeg or ACE2P')
return parser.parse_args()
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
def merge_cfg_from_args(args, cfg):
"""Merge config keys, values in args into the global config."""
for k, v in vars(args).items():
d = cfg
try:
value = eval(v)
except:
value = v
if value is not None:
cfg[k] = value
# PaddleSeg 数据标注
用户需预先采集好用于训练、评估和测试的图片,并使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注,最后用我们提供的数据转换脚本将LabelMe产出的数据格式转换为模型训练时所需的数据格式。
## 1 LabelMe的安装
用户在采集完用于训练、评估和预测的图片之后,需使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注。LabelMe支持在Windows/macOS/Linux三个系统上使用,且三个系统下的标注格式是一样。具体的安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)
## 2 LabelMe的使用
打开终端输入`labelme`会出现LableMe的交互界面,可以先预览`LabelMe`给出的已标注好的图片,再开始标注自定义数据集。
<div align="center">
<img src="./docs/imgs/annotation/image-1.png" width="600px"/>
<p>图1 LableMe交互界面的示意图</p>
</div>
* 预览已标注图片
获取`LabelMe`的源码:
```
git clone https://github.com/wkentaro/labelme
```
终端输入`labelme`会出现LableMe的交互界面,点击`OpenDir`打开`<path/to/labelme>/examples/semantic_segmentation/data_annotated`,其中`<path/to/labelme>`为克隆下来的`labelme`的路径,打开后示意的是语义分割的真值标注。
<div align="center">
<img src="./docs/imgs/annotation/image-2.png" width="600px"/>
<p>图2 已标注图片的示意图</p>
</div>
* 开始标注
请按照下述步骤标注数据集:
​ (1) 点击`OpenDir`打开待标注图片所在目录,点击`Create Polygons`,沿着目标的边缘画多边形,完成后输入目标的类别。在标注过程中,如果某个点画错了,可以按撤销快捷键可撤销该点。Mac下的撤销快捷键为`command+Z`
<div align="center">
<img src="./docs/imgs/annotation/image-3.png" width="600px"/>
<p>图3 标注单个目标的示意图</p>
</div>
​ (2) 右击选择`Edit Polygons`可以整体移动多边形的位置,也可以移动某个点的位置;右击选择`Edit Label`可以修改每个目标的类别。请根据自己的需要执行这一步骤,若不需要修改,可跳过。
<div align="center">
<img src="./docs/imgs/annotation/image-4-1.png" width="00px" />
<img src="./docs/imgs/annotation/image-4-2.png" width="600px"/>
<p>图4 修改标注的示意图</p>
</div>
​ (3) 图片中所有目标的标注都完成后,点击`Save`保存json文件,**请将json文件和图片放在同一个文件夹里**,点击`Next Image`标注下一张图片。
LableMe产出的真值文件可参考我们给出的文件夹`data_annotated`
<div align="center">
<img src="./docs/imgs/annotation/image-5.png" width="600px"/>
<p>图5 LableMe产出的真值文件的示意图</p>
</div>
## 3 数据格式转换
* 我们用于完成语义分割的数据集目录结构如下:
```
my_dataset # 根目录
|-- JPEGImages # 数据集图片
|-- SegmentationClassPNG # 数据集真值
| |-- xxx.png # 像素级别的真值信息
| |...
|-- class_names.txt # 数据集的类别名称
```
<div align="center">
<img src="./docs/imgs/annotation/image-6.png" width="600px"/>
<p>图6 训练所需的数据集目录的结构示意图</p>
</div>
* 运行转换脚本需要依赖labelme和pillow,如未安装,请先安装。Labelme的具体安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)。Pillow的安装:
```shell
pip install pillow
```
* 运行以下代码,将标注后的数据转换成满足以上格式的数据集:
```
python labelme2seg.py <path/to/label_json_file> <path/to/output_dataset>
```
其中,`<path/to/label_json_files>`为图片以及LabelMe产出的json文件所在文件夹的目录,`<path/to/output_dataset>`为转换后的数据集所在文件夹的目录。**需注意的是:`<path/to/output_dataset>`不用预先创建,脚本运行时会自动创建,否则会报错。**
转换得到的数据集可参考我们给出的文件夹`my_dataset`。其中,文件`class_names.txt`是数据集中所有标注类别的名称,包含背景类;文件夹`JPEGImages`保存的是数据集的图片;文件夹`SegmentationClassPNG`保存的是各图片的像素级别的真值信息,背景类`_background_`对应为0,其它目标类别从1开始递增,至多为255。
<div align="center">
<img src="./docs/imgs/annotation/image-7.png" width="600px"/>
<p>图7 训练所需的数据集各目录的内容示意图</p>
</div>
{
"shapes": [
{
"label": "bus",
"line_color": null,
"fill_color": null,
"points": [
[
260.936170212766,
22.563829787234056
],
[
193.936170212766,
19.563829787234056
],
[
124.93617021276599,
39.563829787234056
],
[
89.93617021276599,
101.56382978723406
],
[
81.93617021276599,
150.56382978723406
],
[
108.93617021276599,
145.56382978723406
],
[
88.93617021276599,
244.56382978723406
],
[
89.93617021276599,
322.56382978723406
],
[
116.93617021276599,
367.56382978723406
],
[
158.936170212766,
368.56382978723406
],
[
165.936170212766,
337.56382978723406
],
[
347.936170212766,
335.56382978723406
],
[
349.936170212766,
369.56382978723406
],
[
391.936170212766,
373.56382978723406
],
[
403.936170212766,
335.56382978723406
],
[
425.936170212766,
332.56382978723406
],
[
421.936170212766,
281.56382978723406
],
[
428.936170212766,
252.56382978723406
],
[
428.936170212766,
236.56382978723406
],
[
409.936170212766,
220.56382978723406
],
[
409.936170212766,
150.56382978723406
],
[
430.936170212766,
143.56382978723406
],
[
433.936170212766,
112.56382978723406
],
[
431.936170212766,
96.56382978723406
],
[
408.936170212766,
90.56382978723406
],
[
395.936170212766,
50.563829787234056
],
[
338.936170212766,
25.563829787234056
]
]
},
{
"label": "bus",
"line_color": null,
"fill_color": null,
"points": [
[
88.93617021276599,
115.56382978723406
],
[
0.9361702127659877,
96.56382978723406
],
[
0.0,
251.968085106388
],
[
0.9361702127659877,
265.56382978723406
],
[
27.936170212765987,
265.56382978723406
],
[
29.936170212765987,
283.56382978723406
],
[
63.93617021276599,
281.56382978723406
],
[
89.93617021276599,
252.56382978723406
],
[
100.93617021276599,
183.56382978723406
],
[
108.93617021276599,
145.56382978723406
],
[
81.93617021276599,
151.56382978723406
]
]
},
{
"label": "car",
"line_color": null,
"fill_color": null,
"points": [
[
413.936170212766,
168.56382978723406
],
[
497.936170212766,
168.56382978723406
],
[
497.936170212766,
256.56382978723406
],
[
431.936170212766,
258.56382978723406
],
[
430.936170212766,
236.56382978723406
],
[
408.936170212766,
218.56382978723406
]
]
}
],
"lineColor": [
0,
255,
0,
128
],
"fillColor": [
255,
0,
0,
128
],
"imagePath": "2011_000025.jpg",
"imageData": null
}
\ No newline at end of file
#!/usr/bin/env python
from __future__ import print_function
import argparse
import glob
import json
import os
import os.path as osp
import sys
import numpy as np
import PIL.Image
import labelme
def main():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument('input_dir', help='input annotated directory')
parser.add_argument('output_dir', help='output dataset directory')
args = parser.parse_args()
if osp.exists(args.output_dir):
print('Output directory already exists:', args.output_dir)
sys.exit(1)
os.makedirs(args.output_dir)
os.makedirs(osp.join(args.output_dir, 'JPEGImages'))
os.makedirs(osp.join(args.output_dir, 'SegmentationClassPNG'))
print('Creating dataset:', args.output_dir)
# get the all class names for the given dataset
class_names = ['_background_']
for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
with open(label_file) as f:
data = json.load(f)
for shape in data['shapes']:
points = shape['points']
label = shape['label']
cls_name = label
if not cls_name in class_names:
class_names.append(cls_name)
class_name_to_id = {}
for i, class_name in enumerate(class_names):
class_id = i # starts with 0
class_name_to_id[class_name] = class_id
if class_id == 0:
assert class_name == '_background_'
class_names = tuple(class_names)
print('class_names:', class_names)
out_class_names_file = osp.join(args.output_dir, 'class_names.txt')
with open(out_class_names_file, 'w') as f:
f.writelines('\n'.join(class_names))
print('Saved class_names:', out_class_names_file)
for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
print('Generating dataset from:', label_file)
with open(label_file) as f:
base = osp.splitext(osp.basename(label_file))[0]
out_img_file = osp.join(
args.output_dir, 'JPEGImages', base + '.jpg')
out_png_file = osp.join(
args.output_dir, 'SegmentationClassPNG', base + '.png')
data = json.load(f)
img_file = osp.join(osp.dirname(label_file), data['imagePath'])
img = np.asarray(PIL.Image.open(img_file))
PIL.Image.fromarray(img).save(out_img_file)
lbl = labelme.utils.shapes_to_label(
img_shape=img.shape,
shapes=data['shapes'],
label_name_to_value=class_name_to_id,
)
if osp.splitext(out_png_file)[1] != '.png':
out_png_file += '.png'
# Assume label ranses [0, 255] for uint8,
if lbl.min() >= 0 and lbl.max() <= 255:
lbl_pil = PIL.Image.fromarray(lbl.astype(np.uint8), mode='L')
lbl_pil.save(out_png_file)
else:
raise ValueError(
'[%s] Cannot save the pixel-wise class label as PNG. '
'Please consider using the .npy format.' % out_png_file
)
if __name__ == '__main__':
main()
_background_
bus
car
\ No newline at end of file
# PaddleSeg 性能Benchmark
## 训练性能
### 多GPU加速比
### 显存开销对比
## 预测性能对比
### Windows
### Linux
#### Naive
#### Analysis
# PaddleSeg 分割库配置说明
PaddleSeg提供了提供了统一的配置用于 训练/评估/可视化/导出模型
配置包含以下Group:
* [通用](./configs/basic_group.md)
* [DATASET](./configs/dataset_group.md)
* [DATALOADER](./configs/dataloader_group.md)
* [FREEZE](./configs/freeze_group.md)
* [MODEL](./configs/model_group.md)
* [SOLVER](./configs/solver_group.md)
* [TRAIN](./configs/train_group.md)
* [TEST](./configs/test_group.md)
`Note`:
代码详见pdseg/utils/config.py
# cfg
BASIC Group存放所有通用配置
## `MEAN`
图像预处理减去的均值(格式为*[R, G, B]*
### 默认值
[104.008, 116.669, 122.675]
<br/>
<br/>
## `STD`
图像预处理所除的标准差(格式为*[R, G, B]*
### 默认值
[1.000, 1.000, 1.000]
<br/>
<br/>
## `EVAL_CROP_SIZE`
评估时所对图片裁剪的大小(格式为*[宽, 高]*
### 默认值
无(需要用户自己填写)
### 注意事项
* 裁剪的大小不能小于原图,请将该字段的值填写为评估数据中最长的宽和高
<br/>
<br/>
## `TRAIN_CROP_SIZE`
训练时所对图片裁剪的大小(格式为*[宽, 高]*
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `BATCH_SIZE`
训练、评估、可视化时所用的BATCH大小
### 默认值
1(需要根据实际需求填写)
### 注意事项
* 当指定了多卡运行时,PaddleSeg会将数据平分到每张卡上运行,因此每张卡单次运行的数量为 BATCH_SIZE // dev_count
* 多卡运行时,请确保BATCH_SIZE可被dev_count整除
* 增大BATCH_SIZE有利于模型训练时的收敛速度,但是会带来显存的开销。请根据实际情况评估后填写合适的值
<br/>
<br/>
\ No newline at end of file
# cfg.DATALOADER
DATALOADER Group存放所有与数据加载相关的配置
## `NUM_WORKERS`
数据载入时的并发数量
### 默认值
8
### 注意事项
* 该选项只在`pdseg/train.py``pdseg/eval.py`中使用到
* 当使用多线程时,该字段表示线程适量,使用多进程时,该字段表示进程数量。一般该字段使用默认值即可
<br/>
<br/>
## `BUF_SIZE`
数据载入时的缓存队列大小
### 默认值
256
<br/>
<br/>
\ No newline at end of file
# cfg.DATASET
DATASET Group存放所有与数据集相关的配置
## `DATA_DIR`
数据集主目录,PaddleSeg在读取数据文件列表时,会将列表中的文件名与主目录拼接得到图片的绝对路径
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `TRAIN_FILE_LIST`
训练集列表,调用`pdseg/train.py`进行训练时,会读取该列表中的图片进行训练
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `VAL_FILE_LIST`
验证集列表,调用`pdseg/eval.py`进行效果评估时,会读取该列表中的图片进行评估
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `TEST_FILE_LIST`
测试集列表,调用`pdseg/vis.py`进行可视化展示时,会读取该列表中的图片进行预测
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `VIS_FILE_LIST`
可视化列表,调用`pdseg/train.py`进行训练时,如果打开了--use_tbx开关,则在每次模型保存的时候,会读取该列表中的图片进行可视化
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `NUM_CLASSES`
类别数量,构建网络所需
### 默认值
19(但是一般需要用户修改为自己数据集的类别数量)
### 注意事项
数据集中的label标注必须为0 ~ NUM_CLASSES - 1,如果label设置错误,会导致计算IOU时出现异常
<br/>
<br/>
## `IMAGE_TYPE`
图片类型,支持`rgb``rgba``gray`三种格式
### 默认值
`rgb`
<br/>
<br/>
## `SEPARATOR`
文件列表中用于分隔输入图片和标签图片的分隔符
### 默认值
空格符` `
### 例子
假设训练文件列表如下,则 `SEPARATOR` 应该填写 `|`
```
mydata/train/image1.jpg|mydata/train/image1.label.jpg
mydata/train/image2.jpg|mydata/train/image2.label.jpg
mydata/train/image3.jpg|mydata/train/image3.label.jpg
mydata/train/image4.jpg|mydata/train/image4.label.jpg
...
```
<br/>
<br/>
## `IGNORE_INDEX`
需要忽略的像素标签值,label中所有标记为该值的像素不会参与到loss的计算以及IOU、Acc等指标的计算
### 默认值
255
\ No newline at end of file
# cfg.FREEZE
FREEZE Group存放所有与模型导出相关的配置
## `MODEL_FILENAME`
导出模型后所保存的模型文件名
### 默认值
`__model__`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
## `PARAMS_FILENAME`
导出模型后所保存的参数文件名
### 默认值
`__params__`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
## `SAVE_DIR`
保存导出模型的主目录
### 默认值
`freeze_model`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
\ No newline at end of file
# cfg.MODEL.DEEPLAB
MODEL.DEEPLAB 子Group存放所有和DeepLabv3+模型相关的配置
## `BACKBONE`
DeepLabV3+所用骨干网络,支持`mobilenetv2` `xception65`两种
### 默认值
`xception65`
<br/>
<br/>
## `OUTPUT_STRIDE`
DeepLabV3+下采样率,支持8/16两种选择
### 默认值
16
<br/>
<br/>
## `DEPTH_MULTIPER`
MobileNet V2的depth mutiper值,仅当`BACKBONE``mobilenetv2`生效
### 默认值
1.0
<br/>
<br/>
## `ENCODER_WITH_ASPP`
DeepLabv3+的模型Encoder中是否使用ASPP
### 默认值
True
### 注意事项
* 将该功能置为False可以提升模型计算速度,但是会降低精度
<br/>
<br/>
## `DECODER_WITH_ASPP`
DeepLabv3+的模型是否使用Decoder
### 默认值
True
### 注意事项
* 将该功能置为False可以提升模型计算速度,但是会降低精度
<br/>
<br/>
## `ASPP_WITH_SEP_CONV`
DeepLabv3+的模型的ASPP模块是否使用可分离卷积
### 默认值
False
<br/>
<br/>
## `DECODER_WITH_SEP_CONV`
DeepLabv3+的模型的Decoder模块是否使用可分离卷积
### 默认值
False
<br/>
<br/>
# cfg.MODEL
MODEL Group存放所有和模型相关的配置,该Group还包含三个子Group
* [DeepLabv3p](./model_deeplabv3p_group.md)
* [UNet](./model_unet_group.md)
* [ICNet](./model_icnet_group.md)
## `MODEL_NAME`
所选模型,支持`deeplabv3p` `unet` `icnet`三种模型
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `DEFAULT_NORM_TYPE`
模型所用norm类型,支持`bn` [`gn`]()
### 默认值
`bn`
<br/>
<br/>
## `DEFAULT_GROUP_NUMBER`
默认GROUP数量,仅在`DEFAULT_NORM_TYPE``gn`时生效
### 默认值
32
<br/>
<br/>
## `BN_MOMENTUM`
BatchNorm动量, 一般无需改动
### 默认值
0.99
<br/>
<br/>
## `DEFAULT_EPSILON`
BatchNorm计算时所用的极小值, 防止分母除0溢出,一般无需改动
### 默认值
1e-5
<br/>
<br/>
## `FP16`
是否开启FP16训练
### 默认值
False
<br/>
<br/>
## `SCALE_LOSS`
对损失进行缩放的系数
### 默认值
1.0
### 注意事项
* 启动fp16训练时,建议设置该字段为8
<br/>
<br/>
## `MULTI_LOSS_WEIGHT`
多路损失的权重
### 默认值
[1.0]
### 注意事项
* 该字段仅在模型存在多路损失的情况下生效
* 目前支持的模型中只有`icnet`使用多路(3路)损失
* 当选择模型为`icnet`且该字段的长度不为3时,PaddleSeg会强制设置该字段为[1.0, 0.4, 0.16]
### 示例
假设模型存在三路损失,计算结果分别为loss1/loss2/loss3,并且`MULTI_LOSS_WEIGHT`的值为[1.0, 0.4, 0.16],则最终损失的计算结果为
```math
loss = 1.0 * loss1 + 0.4 * loss2 + 0.16 * loss3
```
<br/>
<br/>
# cfg.MODEL.ICNET
MODEL.ICNET 子Group存放所有和ICNet模型相关的配置
## `DEPTH_MULTIPER`
Resnet backbone的depth multiper
### 默认值
0.5
<br/>
<br/>
## `LAYERS`
ResNet backbone的层数,支持`18` `34` `50` `101` `152`等五种
### 默认值
50
<br/>
<br/>
\ No newline at end of file
# cfg.MODEL.UNET
MODEL.UNET 子Group存放所有和UNet模型相关的配置
## `UPSAMPLE_MODE`
上采样方式,支持`bilinear`或者不设置
### 默认值
`bilinear`
### 注意事项
*`UPSAMPLE_MODE`值为`bilinear`时,UNet上采样方法为双线性插值法,否则使用转置卷积进行上采样
<br/>
<br/>
\ No newline at end of file
# cfg.SOLVER
SOLVER Group定义所有和训练优化相关的配置
## `LR`
初始学习率
### 默认值
0.1
<br/>
<br/>
## `LR_POLICY`
学习率的衰减策略,支持`poly` `piecewise` `cosine`三种策略
### 默认值
`poly`
### 示例
* 当使用`poly`衰减时,假设初始学习率为0.1,训练总步数为10000,则在power分别为`0.4``0.8``1``1.2``1.6`时,衰减曲线如下图:
* power = 1 衰减曲线为直线
* power > 1 衰减曲线内凹
* power < 1 衰减曲线外凸
<p align="center">
<img src="../imgs/poly_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
* 当使用`piecewise`衰减时,假设初始学习率为0.1,GAMMA为0.9,总EPOCH数量为100,DECAY_EPOCH为[10, 20],衰减曲线如下图:
<p align="center">
<img src="../imgs/piecewise_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
* 当使用`cosine`衰减时,假设初始学习率为0.1,总EPOCH数量为100,衰减曲线如下图:
<p align="center">
<img src="../imgs/cosine_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
<br/>
<br/>
## `POWER`
学习率Poly下降指数,仅当策略为[`LR_POLICY`](#LR_POLICY)`poly`时有效
### 默认值
0.9
<br/>
<br/>
## `GAMMA`
学习率piecewise下降指数,仅当策略为[`LR_POLICY`](#LR_POLICY)`piecewise`时有效
### 默认值
0.1
<br/>
<br/>
## `DECAY_EPOCH`
学习率piecewise下降间隔,仅当策略为[`LR_POLICY`](#LR_POLICY)`piecewise`时有效
### 默认值
[10, 20]
<br/>
<br/>
## `WEIGHT_DECAY`
L2正则化系数
### 默认值
0.00004
<br/>
<br/>
## `BEGIN_EPOCH`
起始EPOCH值
### 默认值
0
<br/>
<br/>
## `NUM_EPOCHS`
训练EPOCH数
### 默认值
30(需要根据实际需求进行调整)
<br/>
<br/>
## `SNAPSHOT`
训练时,保存模型的间隔(单位为EPOCH)
### 默认值
10(意味着每训练10个EPOCH保存一次模型)
<br/>
<br/>
\ No newline at end of file
# cfg.TEST
TEST Group存放所有和测试模型相关的配置
## `TEST_MODEL`
待测试模型的路径
### 默认值
无(需要用户自己填写)
### 注意事项
* 使用`pdseg/export_model.py` `pdseg/eval.py` `pdseg/vis.py`等脚本进行模型的评估、可视化和导出时,该字段必填
<br/>
<br/>
\ No newline at end of file
# cfg.TRAIN
TRAIN Group存放所有和训练相关的配置
## `MODEL_SAVE_DIR`
在训练周期内定期保存模型的主目录
## 默认值
无(需要用户自己填写)
<br/>
<br/>
## `PRETRAINED_MODEL`
预训练模型路径
## 默认值
## 注意事项
* 若未指定该字段,则模型会随机初始化所有的参数,从头开始训练
* 若指定了该字段,但是路径不存在,则参数加载失败,仍然会被随机初始化
* 若指定了该字段,且路径存在,但是部分参数不存在或者shape无法对应,则该部分参数随机初始化
<br/>
<br/>
## `RESUME`
是否从预训练模型中恢复参数并继续训练
## 默认值
False
## 注意事项
* 当该字段被置为True且`PRETRAINED_MODEL`不存在时,该选项不生效
* 当该字段被置为True且`PRETRAINED_MODEL`存在时,PaddleSeg会恢复到上一次训练的最近一个epoch,并且恢复训练过程中的临时变量(如已经衰减过的学习率,Optimizer的动量数据等)
* 当该字段被置为True且`PRETRAINED_MODEL`存在时,`PRETRAINED_MODEL`路径的最后一个目录必须为int数值或者字符串final,PaddleSeg会将int数值作为当前起始EPOCH继续训练,若目录为final,则不会继续训练。若目录不满足上述条件,PaddleSeg会抛出错误。
<br/>
<br/>
## `SYNC_BATCH_NORM`
是否在多卡间同步BN的均值和方差
## 默认值
False
## 注意事项
* 打开该选项会带来一定的性能消耗(多卡间同步数据导致)
* 仅在GPU多卡训练时该开关有效(Windows不支持多卡训练,因此无需打开该开关)
* GPU多卡训练时,建议开启该开关,可以提升模型的训练效果
\ No newline at end of file
# PaddleSeg 数据增强
## 数据增强基本流程
![](imgs/data_aug_flow.png)
## resize
resize 步骤是指将输入图像按照某种规则先进行resize,PaddleSeg支持以下3种resize方式:
![](imgs/aug_method.png)
- unpadding
将输入图像直接resize到某一个固定大小下,送入到网络中间训练,对应参数为AUG.FIX_RESIZE_SIZE。预测时同样操作。
- stepscaling
将输入图像按照某一个比例resize,这个比例以某一个步长在一定范围内随机变动。设定最小比例参数为`AUG.MIN_SCALE_FACTOR`, 最大比例参数`AUG.MAX_SCALE_FACTOR`,步长参数为`AUG.SCALE_STEP_SIZE`。预测时不对输入图像做处理。
- rangescaling
固定长宽比resize,即图像长边对齐到某一个固定大小,短边随同样的比例变化。设定最小大小参数为`AUG.MIN_RESIZE_VALUE`,设定最大大小参数为`AUG.MAX_RESIZE_VALUE`。预测时需要将长边对齐到`AUG.INF_RESIZE_VALUE`所指定的大小,其中`AUG.INF_RESIZE_VALUE``AUG.MIN_RESIZE_VALUE``AUG.MAX_RESIZE_VALUE`范围内。
rangescaling示意图如下:
![](imgs/rangescale.png)
## rich crop
Rich Crop是PaddleSeg结合实际业务经验开放的一套数据增强策略,面向标注数据少,测试数据情况繁杂的分割业务场景使用的数据增强策略。流程如下图所示:
![RichCrop示意图](imgs/data_aug_example.png)
rich crop是指对图像进行多种变换,保证在训练过程中数据的丰富多样性,PaddleSeg支持以下几种变换。`AUG.RICH_CROP.ENABLE`为False时会直接跳过该步骤。
- blur
图像加模糊,使用开关`AUG.RICH_CROP.BLUR`,为False时该项功能关闭。`AUG.RICH_CROP.BLUR_RATIO`控制加入模糊的概率。
- flip
图像上下翻转,使用开关`AUG.RICH_CROP.FLIP`,为False时该项功能关闭。`AUG.RICH_CROP.FLIP_RATIO`控制加入模糊的概率。
- rotation
图像旋转,`AUG.RICH_CROP.MAX_ROTATION`控制最大旋转角度。旋转产生的多余的区域的填充值为均值。
- aspect
图像长宽比调整,从图像中crop一定区域出来之后在某一长宽比内进行resize。控制参数`AUG.RICH_CROP.MIN_AREA_RATIO``AUG.RICH_CROP.ASPECT_RATIO`
- color jitter
图像颜色调整,控制参数`AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO``AUG.RICH_CROP.SATURATION_JITTER_RATIO``AUG.RICH_CROP.CONTRAST_JITTER_RATIO`
## random crop
该步骤主要是通过crop的方式使得输入到网络中的图像在某一个固定大小,控制该大小的参数为TRAIN_CROP_SIZE,类型为tuple,格式为(width, height). 当输入图像大小小于CROP_SIZE的时候会对输入图像进行padding,padding值为均值。
- preprocess
- 减均值
- 除方差
- 水平翻转
- 输入图片格式
- 原图
- 图片格式:rgb三通道图片和rgba四通道图片两种类型的图片进行训练,但是在一次训练过程只能存在一种格式。
- 图片转换:灰度图片经过预处理后之后会转变成三通道图片
- 图片参数设置:当图片为三通道图片时IMAGE_TYPE设置为rgb, 对应MEAN和STD也必须是一个长度为3的list,当图片为四通道图片时IMAGE_TYPE设置为rgba,对应的MEAN和STD必须是一个长度为4的list。
- 标注图
- 图片格式:标注图片必须为png格式的单通道多值图,元素值代表的是这个元素所属于的类别。
- 图片转换:在datalayer层对label图片进行的任何resize,以及旋转的操作,都必须采用最近邻的插值方式。
- 图片ignore:设置TRAIN.IGNORE_INDEX 参数可以选择性忽略掉属于某一个类别的所有像素点。这个参数一般设置为255
# PaddleSeg 数据准备
## 数据标注
数据标注推荐使用LabelMe工具,具体可参考文档[PaddleSeg 数据标注](./annotation/README.md)
## 语义分割标注规范
PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试集。像素标注类别需要从0开始递增。
**NOTE:** 标注图像请使用PNG无损压缩格式的图片
以Cityscapes数据集为例, 我们需要整理出训练集、验证集、测试集对应的原图和标注文件列表用于PaddleSeg训练即可。
其中`DATASET.DATA_DIR`为数据根目录,文件列表的路径以数据集根目录作为相对路径起始点。
```
./cityscapes/ # 数据集根目录
├── gtFine # 标注目录
│   ├── test
│   │   ├── berlin
│   │   └── ...
│   ├── train
│   │   ├── aachen
│   │   └── ...
│   └── val
│   ├── frankfurt
│   └── ...
└── leftImg8bit # 原图目录
├── test
│   ├── berlin
│   └── ...
├── train
│   ├── aachen
│   └── ...
└── val
├── frankfurt
└── ...
```
文件列表组织形式如下
```
原始图片路径 [SEP] 标注图片路径
```
其中`[SEP]`是文件路径分割库,可以`DATASET.SEPRATOR`配置中进行配置, 默认为空格。
如果文件名中存在**空格**,推荐使用'|'等文件名不可用字符进行切分。
**注意事项**
* 务必保证分隔符在文件列表中每行只存在一次, 如文件名中存在空格,请使用'|'等文件名不可用字符进行切分
* 文件列表请使用**UTF-8**格式保存, PaddleSeg默认使用UTF-8编码读取file_list文件
如下图所示,左边为原图的图片路径,右边为图片对应的标注路径。
![cityscapes_filelist](./docs/imgs/file_list.png)
完整的配置信息可以参考[`./dataset/cityscapes_demo`](../dataset/cityscapes_demo/)目录下的yaml和文件列表。
## 数据校验
从7方面对用户自定义的数据集和yaml配置进行校验,帮助用户排查基本的数据和配置问题。
数据校验脚本如下,支持通过`YAML_FILE_PATH`来指定配置文件。
```
# YAML_FILE_PATH为yaml配置文件路径
python pdseg/check.py --cfg ${YAML_FILE_PATH}
```
### 1 数据集基本校验
* 数据集路径检查,包括`DATASET.TRAIN_FILE_LIST``DATASET.VAL_FILE_LIST``DATASET.TEST_FILE_LIST`设置是否正确。
* 列表分割符检查,判断在`TRAIN_FILE_LIST``VAL_FILE_LIST``TEST_FILE_LIST`列表文件中的分隔符`DATASET.SEPARATOR`设置是否正确。
### 2 标注类别校验
检查实际标注类别是否和配置参数`DATASET.NUM_CLASSES``DATASET.IGNORE_INDEX`匹配。
**NOTE:**
标注图像类别数值必须在[0~(`DATASET.NUM_CLASSES`-1)]范围内或者为`DATASET.IGNORE_INDEX`
标注类别最好从0开始,否则可能影响精度。
### 3 标注像素统计
统计每种类别像素数量,显示以供参考。
### 4 标注格式校验
检查标注图像是否为PNG格式。
**NOTE:** 标注图像请使用PNG无损压缩格式的图片,若使用其他格式则可能影响精度。
### 5 图像格式校验
检查图片类型`DATASET.IMAGE_TYPE`是否设置正确。
**NOTE:** 当数据集包含三通道图片时`DATASET.IMAGE_TYPE`设置为rgb;
当数据集全部为四通道图片时`DATASET.IMAGE_TYPE`设置为rgba;
### 6 图像与标注图尺寸一致性校验
验证图像尺寸和对应标注图尺寸是否一致。
### 7 模型验证参数`EVAL_CROP_SIZE`校验
验证`EVAL_CROP_SIZE`是否设置正确,共有3种情形:
-`AUG.AUG_METHOD`为unpadding时,`EVAL_CROP_SIZE`的宽高应不小于`AUG.FIX_RESIZE_SIZE`的宽高。
-`AUG.AUG_METHOD`为stepscaling时,`EVAL_CROP_SIZE`的宽高应不小于原图中最大的宽高。
-`AUG.AUG_METHOD`为rangscaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最大的宽高。
我们将计算并给出`EVAL_CROP_SIZE`的建议值。
\ No newline at end of file
# PaddleSeg预测库部署
# PaddleSeg 安装说明
## 推荐开发环境
* Python2.7 or 3.5+
* CUDA 9.2
* cudnn v7.1
## 1. 安装PaddlePaddle
### pip安装
由于图像分割任务模型计算量大,强烈推荐在GPU版本的paddlepaddle下使用PaddleSeg.
```
pip install paddlepaddle-gpu
```
### Conda安装
PaddlePaddle最新版本1.5支持Conda安装,可以减少相关依赖安装成本,conda相关使用说明可以参考[Anaconda](https://www.anaconda.com/distribution/)
```
conda install -c paddle paddlepaddle-gpu cudatoolkit=9.0
```
更多安装方式详情可以查看 [PaddlePaddle快速开始](https://www.paddlepaddle.org.cn/start)
## 2. 下载PaddleSeg代码
```
git clone https://github.com/PaddlePaddle/PaddleSeg
```
## 3. 安装PaddleSeg依赖
```
pip install -r requirements.txt
```
## 4. 本地流程测试
通过执行以下命令,会完整执行数据下载,训练,可视化,预测模型导出四个环节,用于验证PaddleSeg安装和依赖是否正常。
```
python test/local_test_cityscapes.py
```
\ No newline at end of file
# PaddleSeg 预训练模型
PaddleSeg对所有内置的分割模型都提供了公开数据集的下的预训练模型,通过加载预训练模型后训练可以在自定义数据集中得到更稳定地效果。
## ImageNet预训练模型
所有Imagenet预训练模型来自于PaddlePaddle图像分类库,想获取更多细节请点击[这里](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification))
| 模型 | 数据集合 | Depth multiplier | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error|
|---|---|---|---|---|---|
| MobieNetV2_1.0x | ImageNet | 1.0x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar) | 72.15%/90.65% |
| MobieNetV2_0.25x | ImageNet | 0.25x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.25 <br> MODEL.DEFAULT_NORM_TYPE: bn |[MobileNetV2_0.25x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar) | 53.21%/76.52% |
| MobieNetV2_0.5x | ImageNet | 0.5x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.5 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_0.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar) | 65.03%/85.72% |
| MobieNetV2_1.5x | ImageNet | 1.5x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.5 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar) | 74.12%/91.67% |
| MobieNetV2_2.0x | ImageNet | 2.0x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 2.0 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_2.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar) | 75.23%/92.58% |
用户可以结合实际场景的精度和预测性能要求,选取不同`Depth multiplier`参数的MobileNet模型。
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error |
|---|---|---|---|---|
| Xception41 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_41 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception41_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception41_pretrained.tgz) | 79.5%/94.38% |
| Xception65 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception65_pretrained.tgz) | 80.32%/94.47% |
| Xception71 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_71 <br> MODEL.DEFAULT_NORM_TYPE: bn| coming soon | -- |
## COCO预训练模型
train数据集为coco instance分割数据集合转换成的语义分割数据集合
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Strid|multi-scale test| mIoU |
|---|---|---|---|---|---|---|
| DeepLabv3+/MobileNetv2/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn|[deeplabv3plus_coco_bn_init.tgz](https://bj.bcebos.com/v1/paddleseg/deeplabv3plus_coco_bn_init.tgz) | 16 | --| -- |
| DeeplabV3+/Xception65/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn | [xception65_coco.tgz](https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz)| 16 | -- | -- |
| UNet/bn | COCO | MODEL.MODEL_NEME: unet <br> MODEL.DEFAULT_NORM_TYPE: bn | [unet](https://paddleseg.bj.bcebos.com/models/unet_coco_v2.tgz) | 16 | -- | -- |
## Cityscapes预训练模型
train数据集合为Cityscapes 训练集合,测试为Cityscapes的验证集合
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Stride| mutli-scale test| mIoU on val|
|---|---|---|---|---|---|---|---|
| DeepLabv3+/MobileNetv2/bn | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEEPLAB.ENCODER_WITH_ASPP: False <br> MODEL.DEEPLAB.ENABLE_DECODER: False <br> MODEL.DEFAULT_NORM_TYPE: bn|[mobilenet_cityscapes.tgz] (https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) |16|false| 0.698|
| DeepLabv3+/Xception65/gn | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: gn | [deeplabv3p_xception65_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_cityscapes.tgz) |16|false| 0.7804 |
| DeepLabv3+/Xception65/bn | Cityscapes | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_deeplab_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/Xception65_deeplab_cityscapes.tgz) | 16 | false | 0.7715 |
| ICNet/bn | Cityscapes | MODEL.MODEL_NAME: icnet <br> MODEL.DEFAULT_NORM_TYPE: bn | [icnet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/icnet_cityscapes.tgz) |16|false| 0.6854 |
# PaddleSeg 模型列表
### U-Net
U-Net 起源于医疗图像分割,整个网络是标准的encoder-decoder网络,特点是参数少,计算快,应用性强,对于一般场景适应度很高。
![](./imgs/unet.png)
### DeepLabv3+
DeepLabv3+ 是DeepLab系列的最后一篇文章,其前作有 DeepLabv1,DeepLabv2, DeepLabv3,
在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层,
其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。
![](./imgs/deeplabv3p.png)
在PaddleSeg当前实现中,支持两种分类Backbone网络的切换
- MobileNetv2:
适用于移动设备的快速网络,如果对分割性能有较高的要求,请使用这一backbone网络。
- Xception:
DeepLabv3+原始实现的backbone网络,兼顾了精度和性能,适用于服务端部署。
### ICNet
Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。 ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
整个网络结构如下:
![](./imgs/icnet.png)
## 参考
- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
- [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
# PaddleSeg特殊网络结构介绍
### Group Norm
![](./imgs/gn.png)
关于Group Norm的介绍可以参考论文:https://arxiv.org/abs/1803.08494
GN 把通道分为组,并计算每一组之内的均值和方差,以进行归一化。GN 的计算与批量大小无关,其精度也在各种批量大小下保持稳定。适应于网络参数很重的模型,比如deeplabv3+这种,可以在一个小batch下取得一个较好的训练效果。
### Synchronized Batch Norm
Synchronized Batch Norm跨GPU批归一化策略最早在[MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240)
论文中提出,在[Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)论文中以Yolov3验证了这一策略的有效性,[PaddleCV/yolov3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/yolov3)实现了这一系列策略并比Darknet框架版本在COCO17数据上mAP高5.9.
PaddleSeg基于PaddlePaddle框架的sync_batch_norm策略,可以支持通过多卡实现大batch size的分割模型训练,可以得到更高的mIoU精度。
PaddleSeg提供了 `训练`/`评估`/`预测(可视化)`/`模型导出` 等四个功能的使用脚本。四个脚本都支持通过不同的Flags来开启特定功能,也支持通过Options来修改默认的[训练配置](./config.md)。四者的使用方式非常接近,如下:
```shell
# 训练
python pdseg/train.py ${FLAGS} ${OPTIONS}
# 评估
python pdseg/eval.py ${FLAGS} ${OPTIONS}
# 预测/可视化
python pdseg/vis.py ${FLAGS} ${OPTIONS}
# 模型导出
python pdseg/export_model.py ${FLAGS} ${OPTIONS}
```
`Note`:
> * FLAGS必须位于OPTIONS之前,否会将会遇到报错,例如如下的例子:
>
> ```shell
> # FLAGS "--cfg configs/cityscapes.yaml" 必须在 OPTIONS "BATCH_SIZE 1" 之前
> python pdseg/train.py BATCH_SIZE 1 --cfg configs/cityscapes.yaml
> ```
## FLAGS
|FLAG|支持脚本|用途|默认值|备注|
|-|-|-|-|-|
|--cfg|ALL|配置文件路径|None||
|--use_gpu|train/eval/vis|是否使用GPU进行训练|False||
|--use_mpio|train/eval|是否使用多线程进行IO处理|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。</br> NOTE:windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 </br> |
|--use_tbx|train|是否使用tensorboardX记录训练数据|False||
|--log_steps|train|训练日志的打印周期(单位为step)|10||
|--debug|train|是否打印debug信息|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度|
|--tbx_log_dir|train|tensorboardX的日志路径|None||
|--do_eval|train|是否在保存模型时进行效果评估|False||
|--vis_dir|vis|保存可视化图片的路径|"visual"||
|--also_save_raw_results|vis|是否保存原始的预测图片|False||
## OPTIONS
详见[训练配置](./config.md)
## 使用示例
下面通过一个简单的示例,说明如何使用PaddleSeg提供的预训练模型进行finetune。我们选择基于COCO数据集预训练的unet模型作为pretrained模型,在一个Oxford-IIIT Pet数据集上进行finetune。
**Note:** 为了快速体验,我们使用Oxford-IIIT Pet做了一个小型数据集,后续数据都使用该小型数据集。
### 准备工作
在开始教程前,请先确认准备工作已经完成:
1. 下载合适版本的paddlepaddle
2. PaddleSeg相关依赖已经安装
如果有不确认的地方,请参考[安装说明](./docs/installation.md)
### 下载预训练模型
```shell
# 下载预训练模型
wget https://bj.bcebos.com/v1/paddleseg/models/unet_coco_init.tgz
# 解压缩到当前路径下
tar xvzf unet_coco_init.tgz
```
### 下载Oxford-IIIT数据集
```shell
# 下载Oxford-IIIT Pet数据集
wget https://paddleseg.bj.bcebos.com/dataset/mini_pet.zip --no-check-certificate
# 解压缩到当前路径下
unzip mini_pet.zip
```
### Finetune
接着开始Finetune,为了方便体验,我们在configs目录下放置了Oxford-IIIT Pet所对应的配置文件`unet_pet.yaml`,可以通过`--cfg`指向该文件来设置训练配置。
我们选择两张GPU进行训练,这可以通过环境变量`CUDA_VISIBLE_DEVICES`来指定。
除此之外,我们指定总BATCH_SIZE为4,PaddleSeg会根据可用的GPU数量,将数据平分到每张卡上,务必确保BATCH_SIZE为GPU数量的整数倍(在本例中,每张卡的BATCH_SIZE为2)。
```
export CUDA_VISIBLE_DEVICES=0,1
python pdseg/train.py --use_gpu \
--do_eval \
--use_tbx \
--tbx_log_dir train_log \
--cfg configs/unet_pet.yaml \
BATCH_SIZE 4 \
TRAIN.PRETRAINED_MODEL unet_coco_init \
DATASET.DATA_DIR mini_pet \
DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
DATASET.TRAIN_FILE_LIST mini_pet/file_list/train_list.txt \
DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
DATASET.VIS_FILE_LIST mini_pet/file_list/val_list.txt
TRAIN.SYNC_BATCH_NORM True
SOLVER.LR 5e-5
```
`NOTE`:
> * 上述示例中,一共存在三套配置方案: PaddleSeg默认配置/unet_pet.yaml/OPTIONS,三者的优先级顺序为 OPTIONS > yaml > 默认配置。这个原则对于train.py/eval.py/vis.py/export_model.py都适用
>
> * 如果发现因为内存不足而Crash。请适当调低BATCH_SIZE。如果本机GPU内存充足,则可以调高BATCH_SIZE的大小以获得更快的训练速度
>
> * windows并不支持多卡训练
### 训练过程可视化
当打开do_eval和use_tbx两个开关后,我们可以通过TensorBoard查看训练的效果
```shell
tensorboard --logdir train_log --host {$HOST_IP} --port {$PORT}
```
NOTE:
1. 上述示例中,$HOST_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口
2. 数据量较大时,前端加载速度会比较慢,请耐心等待
启动TensorBoard命令后,我们可以在浏览器中查看对应的训练数据
`SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势
![](docs/imgs/tensorboard_scalar.JPG)
`IMAGE`这个tab中,查看样本的预测情况
![](docs/imgs/tensorboard_image.JPG)
### 模型评估
训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为500,保存间隔为10,因此一共会产生50个定期保存的模型,加上最终保存的final模型,一共有51个模型。我们选择最后保存的模型进行效果的评估:
```shell
python pdseg/eval.py --use_gpu \
--cfg configs/unet_pet.yaml \
DATASET.DATA_DIR mini_pet \
DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
### 模型预测/可视化
通过vis.py来评估模型效果,我们选择最后保存的模型进行效果的评估:
```shell
python pdseg/vis.py --use_gpu \
--cfg configs/unet_pet.yaml \
DATASET.DATA_DIR mini_pet \
DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
`NOTE`
1. 可视化的图片会默认保存在visual/visual_results目录下,可以通过`--vis_dir`来指定输出目录
2. 训练过程中会使用DATASET.VIS_FILE_LIST中的图片进行可视化显示,而vis.py则会使用DATASET.TEST_FILE_LIST
### 模型导出
当确定模型效果满足预期后,我们需要通过export_model.py来导出一个可用于部署到服务端预测的模型:
```shell
python pdseg/export_model.py --cfg configs/unet_pet.yaml \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
模型会导出到freeze_model目录,接下来就是进行模型的部署,相关步骤,请查看[模型部署](./inference/README.md)
\ No newline at end of file
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
include(external-cmake/yaml-cpp.cmake)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/yaml-cpp/include")
include_directories("${PADDLE_DIR}/")
include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
include_directories("${PADDLE_DIR}/third_party/install/glog/include")
include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
include_directories("${PADDLE_DIR}/third_party/boost")
include_directories("${PADDLE_DIR}/third_party/eigen3")
link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
link_directories("${PADDLE_DIR}/paddle/lib/")
link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
link_directories("${CMAKE_CURRENT_BINARY_DIR}")
if (WIN32)
include_directories("${PADDLE_DIR}/paddle/fluid/inference")
link_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${OPENCV_DIR}/build/include")
include_directories("${OPENCV_DIR}/opencv/build/include")
link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
else ()
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
include_directories("${OPENCV_DIR}/include")
link_directories("${OPENCV_DIR}/lib64")
endif ()
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
# TODO let users define cuda lib path
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
if (NOT WIN32)
if (USE_TENSORRT AND WITH_GPU)
include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
endif()
endif(NOT WIN32)
if (NOT WIN32)
set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
if(EXISTS ${NGRAPH_PATH})
include(GNUInstallDirs)
include_directories("${NGRAPH_PATH}/include")
link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_MKL)
include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
if (WIN32)
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
endif ()
set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if(WITH_STATIC_LIB)
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
else()
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf snappystream snappy z xxhash
${EXTERNAL_LIB})
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
opencv_world346 glog libyaml-cppmt gflags_static libprotobuf snappy zlibstatic xxhash snappystream ${EXTERNAL_LIB})
set(DEPS ${DEPS} libcmt shlwapi)
set(DEPS ${DEPS} ${YAML_CPP_LIBRARY})
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgcodecs${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgproc${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_core${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_highgui${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libIlmImf${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjasper${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibpng${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibtiff${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libittnotify${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjpeg-turbo${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibwebp${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libzlib${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
SET(PADDLESEG_INFERENCE_SRCS preprocessor/preprocessor.cpp preprocessor/preprocessor_seg.cpp predictor/seg_predictor.cpp)
ADD_LIBRARY(libpaddleseg_inference STATIC ${PADDLESEG_INFERENCE_SRCS})
target_link_libraries(libpaddleseg_inference ${DEPS})
add_executable(demo demo.cpp)
ADD_DEPENDENCIES(libpaddleseg_inference yaml-cpp)
ADD_DEPENDENCIES(demo yaml-cpp libpaddleseg_inference)
target_link_libraries(demo ${DEPS} libpaddleseg_inference)
add_custom_command(TARGET demo POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/bin/mkldnn.dll ./mkldnn.dll
)
\ No newline at end of file
{
"configurations": [
{
"name": "x64-Release",
"generator": "Ninja",
"configurationType": "RelWithDebInfo",
"inheritEnvironments": [ "msvc_x64_x64" ],
"buildRoot": "${projectDir}\\out\\build\\${name}",
"installRoot": "${projectDir}\\out\\install\\${name}",
"cmakeCommandArgs": "",
"buildCommandArgs": "-v",
"ctestCommandArgs": "",
"variables": [
{
"name": "CUDA_LIB",
"value": "C:/PaddleDeploy/cudalib/v8.0/lib/x64",
"type": "PATH"
},
{
"name": "OPENCV_DIR",
"value": "C:/PaddleDeploy/opencv",
"type": "PATH"
},
{
"name": "PADDLE_DIR",
"value": "C:/PaddleDeploy/fluid_inference",
"type": "PATH"
},
{
"name": "CMAKE_BUILD_TYPE",
"value": "Release",
"type": "STRING"
}
]
}
]
}
\ No newline at end of file
# 依赖安装
## OpenCV
OpenCV官方Release地址:https://opencv.org/releases/
### Windows
1. 下载Windows安装包:OpenCV-3.4.6
2. 双击安装到指定位置,如D:\opencv
3. 配置环境变量
> 1.我的电脑->属性->高级系统设置->环境变量
> 2.在系统变量中找到Path(如没有,自行创建),并双击编辑
> 3.新建,将opencv路径填入并保存,如D:\opencv\build\x64\vc14\bin
### Linux
1. 下载OpenCV-3.4.6 Sources,并解压,如/home/user/opencv-3.4.6
2. cd opencv-3.4.6 & mkdir build & mkdir release
3. 修改modules/videoio/src/cap_v4l.cpp 在代码第253行下,插入如下代码
```
#ifndef V4L2_CID_ROTATE
#define V4L2_CID_ROTATE (V4L2_CID_BASE+34)
#endif
#ifndef V4L2_CID_IRIS_ABSOLUTE
#define V4L2_CID_IRIS_ABSOLUTE (V4L2_CID_CAMERA_CLASS_BASE+17)
#endif
```
3. cd build
4. cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/ssd2/Jason/tmp/opencv-3.4.6/release/ --OPENCV_FORCE_3RDPARTY_BUILD=OFF
5. make -j10
6. make install
编译后产出的头文件和lib即安装在/home/user/opencv-3.4.6/release目录下
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# PaddleSeg C++预测部署方案
## 说明
本目录提供一个跨平台的图像分割模型的C++预测部署方案,用户通过一定的配置,加上少量的代码,即可把模型集成到自己的服务中,完成图像分割的任务。
主要设计的目标包括以下三点:
- 跨平台,支持在 windows和Linux完成编译、开发和部署
- 支持主流图像分割任务,用户通过少量配置即可加载模型完成常见预测任务,比如人像分割等
- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理、后处理等逻辑
## 主要目录和文件
| 文件 | 作用 |
|-------|----------|
| CMakeList.txt | cmake 编译配置文件 |
| external-cmake| 依赖的外部项目 cmake (目前仅有yaml-cpp)|
| demo.cpp | 示例C++代码,演示加载模型完成预测任务 |
| predictor | 加载模型并预测的类代码|
| preprocess |数据预处理相关的类代码|
| utils | 一些基础公共函数|
| images/humanseg | 样例人像分割模型的测试图片目录|
| conf/humanseg.yaml | 示例人像分割模型配置|
| tools/visualize.py | 预测结果彩色可视化脚本 |
## Windows平台编译
### 前置条件
* Visual Studio 2015+
* CUDA 8.0 / CUDA 9.0 + CuDNN 7
* CMake 3.0+
我们分别在 `Visual Studio 2015``Visual Studio 2019 Community` 两个版本下做了测试.
**下面所有示例,以根目录为 `D:\`演示**
### Step1: 下载代码
1. `git clone http://gitlab.baidu.com/Paddle/PaddleSeg.git`
2. 拷贝 `D:\PaddleSeg\inference\` 目录到 `D:\PaddleDeploy`下
目录`D:\PaddleDeploy\inference` 目录包含了`CMakelist.txt`以及代码等项目文件.
### Step2: 下载PaddlePaddle预测库fluid_inference
根据Windows环境,下载相应版本的PaddlePaddle预测库,并解压到`D:\PaddleDeploy\`目录
| CUDA | GPU | 下载地址 |
|------|------|--------|
| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
`D:\PaddleDeploy\fluid_inference`目录包含内容为:
```bash
paddle # paddle核心目录
third_party # paddle 第三方依赖
version.txt # 编译的版本信息
```
### Step3: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\PaddleDeploy\opencv`
3. 配置环境变量,如下流程所示
1. 我的电脑->属性->高级系统设置->环境变量
2. 在系统变量中找到Path(如没有,自行创建),并双击编辑
3. 新建,将opencv路径填入并保存,如`D:\PaddleDeploy\opencv\build\x64\vc14\bin`
### Step4: 以VS2015为例编译代码
以下命令需根据自己系统中各相关依赖的路径进行修改
* 调用VS2015, 请根据实际VS安装路径进行调整,打开cmd命令行工具执行以下命令
* 其他vs版本,请查找到对应版本的`vcvarsall.bat`路径,替换本命令即可
```
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
```
* CMAKE编译工程
* PADDLE_DIR: fluid_inference预测库目录
* CUDA_LIB: CUDA动态库目录, 请根据实际安装情况调整
* OPENCV_DIR: OpenCV解压目录
```
# 创建CMake的build目录
D:
cd PaddleDeploy\inference
mkdir build
cd build
D:\PaddleDeploy\inference\build> cmake .. -G "Visual Studio 14 2015 Win64" -DWITH_GPU=ON -DPADDLE_DIR=D:\PaddleDeploy\fluid_inference -DCUDA_LIB=D:\PaddleDeploy\cudalib\v8.0\lib\x64 -DOPENCV_DIR=D:\PaddleDeploy\opencv -T host=x64
```
这里的`cmake`参数`-G`, 可以根据自己的`VS`版本调整,具体请参考[cmake文档](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html)
* 生成可执行文件
```
D:\PaddleDeploy\inference\build> msbuild /m /p:Configuration=Release cpp_inference_demo.sln
```
### Step5: 预测及可视化
上步骤中编译生成的可执行文件和相关动态链接库并保存在build/Release目录下,可通过Windows命令行直接调用。
可下载并解压示例模型进行测试,点击下载示例的人像分割模型[下载地址](https://paddleseg.bj.bcebos.com/inference_model/deeplabv3p_xception65_humanseg.tgz)
假设解压至 `D:\PaddleDeploy\models\deeplabv3p_xception65_humanseg` ,执行以下命令:
```
cd Release
D:\PaddleDeploy\inference\build\Release> demo.ext --conf=D:\\PaddleDeploy\\inference\\conf\\humanseg.yaml --input_dir=D:\\PaddleDeploy\\inference\\images\humanseg\\
```
预测使用的两个命令参数说明如下:
| 参数 | 含义 |
|-------|----------|
| conf | 模型配置的yaml文件路径 |
| input_dir | 需要预测的图片目录 |
**配置文件**的样例以及字段注释说明请参考: [conf/humanseg.yaml](inference/conf/humanseg.yaml)
样例程序会扫描input_dir目录下的所有图片,并生成对应的预测结果图片。
文件`14.jpg`预测的结果存储在`14_jpg.png`中,可视化结果在`14_jpg_scoremap.png`中, 原始尺寸的预测结果在`14_jpg_recover.png`中。
输入原图
![avatar](inference/images/humanseg/demo.jpg)
输出预测结果
![avatar](inference/images/humanseg/demo_jpg_recover.png)
DEPLOY:
USE_GPU: 1
MODEL_PATH: "C:\\PaddleDeploy\\models\\deeplabv3p_xception65_humanseg"
MODEL_NAME: "unet"
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
EVAL_CROP_SIZE: (513, 513)
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
IMAGE_TYPE: "rgb"
NUM_CLASSES: 2
CHANNELS : 3
PRE_PROCESSOR: "SegPreProcessor"
PREDICTOR_MODE: "ANALYSIS"
BATCH_SIZE : 3
\ No newline at end of file
#include <glog/logging.h>
#include <utils/utils.h>
#include <predictor/seg_predictor.h>
DEFINE_string(conf, "", "Configuration File Path");
DEFINE_string(input_dir, "", "Directory of Input Images");
int main(int argc, char** argv) {
// 0. parse args
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_conf.empty() || FLAGS_input_dir.empty()) {
std::cout << "Usage: ./predictor --conf=/config/path/to/your/model --input_dir=/directory/of/your/input/images";
return -1;
}
// 1. create a predictor and init it with conf
PaddleSolution::Predictor predictor;
if (predictor.init(FLAGS_conf) != 0) {
LOG(FATAL) << "Fail to init predictor";
return -1;
}
// 2. get all the images with extension '.jpeg' at input_dir
auto imgs = PaddleSolution::utils::get_directory_images(FLAGS_input_dir, ".jpeg|.jpg");
// 3. predict
predictor.predict(imgs);
return 0;
}
find_package(Git REQUIRED)
include(ExternalProject)
message("${CMAKE_BUILD_TYPE}")
ExternalProject_Add(
yaml-cpp
GIT_REPOSITORY https://github.com/jbeder/yaml-cpp.git
GIT_TAG e0e01d53c27ffee6c86153fa41e7f5e57d3e5c90
CMAKE_ARGS
-DYAML_CPP_BUILD_TESTS=OFF
-DYAML_CPP_BUILD_TOOLS=OFF
-DYAML_CPP_INSTALL=OFF
-DYAML_CPP_BUILD_CONTRIB=OFF
-DMSVC_SHARED_RT=OFF
-DBUILD_SHARED_LIBS=OFF
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
-DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
-DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp"
# Disable install step
INSTALL_COMMAND ""
LOG_DOWNLOAD ON
)
\ No newline at end of file
#include "seg_predictor.h"
namespace PaddleSolution {
int Predictor::init(const std::string& conf) {
if (!_model_config.load_config(conf)) {
LOG(FATAL) << "Fail to load config file: [" << conf << "]";
return -1;
}
_preprocessor = PaddleSolution::create_processor(conf);
if (_preprocessor == nullptr) {
LOG(FATAL) << "Failed to create_processor";
return -1;
}
_mask.resize(_model_config._resize[0] * _model_config._resize[1]);
_scoremap.resize(_model_config._resize[0] * _model_config._resize[1]);
bool use_gpu = _model_config._use_gpu;
const auto& model_dir = _model_config._model_path;
const auto& model_filename = _model_config._model_file_name;
const auto& params_filename = _model_config._param_file_name;
// load paddle model file
if (_model_config._predictor_mode == "NATIVE") {
paddle::NativeConfig config;
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.prog_file = prog_file;
config.param_file = param_file;
config.fraction_of_gpu_memory = 0;
config.use_gpu = use_gpu;
config.device = 0;
_main_predictor = paddle::CreatePaddlePredictor(config);
}
else if (_model_config._predictor_mode == "ANALYSIS") {
paddle::AnalysisConfig config;
if (use_gpu) {
config.EnableUseGpu(100, 0);
}
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.SetModel(prog_file, param_file);
config.SwitchUseFeedFetchOps(false);
_main_predictor = paddle::CreatePaddlePredictor(config);
}
else {
return -1;
}
return 0;
}
int Predictor::predict(const std::vector<std::string>& imgs) {
if (_model_config._predictor_mode == "NATIVE") {
return native_predict(imgs);
}
else if (_model_config._predictor_mode == "ANALYSIS") {
return analysis_predict(imgs);
}
return -1;
}
int Predictor::output_mask(const std::string& fname, float* p_out, int length, int* height, int* width) {
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
int eval_num_class = _model_config._class_num;
int blob_out_len = length;
int seg_out_len = eval_height * eval_width * eval_num_class;
if (blob_out_len != seg_out_len) {
LOG(ERROR) << " [FATAL] unequal: input vs output [" <<
seg_out_len << "|" << blob_out_len << "]" << std::endl;
return -1;
}
//post process
_mask.clear();
_scoremap.clear();
int out_img_len = eval_height * eval_width;
for (int i = 0; i < out_img_len; ++i) {
float max_value = -1;
int label = 0;
for (int j = 0; j < eval_num_class; ++j) {
int index = i + j * out_img_len;
if (index >= blob_out_len) {
break;
}
float value = p_out[index];
if (value > max_value) {
max_value = value;
label = j;
}
}
if (label == 0) max_value = 0;
_mask[i] = uchar(label);
_scoremap[i] = uchar(max_value * 255);
}
cv::Mat mask_png = cv::Mat(eval_height, eval_width, CV_8UC1);
mask_png.data = _mask.data();
std::string nname(fname);
auto pos = fname.find(".");
nname[pos] = '_';
std::string mask_save_name = nname + ".png";
cv::imwrite(mask_save_name, mask_png);
cv::Mat scoremap_png = cv::Mat(eval_height, eval_width, CV_8UC1);
scoremap_png.data = _scoremap.data();
std::string scoremap_save_name = nname + std::string("_scoremap.png");
cv::imwrite(scoremap_save_name, scoremap_png);
std::cout << "save mask of [" << fname << "] done" << std::endl;
if (height && width) {
int recover_height = *height;
int recover_width = *width;
cv::Mat recover_png = cv::Mat(recover_height, recover_width, CV_8UC1);
cv::resize(scoremap_png, recover_png, cv::Size(recover_width, recover_height),
0, 0, cv::INTER_CUBIC);
std::string recover_name = nname + std::string("_recover.png");
cv::imwrite(recover_name, recover_png);
}
return 0;
}
int Predictor::native_predict(const std::vector<std::string>& imgs)
{
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
std::size_t total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& org_width = _org_width;
auto& org_height = _org_height;
auto& imgs_batch = _imgs_batch;
input_buffer.resize(batch_buffer_size);
org_width.resize(default_batch_size);
org_height.resize(default_batch_size);
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
input_buffer.resize(real_buffer_size);
org_height.resize(batch_size);
org_width.resize(batch_size);
for (int i = 0; i < batch_size; ++i) {
org_width[i] = org_height[i] = 0;
}
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_width.data(), org_height.data())) {
return -1;
}
paddle::PaddleTensor im_tensor;
im_tensor.name = "image";
im_tensor.shape = std::vector<int>({ batch_size, channels, eval_height, eval_width });
im_tensor.data.Reset(input_buffer.data(), real_buffer_size * sizeof(float));
im_tensor.dtype = paddle::PaddleDType::FLOAT32;
feeds.push_back(im_tensor);
_outputs.clear();
auto t1 = std::chrono::high_resolution_clock::now();
if (!_main_predictor->Run(feeds, &_outputs, batch_size)) {
LOG(ERROR) << "Failed: NativePredictor->Run() return false at batch: " << u;
continue;
}
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
int out_num = 1;
// print shape of first output tensor for debugging
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < _outputs[0].shape.size(); ++j) {
out_num *= _outputs[0].shape[j];
std::cout << _outputs[0].shape[j] << ",";
}
std::cout << ")" << std::endl;
const size_t nums = _outputs.front().data.length() / sizeof(float);
if (out_num % batch_size != 0 || out_num != nums) {
LOG(ERROR) << "outputs data size mismatch with shape size.";
return -1;
}
for (int i = 0; i < batch_size; ++i) {
float* output_addr = (float*)(_outputs[0].data.data()) + i * (out_num / batch_size);
output_mask(imgs_batch[i], output_addr, out_num / batch_size, &org_height[i], &org_width[i]);
}
}
return 0;
}
int Predictor::analysis_predict(const std::vector<std::string>& imgs) {
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
auto total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& org_width = _org_width;
auto& org_height = _org_height;
auto& imgs_batch = _imgs_batch;
input_buffer.resize(batch_buffer_size);
org_width.resize(default_batch_size);
org_height.resize(default_batch_size);
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
input_buffer.resize(real_buffer_size);
org_height.resize(batch_size);
org_width.resize(batch_size);
for (int i = 0; i < batch_size; ++i) {
org_width[i] = org_height[i] = 0;
}
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_height.data(), org_width.data())) {
return -1;
}
auto im_tensor = _main_predictor->GetInputTensor("image");
im_tensor->Reshape({ batch_size, channels, eval_height, eval_width });
im_tensor->copy_from_cpu(input_buffer.data());
auto t1 = std::chrono::high_resolution_clock::now();
_main_predictor->ZeroCopyRun();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
auto output_names = _main_predictor->GetOutputNames();
auto output_t = _main_predictor->GetOutputTensor(output_names[0]);
std::vector<float> out_data;
std::vector<int> output_shape = output_t->shape();
int out_num = 1;
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < output_shape.size(); ++j) {
out_num *= output_shape[j];
std::cout << output_shape[j] << ",";
}
std::cout << ")" << std::endl;
out_data.resize(out_num);
output_t->copy_to_cpu(out_data.data());
for (int i = 0; i < batch_size; ++i) {
float* out_addr = out_data.data() + (out_num / batch_size) * i;
output_mask(imgs_batch[i], out_addr, out_num / batch_size, &org_height[i], &org_width[i]);
}
}
return 0;
}
}
#pragma once
#include <memory>
#include <string>
#include <vector>
#include <thread>
#include <chrono>
#include <algorithm>
#include <glog/logging.h>
#include <yaml-cpp/yaml.h>
#include <opencv2/opencv.hpp>
#include <paddle_inference_api.h>
#include <utils/seg_conf_parser.h>
#include <utils/utils.h>
#include <preprocessor/preprocessor.h>
namespace PaddleSolution {
class Predictor {
public:
// init a predictor with a yaml config file
int init(const std::string& conf);
// predict api
int predict(const std::vector<std::string>& imgs);
private:
int output_mask(
const std::string& fname,
float* p_out,
int length,
int* height = NULL,
int* width = NULL);
int native_predict(const std::vector<std::string>& imgs);
int analysis_predict(const std::vector<std::string>& imgs);
private:
std::vector<float> _buffer;
std::vector<int> _org_width;
std::vector<int> _org_height;
std::vector<std::string> _imgs_batch;
std::vector<paddle::PaddleTensor> _outputs;
std::vector<uchar> _mask;
std::vector<uchar> _scoremap;
PaddleSolution::PaddleSegModelConfigPaser _model_config;
std::shared_ptr<PaddleSolution::ImagePreProcessor> _preprocessor;
std::unique_ptr<paddle::PaddlePredictor> _main_predictor;
};
}
#include <glog/logging.h>
#include "preprocessor.h"
#include "preprocessor_seg.h"
namespace PaddleSolution {
std::shared_ptr<ImagePreProcessor> create_processor(const std::string& conf_file) {
auto config = std::make_shared<PaddleSolution::PaddleSegModelConfigPaser>();
if (!config->load_config(conf_file)) {
LOG(FATAL) << "fail to laod conf file [" << conf_file << "]";
return nullptr;
}
if (config->_pre_processor == "SegPreProcessor") {
auto p = std::make_shared<SegPreProcessor>();
if (!p->init(config)) {
return nullptr;
}
return p;
}
LOG(FATAL) << "unknown processor_name [" << config->_pre_processor << "]";
return nullptr;
}
}
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册