提交 951d342d 编写于 作者: W wuzewu

First commit

上级 f8e28e8b
# PaddleSeg # PaddleSeg 语义分割库
\ No newline at end of file
## 简介
PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义分割库,覆盖了DeepLabv3+, U-Net, ICNet三类主流的分割模型。通过统一的配置,帮助用户更便捷地完成从训练到部署的全流程图像分割应用。
具备高性能、丰富的数据增强、工业级部署、全流程应用的特点。
- **丰富的数据增强**
- 基于百度视觉技术部的实际业务经验,内置10+种数据增强策略,可结合实际业务场景进行定制组合,提升模型泛化能力和鲁棒性。
- **主流模型覆盖**
- 支持U-Net, DeepLabv3+, ICNet三类主流分割网络,结合预训练模型和可调节的骨干网络,满足不同性能和精度的要求。
- **高性能**
- PaddleSeg支持多进程IO、多卡并行、多卡Batch Norm, FP16混合精度等训练加速策略,通过飞桨核心框架的显存优化算法,可以大幅度节约分割模型的显存开销,更快完成分割模型训练。
- **工业级部署**
- 基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)和PaddlePaddle高性能预测引擎, 结合百度开放的AI能力,轻松搭建人像分割和车道线分割服务。
更多模型信息与技术细节请查看[模型介绍](./docs/models.md)[预训练模型](./docs/mode_zoo.md)
## AI Studio教程
### 快速开始
通过 [PaddleSeg人像分割](https://aistudio.baidu.com/aistudio/projectDetail/100798) 教程可快速体验PaddleSeg人像分割模型的效果。
### 入门教程
入门教程以经典的U-Net模型为例, 结合Oxford-IIIT宠物数据集,快速熟悉PaddleSeg使用流程, 详情请点击[U-Net宠物分割](https://aistudio.baidu.com/aistudio/projectDetail/102889)
### 高级教程
高级教程以DeepLabv3+模型为例,结合Cityscapes数据集,快速了解ASPP, Backbone网络切换,多卡Batch Norm同步等策略,详情请点击[DeepLabv3+图像分割](https://aistudio.baidu.com/aistudio/projectDetail/101696)
### 垂类模型
更多特色垂类分割模型如LIP人体部件分割、人像分割、车道线分割模型可以参考[contrib](./contrib/README.md)
## 使用文档
* [安装说明](./docs/installation.md)
* [数据准备](./docs/data_prepare.md)
* [数据增强](./docs/data_aug.md)
* [预训练模型](./docs/model_zoo.md)
* [训练/评估/预测(可视化)](./docs/usage.md)
* [预测库集成](./inference/README.md)
* [服务端部署](./serving/README.md)
* [垂类分割模型](./contrib/README.md)
## FAQ
#### Q:图像分割的数据增强如何配置,unpadding, step scaling, range scaling的原理是什么?
A:数据增强的配置可以参考文档[数据增强](./docs/data_aug.md)
#### Q: 预测时图片过大,导致显存不足如何处理?
A: 降低Batch size,使用Group Norm策略等。
## 更新日志
### 2019.08.25
#### v0.1.0
* PaddleSeg分割库初始版本发布,包含DeepLabv3+, U-Net, ICNet三类分割模型, 其中DeepLabv3+支持Xception, MobileNet两种可调节的骨干网络。
* CVPR 19' LIP人体部件分割比赛冠军预测模型发布[ACE2P](./contrib/ACE2P)
* 预置基于DeepLabv3+网络的[人像分割](./contrib/HumanSeg/)[车道线分割](./contrib/RoadLine)预测模型发布
## 如何贡献代码
我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。
EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 4
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "gn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
ASPP_WITH_SEP_CONV: True
DECODER_USE_SEP_CONV: True
TEST:
TEST_MODEL: "snapshots/cityscape_v5/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
PRETRAINED_MODEL: u"pretrain/deeplabv3plus_gn_init"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 700
EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 8
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: "./data/COCO2014/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 21
TEST_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
TRAIN_FILE_LIST: "data/COCO2014/ImageSets/train.txt"
VAL_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
TEST:
TEST_MODEL: "snapshots/coco_v1/final"
TRAIN:
MODEL_SAVE_DIR: "snapshots/coco_v1/"
PRETRAINED_MODEL: "pretrain/xception65_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 5
SOLVER:
LR: 0.007
WEIGHT_DECAY: 0.00004
NUM_EPOCHS: 40
LR_POLICY: "poly"
OPTIMIZER: "SGD"
TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (513, 513) # (width, height), for unpadding
INF_RESIZE_VALUE: 513 # for rangescaling
MAX_RESIZE_VALUE: 400 # for rangescaling
MIN_RESIZE_VALUE: 513 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: True
ASPECT_RATIO: 0
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 45
MIN_AREA_RATIO: 0
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 24
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: u"./data/humanseg/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: u"data/humanseg/list/val.txt"
TRAIN_FILE_LIST: u"data/humanseg/list/train.txt"
VAL_FILE_LIST: u"data/humanseg/list/val.txt"
IGNORE_INDEX: 255
SEPARATOR: "|"
FREEZE:
MODEL_FILENAME: u"model"
PARAMS_FILENAME: u"params"
SAVE_DIR: u"human_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: u"bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "xception_65"
TEST:
TEST_MODEL: "snapshots/humanseg/aic_v2/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/humanseg/aic_v2/"
PRETRAINED_MODEL: u"pretrain/xception65_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 5
SOLVER:
LR: 0.1
NUM_EPOCHS: 40
LR_POLICY: "poly"
OPTIMIZER: "sgd"
EVAL_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (1536, 576) # (width, height), for unpadding
INF_RESIZE_VALUE: 1280 # for rangescaling
MAX_RESIZE_VALUE: 1024 # for rangescaling
MIN_RESIZE_VALUE: 1536 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 1
MEAN: [127.5, 127.5, 127.5]
STD: [127.5, 127.5, 127.5]
DATASET:
DATA_DIR: "./data/line/L4_lane_mask_dataset_app/L4_360_0_2class/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
TRAIN_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/train.txt"
VAL_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
SAVE_DIR: "line_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "mobilenet"
TEST:
TEST_MODEL: "snapshots/line_v4/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/line_v4/"
PRETRAINED_MODEL: u"pretrain/MobileNetV2_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 40
TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 1.25 # for stepscaling
MIN_SCALE_FACTOR: 0.75 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 4
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: "./dataset/mini_pet/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 3
TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
IGNORE_INDEX: 255
SEPARATOR: " "
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
MODEL:
MODEL_NAME: "unet"
DEFAULT_NORM_TYPE: "bn"
TEST:
TEST_MODEL: "./test/saved_model/unet_pet/final/"
TRAIN:
MODEL_SAVE_DIR: "./test/saved_models/unet_pet/"
PRETRAINED_MODEL: "./test/models/unet_coco/"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
NUM_EPOCHS: 500
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "adam"
# Augmented Context Embedding with Edge Perceiving(ACE2P)
- 类别: 图像-语义分割
- 网络: ACE2P
- 数据集: LIP
## 模型概述
人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。ACE2P通过融合底层特征、全局上下文信息和边缘细节,
端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
## 模型框架图
![](imgs/net.jpg)
## 模型细节
ACE2P模型包含三个分支:
* 语义分割分支
* 边缘检测分支
* 融合分支
语义分割分支采用resnet101作为backbone,通过Pyramid Scene Parsing Network融合上下文信息以获得更加精确的特征表征
边缘检测分支采用backbone的中间层特征作为输入,预测二值边缘信息
融合分支将语义分割分支以及边缘检测分支的特征进行融合,以获得边缘细节更加准确的分割图像。
分割问题一般采用mIoU作为评价指标,特别引入了IoU loss结合cross-entropy loss以针对性优化这一指标
测试阶段,采用多尺度以及水平翻转的结果进行融合生成最终预测结果
训练阶段,采用余弦退火的学习率策略, 并且在学习初始阶段采用线性warm up
数据预处理方面,保持图片比例并进行随机缩放,随机旋转,水平翻转作为数据增强策略
## LIP指标
该模型在测试尺度为'377,377,473,473,567,567'且水平翻转的情况下,meanIoU为62.63
多模型ensemble后meanIoU为65.18, 居LIP Single-Person Human Parsing Track榜单第一
## 模型预测效果展示
![](imgs/result.jpg)
## 引用
**论文**
*Devil in the Details: Towards Accurate Single and Multiple Human Parsing* https://arxiv.org/abs/1809.05996
**代码**
https://github.com/Microsoft/human-pose-estimation.pytorch
https://github.com/liutinglt/CE2P
# -*- coding: utf-8 -*-
from utils.util import AttrDict, merge_cfg_from_args, get_arguments
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "testing_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test_id.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "ACE2P")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 20
# 均值, 图像预处理减去的均值
cfg.MEAN = 0.406, 0.456, 0.485
# 标准差,图像预处理除以标准差
cfg.STD = 0.225, 0.224, 0.229
# 多尺度预测时图像尺寸
cfg.multi_scales = (377,377), (473,473), (567,567)
# 多尺度预测时图像是否水平翻转
cfg.flip = True
merge_cfg_from_args(args, cfg)
# -*- coding: utf-8 -*-
import numpy as np
import paddle.fluid as fluid
from ACE2P.config import cfg
import cv2
def get_affine_points(src_shape, dst_shape, rot_grad=0):
# 获取图像和仿射后图像的三组对应点坐标
# 三组点为仿射变换后图像的中心点, [w/2,0], [0,0],及对应原始图像的点
if dst_shape[0] == 0 or dst_shape[1] == 0:
raise Exception('scale shape should not be 0')
# 旋转角度
rotation = rot_grad * np.pi / 180.0
sin_v = np.sin(rotation)
cos_v = np.cos(rotation)
dst_ratio = float(dst_shape[0]) / dst_shape[1]
h, w = src_shape
src_ratio = float(h) / w if w != 0 else 0
affine_shape = [h, h * dst_ratio] if src_ratio > dst_ratio \
else [w / dst_ratio, w]
# 原始图像三组点
points = [[0, 0]] * 3
points[0] = (np.array([w, h]) - 1) * 0.5
points[1] = points[0] + 0.5 * affine_shape[0] * np.array([sin_v, -cos_v])
points[2] = points[1] - 0.5 * affine_shape[1] * np.array([cos_v, sin_v])
# 仿射变换后图三组点
points_trans = [[0, 0]] * 3
points_trans[0] = (np.array(dst_shape[::-1]) - 1) * 0.5
points_trans[1] = [points_trans[0][0], 0]
return points, points_trans
def preprocess(im):
# ACE2P模型数据预处理
im_shape = im.shape[:2]
input_images = []
for i, scale in enumerate(cfg.multi_scales):
# 获取图像和仿射变换后图像的对应点坐标
points, points_trans = get_affine_points(im_shape, scale)
# 根据对应点集获得仿射矩阵
trans = cv2.getAffineTransform(np.float32(points),
np.float32(points_trans))
# 根据仿射矩阵对图像进行仿射
input = cv2.warpAffine(im,
trans,
scale[::-1],
flags=cv2.INTER_LINEAR)
# 减均值测,除以方差,转换数据格式为NCHW
input = input.astype(np.float32)
input = (input / 255. - np.array(cfg.MEAN)) / np.array(cfg.STD)
input = input.transpose(2, 0, 1).astype(np.float32)
input = np.expand_dims(input, 0)
# 水平翻转
if cfg.flip:
flip_input = input[:, :, :, ::-1]
input_images.append(np.vstack((input, flip_input)))
else:
input_images.append(input)
return input_images
def multi_scale_test(exe, test_prog, feed_name, fetch_list,
input_ims, im_shape):
# 由于部分类别分左右部位, flipped_idx为其水平翻转后对应的标签
flipped_idx = (15, 14, 17, 16, 19, 18)
ms_outputs = []
# 多尺度预测
for idx, scale in enumerate(cfg.multi_scales):
input_im = input_ims[idx]
parsing_output = exe.run(program=test_prog,
feed={feed_name[0]: input_im},
fetch_list=fetch_list)
output = parsing_output[0][0]
if cfg.flip:
# 若水平翻转,对部分类别进行翻转,与原始预测结果取均值
flipped_output = parsing_output[0][1]
flipped_output[14:20, :, :] = flipped_output[flipped_idx, :, :]
flipped_output = flipped_output[:, :, ::-1]
output += flipped_output
output *= 0.5
output = np.transpose(output, [1, 2, 0])
# 仿射变换回图像原始尺寸
points, points_trans = get_affine_points(im_shape, scale)
M = cv2.getAffineTransform(np.float32(points_trans), np.float32(points))
logits_result = cv2.warpAffine(output, M, im_shape[::-1], flags=cv2.INTER_LINEAR)
ms_outputs.append(logits_result)
# 多尺度预测结果求均值,求预测概率最大的类别
ms_fused_parsing_output = np.stack(ms_outputs)
ms_fused_parsing_output = np.mean(ms_fused_parsing_output, axis=0)
parsing = np.argmax(ms_fused_parsing_output, axis=2)
return parsing, ms_fused_parsing_output
# -*- coding: utf-8 -*-
from utils.util import AttrDict, get_arguments, merge_cfg_from_args
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "test_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "model")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 2
# 均值, 图像预处理减去的均值
cfg.MEAN = 104.008, 116.669, 122.675
# 标准差,图像预处理除以标准差
cfg.STD = 1.0, 1.0, 1.0
# 待预测图像输入尺寸
cfg.input_size = 513, 513
merge_cfg_from_args(args, cfg)
# PaddleSeg 特色垂类分割模型
提供基于PaddlePaddle最新的分割特色模型
## Augmented Context Embedding with Edge Perceiving (ACE2P)
### 1. 模型概述
CVPR 19 Look into Person (LIP) 单人人像分割比赛冠军模型,详见[ACE2P/README](http://gitlab.baidu.com/Paddle/PaddleSeg/tree/master/contrib/ACE2P)
### 2. 模型下载
点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz),下载, 在contrib/ACE2P下解压, `tar -xzf ACE2P.tgz`
### 3. 数据下载
前往LIP数据集官网: http://47.100.21.47:9999/overview.php 或点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760),
加载Testing_images.zip, 解压到contrib/ACE2P/data文件夹下
### 4. 运行
**NOTE:** 运行该模型需要需至少2.5G显存
使用GPU预测
```
python -u infer.py --example ACE2P --use_gpu
```
使用CPU预测:
```
python -u infer.py --example ACE2P
```
## 人像分割 (HumanSeg)
### 1. 模型结构
DeepLabv3+ backbone为Xception65
### 2. 下载模型和数据
点击[链接](https://paddleseg.bj.bcebos.com/models/HumanSeg.tgz),下载解压到contrib文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example HumanSeg --use_gpu
```
使用CPU预测:
```
python -u infer.py --example HumanSeg
```
### 4. 预测结果示例:
原图:![](imgs/Human.jpg)
预测结果:![](imgs/HumanSeg.jpg)
## 车道线分割 (RoadLine)
### 1. 模型结构
Deeplabv3+ backbone为MobileNetv2
### 2. 下载模型和数据
点击[链接](https://paddleseg.bj.bcebos.com/inference_model/RoadLine.tgz),下载解压在contrib文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example RoadLine --use_gpu
```
使用CPU预测:
```
python -u infer.py --example RoadLine
```
#### 4. 预测结果示例:
原图:![](imgs/RoadLine.jpg)
预测结果:![](imgs/RoadLine.png)
# 备注
1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
2. ACE2P模型需预留2G显存,若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
# -*- coding: utf-8 -*-
from utils.util import AttrDict, merge_cfg_from_args, get_arguments
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "test_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "model")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 2
# 均值, 图像预处理减去的均值
cfg.MEAN = 127.5, 127.5, 127.5
# 标准差,图像预处理除以标准差
cfg.STD = 127.5, 127.5, 127.5
# 待预测图像输入尺寸
cfg.input_size = 1536, 576
merge_cfg_from_args(args, cfg)
# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np
from utils.util import get_arguments
from utils.palette import get_palette
from PIL import Image as PILImage
import importlib
args = get_arguments()
config = importlib.import_module(args.example+'.config')
cfg = getattr(config, 'cfg')
# paddle垃圾回收策略FLAG,ACE2P模型较大,当显存不够时建议开启
os.environ['FLAGS_eager_delete_tensor_gb']='0.0'
import paddle.fluid as fluid
# 预测数据集类
class TestDataSet():
def __init__(self):
self.data_dir = cfg.data_dir
self.data_list_file = cfg.data_list_file
self.data_list = self.get_data_list()
self.data_num = len(self.data_list)
def get_data_list(self):
# 获取预测图像路径列表
data_list = []
data_file_handler = open(self.data_list_file, 'r')
for line in data_file_handler:
img_name = line.strip()
name_prefix = img_name.split('.')[0]
if len(img_name.split('.')) == 1:
img_name = img_name + '.jpg'
img_path = os.path.join(self.data_dir, img_name)
data_list.append(img_path)
return data_list
def preprocess(self, img):
# 图像预处理
if cfg.example == 'ACE2P':
reader = importlib.import_module(args.example+'.reader')
ACE2P_preprocess = getattr(reader, 'preprocess')
img = ACE2P_preprocess(img)
else:
img = cv2.resize(img, cfg.input_size).astype(np.float32)
img -= np.array(cfg.MEAN)
img /= np.array(cfg.STD)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0)
return img
def get_data(self, index):
# 获取图像信息
img_path = self.data_list[index]
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
if img is None:
return img, img,img_path, None
img_name = img_path.split(os.sep)[-1]
name_prefix = img_name.replace('.'+img_name.split('.')[-1],'')
img_shape = img.shape[:2]
img_process = self.preprocess(img)
return img, img_process, name_prefix, img_shape
def infer():
if not os.path.exists(cfg.vis_dir):
os.makedirs(cfg.vis_dir)
palette = get_palette(cfg.class_num)
# 人像分割结果显示阈值
thresh = 120
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# 加载预测模型
test_prog, feed_name, fetch_list = fluid.io.load_inference_model(
dirname=cfg.model_path, executor=exe, params_filename='__params__')
#加载预测数据集
test_dataset = TestDataSet()
data_num = test_dataset.data_num
for idx in range(data_num):
# 数据获取
ori_img, image, im_name, im_shape = test_dataset.get_data(idx)
if image is None:
print(im_name, 'is None')
continue
# 预测
if cfg.example == 'ACE2P':
# ACE2P模型使用多尺度预测
reader = importlib.import_module(args.example+'.reader')
multi_scale_test = getattr(reader, 'multi_scale_test')
parsing, logits = multi_scale_test(exe, test_prog, feed_name, fetch_list, image, im_shape)
else:
# HumanSeg,RoadLine模型单尺度预测
result = exe.run(program=test_prog, feed={feed_name[0]: image}, fetch_list=fetch_list)
parsing = np.argmax(result[0][0], axis=0)
parsing = cv2.resize(parsing.astype(np.uint8), im_shape[::-1])
# 预测结果保存
result_path = os.path.join(cfg.vis_dir, im_name + '.png')
if cfg.example == 'HumanSeg':
logits = result[0][0][1]*255
logits = cv2.resize(logits, im_shape[::-1])
ret, logits = cv2.threshold(logits, thresh, 0, cv2.THRESH_TOZERO)
logits = 255 *(logits - thresh)/(255 - thresh)
# 将分割结果添加到alpha通道
rgba = np.concatenate((ori_img, np.expand_dims(logits, axis=2)), axis=2)
cv2.imwrite(result_path, rgba)
else:
output_im = PILImage.fromarray(np.asarray(parsing, dtype=np.uint8))
output_im.putpalette(palette)
output_im.save(result_path)
if idx % 100 == 0:
print('%d processd' % (idx))
print('%d processd done' % (idx))
return 0
if __name__ == "__main__":
infer()
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## Created by: RainbowSecret
## Microsoft Research
## yuyua@microsoft.com
## Copyright (c) 2018
##
## This source code is licensed under the MIT-style license found in the
## LICENSE file in the root directory of this source tree
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import cv2
def get_palette(num_cls):
""" Returns the color map for visualizing the segmentation mask.
Args:
num_cls: Number of classes
Returns:
The color map
"""
n = num_cls
palette = [0] * (n * 3)
for j in range(0, n):
lab = j
palette[j * 3 + 0] = 0
palette[j * 3 + 1] = 0
palette[j * 3 + 2] = 0
i = 0
while lab:
palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
i += 1
lab >>= 3
return palette
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import os
def get_arguments():
parser = argparse.ArgumentParser()
parser.add_argument("--use_gpu",
action="store_true",
help="Use gpu or cpu to test.")
parser.add_argument('--example',
type=str,
help='RoadLine, HumanSeg or ACE2P')
return parser.parse_args()
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
def merge_cfg_from_args(args, cfg):
"""Merge config keys, values in args into the global config."""
for k, v in vars(args).items():
d = cfg
try:
value = eval(v)
except:
value = v
if value is not None:
cfg[k] = value
# PaddleSeg 数据标注
用户需预先采集好用于训练、评估和测试的图片,并使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注,最后用我们提供的数据转换脚本将LabelMe产出的数据格式转换为模型训练时所需的数据格式。
## 1 LabelMe的安装
用户在采集完用于训练、评估和预测的图片之后,需使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注。LabelMe支持在Windows/macOS/Linux三个系统上使用,且三个系统下的标注格式是一样。具体的安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)
## 2 LabelMe的使用
打开终端输入`labelme`会出现LableMe的交互界面,可以先预览`LabelMe`给出的已标注好的图片,再开始标注自定义数据集。
<div align="center">
<img src="./docs/imgs/annotation/image-1.png" width="600px"/>
<p>图1 LableMe交互界面的示意图</p>
</div>
* 预览已标注图片
获取`LabelMe`的源码:
```
git clone https://github.com/wkentaro/labelme
```
终端输入`labelme`会出现LableMe的交互界面,点击`OpenDir`打开`<path/to/labelme>/examples/semantic_segmentation/data_annotated`,其中`<path/to/labelme>`为克隆下来的`labelme`的路径,打开后示意的是语义分割的真值标注。
<div align="center">
<img src="./docs/imgs/annotation/image-2.png" width="600px"/>
<p>图2 已标注图片的示意图</p>
</div>
* 开始标注
请按照下述步骤标注数据集:
​ (1) 点击`OpenDir`打开待标注图片所在目录,点击`Create Polygons`,沿着目标的边缘画多边形,完成后输入目标的类别。在标注过程中,如果某个点画错了,可以按撤销快捷键可撤销该点。Mac下的撤销快捷键为`command+Z`
<div align="center">
<img src="./docs/imgs/annotation/image-3.png" width="600px"/>
<p>图3 标注单个目标的示意图</p>
</div>
​ (2) 右击选择`Edit Polygons`可以整体移动多边形的位置,也可以移动某个点的位置;右击选择`Edit Label`可以修改每个目标的类别。请根据自己的需要执行这一步骤,若不需要修改,可跳过。
<div align="center">
<img src="./docs/imgs/annotation/image-4-1.png" width="00px" />
<img src="./docs/imgs/annotation/image-4-2.png" width="600px"/>
<p>图4 修改标注的示意图</p>
</div>
​ (3) 图片中所有目标的标注都完成后,点击`Save`保存json文件,**请将json文件和图片放在同一个文件夹里**,点击`Next Image`标注下一张图片。
LableMe产出的真值文件可参考我们给出的文件夹`data_annotated`
<div align="center">
<img src="./docs/imgs/annotation/image-5.png" width="600px"/>
<p>图5 LableMe产出的真值文件的示意图</p>
</div>
## 3 数据格式转换
* 我们用于完成语义分割的数据集目录结构如下:
```
my_dataset # 根目录
|-- JPEGImages # 数据集图片
|-- SegmentationClassPNG # 数据集真值
| |-- xxx.png # 像素级别的真值信息
| |...
|-- class_names.txt # 数据集的类别名称
```
<div align="center">
<img src="./docs/imgs/annotation/image-6.png" width="600px"/>
<p>图6 训练所需的数据集目录的结构示意图</p>
</div>
* 运行转换脚本需要依赖labelme和pillow,如未安装,请先安装。Labelme的具体安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)。Pillow的安装:
```shell
pip install pillow
```
* 运行以下代码,将标注后的数据转换成满足以上格式的数据集:
```
python labelme2seg.py <path/to/label_json_file> <path/to/output_dataset>
```
其中,`<path/to/label_json_files>`为图片以及LabelMe产出的json文件所在文件夹的目录,`<path/to/output_dataset>`为转换后的数据集所在文件夹的目录。**需注意的是:`<path/to/output_dataset>`不用预先创建,脚本运行时会自动创建,否则会报错。**
转换得到的数据集可参考我们给出的文件夹`my_dataset`。其中,文件`class_names.txt`是数据集中所有标注类别的名称,包含背景类;文件夹`JPEGImages`保存的是数据集的图片;文件夹`SegmentationClassPNG`保存的是各图片的像素级别的真值信息,背景类`_background_`对应为0,其它目标类别从1开始递增,至多为255。
<div align="center">
<img src="./docs/imgs/annotation/image-7.png" width="600px"/>
<p>图7 训练所需的数据集各目录的内容示意图</p>
</div>
{
"shapes": [
{
"label": "bus",
"line_color": null,
"fill_color": null,
"points": [
[
260.936170212766,
22.563829787234056
],
[
193.936170212766,
19.563829787234056
],
[
124.93617021276599,
39.563829787234056
],
[
89.93617021276599,
101.56382978723406
],
[
81.93617021276599,
150.56382978723406
],
[
108.93617021276599,
145.56382978723406
],
[
88.93617021276599,
244.56382978723406
],
[
89.93617021276599,
322.56382978723406
],
[
116.93617021276599,
367.56382978723406
],
[
158.936170212766,
368.56382978723406
],
[
165.936170212766,
337.56382978723406
],
[
347.936170212766,
335.56382978723406
],
[
349.936170212766,
369.56382978723406
],
[
391.936170212766,
373.56382978723406
],
[
403.936170212766,
335.56382978723406
],
[
425.936170212766,
332.56382978723406
],
[
421.936170212766,
281.56382978723406
],
[
428.936170212766,
252.56382978723406
],
[
428.936170212766,
236.56382978723406
],
[
409.936170212766,
220.56382978723406
],
[
409.936170212766,
150.56382978723406
],
[
430.936170212766,
143.56382978723406
],
[
433.936170212766,
112.56382978723406
],
[
431.936170212766,
96.56382978723406
],
[
408.936170212766,
90.56382978723406
],
[
395.936170212766,
50.563829787234056
],
[
338.936170212766,
25.563829787234056
]
]
},
{
"label": "bus",
"line_color": null,
"fill_color": null,
"points": [
[
88.93617021276599,
115.56382978723406
],
[
0.9361702127659877,
96.56382978723406
],
[
0.0,
251.968085106388
],
[
0.9361702127659877,
265.56382978723406
],
[
27.936170212765987,
265.56382978723406
],
[
29.936170212765987,
283.56382978723406
],
[
63.93617021276599,
281.56382978723406
],
[
89.93617021276599,
252.56382978723406
],
[
100.93617021276599,
183.56382978723406
],
[
108.93617021276599,
145.56382978723406
],
[
81.93617021276599,
151.56382978723406
]
]
},
{
"label": "car",
"line_color": null,
"fill_color": null,
"points": [
[
413.936170212766,
168.56382978723406
],
[
497.936170212766,
168.56382978723406
],
[
497.936170212766,
256.56382978723406
],
[
431.936170212766,
258.56382978723406
],
[
430.936170212766,
236.56382978723406
],
[
408.936170212766,
218.56382978723406
]
]
}
],
"lineColor": [
0,
255,
0,
128
],
"fillColor": [
255,
0,
0,
128
],
"imagePath": "2011_000025.jpg",
"imageData": null
}
\ No newline at end of file
#!/usr/bin/env python
from __future__ import print_function
import argparse
import glob
import json
import os
import os.path as osp
import sys
import numpy as np
import PIL.Image
import labelme
def main():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument('input_dir', help='input annotated directory')
parser.add_argument('output_dir', help='output dataset directory')
args = parser.parse_args()
if osp.exists(args.output_dir):
print('Output directory already exists:', args.output_dir)
sys.exit(1)
os.makedirs(args.output_dir)
os.makedirs(osp.join(args.output_dir, 'JPEGImages'))
os.makedirs(osp.join(args.output_dir, 'SegmentationClassPNG'))
print('Creating dataset:', args.output_dir)
# get the all class names for the given dataset
class_names = ['_background_']
for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
with open(label_file) as f:
data = json.load(f)
for shape in data['shapes']:
points = shape['points']
label = shape['label']
cls_name = label
if not cls_name in class_names:
class_names.append(cls_name)
class_name_to_id = {}
for i, class_name in enumerate(class_names):
class_id = i # starts with 0
class_name_to_id[class_name] = class_id
if class_id == 0:
assert class_name == '_background_'
class_names = tuple(class_names)
print('class_names:', class_names)
out_class_names_file = osp.join(args.output_dir, 'class_names.txt')
with open(out_class_names_file, 'w') as f:
f.writelines('\n'.join(class_names))
print('Saved class_names:', out_class_names_file)
for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
print('Generating dataset from:', label_file)
with open(label_file) as f:
base = osp.splitext(osp.basename(label_file))[0]
out_img_file = osp.join(
args.output_dir, 'JPEGImages', base + '.jpg')
out_png_file = osp.join(
args.output_dir, 'SegmentationClassPNG', base + '.png')
data = json.load(f)
img_file = osp.join(osp.dirname(label_file), data['imagePath'])
img = np.asarray(PIL.Image.open(img_file))
PIL.Image.fromarray(img).save(out_img_file)
lbl = labelme.utils.shapes_to_label(
img_shape=img.shape,
shapes=data['shapes'],
label_name_to_value=class_name_to_id,
)
if osp.splitext(out_png_file)[1] != '.png':
out_png_file += '.png'
# Assume label ranses [0, 255] for uint8,
if lbl.min() >= 0 and lbl.max() <= 255:
lbl_pil = PIL.Image.fromarray(lbl.astype(np.uint8), mode='L')
lbl_pil.save(out_png_file)
else:
raise ValueError(
'[%s] Cannot save the pixel-wise class label as PNG. '
'Please consider using the .npy format.' % out_png_file
)
if __name__ == '__main__':
main()
_background_
bus
car
\ No newline at end of file
# PaddleSeg 性能Benchmark
## 训练性能
### 多GPU加速比
### 显存开销对比
## 预测性能对比
### Windows
### Linux
#### Naive
#### Analysis
# PaddleSeg 分割库配置说明
PaddleSeg提供了提供了统一的配置用于 训练/评估/可视化/导出模型
配置包含以下Group:
* [通用](./configs/basic_group.md)
* [DATASET](./configs/dataset_group.md)
* [DATALOADER](./configs/dataloader_group.md)
* [FREEZE](./configs/freeze_group.md)
* [MODEL](./configs/model_group.md)
* [SOLVER](./configs/solver_group.md)
* [TRAIN](./configs/train_group.md)
* [TEST](./configs/test_group.md)
`Note`:
代码详见pdseg/utils/config.py
# cfg
BASIC Group存放所有通用配置
## `MEAN`
图像预处理减去的均值(格式为*[R, G, B]*
### 默认值
[104.008, 116.669, 122.675]
<br/>
<br/>
## `STD`
图像预处理所除的标准差(格式为*[R, G, B]*
### 默认值
[1.000, 1.000, 1.000]
<br/>
<br/>
## `EVAL_CROP_SIZE`
评估时所对图片裁剪的大小(格式为*[宽, 高]*
### 默认值
无(需要用户自己填写)
### 注意事项
* 裁剪的大小不能小于原图,请将该字段的值填写为评估数据中最长的宽和高
<br/>
<br/>
## `TRAIN_CROP_SIZE`
训练时所对图片裁剪的大小(格式为*[宽, 高]*
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `BATCH_SIZE`
训练、评估、可视化时所用的BATCH大小
### 默认值
1(需要根据实际需求填写)
### 注意事项
* 当指定了多卡运行时,PaddleSeg会将数据平分到每张卡上运行,因此每张卡单次运行的数量为 BATCH_SIZE // dev_count
* 多卡运行时,请确保BATCH_SIZE可被dev_count整除
* 增大BATCH_SIZE有利于模型训练时的收敛速度,但是会带来显存的开销。请根据实际情况评估后填写合适的值
<br/>
<br/>
\ No newline at end of file
# cfg.DATALOADER
DATALOADER Group存放所有与数据加载相关的配置
## `NUM_WORKERS`
数据载入时的并发数量
### 默认值
8
### 注意事项
* 该选项只在`pdseg/train.py``pdseg/eval.py`中使用到
* 当使用多线程时,该字段表示线程适量,使用多进程时,该字段表示进程数量。一般该字段使用默认值即可
<br/>
<br/>
## `BUF_SIZE`
数据载入时的缓存队列大小
### 默认值
256
<br/>
<br/>
\ No newline at end of file
# cfg.DATASET
DATASET Group存放所有与数据集相关的配置
## `DATA_DIR`
数据集主目录,PaddleSeg在读取数据文件列表时,会将列表中的文件名与主目录拼接得到图片的绝对路径
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `TRAIN_FILE_LIST`
训练集列表,调用`pdseg/train.py`进行训练时,会读取该列表中的图片进行训练
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `VAL_FILE_LIST`
验证集列表,调用`pdseg/eval.py`进行效果评估时,会读取该列表中的图片进行评估
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `TEST_FILE_LIST`
测试集列表,调用`pdseg/vis.py`进行可视化展示时,会读取该列表中的图片进行预测
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `VIS_FILE_LIST`
可视化列表,调用`pdseg/train.py`进行训练时,如果打开了--use_tbx开关,则在每次模型保存的时候,会读取该列表中的图片进行可视化
文件列表由多行组成,每一行的格式为
```
<img_path><sep><label_path>
```
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `NUM_CLASSES`
类别数量,构建网络所需
### 默认值
19(但是一般需要用户修改为自己数据集的类别数量)
### 注意事项
数据集中的label标注必须为0 ~ NUM_CLASSES - 1,如果label设置错误,会导致计算IOU时出现异常
<br/>
<br/>
## `IMAGE_TYPE`
图片类型,支持`rgb``rgba``gray`三种格式
### 默认值
`rgb`
<br/>
<br/>
## `SEPARATOR`
文件列表中用于分隔输入图片和标签图片的分隔符
### 默认值
空格符` `
### 例子
假设训练文件列表如下,则 `SEPARATOR` 应该填写 `|`
```
mydata/train/image1.jpg|mydata/train/image1.label.jpg
mydata/train/image2.jpg|mydata/train/image2.label.jpg
mydata/train/image3.jpg|mydata/train/image3.label.jpg
mydata/train/image4.jpg|mydata/train/image4.label.jpg
...
```
<br/>
<br/>
## `IGNORE_INDEX`
需要忽略的像素标签值,label中所有标记为该值的像素不会参与到loss的计算以及IOU、Acc等指标的计算
### 默认值
255
\ No newline at end of file
# cfg.FREEZE
FREEZE Group存放所有与模型导出相关的配置
## `MODEL_FILENAME`
导出模型后所保存的模型文件名
### 默认值
`__model__`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
## `PARAMS_FILENAME`
导出模型后所保存的参数文件名
### 默认值
`__params__`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
## `SAVE_DIR`
保存导出模型的主目录
### 默认值
`freeze_model`
### 注意事项
* 仅在使用`pdseg/export_model.py` 脚本导出模型时,该字段必填
<br/>
<br/>
\ No newline at end of file
# cfg.MODEL.DEEPLAB
MODEL.DEEPLAB 子Group存放所有和DeepLabv3+模型相关的配置
## `BACKBONE`
DeepLabV3+所用骨干网络,支持`mobilenetv2` `xception65`两种
### 默认值
`xception65`
<br/>
<br/>
## `OUTPUT_STRIDE`
DeepLabV3+下采样率,支持8/16两种选择
### 默认值
16
<br/>
<br/>
## `DEPTH_MULTIPER`
MobileNet V2的depth mutiper值,仅当`BACKBONE``mobilenetv2`生效
### 默认值
1.0
<br/>
<br/>
## `ENCODER_WITH_ASPP`
DeepLabv3+的模型Encoder中是否使用ASPP
### 默认值
True
### 注意事项
* 将该功能置为False可以提升模型计算速度,但是会降低精度
<br/>
<br/>
## `DECODER_WITH_ASPP`
DeepLabv3+的模型是否使用Decoder
### 默认值
True
### 注意事项
* 将该功能置为False可以提升模型计算速度,但是会降低精度
<br/>
<br/>
## `ASPP_WITH_SEP_CONV`
DeepLabv3+的模型的ASPP模块是否使用可分离卷积
### 默认值
False
<br/>
<br/>
## `DECODER_WITH_SEP_CONV`
DeepLabv3+的模型的Decoder模块是否使用可分离卷积
### 默认值
False
<br/>
<br/>
# cfg.MODEL
MODEL Group存放所有和模型相关的配置,该Group还包含三个子Group
* [DeepLabv3p](./model_deeplabv3p_group.md)
* [UNet](./model_unet_group.md)
* [ICNet](./model_icnet_group.md)
## `MODEL_NAME`
所选模型,支持`deeplabv3p` `unet` `icnet`三种模型
### 默认值
无(需要用户自己填写)
<br/>
<br/>
## `DEFAULT_NORM_TYPE`
模型所用norm类型,支持`bn` [`gn`]()
### 默认值
`bn`
<br/>
<br/>
## `DEFAULT_GROUP_NUMBER`
默认GROUP数量,仅在`DEFAULT_NORM_TYPE``gn`时生效
### 默认值
32
<br/>
<br/>
## `BN_MOMENTUM`
BatchNorm动量, 一般无需改动
### 默认值
0.99
<br/>
<br/>
## `DEFAULT_EPSILON`
BatchNorm计算时所用的极小值, 防止分母除0溢出,一般无需改动
### 默认值
1e-5
<br/>
<br/>
## `FP16`
是否开启FP16训练
### 默认值
False
<br/>
<br/>
## `SCALE_LOSS`
对损失进行缩放的系数
### 默认值
1.0
### 注意事项
* 启动fp16训练时,建议设置该字段为8
<br/>
<br/>
## `MULTI_LOSS_WEIGHT`
多路损失的权重
### 默认值
[1.0]
### 注意事项
* 该字段仅在模型存在多路损失的情况下生效
* 目前支持的模型中只有`icnet`使用多路(3路)损失
* 当选择模型为`icnet`且该字段的长度不为3时,PaddleSeg会强制设置该字段为[1.0, 0.4, 0.16]
### 示例
假设模型存在三路损失,计算结果分别为loss1/loss2/loss3,并且`MULTI_LOSS_WEIGHT`的值为[1.0, 0.4, 0.16],则最终损失的计算结果为
```math
loss = 1.0 * loss1 + 0.4 * loss2 + 0.16 * loss3
```
<br/>
<br/>
# cfg.MODEL.ICNET
MODEL.ICNET 子Group存放所有和ICNet模型相关的配置
## `DEPTH_MULTIPER`
Resnet backbone的depth multiper
### 默认值
0.5
<br/>
<br/>
## `LAYERS`
ResNet backbone的层数,支持`18` `34` `50` `101` `152`等五种
### 默认值
50
<br/>
<br/>
\ No newline at end of file
# cfg.MODEL.UNET
MODEL.UNET 子Group存放所有和UNet模型相关的配置
## `UPSAMPLE_MODE`
上采样方式,支持`bilinear`或者不设置
### 默认值
`bilinear`
### 注意事项
*`UPSAMPLE_MODE`值为`bilinear`时,UNet上采样方法为双线性插值法,否则使用转置卷积进行上采样
<br/>
<br/>
\ No newline at end of file
# cfg.SOLVER
SOLVER Group定义所有和训练优化相关的配置
## `LR`
初始学习率
### 默认值
0.1
<br/>
<br/>
## `LR_POLICY`
学习率的衰减策略,支持`poly` `piecewise` `cosine`三种策略
### 默认值
`poly`
### 示例
* 当使用`poly`衰减时,假设初始学习率为0.1,训练总步数为10000,则在power分别为`0.4``0.8``1``1.2``1.6`时,衰减曲线如下图:
* power = 1 衰减曲线为直线
* power > 1 衰减曲线内凹
* power < 1 衰减曲线外凸
<p align="center">
<img src="../imgs/poly_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
* 当使用`piecewise`衰减时,假设初始学习率为0.1,GAMMA为0.9,总EPOCH数量为100,DECAY_EPOCH为[10, 20],衰减曲线如下图:
<p align="center">
<img src="../imgs/piecewise_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
* 当使用`cosine`衰减时,假设初始学习率为0.1,总EPOCH数量为100,衰减曲线如下图:
<p align="center">
<img src="../imgs/cosine_decay_example.png" hspace='10' height="400" width="800"/> <br />
</p>
<br/>
<br/>
## `POWER`
学习率Poly下降指数,仅当策略为[`LR_POLICY`](#LR_POLICY)`poly`时有效
### 默认值
0.9
<br/>
<br/>
## `GAMMA`
学习率piecewise下降指数,仅当策略为[`LR_POLICY`](#LR_POLICY)`piecewise`时有效
### 默认值
0.1
<br/>
<br/>
## `DECAY_EPOCH`
学习率piecewise下降间隔,仅当策略为[`LR_POLICY`](#LR_POLICY)`piecewise`时有效
### 默认值
[10, 20]
<br/>
<br/>
## `WEIGHT_DECAY`
L2正则化系数
### 默认值
0.00004
<br/>
<br/>
## `BEGIN_EPOCH`
起始EPOCH值
### 默认值
0
<br/>
<br/>
## `NUM_EPOCHS`
训练EPOCH数
### 默认值
30(需要根据实际需求进行调整)
<br/>
<br/>
## `SNAPSHOT`
训练时,保存模型的间隔(单位为EPOCH)
### 默认值
10(意味着每训练10个EPOCH保存一次模型)
<br/>
<br/>
\ No newline at end of file
# cfg.TEST
TEST Group存放所有和测试模型相关的配置
## `TEST_MODEL`
待测试模型的路径
### 默认值
无(需要用户自己填写)
### 注意事项
* 使用`pdseg/export_model.py` `pdseg/eval.py` `pdseg/vis.py`等脚本进行模型的评估、可视化和导出时,该字段必填
<br/>
<br/>
\ No newline at end of file
# cfg.TRAIN
TRAIN Group存放所有和训练相关的配置
## `MODEL_SAVE_DIR`
在训练周期内定期保存模型的主目录
## 默认值
无(需要用户自己填写)
<br/>
<br/>
## `PRETRAINED_MODEL`
预训练模型路径
## 默认值
## 注意事项
* 若未指定该字段,则模型会随机初始化所有的参数,从头开始训练
* 若指定了该字段,但是路径不存在,则参数加载失败,仍然会被随机初始化
* 若指定了该字段,且路径存在,但是部分参数不存在或者shape无法对应,则该部分参数随机初始化
<br/>
<br/>
## `RESUME`
是否从预训练模型中恢复参数并继续训练
## 默认值
False
## 注意事项
* 当该字段被置为True且`PRETRAINED_MODEL`不存在时,该选项不生效
* 当该字段被置为True且`PRETRAINED_MODEL`存在时,PaddleSeg会恢复到上一次训练的最近一个epoch,并且恢复训练过程中的临时变量(如已经衰减过的学习率,Optimizer的动量数据等)
* 当该字段被置为True且`PRETRAINED_MODEL`存在时,`PRETRAINED_MODEL`路径的最后一个目录必须为int数值或者字符串final,PaddleSeg会将int数值作为当前起始EPOCH继续训练,若目录为final,则不会继续训练。若目录不满足上述条件,PaddleSeg会抛出错误。
<br/>
<br/>
## `SYNC_BATCH_NORM`
是否在多卡间同步BN的均值和方差
## 默认值
False
## 注意事项
* 打开该选项会带来一定的性能消耗(多卡间同步数据导致)
* 仅在GPU多卡训练时该开关有效(Windows不支持多卡训练,因此无需打开该开关)
* GPU多卡训练时,建议开启该开关,可以提升模型的训练效果
\ No newline at end of file
# PaddleSeg 数据增强
## 数据增强基本流程
![](imgs/data_aug_flow.png)
## resize
resize 步骤是指将输入图像按照某种规则先进行resize,PaddleSeg支持以下3种resize方式:
![](imgs/aug_method.png)
- unpadding
将输入图像直接resize到某一个固定大小下,送入到网络中间训练,对应参数为AUG.FIX_RESIZE_SIZE。预测时同样操作。
- stepscaling
将输入图像按照某一个比例resize,这个比例以某一个步长在一定范围内随机变动。设定最小比例参数为`AUG.MIN_SCALE_FACTOR`, 最大比例参数`AUG.MAX_SCALE_FACTOR`,步长参数为`AUG.SCALE_STEP_SIZE`。预测时不对输入图像做处理。
- rangescaling
固定长宽比resize,即图像长边对齐到某一个固定大小,短边随同样的比例变化。设定最小大小参数为`AUG.MIN_RESIZE_VALUE`,设定最大大小参数为`AUG.MAX_RESIZE_VALUE`。预测时需要将长边对齐到`AUG.INF_RESIZE_VALUE`所指定的大小,其中`AUG.INF_RESIZE_VALUE``AUG.MIN_RESIZE_VALUE``AUG.MAX_RESIZE_VALUE`范围内。
rangescaling示意图如下:
![](imgs/rangescale.png)
## rich crop
Rich Crop是PaddleSeg结合实际业务经验开放的一套数据增强策略,面向标注数据少,测试数据情况繁杂的分割业务场景使用的数据增强策略。流程如下图所示:
![RichCrop示意图](imgs/data_aug_example.png)
rich crop是指对图像进行多种变换,保证在训练过程中数据的丰富多样性,PaddleSeg支持以下几种变换。`AUG.RICH_CROP.ENABLE`为False时会直接跳过该步骤。
- blur
图像加模糊,使用开关`AUG.RICH_CROP.BLUR`,为False时该项功能关闭。`AUG.RICH_CROP.BLUR_RATIO`控制加入模糊的概率。
- flip
图像上下翻转,使用开关`AUG.RICH_CROP.FLIP`,为False时该项功能关闭。`AUG.RICH_CROP.FLIP_RATIO`控制加入模糊的概率。
- rotation
图像旋转,`AUG.RICH_CROP.MAX_ROTATION`控制最大旋转角度。旋转产生的多余的区域的填充值为均值。
- aspect
图像长宽比调整,从图像中crop一定区域出来之后在某一长宽比内进行resize。控制参数`AUG.RICH_CROP.MIN_AREA_RATIO``AUG.RICH_CROP.ASPECT_RATIO`
- color jitter
图像颜色调整,控制参数`AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO``AUG.RICH_CROP.SATURATION_JITTER_RATIO``AUG.RICH_CROP.CONTRAST_JITTER_RATIO`
## random crop
该步骤主要是通过crop的方式使得输入到网络中的图像在某一个固定大小,控制该大小的参数为TRAIN_CROP_SIZE,类型为tuple,格式为(width, height). 当输入图像大小小于CROP_SIZE的时候会对输入图像进行padding,padding值为均值。
- preprocess
- 减均值
- 除方差
- 水平翻转
- 输入图片格式
- 原图
- 图片格式:rgb三通道图片和rgba四通道图片两种类型的图片进行训练,但是在一次训练过程只能存在一种格式。
- 图片转换:灰度图片经过预处理后之后会转变成三通道图片
- 图片参数设置:当图片为三通道图片时IMAGE_TYPE设置为rgb, 对应MEAN和STD也必须是一个长度为3的list,当图片为四通道图片时IMAGE_TYPE设置为rgba,对应的MEAN和STD必须是一个长度为4的list。
- 标注图
- 图片格式:标注图片必须为png格式的单通道多值图,元素值代表的是这个元素所属于的类别。
- 图片转换:在datalayer层对label图片进行的任何resize,以及旋转的操作,都必须采用最近邻的插值方式。
- 图片ignore:设置TRAIN.IGNORE_INDEX 参数可以选择性忽略掉属于某一个类别的所有像素点。这个参数一般设置为255
# PaddleSeg 数据准备
## 数据标注
数据标注推荐使用LabelMe工具,具体可参考文档[PaddleSeg 数据标注](./annotation/README.md)
## 语义分割标注规范
PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试集。像素标注类别需要从0开始递增。
**NOTE:** 标注图像请使用PNG无损压缩格式的图片
以Cityscapes数据集为例, 我们需要整理出训练集、验证集、测试集对应的原图和标注文件列表用于PaddleSeg训练即可。
其中`DATASET.DATA_DIR`为数据根目录,文件列表的路径以数据集根目录作为相对路径起始点。
```
./cityscapes/ # 数据集根目录
├── gtFine # 标注目录
│   ├── test
│   │   ├── berlin
│   │   └── ...
│   ├── train
│   │   ├── aachen
│   │   └── ...
│   └── val
│   ├── frankfurt
│   └── ...
└── leftImg8bit # 原图目录
├── test
│   ├── berlin
│   └── ...
├── train
│   ├── aachen
│   └── ...
└── val
├── frankfurt
└── ...
```
文件列表组织形式如下
```
原始图片路径 [SEP] 标注图片路径
```
其中`[SEP]`是文件路径分割库,可以`DATASET.SEPRATOR`配置中进行配置, 默认为空格。
如果文件名中存在**空格**,推荐使用'|'等文件名不可用字符进行切分。
**注意事项**
* 务必保证分隔符在文件列表中每行只存在一次, 如文件名中存在空格,请使用'|'等文件名不可用字符进行切分
* 文件列表请使用**UTF-8**格式保存, PaddleSeg默认使用UTF-8编码读取file_list文件
如下图所示,左边为原图的图片路径,右边为图片对应的标注路径。
![cityscapes_filelist](./docs/imgs/file_list.png)
完整的配置信息可以参考[`./dataset/cityscapes_demo`](../dataset/cityscapes_demo/)目录下的yaml和文件列表。
## 数据校验
从7方面对用户自定义的数据集和yaml配置进行校验,帮助用户排查基本的数据和配置问题。
数据校验脚本如下,支持通过`YAML_FILE_PATH`来指定配置文件。
```
# YAML_FILE_PATH为yaml配置文件路径
python pdseg/check.py --cfg ${YAML_FILE_PATH}
```
### 1 数据集基本校验
* 数据集路径检查,包括`DATASET.TRAIN_FILE_LIST``DATASET.VAL_FILE_LIST``DATASET.TEST_FILE_LIST`设置是否正确。
* 列表分割符检查,判断在`TRAIN_FILE_LIST``VAL_FILE_LIST``TEST_FILE_LIST`列表文件中的分隔符`DATASET.SEPARATOR`设置是否正确。
### 2 标注类别校验
检查实际标注类别是否和配置参数`DATASET.NUM_CLASSES``DATASET.IGNORE_INDEX`匹配。
**NOTE:**
标注图像类别数值必须在[0~(`DATASET.NUM_CLASSES`-1)]范围内或者为`DATASET.IGNORE_INDEX`
标注类别最好从0开始,否则可能影响精度。
### 3 标注像素统计
统计每种类别像素数量,显示以供参考。
### 4 标注格式校验
检查标注图像是否为PNG格式。
**NOTE:** 标注图像请使用PNG无损压缩格式的图片,若使用其他格式则可能影响精度。
### 5 图像格式校验
检查图片类型`DATASET.IMAGE_TYPE`是否设置正确。
**NOTE:** 当数据集包含三通道图片时`DATASET.IMAGE_TYPE`设置为rgb;
当数据集全部为四通道图片时`DATASET.IMAGE_TYPE`设置为rgba;
### 6 图像与标注图尺寸一致性校验
验证图像尺寸和对应标注图尺寸是否一致。
### 7 模型验证参数`EVAL_CROP_SIZE`校验
验证`EVAL_CROP_SIZE`是否设置正确,共有3种情形:
-`AUG.AUG_METHOD`为unpadding时,`EVAL_CROP_SIZE`的宽高应不小于`AUG.FIX_RESIZE_SIZE`的宽高。
-`AUG.AUG_METHOD`为stepscaling时,`EVAL_CROP_SIZE`的宽高应不小于原图中最大的宽高。
-`AUG.AUG_METHOD`为rangscaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最大的宽高。
我们将计算并给出`EVAL_CROP_SIZE`的建议值。
\ No newline at end of file
# PaddleSeg预测库部署
# PaddleSeg 安装说明
## 推荐开发环境
* Python2.7 or 3.5+
* CUDA 9.2
* cudnn v7.1
## 1. 安装PaddlePaddle
### pip安装
由于图像分割任务模型计算量大,强烈推荐在GPU版本的paddlepaddle下使用PaddleSeg.
```
pip install paddlepaddle-gpu
```
### Conda安装
PaddlePaddle最新版本1.5支持Conda安装,可以减少相关依赖安装成本,conda相关使用说明可以参考[Anaconda](https://www.anaconda.com/distribution/)
```
conda install -c paddle paddlepaddle-gpu cudatoolkit=9.0
```
更多安装方式详情可以查看 [PaddlePaddle快速开始](https://www.paddlepaddle.org.cn/start)
## 2. 下载PaddleSeg代码
```
git clone https://github.com/PaddlePaddle/PaddleSeg
```
## 3. 安装PaddleSeg依赖
```
pip install -r requirements.txt
```
## 4. 本地流程测试
通过执行以下命令,会完整执行数据下载,训练,可视化,预测模型导出四个环节,用于验证PaddleSeg安装和依赖是否正常。
```
python test/local_test_cityscapes.py
```
\ No newline at end of file
# PaddleSeg 预训练模型
PaddleSeg对所有内置的分割模型都提供了公开数据集的下的预训练模型,通过加载预训练模型后训练可以在自定义数据集中得到更稳定地效果。
## ImageNet预训练模型
所有Imagenet预训练模型来自于PaddlePaddle图像分类库,想获取更多细节请点击[这里](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification))
| 模型 | 数据集合 | Depth multiplier | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error|
|---|---|---|---|---|---|
| MobieNetV2_1.0x | ImageNet | 1.0x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar) | 72.15%/90.65% |
| MobieNetV2_0.25x | ImageNet | 0.25x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.25 <br> MODEL.DEFAULT_NORM_TYPE: bn |[MobileNetV2_0.25x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar) | 53.21%/76.52% |
| MobieNetV2_0.5x | ImageNet | 0.5x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.5 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_0.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar) | 65.03%/85.72% |
| MobieNetV2_1.5x | ImageNet | 1.5x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.5 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar) | 74.12%/91.67% |
| MobieNetV2_2.0x | ImageNet | 2.0x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 2.0 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_2.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar) | 75.23%/92.58% |
用户可以结合实际场景的精度和预测性能要求,选取不同`Depth multiplier`参数的MobileNet模型。
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error |
|---|---|---|---|---|
| Xception41 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_41 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception41_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception41_pretrained.tgz) | 79.5%/94.38% |
| Xception65 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception65_pretrained.tgz) | 80.32%/94.47% |
| Xception71 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_71 <br> MODEL.DEFAULT_NORM_TYPE: bn| coming soon | -- |
## COCO预训练模型
train数据集为coco instance分割数据集合转换成的语义分割数据集合
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Strid|multi-scale test| mIoU |
|---|---|---|---|---|---|---|
| DeepLabv3+/MobileNetv2/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn|[deeplabv3plus_coco_bn_init.tgz](https://bj.bcebos.com/v1/paddleseg/deeplabv3plus_coco_bn_init.tgz) | 16 | --| -- |
| DeeplabV3+/Xception65/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn | [xception65_coco.tgz](https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz)| 16 | -- | -- |
| UNet/bn | COCO | MODEL.MODEL_NEME: unet <br> MODEL.DEFAULT_NORM_TYPE: bn | [unet](https://paddleseg.bj.bcebos.com/models/unet_coco_v2.tgz) | 16 | -- | -- |
## Cityscapes预训练模型
train数据集合为Cityscapes 训练集合,测试为Cityscapes的验证集合
| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Stride| mutli-scale test| mIoU on val|
|---|---|---|---|---|---|---|---|
| DeepLabv3+/MobileNetv2/bn | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEEPLAB.ENCODER_WITH_ASPP: False <br> MODEL.DEEPLAB.ENABLE_DECODER: False <br> MODEL.DEFAULT_NORM_TYPE: bn|[mobilenet_cityscapes.tgz] (https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) |16|false| 0.698|
| DeepLabv3+/Xception65/gn | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: gn | [deeplabv3p_xception65_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_cityscapes.tgz) |16|false| 0.7804 |
| DeepLabv3+/Xception65/bn | Cityscapes | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_deeplab_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/Xception65_deeplab_cityscapes.tgz) | 16 | false | 0.7715 |
| ICNet/bn | Cityscapes | MODEL.MODEL_NAME: icnet <br> MODEL.DEFAULT_NORM_TYPE: bn | [icnet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/icnet_cityscapes.tgz) |16|false| 0.6854 |
# PaddleSeg 模型列表
### U-Net
U-Net 起源于医疗图像分割,整个网络是标准的encoder-decoder网络,特点是参数少,计算快,应用性强,对于一般场景适应度很高。
![](./imgs/unet.png)
### DeepLabv3+
DeepLabv3+ 是DeepLab系列的最后一篇文章,其前作有 DeepLabv1,DeepLabv2, DeepLabv3,
在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层,
其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。
![](./imgs/deeplabv3p.png)
在PaddleSeg当前实现中,支持两种分类Backbone网络的切换
- MobileNetv2:
适用于移动设备的快速网络,如果对分割性能有较高的要求,请使用这一backbone网络。
- Xception:
DeepLabv3+原始实现的backbone网络,兼顾了精度和性能,适用于服务端部署。
### ICNet
Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。 ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
整个网络结构如下:
![](./imgs/icnet.png)
## 参考
- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
- [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
# PaddleSeg特殊网络结构介绍
### Group Norm
![](./imgs/gn.png)
关于Group Norm的介绍可以参考论文:https://arxiv.org/abs/1803.08494
GN 把通道分为组,并计算每一组之内的均值和方差,以进行归一化。GN 的计算与批量大小无关,其精度也在各种批量大小下保持稳定。适应于网络参数很重的模型,比如deeplabv3+这种,可以在一个小batch下取得一个较好的训练效果。
### Synchronized Batch Norm
Synchronized Batch Norm跨GPU批归一化策略最早在[MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240)
论文中提出,在[Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)论文中以Yolov3验证了这一策略的有效性,[PaddleCV/yolov3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/yolov3)实现了这一系列策略并比Darknet框架版本在COCO17数据上mAP高5.9.
PaddleSeg基于PaddlePaddle框架的sync_batch_norm策略,可以支持通过多卡实现大batch size的分割模型训练,可以得到更高的mIoU精度。
PaddleSeg提供了 `训练`/`评估`/`预测(可视化)`/`模型导出` 等四个功能的使用脚本。四个脚本都支持通过不同的Flags来开启特定功能,也支持通过Options来修改默认的[训练配置](./config.md)。四者的使用方式非常接近,如下:
```shell
# 训练
python pdseg/train.py ${FLAGS} ${OPTIONS}
# 评估
python pdseg/eval.py ${FLAGS} ${OPTIONS}
# 预测/可视化
python pdseg/vis.py ${FLAGS} ${OPTIONS}
# 模型导出
python pdseg/export_model.py ${FLAGS} ${OPTIONS}
```
`Note`:
> * FLAGS必须位于OPTIONS之前,否会将会遇到报错,例如如下的例子:
>
> ```shell
> # FLAGS "--cfg configs/cityscapes.yaml" 必须在 OPTIONS "BATCH_SIZE 1" 之前
> python pdseg/train.py BATCH_SIZE 1 --cfg configs/cityscapes.yaml
> ```
## FLAGS
|FLAG|支持脚本|用途|默认值|备注|
|-|-|-|-|-|
|--cfg|ALL|配置文件路径|None||
|--use_gpu|train/eval/vis|是否使用GPU进行训练|False||
|--use_mpio|train/eval|是否使用多线程进行IO处理|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。</br> NOTE:windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 </br> |
|--use_tbx|train|是否使用tensorboardX记录训练数据|False||
|--log_steps|train|训练日志的打印周期(单位为step)|10||
|--debug|train|是否打印debug信息|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度|
|--tbx_log_dir|train|tensorboardX的日志路径|None||
|--do_eval|train|是否在保存模型时进行效果评估|False||
|--vis_dir|vis|保存可视化图片的路径|"visual"||
|--also_save_raw_results|vis|是否保存原始的预测图片|False||
## OPTIONS
详见[训练配置](./config.md)
## 使用示例
下面通过一个简单的示例,说明如何使用PaddleSeg提供的预训练模型进行finetune。我们选择基于COCO数据集预训练的unet模型作为pretrained模型,在一个Oxford-IIIT Pet数据集上进行finetune。
**Note:** 为了快速体验,我们使用Oxford-IIIT Pet做了一个小型数据集,后续数据都使用该小型数据集。
### 准备工作
在开始教程前,请先确认准备工作已经完成:
1. 下载合适版本的paddlepaddle
2. PaddleSeg相关依赖已经安装
如果有不确认的地方,请参考[安装说明](./docs/installation.md)
### 下载预训练模型
```shell
# 下载预训练模型
wget https://bj.bcebos.com/v1/paddleseg/models/unet_coco_init.tgz
# 解压缩到当前路径下
tar xvzf unet_coco_init.tgz
```
### 下载Oxford-IIIT数据集
```shell
# 下载Oxford-IIIT Pet数据集
wget https://paddleseg.bj.bcebos.com/dataset/mini_pet.zip --no-check-certificate
# 解压缩到当前路径下
unzip mini_pet.zip
```
### Finetune
接着开始Finetune,为了方便体验,我们在configs目录下放置了Oxford-IIIT Pet所对应的配置文件`unet_pet.yaml`,可以通过`--cfg`指向该文件来设置训练配置。
我们选择两张GPU进行训练,这可以通过环境变量`CUDA_VISIBLE_DEVICES`来指定。
除此之外,我们指定总BATCH_SIZE为4,PaddleSeg会根据可用的GPU数量,将数据平分到每张卡上,务必确保BATCH_SIZE为GPU数量的整数倍(在本例中,每张卡的BATCH_SIZE为2)。
```
export CUDA_VISIBLE_DEVICES=0,1
python pdseg/train.py --use_gpu \
--do_eval \
--use_tbx \
--tbx_log_dir train_log \
--cfg configs/unet_pet.yaml \
BATCH_SIZE 4 \
TRAIN.PRETRAINED_MODEL unet_coco_init \
DATASET.DATA_DIR mini_pet \
DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
DATASET.TRAIN_FILE_LIST mini_pet/file_list/train_list.txt \
DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
DATASET.VIS_FILE_LIST mini_pet/file_list/val_list.txt
TRAIN.SYNC_BATCH_NORM True
SOLVER.LR 5e-5
```
`NOTE`:
> * 上述示例中,一共存在三套配置方案: PaddleSeg默认配置/unet_pet.yaml/OPTIONS,三者的优先级顺序为 OPTIONS > yaml > 默认配置。这个原则对于train.py/eval.py/vis.py/export_model.py都适用
>
> * 如果发现因为内存不足而Crash。请适当调低BATCH_SIZE。如果本机GPU内存充足,则可以调高BATCH_SIZE的大小以获得更快的训练速度
>
> * windows并不支持多卡训练
### 训练过程可视化
当打开do_eval和use_tbx两个开关后,我们可以通过TensorBoard查看训练的效果
```shell
tensorboard --logdir train_log --host {$HOST_IP} --port {$PORT}
```
NOTE:
1. 上述示例中,$HOST_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口
2. 数据量较大时,前端加载速度会比较慢,请耐心等待
启动TensorBoard命令后,我们可以在浏览器中查看对应的训练数据
`SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势
![](docs/imgs/tensorboard_scalar.JPG)
`IMAGE`这个tab中,查看样本的预测情况
![](docs/imgs/tensorboard_image.JPG)
### 模型评估
训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为500,保存间隔为10,因此一共会产生50个定期保存的模型,加上最终保存的final模型,一共有51个模型。我们选择最后保存的模型进行效果的评估:
```shell
python pdseg/eval.py --use_gpu \
--cfg configs/unet_pet.yaml \
DATASET.DATA_DIR mini_pet \
DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
### 模型预测/可视化
通过vis.py来评估模型效果,我们选择最后保存的模型进行效果的评估:
```shell
python pdseg/vis.py --use_gpu \
--cfg configs/unet_pet.yaml \
DATASET.DATA_DIR mini_pet \
DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
`NOTE`
1. 可视化的图片会默认保存在visual/visual_results目录下,可以通过`--vis_dir`来指定输出目录
2. 训练过程中会使用DATASET.VIS_FILE_LIST中的图片进行可视化显示,而vis.py则会使用DATASET.TEST_FILE_LIST
### 模型导出
当确定模型效果满足预期后,我们需要通过export_model.py来导出一个可用于部署到服务端预测的模型:
```shell
python pdseg/export_model.py --cfg configs/unet_pet.yaml \
TEST.TEST_MODEL test/saved_models/unet_pet/final
```
模型会导出到freeze_model目录,接下来就是进行模型的部署,相关步骤,请查看[模型部署](./inference/README.md)
\ No newline at end of file
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
include(external-cmake/yaml-cpp.cmake)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/yaml-cpp/include")
include_directories("${PADDLE_DIR}/")
include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
include_directories("${PADDLE_DIR}/third_party/install/glog/include")
include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
include_directories("${PADDLE_DIR}/third_party/boost")
include_directories("${PADDLE_DIR}/third_party/eigen3")
link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
link_directories("${PADDLE_DIR}/paddle/lib/")
link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
link_directories("${CMAKE_CURRENT_BINARY_DIR}")
if (WIN32)
include_directories("${PADDLE_DIR}/paddle/fluid/inference")
link_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${OPENCV_DIR}/build/include")
include_directories("${OPENCV_DIR}/opencv/build/include")
link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
else ()
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
include_directories("${OPENCV_DIR}/include")
link_directories("${OPENCV_DIR}/lib64")
endif ()
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
# TODO let users define cuda lib path
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
if (NOT WIN32)
if (USE_TENSORRT AND WITH_GPU)
include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
endif()
endif(NOT WIN32)
if (NOT WIN32)
set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
if(EXISTS ${NGRAPH_PATH})
include(GNUInstallDirs)
include_directories("${NGRAPH_PATH}/include")
link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_MKL)
include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
if (WIN32)
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
endif ()
set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if(WITH_STATIC_LIB)
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
else()
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf snappystream snappy z xxhash
${EXTERNAL_LIB})
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
opencv_world346 glog libyaml-cppmt gflags_static libprotobuf snappy zlibstatic xxhash snappystream ${EXTERNAL_LIB})
set(DEPS ${DEPS} libcmt shlwapi)
set(DEPS ${DEPS} ${YAML_CPP_LIBRARY})
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgcodecs${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgproc${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_core${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_highgui${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libIlmImf${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjasper${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibpng${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibtiff${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libittnotify${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjpeg-turbo${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibwebp${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libzlib${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
SET(PADDLESEG_INFERENCE_SRCS preprocessor/preprocessor.cpp preprocessor/preprocessor_seg.cpp predictor/seg_predictor.cpp)
ADD_LIBRARY(libpaddleseg_inference STATIC ${PADDLESEG_INFERENCE_SRCS})
target_link_libraries(libpaddleseg_inference ${DEPS})
add_executable(demo demo.cpp)
ADD_DEPENDENCIES(libpaddleseg_inference yaml-cpp)
ADD_DEPENDENCIES(demo yaml-cpp libpaddleseg_inference)
target_link_libraries(demo ${DEPS} libpaddleseg_inference)
add_custom_command(TARGET demo POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/bin/mkldnn.dll ./mkldnn.dll
)
\ No newline at end of file
{
"configurations": [
{
"name": "x64-Release",
"generator": "Ninja",
"configurationType": "RelWithDebInfo",
"inheritEnvironments": [ "msvc_x64_x64" ],
"buildRoot": "${projectDir}\\out\\build\\${name}",
"installRoot": "${projectDir}\\out\\install\\${name}",
"cmakeCommandArgs": "",
"buildCommandArgs": "-v",
"ctestCommandArgs": "",
"variables": [
{
"name": "CUDA_LIB",
"value": "C:/PaddleDeploy/cudalib/v8.0/lib/x64",
"type": "PATH"
},
{
"name": "OPENCV_DIR",
"value": "C:/PaddleDeploy/opencv",
"type": "PATH"
},
{
"name": "PADDLE_DIR",
"value": "C:/PaddleDeploy/fluid_inference",
"type": "PATH"
},
{
"name": "CMAKE_BUILD_TYPE",
"value": "Release",
"type": "STRING"
}
]
}
]
}
\ No newline at end of file
# 依赖安装
## OpenCV
OpenCV官方Release地址:https://opencv.org/releases/
### Windows
1. 下载Windows安装包:OpenCV-3.4.6
2. 双击安装到指定位置,如D:\opencv
3. 配置环境变量
> 1.我的电脑->属性->高级系统设置->环境变量
> 2.在系统变量中找到Path(如没有,自行创建),并双击编辑
> 3.新建,将opencv路径填入并保存,如D:\opencv\build\x64\vc14\bin
### Linux
1. 下载OpenCV-3.4.6 Sources,并解压,如/home/user/opencv-3.4.6
2. cd opencv-3.4.6 & mkdir build & mkdir release
3. 修改modules/videoio/src/cap_v4l.cpp 在代码第253行下,插入如下代码
```
#ifndef V4L2_CID_ROTATE
#define V4L2_CID_ROTATE (V4L2_CID_BASE+34)
#endif
#ifndef V4L2_CID_IRIS_ABSOLUTE
#define V4L2_CID_IRIS_ABSOLUTE (V4L2_CID_CAMERA_CLASS_BASE+17)
#endif
```
3. cd build
4. cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/ssd2/Jason/tmp/opencv-3.4.6/release/ --OPENCV_FORCE_3RDPARTY_BUILD=OFF
5. make -j10
6. make install
编译后产出的头文件和lib即安装在/home/user/opencv-3.4.6/release目录下
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# PaddleSeg C++预测部署方案
## 说明
本目录提供一个跨平台的图像分割模型的C++预测部署方案,用户通过一定的配置,加上少量的代码,即可把模型集成到自己的服务中,完成图像分割的任务。
主要设计的目标包括以下三点:
- 跨平台,支持在 windows和Linux完成编译、开发和部署
- 支持主流图像分割任务,用户通过少量配置即可加载模型完成常见预测任务,比如人像分割等
- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理、后处理等逻辑
## 主要目录和文件
| 文件 | 作用 |
|-------|----------|
| CMakeList.txt | cmake 编译配置文件 |
| external-cmake| 依赖的外部项目 cmake (目前仅有yaml-cpp)|
| demo.cpp | 示例C++代码,演示加载模型完成预测任务 |
| predictor | 加载模型并预测的类代码|
| preprocess |数据预处理相关的类代码|
| utils | 一些基础公共函数|
| images/humanseg | 样例人像分割模型的测试图片目录|
| conf/humanseg.yaml | 示例人像分割模型配置|
| tools/visualize.py | 预测结果彩色可视化脚本 |
## Windows平台编译
### 前置条件
* Visual Studio 2015+
* CUDA 8.0 / CUDA 9.0 + CuDNN 7
* CMake 3.0+
我们分别在 `Visual Studio 2015``Visual Studio 2019 Community` 两个版本下做了测试.
**下面所有示例,以根目录为 `D:\`演示**
### Step1: 下载代码
1. `git clone http://gitlab.baidu.com/Paddle/PaddleSeg.git`
2. 拷贝 `D:\PaddleSeg\inference\` 目录到 `D:\PaddleDeploy`下
目录`D:\PaddleDeploy\inference` 目录包含了`CMakelist.txt`以及代码等项目文件.
### Step2: 下载PaddlePaddle预测库fluid_inference
根据Windows环境,下载相应版本的PaddlePaddle预测库,并解压到`D:\PaddleDeploy\`目录
| CUDA | GPU | 下载地址 |
|------|------|--------|
| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
`D:\PaddleDeploy\fluid_inference`目录包含内容为:
```bash
paddle # paddle核心目录
third_party # paddle 第三方依赖
version.txt # 编译的版本信息
```
### Step3: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\PaddleDeploy\opencv`
3. 配置环境变量,如下流程所示
1. 我的电脑->属性->高级系统设置->环境变量
2. 在系统变量中找到Path(如没有,自行创建),并双击编辑
3. 新建,将opencv路径填入并保存,如`D:\PaddleDeploy\opencv\build\x64\vc14\bin`
### Step4: 以VS2015为例编译代码
以下命令需根据自己系统中各相关依赖的路径进行修改
* 调用VS2015, 请根据实际VS安装路径进行调整,打开cmd命令行工具执行以下命令
* 其他vs版本,请查找到对应版本的`vcvarsall.bat`路径,替换本命令即可
```
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
```
* CMAKE编译工程
* PADDLE_DIR: fluid_inference预测库目录
* CUDA_LIB: CUDA动态库目录, 请根据实际安装情况调整
* OPENCV_DIR: OpenCV解压目录
```
# 创建CMake的build目录
D:
cd PaddleDeploy\inference
mkdir build
cd build
D:\PaddleDeploy\inference\build> cmake .. -G "Visual Studio 14 2015 Win64" -DWITH_GPU=ON -DPADDLE_DIR=D:\PaddleDeploy\fluid_inference -DCUDA_LIB=D:\PaddleDeploy\cudalib\v8.0\lib\x64 -DOPENCV_DIR=D:\PaddleDeploy\opencv -T host=x64
```
这里的`cmake`参数`-G`, 可以根据自己的`VS`版本调整,具体请参考[cmake文档](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html)
* 生成可执行文件
```
D:\PaddleDeploy\inference\build> msbuild /m /p:Configuration=Release cpp_inference_demo.sln
```
### Step5: 预测及可视化
上步骤中编译生成的可执行文件和相关动态链接库并保存在build/Release目录下,可通过Windows命令行直接调用。
可下载并解压示例模型进行测试,点击下载示例的人像分割模型[下载地址](https://paddleseg.bj.bcebos.com/inference_model/deeplabv3p_xception65_humanseg.tgz)
假设解压至 `D:\PaddleDeploy\models\deeplabv3p_xception65_humanseg` ,执行以下命令:
```
cd Release
D:\PaddleDeploy\inference\build\Release> demo.ext --conf=D:\\PaddleDeploy\\inference\\conf\\humanseg.yaml --input_dir=D:\\PaddleDeploy\\inference\\images\humanseg\\
```
预测使用的两个命令参数说明如下:
| 参数 | 含义 |
|-------|----------|
| conf | 模型配置的yaml文件路径 |
| input_dir | 需要预测的图片目录 |
**配置文件**的样例以及字段注释说明请参考: [conf/humanseg.yaml](inference/conf/humanseg.yaml)
样例程序会扫描input_dir目录下的所有图片,并生成对应的预测结果图片。
文件`14.jpg`预测的结果存储在`14_jpg.png`中,可视化结果在`14_jpg_scoremap.png`中, 原始尺寸的预测结果在`14_jpg_recover.png`中。
输入原图
![avatar](inference/images/humanseg/demo.jpg)
输出预测结果
![avatar](inference/images/humanseg/demo_jpg_recover.png)
DEPLOY:
USE_GPU: 1
MODEL_PATH: "C:\\PaddleDeploy\\models\\deeplabv3p_xception65_humanseg"
MODEL_NAME: "unet"
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
EVAL_CROP_SIZE: (513, 513)
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
IMAGE_TYPE: "rgb"
NUM_CLASSES: 2
CHANNELS : 3
PRE_PROCESSOR: "SegPreProcessor"
PREDICTOR_MODE: "ANALYSIS"
BATCH_SIZE : 3
\ No newline at end of file
#include <glog/logging.h>
#include <utils/utils.h>
#include <predictor/seg_predictor.h>
DEFINE_string(conf, "", "Configuration File Path");
DEFINE_string(input_dir, "", "Directory of Input Images");
int main(int argc, char** argv) {
// 0. parse args
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_conf.empty() || FLAGS_input_dir.empty()) {
std::cout << "Usage: ./predictor --conf=/config/path/to/your/model --input_dir=/directory/of/your/input/images";
return -1;
}
// 1. create a predictor and init it with conf
PaddleSolution::Predictor predictor;
if (predictor.init(FLAGS_conf) != 0) {
LOG(FATAL) << "Fail to init predictor";
return -1;
}
// 2. get all the images with extension '.jpeg' at input_dir
auto imgs = PaddleSolution::utils::get_directory_images(FLAGS_input_dir, ".jpeg|.jpg");
// 3. predict
predictor.predict(imgs);
return 0;
}
find_package(Git REQUIRED)
include(ExternalProject)
message("${CMAKE_BUILD_TYPE}")
ExternalProject_Add(
yaml-cpp
GIT_REPOSITORY https://github.com/jbeder/yaml-cpp.git
GIT_TAG e0e01d53c27ffee6c86153fa41e7f5e57d3e5c90
CMAKE_ARGS
-DYAML_CPP_BUILD_TESTS=OFF
-DYAML_CPP_BUILD_TOOLS=OFF
-DYAML_CPP_INSTALL=OFF
-DYAML_CPP_BUILD_CONTRIB=OFF
-DMSVC_SHARED_RT=OFF
-DBUILD_SHARED_LIBS=OFF
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
-DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
-DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp"
# Disable install step
INSTALL_COMMAND ""
LOG_DOWNLOAD ON
)
\ No newline at end of file
#include "seg_predictor.h"
namespace PaddleSolution {
int Predictor::init(const std::string& conf) {
if (!_model_config.load_config(conf)) {
LOG(FATAL) << "Fail to load config file: [" << conf << "]";
return -1;
}
_preprocessor = PaddleSolution::create_processor(conf);
if (_preprocessor == nullptr) {
LOG(FATAL) << "Failed to create_processor";
return -1;
}
_mask.resize(_model_config._resize[0] * _model_config._resize[1]);
_scoremap.resize(_model_config._resize[0] * _model_config._resize[1]);
bool use_gpu = _model_config._use_gpu;
const auto& model_dir = _model_config._model_path;
const auto& model_filename = _model_config._model_file_name;
const auto& params_filename = _model_config._param_file_name;
// load paddle model file
if (_model_config._predictor_mode == "NATIVE") {
paddle::NativeConfig config;
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.prog_file = prog_file;
config.param_file = param_file;
config.fraction_of_gpu_memory = 0;
config.use_gpu = use_gpu;
config.device = 0;
_main_predictor = paddle::CreatePaddlePredictor(config);
}
else if (_model_config._predictor_mode == "ANALYSIS") {
paddle::AnalysisConfig config;
if (use_gpu) {
config.EnableUseGpu(100, 0);
}
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.SetModel(prog_file, param_file);
config.SwitchUseFeedFetchOps(false);
_main_predictor = paddle::CreatePaddlePredictor(config);
}
else {
return -1;
}
return 0;
}
int Predictor::predict(const std::vector<std::string>& imgs) {
if (_model_config._predictor_mode == "NATIVE") {
return native_predict(imgs);
}
else if (_model_config._predictor_mode == "ANALYSIS") {
return analysis_predict(imgs);
}
return -1;
}
int Predictor::output_mask(const std::string& fname, float* p_out, int length, int* height, int* width) {
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
int eval_num_class = _model_config._class_num;
int blob_out_len = length;
int seg_out_len = eval_height * eval_width * eval_num_class;
if (blob_out_len != seg_out_len) {
LOG(ERROR) << " [FATAL] unequal: input vs output [" <<
seg_out_len << "|" << blob_out_len << "]" << std::endl;
return -1;
}
//post process
_mask.clear();
_scoremap.clear();
int out_img_len = eval_height * eval_width;
for (int i = 0; i < out_img_len; ++i) {
float max_value = -1;
int label = 0;
for (int j = 0; j < eval_num_class; ++j) {
int index = i + j * out_img_len;
if (index >= blob_out_len) {
break;
}
float value = p_out[index];
if (value > max_value) {
max_value = value;
label = j;
}
}
if (label == 0) max_value = 0;
_mask[i] = uchar(label);
_scoremap[i] = uchar(max_value * 255);
}
cv::Mat mask_png = cv::Mat(eval_height, eval_width, CV_8UC1);
mask_png.data = _mask.data();
std::string nname(fname);
auto pos = fname.find(".");
nname[pos] = '_';
std::string mask_save_name = nname + ".png";
cv::imwrite(mask_save_name, mask_png);
cv::Mat scoremap_png = cv::Mat(eval_height, eval_width, CV_8UC1);
scoremap_png.data = _scoremap.data();
std::string scoremap_save_name = nname + std::string("_scoremap.png");
cv::imwrite(scoremap_save_name, scoremap_png);
std::cout << "save mask of [" << fname << "] done" << std::endl;
if (height && width) {
int recover_height = *height;
int recover_width = *width;
cv::Mat recover_png = cv::Mat(recover_height, recover_width, CV_8UC1);
cv::resize(scoremap_png, recover_png, cv::Size(recover_width, recover_height),
0, 0, cv::INTER_CUBIC);
std::string recover_name = nname + std::string("_recover.png");
cv::imwrite(recover_name, recover_png);
}
return 0;
}
int Predictor::native_predict(const std::vector<std::string>& imgs)
{
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
std::size_t total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& org_width = _org_width;
auto& org_height = _org_height;
auto& imgs_batch = _imgs_batch;
input_buffer.resize(batch_buffer_size);
org_width.resize(default_batch_size);
org_height.resize(default_batch_size);
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
input_buffer.resize(real_buffer_size);
org_height.resize(batch_size);
org_width.resize(batch_size);
for (int i = 0; i < batch_size; ++i) {
org_width[i] = org_height[i] = 0;
}
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_width.data(), org_height.data())) {
return -1;
}
paddle::PaddleTensor im_tensor;
im_tensor.name = "image";
im_tensor.shape = std::vector<int>({ batch_size, channels, eval_height, eval_width });
im_tensor.data.Reset(input_buffer.data(), real_buffer_size * sizeof(float));
im_tensor.dtype = paddle::PaddleDType::FLOAT32;
feeds.push_back(im_tensor);
_outputs.clear();
auto t1 = std::chrono::high_resolution_clock::now();
if (!_main_predictor->Run(feeds, &_outputs, batch_size)) {
LOG(ERROR) << "Failed: NativePredictor->Run() return false at batch: " << u;
continue;
}
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
int out_num = 1;
// print shape of first output tensor for debugging
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < _outputs[0].shape.size(); ++j) {
out_num *= _outputs[0].shape[j];
std::cout << _outputs[0].shape[j] << ",";
}
std::cout << ")" << std::endl;
const size_t nums = _outputs.front().data.length() / sizeof(float);
if (out_num % batch_size != 0 || out_num != nums) {
LOG(ERROR) << "outputs data size mismatch with shape size.";
return -1;
}
for (int i = 0; i < batch_size; ++i) {
float* output_addr = (float*)(_outputs[0].data.data()) + i * (out_num / batch_size);
output_mask(imgs_batch[i], output_addr, out_num / batch_size, &org_height[i], &org_width[i]);
}
}
return 0;
}
int Predictor::analysis_predict(const std::vector<std::string>& imgs) {
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
auto total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& org_width = _org_width;
auto& org_height = _org_height;
auto& imgs_batch = _imgs_batch;
input_buffer.resize(batch_buffer_size);
org_width.resize(default_batch_size);
org_height.resize(default_batch_size);
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
input_buffer.resize(real_buffer_size);
org_height.resize(batch_size);
org_width.resize(batch_size);
for (int i = 0; i < batch_size; ++i) {
org_width[i] = org_height[i] = 0;
}
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_height.data(), org_width.data())) {
return -1;
}
auto im_tensor = _main_predictor->GetInputTensor("image");
im_tensor->Reshape({ batch_size, channels, eval_height, eval_width });
im_tensor->copy_from_cpu(input_buffer.data());
auto t1 = std::chrono::high_resolution_clock::now();
_main_predictor->ZeroCopyRun();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
auto output_names = _main_predictor->GetOutputNames();
auto output_t = _main_predictor->GetOutputTensor(output_names[0]);
std::vector<float> out_data;
std::vector<int> output_shape = output_t->shape();
int out_num = 1;
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < output_shape.size(); ++j) {
out_num *= output_shape[j];
std::cout << output_shape[j] << ",";
}
std::cout << ")" << std::endl;
out_data.resize(out_num);
output_t->copy_to_cpu(out_data.data());
for (int i = 0; i < batch_size; ++i) {
float* out_addr = out_data.data() + (out_num / batch_size) * i;
output_mask(imgs_batch[i], out_addr, out_num / batch_size, &org_height[i], &org_width[i]);
}
}
return 0;
}
}
#pragma once
#include <memory>
#include <string>
#include <vector>
#include <thread>
#include <chrono>
#include <algorithm>
#include <glog/logging.h>
#include <yaml-cpp/yaml.h>
#include <opencv2/opencv.hpp>
#include <paddle_inference_api.h>
#include <utils/seg_conf_parser.h>
#include <utils/utils.h>
#include <preprocessor/preprocessor.h>
namespace PaddleSolution {
class Predictor {
public:
// init a predictor with a yaml config file
int init(const std::string& conf);
// predict api
int predict(const std::vector<std::string>& imgs);
private:
int output_mask(
const std::string& fname,
float* p_out,
int length,
int* height = NULL,
int* width = NULL);
int native_predict(const std::vector<std::string>& imgs);
int analysis_predict(const std::vector<std::string>& imgs);
private:
std::vector<float> _buffer;
std::vector<int> _org_width;
std::vector<int> _org_height;
std::vector<std::string> _imgs_batch;
std::vector<paddle::PaddleTensor> _outputs;
std::vector<uchar> _mask;
std::vector<uchar> _scoremap;
PaddleSolution::PaddleSegModelConfigPaser _model_config;
std::shared_ptr<PaddleSolution::ImagePreProcessor> _preprocessor;
std::unique_ptr<paddle::PaddlePredictor> _main_predictor;
};
}
#include <glog/logging.h>
#include "preprocessor.h"
#include "preprocessor_seg.h"
namespace PaddleSolution {
std::shared_ptr<ImagePreProcessor> create_processor(const std::string& conf_file) {
auto config = std::make_shared<PaddleSolution::PaddleSegModelConfigPaser>();
if (!config->load_config(conf_file)) {
LOG(FATAL) << "fail to laod conf file [" << conf_file << "]";
return nullptr;
}
if (config->_pre_processor == "SegPreProcessor") {
auto p = std::make_shared<SegPreProcessor>();
if (!p->init(config)) {
return nullptr;
}
return p;
}
LOG(FATAL) << "unknown processor_name [" << config->_pre_processor << "]";
return nullptr;
}
}
#pragma once
#include <vector>
#include <string>
#include <memory>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "utils/seg_conf_parser.h"
namespace PaddleSolution {
class ImagePreProcessor {
protected:
ImagePreProcessor() {};
public:
virtual ~ImagePreProcessor() {}
virtual bool single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) = 0;
virtual bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) = 0;
}; // end of class ImagePreProcessor
std::shared_ptr<ImagePreProcessor> create_processor(const std::string &config_file);
} // end of namespace paddle_solution
#include <thread>
#include <glog/logging.h>
#include "preprocessor_seg.h"
namespace PaddleSolution {
bool SegPreProcessor::single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) {
cv::Mat im = cv::imread(fname, -1);
if (im.data == nullptr || im.empty()) {
LOG(ERROR) << "Failed to open image: " << fname;
return false;
}
int channels = im.channels();
*ori_w = im.cols;
*ori_h = im.rows;
if (channels == 1) {
cv::cvtColor(im, im, cv::COLOR_GRAY2BGR);
}
channels = im.channels();
if (channels != 3 && channels != 4) {
LOG(ERROR) << "Only support rgb(gray) and rgba image.";
return false;
}
cv::Size resize_size(_config->_resize[0], _config->_resize[1]);
int rw = resize_size.width;
int rh = resize_size.height;
if (*ori_h != rh || *ori_w != rw) {
cv::resize(im, im, resize_size, 0, 0, cv::INTER_LINEAR);
}
float* pmean = _config->_mean.data();
float* pscale = _config->_std.data();
for (int h = 0; h < rh; ++h) {
const uchar* ptr = im.ptr<uchar>(h);
int im_index = 0;
for (int w = 0; w < rw; ++w) {
for (int c = 0; c < channels; ++c) {
int top_index = (c * rh + h) * rw + w;
float pixel = static_cast<float>(ptr[im_index++]);
pixel = (pixel - pmean[c]) / pscale[c];
data[top_index] = pixel;
}
}
}
return true;
}
bool SegPreProcessor::batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) {
auto ic = _config->_channels;
auto iw = _config->_resize[0];
auto ih = _config->_resize[1];
std::vector<std::thread> threads;
for (int i = 0; i < imgs.size(); ++i) {
std::string path = imgs[i];
float* buffer = data + i * ic * iw * ih;
int* width = &ori_w[i];
int* height = &ori_h[i];
threads.emplace_back([this, path, buffer, width, height] {
single_process(path, buffer, width, height);
});
}
for (auto& t : threads) {
if (t.joinable()) {
t.join();
}
}
return true;
}
bool SegPreProcessor::init(std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> config) {
_config = config;
return true;
}
}
#pragma once
#include "preprocessor.h"
namespace PaddleSolution {
class SegPreProcessor : public ImagePreProcessor {
public:
SegPreProcessor() : _config(nullptr){
};
bool init(std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> config);
bool single_process(const std::string &fname, float* data, int* ori_w, int* ori_h);
bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h);
private:
std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> _config;
};
}
import cv2
import sys
# ColorMap for visualization more clearly
color_map = [[128, 64, 128], [244, 35, 231], [69, 69, 69], [102, 102, 156],
[190, 153, 153], [153, 153, 153], [250, 170, 29], [219, 219, 0],
[106, 142, 35], [152, 250, 152], [69, 129, 180], [219, 19, 60],
[255, 0, 0], [0, 0, 142], [0, 0, 69], [0, 60, 100], [0, 79, 100],
[0, 0, 230], [119, 10, 32]]
im = cv2.imread(sys.argv[1])
# Please note (224, 224) just for daheng model
print("visualizing...")
for i in range(0, 224):
for j in range(0, 224):
im[i, j] = color_map[im[i, j, 0]]
cv2.imwrite(sys.argv[1], im)
print("visualizing done!")
#pragma once
#include <iostream>
#include <vector>
#include <string>
#include <yaml-cpp/yaml.h>
namespace PaddleSolution {
class PaddleSegModelConfigPaser {
public:
PaddleSegModelConfigPaser()
:_class_num(0),
_channels(0),
_use_gpu(0),
_batch_size(1),
_model_file_name("__model__"),
_param_file_name("__params__") {
}
~PaddleSegModelConfigPaser() {
}
void reset() {
_resize.clear();
_mean.clear();
_std.clear();
_img_type.clear();
_class_num = 0;
_channels = 0;
_use_gpu = 0;
_batch_size = 1;
_model_name.clear();
_model_file_name.clear();
_model_path.clear();
_param_file_name.clear();
}
std::string process_parenthesis(const std::string& str) {
if (str.size() < 2) {
return str;
}
std::string nstr(str);
if (str[0] == '(' && str.back() == ')') {
nstr[0] = '[';
nstr[str.size() - 1] = ']';
}
return nstr;
}
template <typename T>
std::vector<T> parse_str_to_vec(const std::string& str) {
std::vector<T> data;
auto node = YAML::Load(str);
for (const auto& item : node) {
data.push_back(item.as<T>());
}
return data;
}
bool load_config(const std::string& conf_file) {
reset();
YAML::Node config = YAML::LoadFile(conf_file);
// 1. get resize
auto str = config["DEPLOY"]["EVAL_CROP_SIZE"].as<std::string>();
_resize = parse_str_to_vec<int>(process_parenthesis(str));
// 2. get mean
for (const auto& item : config["DEPLOY"]["MEAN"]) {
_mean.push_back(item.as<float>());
}
// 3. get std
for (const auto& item : config["DEPLOY"]["STD"]) {
_std.push_back(item.as<float>());
}
// 4. get image type
_img_type = config["DEPLOY"]["IMAGE_TYPE"].as<std::string>();
// 5. get class number
_class_num = config["DEPLOY"]["NUM_CLASSES"].as<int>();
// 6. get model_name
_model_name = config["DEPLOY"]["MODEL_NAME"].as<std::string>();
// 7. set model path
_model_path = config["DEPLOY"]["MODEL_PATH"].as<std::string>();
// 8. get model file_name
_model_file_name = config["DEPLOY"]["MODEL_FILENAME"].as<std::string>();
// 9. get model param file name
_param_file_name = config["DEPLOY"]["PARAMS_FILENAME"].as<std::string>();
// 10. get pre_processor
_pre_processor = config["DEPLOY"]["PRE_PROCESSOR"].as<std::string>();
// 11. use_gpu
_use_gpu = config["DEPLOY"]["USE_GPU"].as<int>();
// 12. predictor_mode
_predictor_mode = config["DEPLOY"]["PREDICTOR_MODE"].as<std::string>();
// 13. batch_size
_batch_size = config["DEPLOY"]["BATCH_SIZE"].as<int>();
// 14. channels
_channels = config["DEPLOY"]["CHANNELS"].as<int>();
return true;
}
void debug() const {
std::cout << "EVAL_CROP_SIZE: (" << _resize[0] << ", " << _resize[1] << ")" << std::endl;
std::cout << "MEAN: [";
for (int i = 0; i < _mean.size(); ++i) {
if (i != _mean.size() - 1) {
std::cout << _mean[i] << ", ";
} else {
std::cout << _mean[i];
}
}
std::cout << "]" << std::endl;
std::cout << "STD: [";
for (int i = 0; i < _std.size(); ++i) {
if (i != _std.size() - 1) {
std::cout << _std[i] << ", ";
}
else {
std::cout << _std[i];
}
}
std::cout << "]" << std::endl;
std::cout << "DEPLOY.IMAGE_TYPE: " << _img_type << std::endl;
std::cout << "DEPLOY.NUM_CLASSES: " << _class_num << std::endl;
std::cout << "DEPLOY.CHANNELS: " << _channels << std::endl;
std::cout << "DEPLOY.MODEL_PATH: " << _model_path << std::endl;
std::cout << "DEPLOY.MODEL_NAME: " << _model_name << std::endl;
std::cout << "DEPLOY.MODEL_FILENAME: " << _model_file_name << std::endl;
std::cout << "DEPLOY.PARAMS_FILENAME: " << _param_file_name << std::endl;
std::cout << "DEPLOY.PRE_PROCESSOR: " << _pre_processor << std::endl;
std::cout << "DEPLOY.USE_GPU: " << _use_gpu << std::endl;
std::cout << "DEPLOY.PREDICTOR_MODE: " << _predictor_mode << std::endl;
std::cout << "DEPLOY.BATCH_SIZE: " << _batch_size << std::endl;
}
// DEPLOY.EVAL_CROP_SIZE
std::vector<int> _resize;
// DEPLOY.MEAN
std::vector<float> _mean;
// DEPLOY.STD
std::vector<float> _std;
// DEPLOY.IMAGE_TYPE
std::string _img_type;
// DEPLOY.NUM_CLASSES
int _class_num;
// DEPLOY.CHANNELS
int _channels;
// DEPLOY.MODEL_PATH
std::string _model_path;
// DEPLOY.MODEL_NAME
std::string _model_name;
// DEPLOY.MODEL_FILENAME
std::string _model_file_name;
// DEPLOY.PARAMS_FILENAME
std::string _param_file_name;
// DEPLOY.PRE_PROCESSOR
std::string _pre_processor;
// DEPLOY.USE_GPU
int _use_gpu;
// DEPLOY.PREDICTOR_MODE
std::string _predictor_mode;
// DEPLOY.BATCH_SIZE
int _batch_size;
};
}
#pragma once
#include <iostream>
#include <vector>
#include <string>
#include <filesystem>
namespace PaddleSolution {
namespace utils {
inline std::string path_join(const std::string& dir, const std::string& path) {
std::string seperator = "/";
#ifdef _WIN32
seperator = "\\";
#endif
return dir + seperator + path;
}
// scan a directory and get all files with input extensions
inline std::vector<std::string> get_directory_images(const std::string& path, const std::string& exts)
{
std::vector<std::string> imgs;
for (const auto& item : std::experimental::filesystem::directory_iterator(path)) {
auto suffix = item.path().extension().string();
if (exts.find(suffix) != std::string::npos && suffix.size() > 0) {
auto fullname = path_join(path, item.path().filename().string());
imgs.push_back(item.path().string());
}
}
return imgs;
}
}
}
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import models
import utils
# coding: utf8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os
import sys
import pprint
import argparse
import cv2
from tqdm import tqdm
import imghdr
from utils.config import cfg
def init_global_variable():
"""
初始化全局变量
"""
global png_format_right_num # 格式错误的标签图数量
global png_format_wrong_num # 格式错误的标签图数量
global total_grt_classes # 总的标签类别
global total_num_of_each_class # 每个类别总的像素数
global shape_unequal # 图片和标签shape不一致
global png_format_wrong # 标签格式错误
png_format_right_num = 0
png_format_wrong_num = 0
total_grt_classes = []
total_num_of_each_class = []
shape_unequal = []
png_format_wrong = []
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg check')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
return parser.parse_args()
def cv2_imread(file_path, flag=cv2.IMREAD_COLOR):
# resolve cv2.imread open Chinese file path issues on Windows Platform.
return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
def get_image_max_height_width(img, max_height, max_width):
img_shape = img.shape
height, width = img_shape[0], img_shape[1]
max_height = max(height, max_height)
max_width = max(width, max_width)
return max_height, max_width
def get_image_min_max_aspectratio(img, min_aspectratio, max_aspectratio):
img_shape = img.shape
height, width = img_shape[0], img_shape[1]
min_aspectratio = min(width / height, min_aspectratio)
max_aspectratio = max(width / height, max_aspectratio)
return min_aspectratio, max_aspectratio
def get_image_dim(img, img_dim):
"""获取图像的维度"""
img_shape = img.shape
if img_shape[-1] not in img_dim:
img_dim.append(img_shape[-1])
def sum_gt_check(png_format, grt_classes, num_of_each_class):
"""
统计所有标签图上的格式、类别和每个类别的像素数
params:
png_format: 返回是否是png格式图片
grt_classes: 标签类别
num_of_each_class: 各个类别的像素数目
"""
global png_format_right_num, png_format_wrong_num, total_grt_classes, total_num_of_each_class
if png_format:
png_format_right_num += 1
else:
png_format_wrong_num += 1
if cfg.DATASET.IGNORE_INDEX in grt_classes:
grt_classes2 = np.delete(
grt_classes, np.where(grt_classes == cfg.DATASET.IGNORE_INDEX))
if min(grt_classes2) < 0 or max(grt_classes2) > cfg.DATASET.NUM_CLASSES - 1:
print("fatal error: label class is out of range [0, {}]".format(
cfg.DATASET.NUM_CLASSES - 1))
add_class = []
add_num = []
for i in range(len(grt_classes)):
gi = grt_classes[i]
if gi in total_grt_classes:
j = total_grt_classes.index(gi)
total_num_of_each_class[j] += num_of_each_class[i]
else:
add_class.append(gi)
add_num.append(num_of_each_class[i])
total_num_of_each_class += add_num
total_grt_classes += add_class
def gt_check():
"""
对标签进行校验,输出校验结果
params:
png_format_wrong_num: 格式错误的标签图数量
png_format_right_num: 格式正确的标签图数量
total_grt_classes: 总的标签类别
total_num_of_each_class: 每个类别总的像素数目
return:
total_nc: 按升序排序后的总标签类别和像素数目
"""
if png_format_wrong_num == 0:
print("Not pass label png format check!")
else:
print("Pass label png format check!")
print(
"total {} label imgs are png format, {} label imgs are not png fromat".
format(png_format_right_num, png_format_wrong_num))
total_nc = sorted(zip(total_grt_classes, total_num_of_each_class))
print("total label calsses and their corresponding numbers:\n{} ".format(
total_nc))
if total_nc[0][0]:
print(
"Not pass label class check!\nWarning: label classes should start from 0 !!!"
)
else:
print("Pass label class check!")
def ground_truth_check(grt, grt_path):
"""
验证标签是否重零开始,标签值为0,1,...,num_classes-1, ingnore_idx
验证标签图像的格式
返回标签的像素数
检查图像是否都是ignore_index
params:
grt: 标签图
grt_path: 标签图路径
return:
png_format: 返回是否是png格式图片
label_correct: 返回标签是否是正确的
label_pixel_num: 返回标签的像素数
"""
if imghdr.what(grt_path) == "png":
png_format = True
else:
png_format = False
unique, counts = np.unique(grt, return_counts=True)
return png_format, unique, counts
def eval_crop_size_check(max_height, max_width, min_aspectratio,
max_aspectratio):
"""
判断eval_crop_siz与验证集及测试集的max_height, max_width的关系
param
max_height: 数据集的最大高
max_width: 数据集的最大宽
"""
if cfg.AUG.AUG_METHOD == "stepscaling":
flag = True
if max_width > cfg.EVAL_CROP_SIZE[0]:
print(
"ERROR: The EVAL_CROP_SIZE[0]: {} should larger max width of images {}!"
.format(cfg.EVAL_CROP_SIZE[0], max_width))
flag = False
if max_height > cfg.EVAL_CROP_SIZE[1]:
print(
"ERROR: The EVAL_CROP_SIZE[1]: {} should larger max height of images {}!"
.format(cfg.EVAL_CROP_SIZE[1], max_height))
flag = False
if flag:
print("EVAL_CROP_SIZE setting correct")
elif cfg.AUG.AUG_METHOD == "rangescaling":
if min_aspectratio <= 1 and max_aspectratio >= 1:
if cfg.EVAL_CROP_SIZE[
0] >= cfg.AUG.INF_RESIZE_VALUE and cfg.EVAL_CROP_SIZE[
1] >= cfg.AUG.INF_RESIZE_VALUE:
print("EVAL_CROP_SIZE setting correct")
else:
print(
"ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
.format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
cfg.AUG.INF_RESIZE_VALUE, cfg.AUG.INF_RESIZE_VALUE))
elif min_aspectratio > 1:
max_height_rangscaling = cfg.AUG.INF_RESIZE_VALUE / min_aspectratio
max_height_rangscaling = round(max_height_rangscaling)
if cfg.EVAL_CROP_SIZE[
0] >= cfg.AUG.INF_RESIZE_VALUE and cfg.EVAL_CROP_SIZE[
1] >= max_height_rangscaling:
print("EVAL_CROP_SIZE setting correct")
else:
print(
"ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
.format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
cfg.AUG.INF_RESIZE_VALUE, max_height_rangscaling))
elif max_aspectratio < 1:
max_width_rangscaling = cfg.AUG.INF_RESIZE_VALUE * max_aspectratio
max_width_rangscaling = round(max_width_rangscaling)
if cfg.EVAL_CROP_SIZE[
0] >= max_width_rangscaling and cfg.EVAL_CROP_SIZE[
1] >= cfg.AUG.INF_RESIZE_VALUE:
print("EVAL_CROP_SIZE setting correct")
else:
print(
"ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
.format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
max_width_rangscaling, cfg.AUG.INF_RESIZE_VALUE))
elif cfg.AUG.AUG_METHOD == "unpadding":
if cfg.EVAL_CROP_SIZE[0] >= cfg.AUG.FIX_RESIZE_SIZE[
0] and cfg.EVAL_CROP_SIZE[1] >= cfg.AUG.FIX_RESIZE_SIZE[1]:
print("EVAL_CROP_SIZE setting correct")
else:
print(
"ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
.format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
cfg.AUG.FIX_RESIZE_SIZE[0], cfg.AUG.FIX_RESIZE_SIZE[1]))
else:
print(
"ERROR: cfg.AUG.AUG_METHOD setting wrong, it should be one of [unpadding, stepscaling, rangescaling]"
)
def inf_resize_value_check():
if cfg.AUG.AUG_METHOD == "rangescaling":
if cfg.AUG.INF_RESIZE_VALUE < cfg.AUG.MIN_RESIZE_VALUE or \
cfg.AUG.INF_RESIZE_VALUE > cfg.AUG.MIN_RESIZE_VALUE:
print(
"ERROR: you set AUG.AUG_METHOD = 'rangescaling'"
"AUG.INF_RESIZE_VALUE: {} not in [AUG.MIN_RESIZE_VALUE, AUG.MAX_RESIZE_VALUE]: "
"[{}, {}].".format(cfg.AUG.INF_RESIZE_VALUE,
cfg.AUG.MIN_RESIZE_VALUE,
cfg.AUG.MAX_RESIZE_VALUE))
def image_type_check(img_dim):
"""
验证图像的格式与DATASET.IMAGE_TYPE是否一致
param
img_dim: 图像包含的通道数
return
"""
if (1 in img_dim or 3 in img_dim) and cfg.DATASET.IMAGE_TYPE == 'rgba':
print(
"ERROR: DATASET.IMAGE_TYPE is {} but the type of image has gray or rgb\n"
.format(cfg.DATASET.IMAGE_TYPE))
# elif (1 not in img_dim and 3 not in img_dim and 4 in img_dim) and cfg.DATASET.IMAGE_TYPE == 'rgb':
# print("ERROR: DATASET.IMAGE_TYPE is {} but the type of image is rgba\n".format(cfg.DATASET.IMAGE_TYPE))
else:
print("DATASET.IMAGE_TYPE setting correct")
def image_label_shape_check(img, grt):
"""
验证图像和标签的大小是否匹配
"""
flag = True
img_height = img.shape[0]
img_width = img.shape[1]
grt_height = grt.shape[0]
grt_width = grt.shape[1]
if img_height != grt_height or img_width != grt_width:
flag = False
return flag
def check_train_dataset():
train_list = cfg.DATASET.TRAIN_FILE_LIST
print("\ncheck train dataset...")
with open(train_list, 'r') as fid:
img_dim = []
lines = fid.readlines()
for line in tqdm(lines):
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
print(
line, "File list format incorrect! It should be"
" image_name{}label_name\\n ".format(cfg.DATASET.SEPARATOR))
continue
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
get_image_dim(img, img_dim)
is_equal_img_grt_shape = image_label_shape_check(img, grt)
if not is_equal_img_grt_shape:
print(line,
"ERROR: source img and label img must has the same size")
png_format, grt_classes, num_of_each_class = ground_truth_check(
grt, grt_path)
sum_gt_check(png_format, grt_classes, num_of_each_class)
gt_check()
image_type_check(img_dim)
def check_val_dataset():
val_list = cfg.DATASET.VAL_FILE_LIST
with open(val_list) as fid:
max_height = 0
max_width = 0
min_aspectratio = sys.float_info.max
max_aspectratio = 0.0
img_dim = []
print("check val dataset...")
lines = fid.readlines()
for line in tqdm(lines):
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
print(
line, "File list format incorrect! It should be"
" image_name{}label_name\\n ".format(cfg.DATASET.SEPARATOR))
continue
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
max_height, max_width = get_image_max_height_width(
img, max_height, max_width)
min_aspectratio, max_aspectratio = get_image_min_max_aspectratio(
img, min_aspectratio, max_aspectratio)
get_image_dim(img, img_dim)
is_equal_img_grt_shape = image_label_shape_check(img, grt)
if not is_equal_img_grt_shape:
print(line,
"ERROR: source img and label img must has the same size")
png_format, grt_classes, num_of_each_class = ground_truth_check(
grt, grt_path)
sum_gt_check(png_format, grt_classes, num_of_each_class)
gt_check()
eval_crop_size_check(max_height, max_width, min_aspectratio,
max_aspectratio)
image_type_check(img_dim)
def check_test_dataset():
test_list = cfg.DATASET.TEST_FILE_LIST
with open(test_list) as fid:
max_height = 0
max_width = 0
min_aspectratio = sys.float_info.max
max_aspectratio = 0.0
img_dim = []
print("check test dataset...")
lines = fid.readlines()
for line in tqdm(lines):
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) == 1:
img_name = parts
img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
elif len(parts) == 2:
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
is_equal_img_grt_shape = image_label_shape_check(img, grt)
if not is_equal_img_grt_shape:
print(
line,
"ERROR: source img and label img must has the same size"
)
png_format, grt_classes, num_of_each_class = ground_truth_check(
grt, grt_path)
sum_gt_check(png_format, grt_classes, num_of_each_class)
else:
print(
line, "File list format incorrect! It should be"
" image_name{}label_name\\n or image_name\n ".format(
cfg.DATASET.SEPARATOR))
continue
max_height, max_width = get_image_max_height_width(
img, max_height, max_width)
min_aspectratio, max_aspectratio = get_image_min_max_aspectratio(
img, min_aspectratio, max_aspectratio)
get_image_dim(img, img_dim)
gt_check()
eval_crop_size_check(max_height, max_width, min_aspectratio,
max_aspectratio)
image_type_check(img_dim)
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
cfg.check_and_infer(reset_dataset=True)
print(pprint.pformat(cfg))
init_global_variable()
check_train_dataset()
init_global_variable()
check_val_dataset()
init_global_variable()
check_test_dataset()
inf_resize_value_check()
if __name__ == "__main__":
args = parse_args()
args.cfg_file = "../configs/cityscape.yaml"
main(args)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import cv2
import numpy as np
from utils.config import cfg
from models.model_builder import ModelPhase
def resize(img, grt=None, mode=ModelPhase.TRAIN):
"""
改变图像及标签图像尺寸
AUG.AUG_METHOD为unpadding,所有模式均直接resize到AUG.FIX_RESIZE_SIZE的尺寸
AUG.AUG_METHOD为stepscaling, 按比例resize,训练时比例范围AUG.MIN_SCALE_FACTOR到AUG.MAX_SCALE_FACTOR,间隔为AUG.SCALE_STEP_SIZE,其他模式返回原图
AUG.AUG_METHOD为rangescaling,长边对齐,短边按比例变化,训练时长边对齐范围AUG.MIN_RESIZE_VALUE到AUG.MAX_RESIZE_VALUE,其他模式长边对齐AUG.INF_RESIZE_VALUE
Args:
img(numpy.ndarray): 输入图像
grt(numpy.ndarray): 标签图像,默认为None
mode(string): 模式, 默认训练模式,即ModelPhase.TRAIN
Returns:
resize后的图像和标签图
"""
if cfg.AUG.AUG_METHOD == 'unpadding':
target_size = cfg.AUG.FIX_RESIZE_SIZE
img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)
if grt is not None:
grt = cv2.resize(grt, target_size, interpolation=cv2.INTER_NEAREST)
elif cfg.AUG.AUG_METHOD == 'stepscaling':
if mode == ModelPhase.TRAIN:
min_scale_factor = cfg.AUG.MIN_SCALE_FACTOR
max_scale_factor = cfg.AUG.MAX_SCALE_FACTOR
step_size = cfg.AUG.SCALE_STEP_SIZE
scale_factor = get_random_scale(min_scale_factor, max_scale_factor,
step_size)
img, grt = randomly_scale_image_and_label(
img, grt, scale=scale_factor)
elif cfg.AUG.AUG_METHOD == 'rangescaling':
min_resize_value = cfg.AUG.MIN_RESIZE_VALUE
max_resize_value = cfg.AUG.MAX_RESIZE_VALUE
if mode == ModelPhase.TRAIN:
if min_resize_value == max_resize_value:
random_size = min_resize_value
else:
random_size = int(
np.random.uniform(min_resize_value, max_resize_value) + 0.5)
else:
random_size = cfg.AUG.INF_RESIZE_VALUE
value = max(img.shape[0], img.shape[1])
scale = float(random_size) / float(value)
img = cv2.resize(
img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
if grt is not None:
grt = cv2.resize(
grt, (0, 0),
fx=scale,
fy=scale,
interpolation=cv2.INTER_NEAREST)
else:
raise Exception("Unexpect data augmention method: {}".format(
cfg.AUG.AUG_METHOD))
return img, grt
def get_random_scale(min_scale_factor, max_scale_factor, step_size):
"""
在一定范围内得到随机值,范围为min_scale_factor到max_scale_factor,间隔为step_size
Args:
min_scale_factor(float): 随机尺度下限,大于0
max_scale_factor(float): 随机尺度上限,不小于下限值
step_size(float): 尺度间隔,非负, 等于为0时直接返回min_scale_factor到max_scale_factor范围内任一值
Returns:
随机尺度值
"""
if min_scale_factor < 0 or min_scale_factor > max_scale_factor:
raise ValueError('Unexpected value of min_scale_factor.')
if min_scale_factor == max_scale_factor:
return min_scale_factor
if step_size == 0:
return np.random.uniform(min_scale_factor, max_scale_factor)
num_steps = int((max_scale_factor - min_scale_factor) / step_size + 1)
scale_factors = np.linspace(min_scale_factor, max_scale_factor,
num_steps).tolist()
np.random.shuffle(scale_factors)
return scale_factors[0]
def randomly_scale_image_and_label(image, label=None, scale=1.0):
"""
按比例resize图像和标签图, 如果scale为1,返回原图
Args:
image(numpy.ndarray): 输入图像
label(numpy.ndarray): 标签图,默认None
sclae(float): 图片resize的比例,非负,默认1.0
Returns:
resize后的图像和标签图
"""
if scale == 1.0:
return image, label
height = image.shape[0]
width = image.shape[1]
new_height = int(height * scale + 0.5)
new_width = int(width * scale + 0.5)
new_image = cv2.resize(
image, (new_width, new_height), interpolation=cv2.INTER_LINEAR)
if label is not None:
height = label.shape[0]
width = label.shape[1]
new_height = int(height * scale + 0.5)
new_width = int(width * scale + 0.5)
new_label = cv2.resize(
label, (new_width, new_height), interpolation=cv2.INTER_NEAREST)
return new_image, new_label
def random_rotation(crop_img, crop_seg, rich_crop_max_rotation, mean_value):
"""
随机旋转图像和标签图
Args:
crop_img(numpy.ndarray): 输入图像
crop_seg(numpy.ndarray): 标签图
rich_crop_max_rotation(int):旋转最大角度,0-90
mean_value(list):均值, 对图片旋转产生的多余区域使用均值填充
Returns:
旋转后的图像和标签图
"""
ignore_index = cfg.DATASET.IGNORE_INDEX
if rich_crop_max_rotation > 0:
(h, w) = crop_img.shape[:2]
do_rotation = np.random.uniform(-rich_crop_max_rotation,
rich_crop_max_rotation)
pc = (w // 2, h // 2)
r = cv2.getRotationMatrix2D(pc, do_rotation, 1.0)
cos = np.abs(r[0, 0])
sin = np.abs(r[0, 1])
nw = int((h * sin) + (w * cos))
nh = int((h * cos) + (w * sin))
(cx, cy) = pc
r[0, 2] += (nw / 2) - cx
r[1, 2] += (nh / 2) - cy
dsize = (nw, nh)
crop_img = cv2.warpAffine(
crop_img,
r,
dsize=dsize,
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_CONSTANT,
borderValue=mean_value)
crop_seg = cv2.warpAffine(
crop_seg,
r,
dsize=dsize,
flags=cv2.INTER_NEAREST,
borderMode=cv2.BORDER_CONSTANT,
borderValue=(ignore_index, ignore_index, ignore_index))
return crop_img, crop_seg
def rand_scale_aspect(crop_img,
crop_seg,
rich_crop_min_scale=0,
rich_crop_aspect_ratio=0):
"""
从输入图像和标签图像中裁取随机宽高比的图像,并reszie回原始尺寸
Args:
crop_img(numpy.ndarray): 输入图像
crop_seg(numpy.ndarray): 标签图像
rich_crop_min_scale(float):裁取图像占原始图像的面积比,0-1,默认0返回原图
rich_crop_aspect_ratio(float): 裁取图像的宽高比范围,非负,默认0返回原图
Returns:
裁剪并resize回原始尺寸的图像和标签图像
"""
if rich_crop_min_scale == 0 or rich_crop_aspect_ratio == 0:
return crop_img, crop_seg
else:
img_height = crop_img.shape[0]
img_width = crop_img.shape[1]
for i in range(0, 10):
area = img_height * img_width
target_area = area * np.random.uniform(rich_crop_min_scale, 1.0)
aspectRatio = np.random.uniform(rich_crop_aspect_ratio,
1.0 / rich_crop_aspect_ratio)
dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
if (np.random.randint(10) < 5):
tmp = dw
dw = dh
dh = tmp
if (dh < img_height and dw < img_width):
h1 = np.random.randint(0, img_height - dh)
w1 = np.random.randint(0, img_width - dw)
crop_img = crop_img[h1:(h1 + dh), w1:(w1 + dw), :]
crop_seg = crop_seg[h1:(h1 + dh), w1:(w1 + dw)]
crop_img = cv2.resize(
crop_img, (img_width, img_height),
interpolation=cv2.INTER_LINEAR)
crop_seg = cv2.resize(
crop_seg, (img_width, img_height),
interpolation=cv2.INTER_NEAREST)
break
return crop_img, crop_seg
def saturation_jitter(cv_img, jitter_range):
"""
调节图像饱和度
Args:
cv_img(numpy.ndarray): 输入图像
jitter_range(float): 调节程度,0-1
Returns:
饱和度调整后的图像
"""
greyMat = cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY)
greyMat = greyMat[:, :, None] * np.ones(3, dtype=int)[None, None, :]
cv_img = cv_img.astype(np.float32)
cv_img = cv_img * (1 - jitter_range) + jitter_range * greyMat
cv_img = np.where(cv_img > 255, 255, cv_img)
cv_img = cv_img.astype(np.uint8)
return cv_img
def brightness_jitter(cv_img, jitter_range):
"""
调节图像亮度
Args:
cv_img(numpy.ndarray): 输入图像
jitter_range(float): 调节程度,0-1
Returns:
亮度调整后的图像
"""
cv_img = cv_img.astype(np.float32)
cv_img = cv_img * (1.0 - jitter_range)
cv_img = np.where(cv_img > 255, 255, cv_img)
cv_img = cv_img.astype(np.uint8)
return cv_img
def contrast_jitter(cv_img, jitter_range):
"""
调节图像对比度
Args:
cv_img(numpy.ndarray): 输入图像
jitter_range(float): 调节程度,0-1
Returns:
对比度调整后的图像
"""
greyMat = cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY)
mean = np.mean(greyMat)
cv_img = cv_img.astype(np.float32)
cv_img = cv_img * (1 - jitter_range) + jitter_range * mean
cv_img = np.where(cv_img > 255, 255, cv_img)
cv_img = cv_img.astype(np.uint8)
return cv_img
def random_jitter(cv_img, saturation_range, brightness_range, contrast_range):
"""
图像亮度、饱和度、对比度调节,在调整范围内随机获得调节比例,并随机顺序叠加三种效果
Args:
cv_img(numpy.ndarray): 输入图像
saturation_range(float): 饱和对调节范围,0-1
brightness_range(float): 亮度调节范围,0-1
contrast_range(float): 对比度调节范围,0-1
Returns:
亮度、饱和度、对比度调整后图像
"""
saturation_ratio = np.random.uniform(-saturation_range, saturation_range)
brightness_ratio = np.random.uniform(-brightness_range, brightness_range)
contrast_ratio = np.random.uniform(-contrast_range, contrast_range)
order = [1, 2, 3]
np.random.shuffle(order)
for i in range(3):
if order[i] == 0:
cv_img = saturation_jitter(cv_img, saturation_ratio)
if order[i] == 1:
cv_img = brightness_jitter(cv_img, brightness_ratio)
if order[i] == 2:
cv_img = contrast_jitter(cv_img, contrast_ratio)
return cv_img
def hsv_color_jitter(crop_img,
brightness_jitter_ratio=0,
saturation_jitter_ratio=0,
contrast_jitter_ratio=0):
"""
图像亮度、饱和度、对比度调节
Args:
crop_img(numpy.ndarray): 输入图像
brightness_jitter_ratio(float): 亮度调节度最大值,1-0,默认0
saturation_jitter_ratio(float): 饱和度调节度最大值,1-0,默认0
contrast_jitter_ratio(float): 对比度调节度最大值,1-0,默认0
Returns:
亮度、饱和度、对比度调节后图像
"""
if brightness_jitter_ratio > 0 or \
saturation_jitter_ratio > 0 or \
contrast_jitter_ratio > 0:
random_jitter(crop_img, saturation_jitter_ratio,
brightness_jitter_ratio, contrast_jitter_ratio)
return crop_img
def rand_crop(crop_img, crop_seg, mode=ModelPhase.TRAIN):
"""
随机裁剪图片和标签图, 若crop尺寸大于原始尺寸,分别使用均值和ignore值填充再进行crop,
crop尺寸与原始尺寸一致,返回原图,crop尺寸小于原始尺寸直接crop
Args:
crop_img(numpy.ndarray): 输入图像
crop_seg(numpy.ndarray): 标签图
mode(string): 模式, 默认训练模式,验证或预测模式时crop尺寸需大于原始图片尺寸, 其他模式无限制
Returns:
裁剪后的图片和标签图
"""
img_height = crop_img.shape[0]
img_width = crop_img.shape[1]
if ModelPhase.is_train(mode):
crop_width = cfg.TRAIN_CROP_SIZE[0]
crop_height = cfg.TRAIN_CROP_SIZE[1]
else:
crop_width = cfg.EVAL_CROP_SIZE[0]
crop_height = cfg.EVAL_CROP_SIZE[1]
if ModelPhase.is_eval(mode) or ModelPhase.is_predict(mode):
if (crop_height < img_height or crop_width < img_width):
raise Exception(
"Crop size({},{}) must large than img size({},{}) when in EvalPhase."
.format(crop_width, crop_height, img_width, img_height))
if img_height == crop_height and img_width == crop_width:
return crop_img, crop_seg
else:
pad_height = max(crop_height - img_height, 0)
pad_width = max(crop_width - img_width, 0)
if (pad_height > 0 or pad_width > 0):
crop_img = cv2.copyMakeBorder(
crop_img,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=cfg.MEAN)
if crop_seg is not None:
crop_seg = cv2.copyMakeBorder(
crop_seg,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=cfg.DATASET.IGNORE_INDEX)
img_height = crop_img.shape[0]
img_width = crop_img.shape[1]
if crop_height > 0 and crop_width > 0:
h_off = np.random.randint(img_height - crop_height + 1)
w_off = np.random.randint(img_width - crop_width + 1)
crop_img = crop_img[h_off:(crop_height + h_off), w_off:(
w_off + crop_width), :]
if crop_seg is not None:
crop_seg = crop_seg[h_off:(crop_height + h_off), w_off:(
w_off + crop_width)]
return crop_img, crop_seg
"""
This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
"""
import time
import numpy as np
import threading
import multiprocessing
try:
import queue
except ImportError:
import Queue as queue
class GeneratorEnqueuer(object):
"""
Multiple generators
Args:
generators:
wait_time (float): time to sleep in-between calls to `put()`.
"""
def __init__(self, generators, wait_time=0.05):
self.wait_time = wait_time
self._generators = generators
self._threads = []
self._stop_events = []
self.queue = None
self._manager = None
self.workers = 1
def start(self, workers=1, max_queue_size=16):
"""
Start worker threads which add data from the generator into the queue.
Args:
workers (int): number of worker threads
max_queue_size (int): queue size
(when full, threads could block on `put()`)
"""
self.workers = workers
def data_generator_task(pid):
"""
Data generator task.
"""
def task(pid):
if (self.queue is not None
and self.queue.qsize() < max_queue_size):
generator_output = next(self._generators[pid])
self.queue.put((generator_output))
else:
time.sleep(self.wait_time)
while not self._stop_events[pid].is_set():
try:
task(pid)
except Exception:
self._stop_events[pid].set()
break
try:
self._manager = multiprocessing.Manager()
self.queue = self._manager.Queue(maxsize=max_queue_size)
for pid in range(self.workers):
self._stop_events.append(multiprocessing.Event())
thread = multiprocessing.Process(
target=data_generator_task, args=(pid, ))
thread.daemon = True
self._threads.append(thread)
thread.start()
except:
self.stop()
raise
def is_running(self):
"""
Returns:
bool: Whether the worker theads are running.
"""
# If queue is not empty then still in runing state wait for consumer
if not self.queue.empty():
return True
for pid in range(self.workers):
if not self._stop_events[pid].is_set():
return True
return False
def stop(self, timeout=None):
"""
Stops running threads and wait for them to exit, if necessary.
Should be called by the same thread which called `start()`.
Args:
timeout(int|None): maximum time to wait on `thread.join()`.
"""
if self.is_running():
for pid in range(self.workers):
self._stop_events[pid].set()
for thread in self._threads:
if thread.is_alive():
thread.join(timeout)
if self._manager:
self._manager.shutdown()
self._threads = []
self._stop_events = []
self.queue = None
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
import time
import argparse
import functools
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from models.model_builder import build_model
from models.model_builder import ModelPhase
from reader import SegDataset
from metrics import ConfusionMatrix
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess IO or not',
action='store_true',
default=False)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
np.set_printoptions(precision=5, suppress=True)
startup_prog = fluid.Program()
test_prog = fluid.Program()
dataset = SegDataset(
file_list=cfg.DATASET.VAL_FILE_LIST,
mode=ModelPhase.EVAL,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
#TODO: check is batch reader compatitable with Windows
if use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
for b in data_gen:
yield b[0], b[1], b[2]
py_reader, avg_loss, pred, grts, masks = build_model(
test_prog, startup_prog, phase=ModelPhase.EVAL)
py_reader.decorate_sample_generator(
data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
# Get device environment
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
place = places[0]
dev_count = len(places)
print("Device count = {}".format(dev_count))
exe = fluid.Executor(place)
exe.run(startup_prog)
test_prog = test_prog.clone(for_test=True)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
if ckpt_dir is not None:
print('load test model:', ckpt_dir)
fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
# Use streaming confusion matrix to calculate mean_iou
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
num_images = 0
step = 0
all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
timer = Timer()
timer.start()
py_reader.start()
while True:
try:
step += 1
loss, pred, grts, masks = exe.run(
test_prog, fetch_list=fetch_list, return_numpy=True)
loss = np.mean(np.array(loss))
num_images += pred.shape[0]
conf_mat.calculate(pred, grts, masks)
_, iou = conf_mat.mean_iou()
_, acc = conf_mat.accuracy()
speed = 1.0 / timer.elapsed_time()
print(
"[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
.format(step, loss, acc, iou, speed,
calculate_eta(all_step - step, speed)))
timer.restart()
sys.stdout.flush()
except fluid.core.EOFException:
break
category_iou, avg_iou = conf_mat.mean_iou()
category_acc, avg_acc = conf_mat.accuracy()
print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
num_images, avg_acc, avg_iou))
print("[EVAL]Category IoU:", category_iou)
print("[EVAL]Category Acc:", category_acc)
print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
return category_iou, avg_iou, category_acc, avg_acc
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts is not None:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
evaluate(cfg, **args.__dict__)
if __name__ == '__main__':
main()
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import time
import pprint
import cv2
import argparse
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from models.model_builder import build_model
from models.model_builder import ModelPhase
def parse_args():
parser = argparse.ArgumentParser(
description='PaddleSeg Inference Model Exporter')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def export_inference_model(args):
"""
Export PaddlePaddle inference model for prediction depolyment and serving.
"""
print("Exporting inference model...")
startup_prog = fluid.Program()
infer_prog = fluid.Program()
image, logit_out = build_model(
infer_prog, startup_prog, phase=ModelPhase.PREDICT)
# Use CPU for exporting inference model instead of GPU
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_prog)
infer_prog = infer_prog.clone(for_test=True)
if os.path.exists(cfg.TEST.TEST_MODEL):
fluid.io.load_params(exe, cfg.TEST.TEST_MODEL, main_program=infer_prog)
else:
print("TEST.TEST_MODEL diretory is empty!")
exit(-1)
fluid.io.save_inference_model(
cfg.FREEZE.SAVE_DIR,
feeded_var_names=[image.name],
target_vars=[logit_out],
executor=exe,
main_program=infer_prog,
model_filename=cfg.FREEZE.MODEL_FILENAME,
params_filename=cfg.FREEZE.PARAMS_FILENAME)
print("Inference model exported!")
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts is not None:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
export_inference_model(args)
if __name__ == '__main__':
main()
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import paddle.fluid as fluid
import numpy as np
import importlib
from utils.config import cfg
def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
loss, probs = fluid.layers.softmax_with_cross_entropy(
logit,
label,
ignore_index=cfg.DATASET.IGNORE_INDEX,
return_softmax=True)
loss = loss * ignore_mask
if cfg.MODEL.FP16:
loss = fluid.layers.cast(loss, 'float32')
avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
avg_loss = fluid.layers.cast(avg_loss, 'float16')
else:
avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
if cfg.MODEL.SCALE_LOSS > 1.0:
avg_loss = avg_loss * cfg.MODEL.SCALE_LOSS
label.stop_gradient = True
ignore_mask.stop_gradient = True
return avg_loss
def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2):
if isinstance(logits, tuple):
avg_loss = 0
for i, logit in enumerate(logits):
logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (logit_label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = softmax_with_loss(logit, logit_label, logit_mask,
num_classes)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes)
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, num_classes=2):
if num_classes != 2:
raise Exception("dice loss is only applicable to binary classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
logit = fluid.layers.softmax(logit)
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
loss = fluid.layers.dice_loss(logit, label)
return loss
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
from scipy.sparse import csr_matrix
class ConfusionMatrix(object):
"""
Confusion Matrix for segmentation evaluation
"""
def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes],
dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
def calculate(self, pred, label, ignore=None):
# If not in streaming mode, clear matrix everytime when call `calculate`
if not self.streaming:
self.zero_matrix()
label = np.transpose(label, (0, 2, 3, 1))
ignore = np.transpose(ignore, (0, 2, 3, 1))
mask = np.array(ignore) == 1
label = np.asarray(label)[mask]
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)),
shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
dtype='int64')
def mean_iou(self):
iou_list = []
avg_iou = 0
# TODO: use numpy sum axis api to simpliy
vji = np.zeros(self.num_classes, dtype=int)
vij = np.zeros(self.num_classes, dtype=int)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
for c in range(self.num_classes):
total = vji[c] + vij[c] - self.confusion_matrix[c][c]
if total == 0:
iou = 0
else:
iou = float(self.confusion_matrix[c][c]) / total
avg_iou += iou
iou_list.append(iou)
avg_iou = float(avg_iou) / float(self.num_classes)
return np.array(iou_list), avg_iou
def accuracy(self):
total = self.confusion_matrix.sum()
total_right = 0
for c in range(self.num_classes):
total_right += self.confusion_matrix[c][c]
if total == 0:
avg_acc = 0
else:
avg_acc = float(total_right) / total
vij = np.zeros(self.num_classes, dtype=int)
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
acc_list = []
for c in range(self.num_classes):
if vij[c] == 0:
acc = 0
else:
acc = self.confusion_matrix[c][c] / float(vij[c])
acc_list.append(acc)
return np.array(acc_list), avg_acc
def kappa(self):
vji = np.zeros(self.num_classes)
vij = np.zeros(self.num_classes)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
total = self.confusion_matrix.sum()
# avoid spillovers
# TODO: is it reasonable to hard code 10000.0?
total = float(total) / 10000.0
vji = vji / 10000.0
vij = vij / 10000.0
tp = 0
tc = 0
for c in range(self.num_classes):
tp += vji[c] * vij[c]
tc += self.confusion_matrix[c][c]
tc = tc / 10000.0
pe = tp / (total * total)
po = tc / total
kappa = (po - pe) / (1 - pe)
return kappa
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import models.modeling
import models.libs
import models.backbone
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
from utils.config import cfg
__all__ = [
'MobileNetV2', 'MobileNetV2_x0_25', 'MobileNetV2_x0_5', 'MobileNetV2_x1_0',
'MobileNetV2_x1_5', 'MobileNetV2_x2_0', 'MobileNetV2_scale'
]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class MobileNetV2():
def __init__(self, scale=1.0, change_depth=False, output_stride=None):
self.params = train_parameters
self.scale = scale
self.change_depth = change_depth
self.bottleneck_params_list = [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
] if change_depth == False else [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 5, 2),
(6, 64, 7, 2),
(6, 96, 5, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
]
self.modify_bottle_params(output_stride)
def modify_bottle_params(self, output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s = layer_setting
stride = stride * s
if stride > output_stride:
s = 1
self.bottleneck_params_list[i] = (t, c, n, s)
def net(self, input, class_dim=1000, end_points=None, decode_points=None):
scale = self.scale
change_depth = self.change_depth
#if change_depth is True, the new depth is 1.4 times as deep as before.
bottleneck_params_list = self.bottleneck_params_list
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
#conv1
input = self.conv_bn_layer(
input,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
if_act=True,
name='conv1_1')
layer_count = 1
#print("node test:", layer_count, input.shape)
if check_points(layer_count, decode_points):
decode_ends[layer_count] = input
if check_points(layer_count, end_points):
return input, decode_ends
# bottleneck sequences
i = 1
in_c = int(32 * scale)
for layer_setting in bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
input, depthwise_output = self.invresi_blocks(
input=input,
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name='conv' + str(i))
in_c = int(c * scale)
layer_count += n
#print("node test:", layer_count, input.shape)
if check_points(layer_count, decode_points):
decode_ends[layer_count] = depthwise_output
if check_points(layer_count, end_points):
return input, decode_ends
#last_conv
input = self.conv_bn_layer(
input=input,
num_filters=int(1280 * scale) if scale > 1.0 else 1280,
filter_size=1,
stride=1,
padding=0,
if_act=True,
name='conv9')
input = fluid.layers.pool2d(
input=input,
pool_size=7,
pool_stride=1,
pool_type='avg',
global_pooling=True)
output = fluid.layers.fc(
input=input,
size=class_dim,
param_attr=ParamAttr(name='fc10_weights'),
bias_attr=ParamAttr(name='fc10_offset'))
return output
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def shortcut(self, input, data_residual):
return fluid.layers.elementwise_add(input, data_residual)
def inverted_residual_unit(self,
input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
padding,
expansion_factor,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = self.conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name=name + '_expand')
bottleneck_conv = self.conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
if_act=True,
name=name + '_dwise',
use_cudnn=True if cfg.MODEL.FP16 else False)
depthwise_output = bottleneck_conv
linear_out = self.conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=False,
name=name + '_linear')
if ifshortcut:
out = self.shortcut(input=input, data_residual=linear_out)
return out, depthwise_output
else:
return linear_out, depthwise_output
def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
first_block, depthwise_output = self.inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = self.inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
def MobileNetV2_x0_25():
model = MobileNetV2(scale=0.25)
return model
def MobileNetV2_x0_5():
model = MobileNetV2(scale=0.5)
return model
def MobileNetV2_x1_0():
model = MobileNetV2(scale=1.0)
return model
def MobileNetV2_x1_5():
model = MobileNetV2(scale=1.5)
return model
def MobileNetV2_x2_0():
model = MobileNetV2(scale=2.0)
return model
def MobileNetV2_scale():
model = MobileNetV2(scale=1.2, change_depth=True)
return model
if __name__ == '__main__':
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
model = MobileNetV2_x1_0()
logit, decode_ends = model.net(image)
#print("logit:", logit.shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
__all__ = [
"ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class ResNet():
def __init__(self, layers=50, scale=1.0, stem=None):
self.params = train_parameters
self.layers = layers
self.scale = scale
self.stem = stem
def net(self,
input,
class_dim=1000,
end_points=None,
decode_points=None,
resize_points=None,
dilation_dict=None):
layers = self.layers
supported_layers = [18, 34, 50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
def get_dilated_rate(dilation_dict, idx):
if dilation_dict is None or idx not in dilation_dict:
return 1
else:
return dilation_dict[idx]
if layers == 18:
depth = [2, 2, 2, 2]
elif layers == 34 or layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
if self.stem == 'icnet':
conv = self.conv_bn_layer(
input=input,
num_filters=int(64 * self.scale),
filter_size=3,
stride=2,
act='relu',
name="conv1_1")
conv = self.conv_bn_layer(
input=conv,
num_filters=int(64 * self.scale),
filter_size=3,
stride=1,
act='relu',
name="conv1_2")
conv = self.conv_bn_layer(
input=conv,
num_filters=int(128 * self.scale),
filter_size=3,
stride=1,
act='relu',
name="conv1_3")
else:
conv = self.conv_bn_layer(
input=input,
num_filters=int(64 * self.scale),
filter_size=7,
stride=2,
act='relu',
name="conv1")
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
layer_count = 1
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
if layers >= 50:
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "conv" + str(block + 2) + '_' + str(1 + i)
dilation_rate = get_dilated_rate(dilation_dict, block)
conv = self.bottleneck_block(
input=conv,
num_filters=int(num_filters[block] * self.scale),
stride=2
if i == 0 and block != 0 and dilation_rate == 1 else 1,
name=conv_name,
dilation=dilation_rate)
layer_count += 3
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
if check_points(layer_count, resize_points):
conv = self.interp(
conv,
np.ceil(
np.array(conv.shape[2:]).astype('int32') / 2))
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
else:
for block in range(len(depth)):
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.basic_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
is_first=block == i == 0,
name=conv_name)
layer_count += 2
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
def zero_padding(self, input, padding):
return fluid.layers.pad(
input, [0, 0, 0, 0, padding, padding, padding, padding])
def interp(self, input, out_shape):
out_shape = list(out_shape.astype("int32"))
return fluid.layers.resize_bilinear(input, out_shape=out_shape)
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
dilation=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
act=None,
param_attr=ParamAttr(name=name + "/weights"),
bias_attr=False,
name=name + '.conv2d.output.1')
bn_name = name + '/BatchNorm/'
return fluid.layers.batch_norm(
input=conv,
act=act,
name=bn_name + '.output.1',
param_attr=ParamAttr(name=bn_name + 'gamma'),
bias_attr=ParamAttr(bn_name + 'beta'),
moving_mean_name=bn_name + 'moving_mean',
moving_variance_name=bn_name + 'moving_variance',
)
def shortcut(self, input, ch_out, stride, is_first, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1 or is_first == True:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name, dilation=1):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
dilation=1,
stride=stride,
act='relu',
name=name + "_branch2a")
if dilation > 1:
conv0 = self.zero_padding(conv0, dilation)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
dilation=dilation,
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
dilation=1,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input,
num_filters * 4,
stride,
is_first=False,
name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short, y=conv2, act='relu', name=name + ".add.output.5")
def basic_block(self, input, num_filters, stride, is_first, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self.shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def ResNet18():
model = ResNet(layers=18)
return model
def ResNet34():
model = ResNet(layers=34)
return model
def ResNet50():
model = ResNet(layers=50)
return model
def ResNet101():
model = ResNet(layers=101)
return model
def ResNet152():
model = ResNet(layers=152)
return model
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import math
import paddle.fluid as fluid
from models.libs.model_libs import scope, name_scope
from models.libs.model_libs import bn, bn_relu, relu
from models.libs.model_libs import conv
from models.libs.model_libs import separate_conv
__all__ = ['xception_65', 'xception_41', 'xception_71']
def check_data(data, number):
if type(data) == int:
return [data] * number
assert len(data) == number
return data
def check_stride(s, os):
if s <= os:
return True
else:
return False
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
class Xception():
def __init__(self, backbone="xception_65"):
self.bottleneck_params = self.gen_bottleneck_params(backbone)
self.backbone = backbone
def gen_bottleneck_params(self, backbone='xception_65'):
if backbone == 'xception_65':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_41':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (8, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_71':
bottleneck_params = {
"entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
else:
raise Exception(
"xception backbont only support xception_41/xception_65/xception_71"
)
return bottleneck_params
def net(self,
input,
output_stride=32,
num_classes=1000,
end_points=None,
decode_points=None):
self.stride = 2
self.block_point = 0
self.output_stride = output_stride
self.decode_points = decode_points
self.short_cuts = dict()
with scope(self.backbone):
# Entry flow
data = self.entry_flow(input)
if check_points(self.block_point, end_points):
return data, self.short_cuts
# Middle flow
data = self.middle_flow(data)
if check_points(self.block_point, end_points):
return data, self.short_cuts
# Exit flow
data = self.exit_flow(data)
if check_points(self.block_point, end_points):
return data, self.short_cuts
data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
data = fluid.layers.dropout(data, 0.5)
stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
with scope("logit"):
out = fluid.layers.fc(
input=data,
size=num_classes,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
name='weights',
initializer=fluid.initializer.Uniform(-stdv, stdv)),
bias_attr=fluid.param_attr.ParamAttr(name='bias'))
return out
def entry_flow(self, data):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
with scope("entry_flow"):
with scope("conv1"):
data = bn_relu(
conv(
data, 32, 3, stride=2, padding=1,
param_attr=param_attr))
with scope("conv2"):
data = bn_relu(
conv(
data, 64, 3, stride=1, padding=1,
param_attr=param_attr))
# get entry flow params
block_num = self.bottleneck_params["entry_flow"][0]
strides = self.bottleneck_params["entry_flow"][1]
chns = self.bottleneck_params["entry_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
#print("entry:", block_num, strides, chns)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
#print("entry:", s, block_point, output_stride)
with scope("entry_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
#print("decode shortcut:", block_point)
self.short_cuts[block_point] = short_cuts[1]
#print("entry:", i, data.shape)
self.stride = s
self.block_point = block_point
#print("entry:", s, block_point, output_stride)
return data
def middle_flow(self, data):
block_num = self.bottleneck_params["middle_flow"][0]
strides = self.bottleneck_params["middle_flow"][1]
chns = self.bottleneck_params["middle_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
#print("middle:", block_num, strides, chns)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
#print("middle:", s, block_point, output_stride)
with scope("middle_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride
if check_points(block_point, self.decode_points):
#print("decode shortcut:", block_point)
self.short_cuts[block_point] = short_cuts[1]
#print("middle:", i, data.shape)
self.stride = s
self.block_point = block_point
#print("middle:", s, block_point, output_stride)
return data
def exit_flow(self, data):
block_num = self.bottleneck_params["exit_flow"][0]
strides = self.bottleneck_params["exit_flow"][1]
chns = self.bottleneck_params["exit_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
#print("exit:", block_num, strides, chns)
assert (block_num == 2)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
#print("exit:", s, block_point, output_stride)
with scope("exit_flow"):
with scope('block1'):
block_point += 1
stride = strides[0] if check_stride(s * strides[0],
output_stride) else 1
data, short_cuts = self.xception_block(data, chns[0],
[1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
#print("decode shortcut:", block_point)
self.short_cuts[block_point] = short_cuts[1]
#print("exit:", 0, data.shape)
with scope('block2'):
block_point += 1
stride = strides[1] if check_stride(s * strides[1],
output_stride) else 1
data, short_cuts = self.xception_block(
data,
chns[1], [1, 1, stride],
dilation=2,
has_skip=False,
activation_fn_in_separable_conv=True)
s = s * stride
if check_points(block_point, self.decode_points):
#print("decode shortcut:", block_point)
self.short_cuts[block_point] = short_cuts[1]
#print("exit:", 1, data.shape)
self.stride = s
self.block_point = block_point
#print("exit:", s, block_point, output_stride)
return data
def xception_block(self,
input,
channels,
strides=1,
filters=3,
dilation=1,
skip_conv=True,
has_skip=True,
activation_fn_in_separable_conv=False):
repeat_number = 3
channels = check_data(channels, repeat_number)
filters = check_data(filters, repeat_number)
strides = check_data(strides, repeat_number)
data = input
results = []
for i in range(repeat_number):
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation)
else:
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
act=relu)
results.append(data)
if not has_skip:
return data, results
if skip_conv:
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope('shortcut'):
skip = bn(
conv(
input,
channels[-1],
1,
strides[-1],
groups=1,
padding=0,
param_attr=param_attr))
else:
skip = input
return data + skip, results
def xception_65():
model = Xception("xception_65")
return model
def xception_41():
model = Xception("xception_41")
return model
def xception_71():
model = Xception("xception_71")
return model
if __name__ == '__main__':
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
model = xception_65()
logit = model.net(image)
#print("logit:", logit.shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
from utils.config import cfg
import contextlib
bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
name_scope = ""
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def max_pool(input, kernel, stride, padding):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='max',
pool_stride=stride,
pool_padding=padding)
return data
def avg_pool(input, kernel, stride, padding=0):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='avg',
pool_stride=stride,
pool_padding=padding)
return data
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape
if C % G != 0:
# print "group can not divide channle:", C, G
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
# print "use group size:", G
break
assert C % G == 0
x = fluid.layers.group_norm(
input,
groups=G,
param_attr=param_attr,
bias_attr=bias_attr,
name=name_scope + 'group_norm')
return x
def bn(*args, **kargs):
if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
with scope('BatchNorm'):
return fluid.layers.batch_norm(
*args,
epsilon=cfg.MODEL.DEFAULT_EPSILON,
momentum=cfg.MODEL.BN_MOMENTUM,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
elif cfg.MODEL.DEFAULT_NORM_TYPE == 'gn':
with scope('GroupNorm'):
return group_norm(
args[0],
cfg.MODEL.DEFAULT_GROUP_NUMBER,
eps=cfg.MODEL.DEFAULT_EPSILON,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer))
else:
raise Exception("Unsupport norm type:" + cfg.MODEL.DEFAULT_NORM_TYPE)
def bn_relu(data):
return fluid.layers.relu(bn(data))
def relu(data):
return fluid.layers.relu(data)
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = fluid.ParamAttr(
name=name_scope + 'biases',
regularizer=None,
initializer=fluid.initializer.ConstantInitializer(value=0.0))
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d(*args, **kargs)
def deconv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d_transpose(*args, **kargs)
def separate_conv(input, channel, stride, filter, dilation=1, act=None):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter // 2) * dilation,
dilation=dilation,
use_cudnn=True if cfg.MODEL.FP16 else False,
param_attr=param_attr)
input = bn(input)
if act: input = act(input)
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
input = bn(input)
if act: input = act(input)
return input
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import struct
import importlib
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.proto.framework_pb2 import VarType
import solver
from utils.config import cfg
from loss import multi_softmax_with_loss
class ModelPhase(object):
"""
Standard name for model phase in PaddleSeg
The following standard keys are defined:
* `TRAIN`: training mode.
* `EVAL`: testing/evaluation mode.
* `PREDICT`: prediction/inference mode.
* `VISUAL` : visualization mode
"""
TRAIN = 'train'
EVAL = 'eval'
PREDICT = 'predict'
VISUAL = 'visual'
@staticmethod
def is_train(phase):
return phase == ModelPhase.TRAIN
@staticmethod
def is_predict(phase):
return phase == ModelPhase.PREDICT
@staticmethod
def is_eval(phase):
return phase == ModelPhase.EVAL
@staticmethod
def is_visual(phase):
return phase == ModelPhase.VISUAL
@staticmethod
def is_valid_phase(phase):
""" Check valid phase """
if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
return True
return False
def map_model_name(model_name):
name_dict = {
"unet": "unet.unet",
"deeplabv3p": "deeplab.deeplabv3p",
"icnet": "icnet.icnet",
}
if model_name in name_dict.keys():
return name_dict[model_name]
else:
raise Exception(
"unknow model name, only support unet, deeplabv3p, icnet")
def get_func(func_name):
"""Helper to return a function object by name. func_name must identify a
function in this module or the path to a function relative to the base
'modeling' module.
"""
if func_name == '':
return None
try:
parts = func_name.split('.')
# Refers to a function in this module
if len(parts) == 1:
return globals()[parts[0]]
# Otherwise, assume we're referencing a module under modeling
module_name = 'models.' + '.'.join(parts[:-1])
module = importlib.import_module(module_name)
return getattr(module, parts[-1])
except Exception:
print('Failed to find function: {}'.format(func_name))
return module
def softmax(logit):
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.softmax(logit)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN):
if not ModelPhase.is_valid_phase(phase):
raise ValueError("ModelPhase {} is not valid!".format(phase))
if ModelPhase.is_train(phase):
width = cfg.TRAIN_CROP_SIZE[0]
height = cfg.TRAIN_CROP_SIZE[1]
else:
width = cfg.EVAL_CROP_SIZE[0]
height = cfg.EVAL_CROP_SIZE[1]
image_shape = [cfg.DATASET.DATA_DIM, height, width]
grt_shape = [1, height, width]
class_num = cfg.DATASET.NUM_CLASSES
with fluid.program_guard(main_prog, start_prog):
with fluid.unique_name.guard():
image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(
name='label', shape=grt_shape, dtype='int32')
mask = fluid.layers.data(
name='mask', shape=grt_shape, dtype='int32')
# use PyReader when doing traning and evaluation
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
py_reader = fluid.io.PyReader(
feed_list=[image, label, mask],
capacity=cfg.DATALOADER.BUF_SIZE,
iterable=False,
use_double_buffer=True)
if cfg.MODEL.FP16:
image = fluid.layers.cast(image, "float16")
model_name = map_model_name(cfg.MODEL.MODEL_NAME)
model_func = get_func("modeling." + model_name)
logits = model_func(image, class_num)
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
avg_loss = multi_softmax_with_loss(logits, label, mask,
class_num)
#get pred result in original size
if isinstance(logits, tuple):
logit = logits[0]
else:
logit = logits
if logit.shape[2:] != label.shape[2:]:
logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
# return image input and logit output for inference graph prune
if ModelPhase.is_predict(phase):
logit = softmax(logit)
return image, logit
out = fluid.layers.transpose(x=logit, perm=[0, 2, 3, 1])
if cfg.MODEL.FP16:
out = fluid.layers.cast(out, 'float32')
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if ModelPhase.is_visual(phase):
logit = softmax(logit)
return pred, logit
if ModelPhase.is_eval(phase):
return py_reader, avg_loss, pred, label, mask
if ModelPhase.is_train(phase):
optimizer = solver.Solver(main_prog, start_prog)
decayed_lr = optimizer.optimise(avg_loss)
return py_reader, avg_loss, decayed_lr, pred, label, mask
def to_int(string, dest="I"):
return struct.unpack(dest, string)[0]
def parse_shape_from_file(filename):
with open(filename, "rb") as file:
version = file.read(4)
lod_level = to_int(file.read(8), dest="Q")
for i in range(lod_level):
_size = to_int(file.read(8), dest="Q")
_ = file.read(_size)
version = file.read(4)
tensor_desc_size = to_int(file.read(4))
tensor_desc = VarType.TensorDesc()
tensor_desc.ParseFromString(file.read(tensor_desc_size))
return tuple(tensor_desc.dims)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from models.libs.model_libs import scope, name_scope
from models.libs.model_libs import bn, bn_relu, relu
from models.libs.model_libs import conv
from models.libs.model_libs import separate_conv
from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
from models.backbone.xception import Xception as xception_backbone
def encoder(input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
# OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
# aspp_ratios:ASPP模块空洞卷积的采样率
if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16:
aspp_ratios = [6, 12, 18]
elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8:
aspp_ratios = [12, 24, 36]
else:
raise Exception("deeplab only support stride 8 or 16")
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('encoder'):
channel = 256
with scope("image_pool"):
if cfg.MODEL.FP16:
image_avg = fluid.layers.reduce_mean(
fluid.layers.cast(input, 'float32'), [2, 3], keep_dim=True)
image_avg = fluid.layers.cast(image_avg, 'float16')
else:
image_avg = fluid.layers.reduce_mean(
input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
if cfg.MODEL.FP16:
image_avg = fluid.layers.cast(image_avg, 'float32')
image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
if cfg.MODEL.FP16:
image_avg = fluid.layers.cast(image_avg, 'float16')
with scope("aspp0"):
aspp0 = bn_relu(
conv(
input,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
with scope("aspp1"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp1 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
else:
aspp1 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[0],
padding=aspp_ratios[0],
param_attr=param_attr))
with scope("aspp2"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp2 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
else:
aspp2 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[1],
padding=aspp_ratios[1],
param_attr=param_attr))
with scope("aspp3"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp3 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
else:
aspp3 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[2],
padding=aspp_ratios[2],
param_attr=param_attr))
with scope("concat"):
data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3],
axis=1)
data = bn_relu(
conv(
data,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9)
return data
def decoder(encode_data, decode_shortcut):
# 解码器配置
# encode_data:编码器输出
# decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
# DECODER_USE_SEP_CONV: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('decoder'):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
if cfg.MODEL.FP16:
encode_data = fluid.layers.cast(encode_data, 'float32')
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut.shape[2:])
if cfg.MODEL.FP16:
encode_data = fluid.layers.cast(encode_data, 'float16')
encode_data = fluid.layers.concat([encode_data, decode_shortcut],
axis=1)
if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
with scope("decoder_conv2"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def mobilenetv2(input):
# Backbone: mobilenetv2结构配置
# DEPTH_MULTIPLIER: mobilenetv2的scale设置,默认1.0
# OUTPUT_STRIDE:下采样倍数
# end_points: mobilenetv2的block数
# decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER
output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
model = mobilenet_backbone(scale=scale, output_stride=output_stride)
end_points = 18
decode_point = 4
data, decode_shortcuts = model.net(
input, end_points=end_points, decode_points=decode_point)
decode_shortcut = decode_shortcuts[decode_point]
return data, decode_shortcut
def xception(input):
# Backbone: Xception结构配置, xception_65, xception_41, xception_71三种可选
# decode_point: 从Xception中引出分支所在block数,作为decoder输入
# end_point:Xception的block数
cfg.MODEL.DEFAULT_EPSILON = 1e-3
model = xception_backbone(cfg.MODEL.DEEPLAB.BACKBONE)
backbone = cfg.MODEL.DEEPLAB.BACKBONE
output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
if '65' in backbone:
decode_point = 2
end_points = 21
if '41' in backbone:
decode_point = 2
end_points = 13
if '71' in backbone:
decode_point = 3
end_points = 23
data, decode_shortcuts = model.net(
input,
output_stride=output_stride,
end_points=end_points,
decode_points=decode_point)
decode_shortcut = decode_shortcuts[decode_point]
return data, decode_shortcut
def deeplabv3p(img, num_classes):
# Backbone设置:xception 或 mobilenetv2
if 'xception' in cfg.MODEL.DEEPLAB.BACKBONE:
data, decode_shortcut = xception(img)
elif 'mobilenet' in cfg.MODEL.DEEPLAB.BACKBONE:
data, decode_shortcut = mobilenetv2(img)
else:
raise Exception("deeplab only support xception and mobilenet backbone")
# 编码器解码器设置
cfg.MODEL.DEFAULT_EPSILON = 1e-5
if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP:
data = encoder(data)
if cfg.MODEL.DEEPLAB.ENABLE_DECODER:
data = decoder(data, decode_shortcut)
# 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
logit = conv(
data,
num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
if cfg.MODEL.FP16:
logit = fluid.layers.cast(logit, 'float32')
logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
if cfg.MODEL.FP16:
logit = fluid.layers.cast(logit, 'float16')
return logit
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from utils.config import cfg
from models.libs.model_libs import scope
from models.libs.model_libs import bn, avg_pool, conv
from models.backbone.resnet import ResNet as resnet_backbone
import numpy as np
def interp(input, out_shape):
out_shape = list(out_shape.astype("int32"))
return fluid.layers.resize_bilinear(input, out_shape=out_shape)
def pyramis_pooling(input, input_shape):
shape = np.ceil(input_shape / 32).astype("int32")
h, w = shape
pool1 = avg_pool(input, [h, w], [h, w])
pool1_interp = interp(pool1, shape)
pool2 = avg_pool(input, [h // 2, w // 2], [h // 2, w // 2])
pool3 = avg_pool(input, [h // 3, w // 3], [h // 3, w // 3])
pool4 = avg_pool(input, [h // 4, w // 4], [h // 4, w // 4])
# official caffe repo eval use following hyparam
# pool2 = avg_pool(input, [17, 33], [16, 32])
# pool3 = avg_pool(input, [13, 25], [10, 20])
# pool4 = avg_pool(input, [8, 15], [5, 10])
pool2_interp = interp(pool2, shape)
pool3_interp = interp(pool3, shape)
pool4_interp = interp(pool4, shape)
conv5_3_sum = input + pool4_interp + pool3_interp + pool2_interp + pool1_interp
return conv5_3_sum
def zero_padding(input, padding):
return fluid.layers.pad(input,
[0, 0, 0, 0, padding, padding, padding, padding])
def sub_net_4(input, input_shape):
tmp = pyramis_pooling(input, input_shape)
with scope("conv5_4_k1"):
tmp = conv(tmp, 256, 1, 1)
tmp = bn(tmp, act='relu')
tmp = interp(tmp, out_shape=np.ceil(input_shape / 16))
return tmp
def sub_net_2(input):
with scope("conv3_1_sub2_proj"):
tmp = conv(input, 128, 1, 1)
tmp = bn(tmp)
return tmp
def sub_net_1(input):
with scope("conv1_sub1"):
tmp = conv(input, 32, 3, 2, padding=1)
tmp = bn(tmp, act='relu')
with scope("conv2_sub1"):
tmp = conv(tmp, 32, 3, 2, padding=1)
tmp = bn(tmp, act='relu')
with scope("conv3_sub1"):
tmp = conv(tmp, 64, 3, 2, padding=1)
tmp = bn(tmp, act='relu')
with scope("conv3_sub1_proj"):
tmp = conv(tmp, 128, 1, 1)
tmp = bn(tmp)
return tmp
def CCF24(sub2_out, sub4_out, input_shape):
with scope("conv_sub4"):
tmp = conv(sub4_out, 128, 3, dilation=2, padding=2)
tmp = bn(tmp)
tmp = tmp + sub2_out
tmp = fluid.layers.relu(tmp)
tmp = interp(tmp, np.ceil(input_shape / 8))
return tmp
def CCF124(sub1_out, sub24_out, input_shape):
tmp = zero_padding(sub24_out, padding=2)
with scope("conv_sub2"):
tmp = conv(tmp, 128, 3, dilation=2)
tmp = bn(tmp)
tmp = tmp + sub1_out
tmp = fluid.layers.relu(tmp)
tmp = interp(tmp, input_shape // 4)
return tmp
def resnet(input):
# ICNET backbone: resnet, 默认resnet50
# end_points: resnet终止层数
# decode_point: backbone引出分支所在层数
# resize_point:backbone所在的该层卷积尺寸缩小至1/2
# dilation_dict: resnet block数及对应的膨胀卷积尺度
scale = cfg.MODEL.ICNET.DEPTH_MULTIPLIER
layers = cfg.MODEL.ICNET.LAYERS
model = resnet_backbone(scale=scale, layers=layers, stem='icnet')
end_points = 49
decode_point = 13
resize_point = 13
dilation_dict = {2: 2, 3: 4}
data, decode_shortcuts = model.net(
input,
end_points=end_points,
decode_points=decode_point,
resize_points=resize_point,
dilation_dict=dilation_dict)
return data, decode_shortcuts[decode_point]
def encoder(data13, data49, input, input_shape):
# ICENT encoder配置
# sub_net_4:对resnet49层数据进行pyramis_pooling操作
# sub_net_2:对resnet13层数据进行卷积操作
# sub_net_1: 对原始尺寸图像进行3次下采样卷积操作
sub4_out = sub_net_4(data49, input_shape)
sub2_out = sub_net_2(data13)
sub1_out = sub_net_1(input)
return sub1_out, sub2_out, sub4_out
def decoder(sub1_out, sub2_out, sub4_out, input_shape):
# ICENT decoder配置
# CCF: Cascade Feature Fusion 级联特征融合
sub24_out = CCF24(sub2_out, sub4_out, input_shape)
sub124_out = CCF124(sub1_out, sub24_out, input_shape)
return sub24_out, sub124_out
def get_logit(data, num_classes, name="logit"):
param_attr = fluid.ParamAttr(
name=name + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope(name):
data = conv(
data,
num_classes,
1,
stride=1,
padding=0,
param_attr=param_attr,
bias_attr=True)
return data
def icnet(input, num_classes):
# Backbone resnet: 输入 image_sub2: 图片尺寸缩小至1/2
# 输出 data49: resnet第49层数据,原始尺寸1/32
# data13:resnet第13层数据, 原始尺寸1/16
input_shape = input.shape[2:]
input_shape = np.array(input_shape).astype("float32")
image_sub2 = interp(input, out_shape=np.ceil(input_shape * 0.5))
data49, data13 = resnet(image_sub2)
# encoder:输入:input, data13, data49,分别进行下采样,卷积和金字塔pooling操作
# 输出:分别对应sub1_out, sub2_out, sub4_out
sub1_out, sub2_out, sub4_out = encoder(data13, data49, input, input_shape)
# decoder: 对编码器三个分支结果进行级联特征融合
sub24_out, sub124_out = decoder(sub1_out, sub2_out, sub4_out, input_shape)
# get_logit: 根据类别数决定最后一层卷积输出
logit124 = get_logit(sub124_out, num_classes, "logit124")
logit4 = get_logit(sub4_out, num_classes, "logit4")
logit24 = get_logit(sub24_out, num_classes, "logit24")
return logit124, logit24, logit4
if __name__ == '__main__':
image_shape = [3, 320, 320]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
logit = icnet(image, 4)
print("logit:", logit.shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from models.libs.model_libs import scope, name_scope
from models.libs.model_libs import bn, bn_relu, relu
from models.libs.model_libs import conv, max_pool, deconv
def double_conv(data, out_ch):
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope("conv0"):
data = bn_relu(
conv(data, out_ch, 3, stride=1, padding=1, param_attr=param_attr))
with scope("conv1"):
data = bn_relu(
conv(data, out_ch, 3, stride=1, padding=1, param_attr=param_attr))
return data
def down(data, out_ch):
# 下采样:max_pool + 2个卷积
with scope("down"):
data = max_pool(data, 2, 2, 0)
data = double_conv(data, out_ch)
return data
def up(data, short_cut, out_ch):
# 上采样:data上采样(resize或deconv), 并与short_cut concat
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.XavierInitializer(),
)
with scope("up"):
if cfg.MODEL.UNET.UPSAMPLE_MODE == 'bilinear':
data = fluid.layers.resize_bilinear(data, short_cut.shape[2:])
else:
data = deconv(
data,
out_ch // 2,
filter_size=2,
stride=2,
padding=0,
param_attr=param_attr)
data = fluid.layers.concat([data, short_cut], axis=1)
data = double_conv(data, out_ch)
return data
def encode(data):
# 编码器设置
short_cuts = []
with scope("encode"):
with scope("block1"):
data = double_conv(data, 64)
short_cuts.append(data)
with scope("block2"):
data = down(data, 128)
short_cuts.append(data)
with scope("block3"):
data = down(data, 256)
short_cuts.append(data)
with scope("block4"):
data = down(data, 512)
short_cuts.append(data)
with scope("block5"):
data = down(data, 512)
return data, short_cuts
def decode(data, short_cuts):
# 解码器设置,与编码器对称
with scope("decode"):
with scope("decode1"):
data = up(data, short_cuts[3], 256)
with scope("decode2"):
data = up(data, short_cuts[2], 128)
with scope("decode3"):
data = up(data, short_cuts[1], 64)
with scope("decode4"):
data = up(data, short_cuts[0], 64)
return data
def get_logit(data, num_classes):
# 根据类别数设置最后一个卷积层输出
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope("logit"):
data = conv(
data, num_classes, 3, stride=1, padding=1, param_attr=param_attr)
return data
def unet(input, num_classes):
# UNET网络配置,对称的编码器解码器
encode_data, short_cuts = encode(input)
decode_data = decode(encode_data, short_cuts)
logit = get_logit(decode_data, num_classes)
return logit
if __name__ == '__main__':
image_shape = [3, 320, 320]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
logit = unet(image, 4)
print("logit:", logit.shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import sys
import os
import math
import random
import functools
import io
import time
import codecs
import numpy as np
import paddle
import paddle.fluid as fluid
import cv2
import data_aug as aug
from utils.config import cfg
from data_utils import GeneratorEnqueuer
from models.model_builder import ModelPhase
def cv2_imread(file_path, flag=cv2.IMREAD_COLOR):
# resolve cv2.imread open Chinese file path issues on Windows Platform.
return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
class SegDataset(object):
def __init__(self,
file_list,
data_dir,
shuffle=False,
mode=ModelPhase.TRAIN):
self.mode = mode
self.shuffle = shuffle
self.data_dir = data_dir
# NOTE: Please ensure file list was save in UTF-8 coding format
with codecs.open(file_list, 'r', 'utf-8') as flist:
self.lines = [line.strip() for line in flist]
if shuffle:
np.random.shuffle(self.lines)
def generator(self):
if self.shuffle:
np.random.shuffle(self.lines)
for line in self.lines:
yield self.process_image(line, self.data_dir, self.mode)
def sharding_generator(self, pid=0, num_processes=1):
"""
Use line id as shard key for multiprocess io
It's a normal generator if pid=0, num_processes=1
"""
for index, line in enumerate(self.lines):
# Use index and pid to shard file list
if index % num_processes == pid:
yield self.process_image(line, self.data_dir, self.mode)
def batch_reader(self, batch_size):
br = self.batch(self.reader, batch_size)
for batch in br:
yield batch[0], batch[1], batch[2]
def multiprocess_generator(self, max_queue_size=32, num_processes=8):
# Re-shuffle file list
if self.shuffle:
np.random.shuffle(self.lines)
# Create multiple sharding generators according to num_processes for multiple processes
generators = []
for pid in range(num_processes):
generators.append(self.sharding_generator(pid, num_processes))
try:
enqueuer = GeneratorEnqueuer(generators)
enqueuer.start(max_queue_size=max_queue_size, workers=num_processes)
while True:
generator_out = None
while enqueuer.is_running():
if not enqueuer.queue.empty():
generator_out = enqueuer.queue.get(timeout=5)
break
else:
time.sleep(0.01)
if generator_out is None:
break
yield generator_out
finally:
if enqueuer is not None:
enqueuer.stop()
def batch(self, reader, batch_size, is_test=False, drop_last=False):
def batch_reader(is_test=False, drop_last=drop_last):
if is_test:
imgs, img_names, valid_shapes, org_shapes = [], [], [], []
for img, img_name, valid_shape, org_shape in reader():
imgs.append(img)
img_names.append(img_name)
valid_shapes.append(valid_shape)
org_shapes.append(org_shape)
if len(imgs) == batch_size:
yield np.array(imgs), img_names, np.array(
valid_shapes), np.array(org_shapes)
imgs, img_names, valid_shapes, org_shapes = [], [], [], []
if not drop_last and len(imgs) > 0:
yield np.array(imgs), img_names, np.array(
valid_shapes), np.array(org_shapes)
else:
imgs, labs, ignore = [], [], []
bs = 0
for img, lab, ig in reader():
imgs.append(img)
labs.append(lab)
ignore.append(ig)
bs += 1
if bs == batch_size:
yield np.array(imgs), np.array(labs), np.array(ignore)
bs = 0
imgs, labs, ignore = [], [], []
if not drop_last and bs > 0:
yield np.array(imgs), np.array(labs), np.array(ignore)
return batch_reader(is_test, drop_last)
def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
# original image cv2.imread flag setting
cv2_imread_flag = cv2.IMREAD_COLOR
if cfg.DATASET.IMAGE_TYPE == "rgba":
# If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
# reserver alpha channel
cv2_imread_flag = cv2.IMREAD_UNCHANGED
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
raise Exception("File list format incorrect! It should be"
" image_name{}label_name\\n".format(
cfg.DATASET.SEPARATOR))
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(src_dir, img_name)
grt_path = os.path.join(src_dir, grt_name)
img = cv2_imread(img_path, cv2_imread_flag)
grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
if img is None or grt is None:
raise Exception(
"Empty image, src_dir: {}, img: {} & lab: {}".format(
src_dir, img_path, grt_path))
img_height = img.shape[0]
img_width = img.shape[1]
grt_height = grt.shape[0]
grt_width = grt.shape[1]
if img_height != grt_height or img_width != grt_width:
raise Exception(
"source img and label img must has the same size")
if len(img.shape) < 3:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
img_channels = img.shape[2]
if img_channels < 3:
raise Exception(
"PaddleSeg only supports gray, rgb or rgba image")
if img_channels != cfg.DATASET.DATA_DIM:
raise Exception(
"Input image channel({}) is not match cfg.DATASET.DATA_DIM({}), img_name={}"
.format(img_channels, cfg.DATASET.DATADIM, img_name))
if img_channels != len(cfg.MEAN):
raise Exception(
"img name {}, img chns {} mean size {}, size unequal".
format(img_name, img_channels, len(cfg.MEAN)))
if img_channels != len(cfg.STD):
raise Exception(
"img name {}, img chns {} std size {}, size unequal".format(
img_name, img_channels, len(cfg.STD)))
# visualization mode
elif mode == ModelPhase.VISUAL:
if cfg.DATASET.SEPARATOR in line:
parts = line.strip().split(cfg.DATASET.SEPARATOR)
img_name = parts[0]
else:
img_name = line.strip()
img_path = os.path.join(src_dir, img_name)
img = cv2_imread(img_path, cv2_imread_flag)
if img is None:
raise Exception("empty image, src_dir:{}, img: {}".format(
src_dir, img_name))
# Convert grayscale image to BGR 3 channel image
if len(img.shape) < 3:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
img_height = img.shape[0]
img_width = img.shape[1]
img_channels = img.shape[2]
if img_channels < 3:
raise Exception("this repo only recept gray, rgb or rgba image")
if img_channels != cfg.DATASET.DATA_DIM:
raise Exception("data dim must equal to image channels")
if img_channels != len(cfg.MEAN):
raise Exception(
"img name {}, img chns {} mean size {}, size unequal".
format(img_name, img_channels, len(cfg.MEAN)))
if img_channels != len(cfg.STD):
raise Exception(
"img name {}, img chns {} std size {}, size unequal".format(
img_name, img_channels, len(cfg.STD)))
grt = None
grt_name = None
else:
raise ValueError("mode error: {}".format(mode))
return img, grt, img_name, grt_name
def normalize_image(self, img):
""" 像素归一化后减均值除方差 """
img = img.transpose((2, 0, 1)).astype('float32') / 255.0
img_mean = np.array(cfg.MEAN).reshape((len(cfg.MEAN), 1, 1))
img_std = np.array(cfg.STD).reshape((len(cfg.STD), 1, 1))
img -= img_mean
img /= img_std
return img
def process_image(self, line, data_dir, mode):
""" process_image """
img, grt, img_name, grt_name = self.load_image(
line, data_dir, mode=mode)
if mode == ModelPhase.TRAIN:
img, grt = aug.resize(img, grt, mode)
if cfg.AUG.RICH_CROP.ENABLE:
if cfg.AUG.RICH_CROP.BLUR:
if cfg.AUG.RICH_CROP.BLUR_RATIO <= 0:
n = 0
elif cfg.AUG.RICH_CROP.BLUR_RATIO >= 1:
n = 1
else:
n = int(1.0 / cfg.AUG.RICH_CROP.BLUR_RATIO)
if n > 0:
if np.random.randint(0, n) == 0:
radius = np.random.randint(3, 10)
if radius % 2 != 1:
radius = radius + 1
if radius > 9:
radius = 9
img = cv2.GaussianBlur(img, (radius, radius), 0, 0)
img, grt = aug.random_rotation(
img,
grt,
rich_crop_max_rotation=cfg.AUG.RICH_CROP.MAX_ROTATION,
mean_value=cfg.MEAN)
img, grt = aug.rand_scale_aspect(
img,
grt,
rich_crop_min_scale=cfg.AUG.RICH_CROP.MIN_AREA_RATIO,
rich_crop_aspect_ratio=cfg.AUG.RICH_CROP.ASPECT_RATIO)
img = aug.hsv_color_jitter(
img,
brightness_jitter_ratio=cfg.AUG.RICH_CROP.
BRIGHTNESS_JITTER_RATIO,
saturation_jitter_ratio=cfg.AUG.RICH_CROP.
SATURATION_JITTER_RATIO,
contrast_jitter_ratio=cfg.AUG.RICH_CROP.
CONTRAST_JITTER_RATIO)
if cfg.AUG.RICH_CROP.FLIP:
if cfg.AUG.RICH_CROP.FLIP_RATIO <= 0:
n = 0
elif cfg.AUG.RICH_CROP.FLIP_RATIO >= 1:
n = 1
else:
n = int(1.0 / cfg.AUG.RICH_CROP.FLIP_RATIO)
if n > 0:
if np.random.randint(0, n) == 0:
img = img[::-1, :, :]
grt = grt[::-1, :]
if cfg.AUG.MIRROR:
if np.random.randint(0, 2) == 1:
img = img[:, ::-1, :]
grt = grt[:, ::-1]
img, grt = aug.rand_crop(img, grt, mode=mode)
elif ModelPhase.is_eval(mode):
img, grt = aug.resize(img, grt, mode=mode)
img, grt = aug.rand_crop(img, grt, mode=mode)
elif ModelPhase.is_visual(mode):
org_shape = [img.shape[0], img.shape[1]]
img, grt = aug.resize(img, grt, mode=mode)
valid_shape = [img.shape[0], img.shape[1]]
img, grt = aug.rand_crop(img, grt, mode=mode)
else:
raise ValueError("Dataset mode={} Error!".format(mode))
# Normalize image
img = self.normalize_image(img)
if ModelPhase.is_train(mode) or ModelPhase.is_eval(mode):
grt = np.expand_dims(np.array(grt).astype('int32'), axis=0)
ignore = (grt != cfg.DATASET.IGNORE_INDEX).astype('int32')
if ModelPhase.is_train(mode):
return (img, grt, ignore)
elif ModelPhase.is_eval(mode):
return (img, grt, ignore)
elif ModelPhase.is_visual(mode):
return (img, img_name, valid_shape, org_shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import paddle.fluid as fluid
import numpy as np
import importlib
from utils.config import cfg
from paddle.fluid.contrib.mixed_precision.fp16_utils import create_master_params_grads, master_param_to_train_param
class Solver(object):
def __init__(self, main_prog, start_prog):
total_images = cfg.DATASET.TRAIN_TOTAL_IMAGES
self.weight_decay = cfg.SOLVER.WEIGHT_DECAY
self.momentum = cfg.SOLVER.MOMENTUM
self.momentum2 = cfg.SOLVER.MOMENTUM2
self.step_per_epoch = total_images // cfg.BATCH_SIZE
if total_images % cfg.BATCH_SIZE != 0:
self.step_per_epoch += 1
self.total_step = cfg.SOLVER.NUM_EPOCHS * self.step_per_epoch
self.main_prog = main_prog
self.start_prog = start_prog
def piecewise_decay(self):
gamma = cfg.SOLVER.GAMMA
bd = [self.step_per_epoch * e for e in cfg.SOLVER.DECAY_EPOCH]
lr = [cfg.SOLVER.LR * (gamma**i) for i in range(len(bd) + 1)]
decayed_lr = fluid.layers.piecewise_decay(boundaries=bd, values=lr)
return decayed_lr
def poly_decay(self):
power = cfg.SOLVER.POWER
decayed_lr = fluid.layers.polynomial_decay(
cfg.SOLVER.LR, self.total_step, end_learning_rate=0, power=power)
return decayed_lr
def cosine_decay(self):
decayed_lr = fluid.layers.cosine_decay(
cfg.SOLVER.LR, self.step_per_epoch, cfg.SOLVER.NUM_EPOCHS)
return decayed_lr
def get_lr(self, lr_policy):
if lr_policy.lower() == 'poly':
decayed_lr = self.poly_decay()
elif lr_policy.lower() == 'piecewise':
decayed_lr = self.piecewise_decay()
elif lr_policy.lower() == 'cosine':
decayed_lr = self.cosine_decay()
else:
raise Exception(
"unsupport learning decay policy! only support poly,piecewise,cosine"
)
return decayed_lr
def sgd_optimizer(self, lr_policy, loss):
decayed_lr = self.get_lr(lr_policy)
optimizer = fluid.optimizer.Momentum(
learning_rate=decayed_lr,
momentum=self.momentum,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=self.weight_decay),
)
if cfg.MODEL.FP16:
params_grads = optimizer.backward(loss, self.start_prog)
master_params_grads = create_master_params_grads(
params_grads, self.main_prog, self.start_prog,
cfg.MODEL.SCALE_LOSS)
optimizer.apply_gradients(master_params_grads)
master_param_to_train_param(master_params_grads, params_grads,
self.main_prog)
else:
optimizer.minimize(loss)
return decayed_lr
def adam_optimizer(self, lr_policy, loss):
decayed_lr = self.get_lr(lr_policy)
optimizer = fluid.optimizer.Adam(
learning_rate=decayed_lr,
beta1=self.momentum,
beta2=self.momentum2,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=self.weight_decay),
)
optimizer.minimize(loss)
return decayed_lr
def optimise(self, loss):
lr_policy = cfg.SOLVER.LR_POLICY
opt = cfg.SOLVER.OPTIMIZER
if opt.lower() == 'adam':
return self.adam_optimizer(lr_policy, loss)
elif opt.lower() == 'sgd':
return self.sgd_optimizer(lr_policy, loss)
else:
raise Exception(
"unsupport optimizer solver, only support adam and sgd")
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
import argparse
import pprint
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from metrics import ConfusionMatrix
from reader import SegDataset
from models.model_builder import build_model
from models.model_builder import ModelPhase
from models.model_builder import parse_shape_from_file
from eval import evaluate
from vis import visualize
from utils.fp16_utils import load_fp16_vars
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg training')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess I/O or not',
action='store_true',
default=False)
parser.add_argument(
'--log_steps',
dest='log_steps',
help='Display logging information at every log_steps',
default=10,
type=int)
parser.add_argument(
'--debug',
dest='debug',
help='debug mode, display detail information of training',
action='store_true')
parser.add_argument(
'--use_tb',
dest='use_tb',
help='whether to record the data during training to Tensorboard',
action='store_true')
parser.add_argument(
'--tb_log_dir',
dest='tb_log_dir',
help='Tensorboard logging directory',
default=None,
type=str)
parser.add_argument(
'--do_eval',
dest='do_eval',
help='Evaluation models result on every new checkpoint',
action='store_true')
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
return parser.parse_args()
def save_vars(executor, dirname, program=None, vars=None):
"""
Temporary resolution for Win save variables compatability.
Will fix in PaddlePaddle v1.5.2
"""
save_program = fluid.Program()
save_block = save_program.global_block()
for each_var in vars:
# NOTE: don't save the variable which type is RAW
if each_var.type == fluid.core.VarDesc.VarType.RAW:
continue
new_var = save_block.create_var(
name=each_var.name,
shape=each_var.shape,
dtype=each_var.dtype,
type=each_var.type,
lod_level=each_var.lod_level,
persistable=True)
file_path = os.path.join(dirname, new_var.name)
file_path = os.path.normpath(file_path)
save_block.append_op(
type='save',
inputs={'X': [new_var]},
outputs={},
attrs={'file_path': file_path})
executor.run(save_program)
def save_checkpoint(exe, program, ckpt_name):
"""
Save checkpoint for evaluation or resume training
"""
ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
print("Save model checkpoint to {}".format(ckpt_dir))
if not os.path.isdir(ckpt_dir):
os.makedirs(ckpt_dir)
save_vars(
exe,
ckpt_dir,
program,
vars=list(filter(fluid.io.is_persistable, program.list_vars())))
return ckpt_dir
def load_checkpoint(exe, program):
"""
Load checkpoiont from pretrained model directory for resume training
"""
print('Resume model training from:', cfg.TRAIN.PRETRAINED_MODEL)
if not os.path.exists(cfg.TRAIN.PRETRAINED_MODEL):
raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
cfg.TRAIN.PRETRAINED_MODEL))
fluid.io.load_persistables(
exe, cfg.TRAIN.PRETRAINED_MODEL, main_program=program)
model_path = cfg.TRAIN.PRETRAINED_MODEL
# Check is path ended by path spearator
if model_path[-1] == os.sep:
model_path = model_path[0:-1]
epoch_name = os.path.basename(model_path)
# If resume model is final model
if epoch_name == 'final':
begin_epoch = cfg.SOLVER.NUM_EPOCHS
# If resume model path is end of digit, restore epoch status
elif epoch_name.isdigit():
epoch = int(epoch_name)
begin_epoch = epoch + 1
else:
raise ValueError("Resume model path is not valid!")
print("Model checkpoint loaded successfully!")
return begin_epoch
def train(cfg):
startup_prog = fluid.Program()
train_prog = fluid.Program()
drop_last = True
dataset = SegDataset(
file_list=cfg.DATASET.TRAIN_FILE_LIST,
mode=ModelPhase.TRAIN,
shuffle=True,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
if args.use_mpio:
print("Use multiprocess reader")
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
print("Use multi-thread reader")
data_gen = dataset.generator()
batch_data = []
for b in data_gen:
batch_data.append(b)
if len(batch_data) == cfg.BATCH_SIZE:
for item in batch_data:
yield item[0], item[1], item[2]
batch_data = []
# If use sync batch norm strategy, drop last batch if number of samples
# in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
if not cfg.TRAIN.SYNC_BATCH_NORM:
for item in batch_data:
yield item[0], item[1], item[2]
# Get device environment
places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
place = places[0]
# Get number of GPU
dev_count = len(places)
print("#GPU-Devices: {}".format(dev_count))
# Make sure BATCH_SIZE can divided by GPU cards
assert cfg.BATCH_SIZE % dev_count == 0, (
'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
cfg.BATCH_SIZE, dev_count))
# If use multi-gpu training mode, batch data will allocated to each GPU evenly
batch_size_per_dev = cfg.BATCH_SIZE // dev_count
print("batch_size_per_dev: {}".format(batch_size_per_dev))
py_reader, avg_loss, lr, pred, grts, masks = build_model(
train_prog, startup_prog, phase=ModelPhase.TRAIN)
py_reader.decorate_sample_generator(
data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
exe = fluid.Executor(place)
exe.run(startup_prog)
exec_strategy = fluid.ExecutionStrategy()
# Clear temporary variables every 100 iteration
if args.use_gpu:
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
if dev_count > 1:
# Apply sync batch norm strategy
print("Sync BatchNorm strategy is effective.")
build_strategy.sync_batch_norm = True
else:
print("Sync BatchNorm strategy will not be effective if GPU device"
" count <= 1")
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=avg_loss.name,
exec_strategy=exec_strategy,
build_strategy=build_strategy)
# Resume training
begin_epoch = cfg.SOLVER.BEGIN_EPOCH
if cfg.TRAIN.RESUME:
begin_epoch = load_checkpoint(exe, train_prog)
# Load pretrained model
elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL):
print('Pretrained model dir:', cfg.TRAIN.PRETRAINED_MODEL)
load_vars = []
def var_shape_matched(var, shape):
"""
Check whehter persitable variable shape is match with current network
"""
var_exist = os.path.exists(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL, var.name))
if var_exist:
var_shape = parse_shape_from_file(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL, var.name))
if var_shape == shape:
return True
else:
print(
"Variable[{}] shape does not match current network, skip"
" to load it.".format(var.name))
return False
for x in train_prog.list_vars():
if isinstance(x, fluid.framework.Parameter):
shape = tuple(fluid.global_scope().find_var(
x.name).get_tensor().shape())
if var_shape_matched(x, shape):
load_vars.append(x)
if cfg.MODEL.FP16:
# If open FP16 training mode, load FP16 var separate
load_fp16_vars(exe, cfg.TRAIN.PRETRAINED_MODEL, train_prog)
else:
fluid.io.load_vars(
exe, dirname=cfg.TRAIN.PRETRAINED_MODEL, vars=load_vars)
print("Pretrained model loaded successfully!")
else:
print('Pretrained model dir {} not exists, training from scratch...'.
format(cfg.TRAIN.PRETRAINED_MODEL))
fetch_list = [avg_loss.name, lr.name]
if args.debug:
# Fetch more variable info and use streaming confusion matrix to
# calculate IoU results if in debug mode
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb:
if not args.tb_log_dir:
print("Please specify the log directory by --tb_log_dir.")
exit(1)
from tb_paddle import SummaryWriter
if os.path.exists(args.tb_log_dir):
shutil.rmtree(args.tb_log_dir)
log_writer = SummaryWriter(args.tb_log_dir)
global_step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1
all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
avg_loss = 0.0
timer = Timer()
timer.start()
if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
raise ValueError(
("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
begin_epoch, cfg.SOLVER.NUM_EPOCHS))
for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
py_reader.start()
while True:
try:
if args.debug:
# Print category IoU and accuracy to check whether the
# traning process is corresponed to expectation
loss, lr, pred, grts, masks = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy()
category_iou, mean_iou = cm.mean_iou()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc,
mean_iou, speed,
calculate_eta(all_step - global_step, speed)))
print("Category IoU:", category_iou)
print("Category Acc:", category_acc)
if args.use_tb:
log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step)
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/step/sec', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
cm.zero_matrix()
timer.restart()
else:
# If not in debug mode, avoid unnessary log and calculate
loss, lr = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0:
avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed)))
if args.use_tb:
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/speed', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
timer.restart()
except fluid.core.EOFException:
py_reader.reset()
break
except Exception as e:
print(e)
if epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0:
ckpt_dir = save_checkpoint(exe, train_prog, epoch)
if args.do_eval:
print("Evaluation start")
_, mean_iou, _, mean_acc = evaluate(
cfg=cfg,
ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu,
use_mpio=args.use_mpio)
if args.use_tb:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step)
# Use Tensorboard to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
visualize(
cfg=cfg,
use_gpu=args.use_gpu,
vis_file_list=cfg.DATASET.VIS_FILE_LIST,
vis_dir="visual",
ckpt_dir=ckpt_dir,
log_writer=log_writer)
# save final model
save_checkpoint(exe, train_prog, 'final')
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts is not None:
cfg.update_from_list(args.opts)
cfg.check_and_infer(reset_dataset=True)
print(pprint.pformat(cfg))
train(cfg)
if __name__ == '__main__':
args = parse_args()
if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
print(
"You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
)
print(
"Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
)
sys.exit(1)
main(args)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""A simple attribute dictionary used for representing configuration options."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import codecs
from ast import literal_eval
import yaml
import six
class SegConfig(dict):
def __init__(self, *args, **kwargs):
super(SegConfig, self).__init__(*args, **kwargs)
self.immutable = False
def __setattr__(self, key, value, create_if_not_exist=True):
if key in ["immutable"]:
self.__dict__[key] = value
return
t = self
keylist = key.split(".")
for k in keylist[:-1]:
t = t.__getattr__(k, create_if_not_exist)
t.__getattr__(keylist[-1], create_if_not_exist)
t[keylist[-1]] = value
def __getattr__(self, key, create_if_not_exist=True):
if key in ["immutable"]:
return self.__dict__[key]
if not key in self:
if not create_if_not_exist:
raise KeyError
self[key] = SegConfig()
return self[key]
def __setitem__(self, key, value):
#
if self.immutable:
raise AttributeError(
'Attempted to set "{}" to "{}", but SegConfig is immutable'.
format(key, value))
#
if isinstance(value, six.string_types):
try:
value = literal_eval(value)
except ValueError:
pass
except SyntaxError:
pass
super(SegConfig, self).__setitem__(key, value)
def update_from_segconfig(self, other):
if isinstance(other, dict):
other = SegConfig(other)
assert isinstance(other, SegConfig)
diclist = [("", other)]
while len(diclist):
prefix, tdic = diclist[0]
diclist = diclist[1:]
for key, value in tdic.items():
key = "{}.{}".format(prefix, key) if prefix else key
if isinstance(value, dict):
diclist.append((key, value))
continue
try:
self.__setattr__(key, value, create_if_not_exist=False)
except KeyError:
raise KeyError('Non-existent config key: {}'.format(key))
def check_and_infer(self, reset_dataset=False):
if self.DATASET.IMAGE_TYPE in ['rgb', 'gray']:
self.DATASET.DATA_DIM = 3
elif self.DATASET.IMAGE_TYPE in ['rgba']:
self.DATASET.DATA_DIM = 4
else:
raise KeyError(
'DATASET.IMAGE_TYPE config error, only support `rgb`, `gray` and `rgba`'
)
if reset_dataset:
# Ensure file list is use UTF-8 encoding
train_sets = codecs.open(self.DATASET.TRAIN_FILE_LIST, 'r',
'utf-8').readlines()
val_sets = codecs.open(self.DATASET.VAL_FILE_LIST, 'r',
'utf-8').readlines()
test_sets = codecs.open(self.DATASET.TEST_FILE_LIST, 'r',
'utf-8').readlines()
self.DATASET.TRAIN_TOTAL_IMAGES = len(train_sets)
self.DATASET.VAL_TOTAL_IMAGES = len(val_sets)
self.DATASET.TEST_TOTAL_IMAGES = len(test_sets)
if self.MODEL.MODEL_NAME == 'icnet' and \
len(self.MODEL.MULTI_LOSS_WEIGHT) != 3:
self.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4, 0.16]
def update_from_list(self, config_list):
if len(config_list) % 2 != 0:
raise ValueError(
"Command line options config format error! Please check it: {}".
format(config_list))
for key, value in zip(config_list[0::2], config_list[1::2]):
try:
self.__setattr__(key, value, create_if_not_exist=False)
except KeyError:
raise KeyError('Non-existent config key: {}'.format(key))
def update_from_file(self, config_file):
with codecs.open(config_file, 'r', 'utf-8') as file:
dic = yaml.load(file)
self.update_from_segconfig(dic)
def set_immutable(self, immutable):
self.immutable = immutable
for value in self.values():
if isinstance(value, SegConfig):
value.set_immutable(immutable)
def is_immutable(self):
return self.immutable
# -*- coding: utf-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from __future__ import unicode_literals
from utils.collect import SegConfig
import numpy as np
cfg = SegConfig()
########################## 基本配置 ###########################################
# 均值,图像预处理减去的均值
cfg.MEAN = [104.008, 116.669, 122.675]
# 标准差,图像预处理除以标准差·
cfg.STD = [1.000, 1.000, 1.000]
# 批处理大小
cfg.BATCH_SIZE = 1
# 验证时图像裁剪尺寸(宽,高)
cfg.EVAL_CROP_SIZE = tuple()
# 训练时图像裁剪尺寸(宽,高)
cfg.TRAIN_CROP_SIZE = tuple()
########################## 数据载入配置 #######################################
# 数据载入时的并发数, 建议值8
cfg.DATALOADER.NUM_WORKERS = 8
# 数据载入时缓存队列大小, 建议值256
cfg.DATALOADER.BUF_SIZE = 256
########################## 数据集配置 #########################################
# 数据主目录目录
cfg.DATASET.DATA_DIR = './dataset/cityscapes/'
# 训练集列表
cfg.DATASET.TRAIN_FILE_LIST = './dataset/cityscapes/train.list'
# 训练集数量
cfg.DATASET.TRAIN_TOTAL_IMAGES = 2975
# 验证集列表
cfg.DATASET.VAL_FILE_LIST = './dataset/cityscapes/val.list'
# 验证数据数量
cfg.DATASET.VAL_TOTAL_IMAGES = 500
# 测试数据列表
cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list'
# 测试数据数量
cfg.DATASET.TEST_TOTAL_IMAGES = 500
# Tensorboard 可视化的数据集
cfg.DATASET.VIS_FILE_LIST = None
# 类别数(需包括背景类)
cfg.DATASET.NUM_CLASSES = 19
# 输入图像类型, 支持三通道'rgb',四通道'rgba',单通道灰度图'gray'
cfg.DATASET.IMAGE_TYPE = 'rgb'
# 输入图片的通道数
cfg.DATASET.DATA_DIM = 3
# 数据列表分割符, 默认为空格
cfg.DATASET.SEPARATOR = ' '
# 忽略的像素标签值, 默认为255,一般无需改动
cfg.DATASET.IGNORE_INDEX = 255
########################### 数据增强配置 ######################################
# 图像镜像左右翻转
cfg.AUG.MIRROR = True
# 图像resize的固定尺寸(宽,高),非负
cfg.AUG.FIX_RESIZE_SIZE = tuple()
# 图像resize的方式有三种:
# unpadding(固定尺寸),stepscaling(按比例resize),rangescaling(长边对齐)
cfg.AUG.AUG_METHOD = 'rangescaling'
# 图像resize方式为stepscaling,resize最小尺度,非负
cfg.AUG.MIN_SCALE_FACTOR = 0.5
# 图像resize方式为stepscaling,resize最大尺度,不小于MIN_SCALE_FACTOR
cfg.AUG.MAX_SCALE_FACTOR = 2.0
# 图像resize方式为stepscaling,resize尺度范围间隔,非负
cfg.AUG.SCALE_STEP_SIZE = 0.25
# 图像resize方式为rangescaling,训练时长边resize的范围最小值,非负
cfg.AUG.MIN_RESIZE_VALUE = 400
# 图像resize方式为rangescaling,训练时长边resize的范围最大值,
# 不小于MIN_RESIZE_VALUE
cfg.AUG.MAX_RESIZE_VALUE = 600
# 图像resize方式为rangescaling, 测试验证可视化模式下长边resize的长度,
# 在MIN_RESIZE_VALUE到MAX_RESIZE_VALUE范围内
cfg.AUG.INF_RESIZE_VALUE = 500
# RichCrop数据增广开关,用于提升模型鲁棒性
cfg.AUG.RICH_CROP.ENABLE = False
# 图像旋转最大角度,0-90
cfg.AUG.RICH_CROP.MAX_ROTATION = 15
# 裁取图像与原始图像面积比,0-1
cfg.AUG.RICH_CROP.MIN_AREA_RATIO = 0.5
# 裁取图像宽高比范围,非负
cfg.AUG.RICH_CROP.ASPECT_RATIO = 0.33
# 亮度调节范围,0-1
cfg.AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO = 0.5
# 饱和度调节范围,0-1
cfg.AUG.RICH_CROP.SATURATION_JITTER_RATIO = 0.5
# 对比度调节范围,0-1
cfg.AUG.RICH_CROP.CONTRAST_JITTER_RATIO = 0.5
# 图像模糊开关,True/False
cfg.AUG.RICH_CROP.BLUR = False
# 图像启动模糊百分比,0-1
cfg.AUG.RICH_CROP.BLUR_RATIO = 0.1
# 图像上下翻转开关,True/False
cfg.AUG.RICH_CROP.FLIP = False
# 图像启动上下翻转百分比,0-1
cfg.AUG.RICH_CROP.FLIP_RATIO = 0.2
########################### 训练配置 ##########################################
# 模型保存路径
cfg.TRAIN.MODEL_SAVE_DIR = ''
# 预训练模型路径
cfg.TRAIN.PRETRAINED_MODEL = ''
# 是否resume,继续训练
cfg.TRAIN.RESUME = False
# 是否使用多卡间同步BatchNorm均值和方差
cfg.TRAIN.SYNC_BATCH_NORM = False
# 模型参数保存的epoch间隔数,可用来继续训练中断的模型
cfg.TRAIN.SNAPSHOT_EPOCH = 10
########################### 模型优化相关配置 ##################################
# 初始学习率
cfg.SOLVER.LR = 0.1
# 学习率下降方法, 支持poly piecewise cosine 三种
cfg.SOLVER.LR_POLICY = "poly"
# 优化算法, 支持SGD和Adam两种算法
cfg.SOLVER.OPTIMIZER = "sgd"
# 动量参数
cfg.SOLVER.MOMENTUM = 0.9
# 二阶矩估计的指数衰减率
cfg.SOLVER.MOMENTUM2 = 0.999
# 学习率Poly下降指数
cfg.SOLVER.POWER = 0.9
# step下降指数
cfg.SOLVER.GAMMA = 0.1
# step下降间隔
cfg.SOLVER.DECAY_EPOCH = [10, 20]
# 学习率权重衰减,0-1
cfg.SOLVER.WEIGHT_DECAY = 0.00004
# 训练开始epoch数,默认为1
cfg.SOLVER.BEGIN_EPOCH = 1
# 训练epoch数,正整数
cfg.SOLVER.NUM_EPOCHS = 30
########################## 测试配置 ###########################################
# 测试模型路径
cfg.TEST.TEST_MODEL = ''
########################## 模型通用配置 #######################################
# 模型名称, 支持deeplab, unet, icnet三种
cfg.MODEL.MODEL_NAME = ''
# BatchNorm类型: bn、gn(group_norm)
cfg.MODEL.DEFAULT_NORM_TYPE = 'bn'
# 多路损失加权值
cfg.MODEL.MULTI_LOSS_WEIGHT = [1.0]
# DEFAULT_NORM_TYPE为gn时group数
cfg.MODEL.DEFAULT_GROUP_NUMBER = 32
# 极小值, 防止分母除0溢出,一般无需改动
cfg.MODEL.DEFAULT_EPSILON = 1e-5
# BatchNorm动量, 一般无需改动
cfg.MODEL.BN_MOMENTUM = 0.99
# 是否使用FP16训练
cfg.MODEL.FP16 = False
# FP16需对LOSS进行scale, 一般训练FP16设置为8.0
cfg.MODEL.SCALE_LOSS = 1.0
########################## DeepLab模型配置 ####################################
# DeepLab backbone 配置, 可选项xception_65, mobilenetv2
cfg.MODEL.DEEPLAB.BACKBONE = "xception_65"
# DeepLab output stride
cfg.MODEL.DEEPLAB.OUTPUT_STRIDE = 16
# MobileNet backbone scale 设置
cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER = 1.0
# MobileNet backbone scale 设置
cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP = True
# MobileNet backbone scale 设置
cfg.MODEL.DEEPLAB.ENABLE_DECODER = True
# ASPP是否使用可分离卷积
cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV = True
# 解码器是否使用可分离卷积
cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV = True
########################## UNET模型配置 #######################################
# 上采样方式, 默认为双线性插值
cfg.MODEL.UNET.UPSAMPLE_MODE = 'bilinear'
########################## ICNET模型配置 ######################################
# RESNET backbone scale 设置
cfg.MODEL.ICNET.DEPTH_MULTIPLIER = 0.5
# RESNET 层数 设置
cfg.MODEL.ICNET.LAYERS = 50
########################## 预测部署模型配置 ###################################
# 预测保存的模型名称
cfg.FREEZE.MODEL_FILENAME = '__model__'
# 预测保存的参数名称
cfg.FREEZE.PARAMS_FILENAME = '__params__'
# 预测模型参数保存的路径
cfg.FREEZE.SAVE_DIR = 'freeze_model'
import os
from paddle import fluid
def load_fp16_vars(executor, dirname, program):
load_dirname = os.path.normpath(dirname)
def _if_exist(var):
name = var.name[:-7] if var.name.endswith('.master') else var.name
b = os.path.exists(os.path.join(load_dirname, name))
if not b and isinstance(var, fluid.framework.Parameter):
print("===== {} not found ====".format(var.name))
return b
load_prog = fluid.Program()
load_block = load_prog.global_block()
vars = list(filter(_if_exist, program.list_vars()))
for var in vars:
new_var = fluid.io._clone_var_in_block_(load_block, var)
name = var.name[:-7] if var.name.endswith('.master') else var.name
file_path = os.path.join(load_dirname, name)
load_block.append_op(
type='load',
inputs={},
outputs={'Out': [new_var]},
attrs={
'file_path': file_path,
'load_as_fp16': var.dtype == fluid.core.VarDesc.VarType.FP16
})
executor.run(load_prog)
\ No newline at end of file
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
def calculate_eta(remaining_step, speed):
if remaining_step < 0:
remaining_step = 0
remaining_time = int(remaining_step / speed)
result = "{:0>2}:{:0>2}:{:0>2}"
arr = []
for i in range(2, -1, -1):
arr.append(int(remaining_time / 60**i))
remaining_time %= 60**i
return result.format(*arr)
class Timer(object):
""" Simple timer class for measuring time consuming """
def __init__(self):
self._start_time = 0.0
self._end_time = 0.0
self._elapsed_time = 0.0
self._is_running = False
def start(self):
self._is_running = True
self._start_time = time.time()
def restart(self):
self.start()
def stop(self):
self._is_running = False
self._end_time = time.time()
def elapsed_time(self):
self._end_time = time.time()
self._elapsed_time = self._end_time - self._start_time
if not self.is_running:
return 0.0
return self._elapsed_time
@property
def is_running(self):
return self._is_running
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
import time
import argparse
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
from PIL import Image as PILImage
from utils.config import cfg
from metrics import ConfusionMatrix
from reader import SegDataset
from models.model_builder import build_model
from models.model_builder import ModelPhase
def parse_args():
parser = argparse.ArgumentParser(description='PaddeSeg visualization tools')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu', dest='use_gpu', help='Use gpu or cpu', action='store_true')
parser.add_argument(
'--vis_dir',
dest='vis_dir',
help='visual save dir',
type=str,
default='visual')
parser.add_argument(
'--also_save_raw_results',
dest='also_save_raw_results',
help='whether to save raw result',
action='store_true')
parser.add_argument(
'--local_test',
dest='local_test',
help='if in local test mode, only visualize 5 images for testing',
action='store_true')
parser.add_argument(
'opts',
help='See config.py for all options',
default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def makedirs(directory):
if not os.path.exists(directory):
os.makedirs(directory)
def get_color_map(num_classes):
""" Returns the color map for visualizing the segmentation mask,
which can support arbitrary number of classes.
Args:
num_classes: Number of classes
Returns:
The color map
"""
#color_map = num_classes * 3 * [0]
color_map = num_classes * [[0, 0, 0]]
for i in range(0, num_classes):
j = 0
color_map[i] = [0, 0, 0]
lab = i
while lab:
color_map[i][0] |= (((lab >> 0) & 1) << (7 - j))
color_map[i][1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i][2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
return color_map
def colorize(image, shape, color_map):
"""
Convert segment result to color image.
"""
color_map = np.array(color_map).astype("uint8")
# Use OpenCV LUT for color mapping
c1 = cv2.LUT(image, color_map[:, 0])
c2 = cv2.LUT(image, color_map[:, 1])
c3 = cv2.LUT(image, color_map[:, 2])
color_res = np.dstack((c1, c2, c3))
return color_res
def to_png_fn(fn):
"""
Append png as filename postfix
"""
directory, filename = os.path.split(fn)
basename, ext = os.path.splitext(filename)
return basename + ".png"
def visualize(cfg,
vis_file_list=None,
use_gpu=False,
vis_dir="visual",
also_save_raw_results=False,
ckpt_dir=None,
log_writer=None,
local_test=False,
**kwargs):
if vis_file_list is None:
vis_file_list = cfg.DATASET.TEST_FILE_LIST
dataset = SegDataset(
file_list=vis_file_list,
mode=ModelPhase.VISUAL,
data_dir=cfg.DATASET.DATA_DIR)
startup_prog = fluid.Program()
test_prog = fluid.Program()
pred, logit = build_model(test_prog, startup_prog, phase=ModelPhase.VISUAL)
# Clone forward graph
test_prog = test_prog.clone(for_test=True)
# Generator full colormap for maximum 256 classes
color_map = get_color_map(256)
# Get device environment
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_prog)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
save_dir = os.path.join(vis_dir, 'visual_results')
makedirs(save_dir)
if also_save_raw_results:
raw_save_dir = os.path.join(vis_dir, 'raw_results')
makedirs(raw_save_dir)
fetch_list = [pred.name]
test_reader = dataset.batch(dataset.generator, batch_size=1, is_test=True)
img_cnt = 0
for imgs, img_names, valid_shapes, org_shapes in test_reader:
pred_shape = (imgs.shape[2], imgs.shape[3])
pred, = exe.run(
program=test_prog,
feed={'image': imgs},
fetch_list=fetch_list,
return_numpy=True)
num_imgs = pred.shape[0]
# TODO: use multi-thread to write images
for i in range(num_imgs):
# Add more comments
res_map = np.squeeze(pred[i, :, :, :]).astype(np.uint8)
img_name = img_names[i]
res_shape = (res_map.shape[0], res_map.shape[1])
if res_shape[0] != pred_shape[0] or res_shape[1] != pred_shape[1]:
res_map = cv2.resize(
res_map, pred_shape, interpolation=cv2.INTER_NEAREST)
valid_shape = (valid_shapes[i, 0], valid_shapes[i, 1])
res_map = res_map[0:valid_shape[0], 0:valid_shape[1]]
org_shape = (org_shapes[i, 0], org_shapes[i, 1])
res_map = cv2.resize(
res_map, (org_shape[1], org_shape[0]),
interpolation=cv2.INTER_NEAREST)
png_fn = to_png_fn(img_names[i])
if also_save_raw_results:
raw_fn = os.path.join(raw_save_dir, png_fn)
dirname = os.path.dirname(raw_save_dir)
makedirs(dirname)
cv2.imwrite(raw_fn, res_map)
# colorful segment result visualization
vis_fn = os.path.join(save_dir, png_fn)
dirname = os.path.dirname(vis_fn)
makedirs(dirname)
pred_mask = colorize(res_map, org_shapes[i], color_map)
cv2.imwrite(vis_fn, pred_mask)
img_cnt += 1
print("#{} visualize image path: {}".format(img_cnt, vis_fn))
# Use Tensorboard to visualize image
if log_writer is not None:
# Calulate epoch from ckpt_dir folder name
epoch = int(ckpt_dir.split(os.path.sep)[-1])
print("Tensorboard visualization epoch", epoch)
log_writer.add_image(
"Predict/{}".format(img_names[i]),
pred_mask[..., ::-1],
epoch,
dataformats='HWC')
# Original image
# BGR->RGB
img = cv2.imread(
os.path.join(cfg.DATASET.DATA_DIR, img_names[i]))[..., ::-1]
log_writer.add_image(
"Images/{}".format(img_names[i]),
img,
epoch,
dataformats='HWC')
#TODO: add ground truth (label) images
# If in local_test mode, only visualize 5 images just for testing
# procedure
if local_test and img_cnt >= 5:
break
if __name__ == '__main__':
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts is not None:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
visualize(cfg, **args.__dict__)
# 源码编译安装及搭建服务流程
本文将介绍源码编译安装以及在服务搭建流程。
## 1. 系统依赖项
依赖项 | 验证过的版本
-- | --
Linux | Centos 6.10 / 7
CMake | 3.0+
GCC | 4.8.2/5.4.0
Python| 2.7
GO编译器| 1.9.2
openssl| 1.0.1+
bzip2 | 1.0.6+
如果需要使用GPU预测,还需安装以下几个依赖库
GPU库 | 验证过的版本
-- | --
CUDA | 9.2
cuDNN | 7.1.4
nccl | 2.4.7
## 2. 安装依赖项
以下流程在百度云CentOS7.5+CUDA9.2环境下进行。
### 2.1. 安装openssl、Go编译器以及bzip2
```bash
yum -y install openssl openssl-devel golang bzip2-libs bzip2-devel
```
### 2.2. 安装GPU预测的依赖项(如果需要使用GPU预测,必须执行此步骤)
#### 2.2.1. 安装配置CUDA9.2以及cuDNN 7.1.4
该百度云机器已经安装CUDA以及cuDNN,仅需复制相关头文件与链接库
```bash
# 看情况确定是否需要安装 cudnn
# 进入 cudnn 根目录
cd /home/work/cudnn/cudnn7.1.4
# 拷贝头文件
cp include/cudnn.h /usr/local/cuda/include/
# 拷贝链接库
cp lib64/libcudnn* /usr/local/cuda/lib64/
# 修改头文件、链接库访问权限
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
```
#### 2.2.2. 安装nccl库
```bash
# 下载文件 nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
wget -c https://paddlehub.bj.bcebos.com/serving/nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
# 安装nccl的repo
rpm -i nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
# 更新索引
yum -y update
# 安装包
yum -y install libnccl-2.4.7-1+cuda9.2 libnccl-devel-2.4.7-1+cuda9.2 libnccl-static-2.4.7-1+cuda9.2
```
### 2.3. 安装 cmake 3.15
如果机器没有安装cmake或者已安装cmake的版本低于3.0,请执行以下步骤
```bash
# 如果原来的已经安装低于3.0版本的cmake,请先卸载原有低版本 cmake
yum -y remove cmake
# 下载源代码并解压
wget -c https://github.com/Kitware/CMake/releases/download/v3.15.0/cmake-3.15.0.tar.gz
tar xvfz cmake-3.15.0.tar.gz
# 编译cmake
cd cmake-3.15.0
./configure
make -j4
# 安装并检查cmake版本
make install
cmake --version
# 在cmake-3.15.0目录中,将相应的头文件目录(curl目录,为PaddleServing的依赖头文件目录)拷贝到系统include目录下
cp -r Utilities/cmcurl/include/curl/ /usr/include/
```
### 2.4. 为依赖库增加相应的软连接
现在Linux系统中大部分链接库的名称都以版本号作为后缀,如libcurl.so.4.3.0。这种命名方式最大的问题是,CMakeList.txt中find_library命令是无法识别使用这种命名方式的链接库,会导致CMake时候出错。由于本项目是用CMake构建,所以务必保证相应的链接库以 .so 或 .a为后缀命名。解决这个问题最简单的方式就是用创建一个软连接指向相应的链接库。在百度云的机器中,只有curl库的命名方式有问题。所以命令如下:(如果是其他库,解决方法也类似):
```bash
ln -s /usr/lib64/libcurl.so.4.3.0 /usr/lib64/libcurl.so
```
### 2.5. 编译安装PaddleServing
下列步骤介绍CPU版本以及GPU版本的PaddleServing编译安装过程。
```bash
# Step 1. 在~目录下下载paddle-serving代码
cd ~
git clone https://github.com/PaddlePaddle/serving.git
# Step 2. 进入serving目录,创建build目录编译、安装
cd serving
mkdir build
cd build
# Step 3. 以下为生成GPU版本的makefile,生成CPU版本的makefile执行 cmake -DWITH_GPU=OFF ..
cmake -DWITH_GPU=ON -DCUDNN_ROOT=/usr/local/cuda/lib64 ..
# Step 4. nproc 可以输出当前机器的核心数,利用多核进行编译。如果make时候报错退出,可以多执行几次make解决
make -j$(nproc)
# Step 5. 安装
make install
# Step 6. 安装后可以看PaddleServing的目录结构如下
serving
├── build
├── cmake
├── CMakeLists.txt
├── configure
├── CONTRIBUTING.md
├── cube
├── demo-client
├── demo-serving
│ ├── CMakeLists.txt
│ ├── conf # demo-serving 的配置文件目录
│ ├── data # 模型文件以及参数文件的目录
│ ├── op # 数据处理的源文件目录
│ ├── proto # 数据传输的proto文件目录
│ └── scripts
├── doc
├── inferencer-fluid-cpu
├── inferencer-fluid-gpu
├── kvdb
├── LICENSE
├── pdcodegen
├── predictor
├── README.md
├── sdk-cpp
└── tools
```
### 2.6. 安装PaddleSegServing
```bash
# Step 1. 在~目录下下载PaddleSeg代码
git clone http://gitlab.baidu.com/Paddle/PaddleSeg.git
# Step 2. 进入PaddleSeg的serving目录(注意区分PaddleServing的serving目录),并将seg-serving目录复制到PaddleServing的serving目录下
cd PaddleSeg/serving
cp -r seg-serving ~/serving
# 复制后PaddleServing的目录结构如下
serving
├── build
├── cmake
├── CMakeLists.txt
├── configure
├── CONTRIBUTING.md
├── cube
├── demo-client
├── demo-serving
├── doc
├── inferencer-fluid-cpu
├── inferencer-fluid-gpu
├── kvdb
├── LICENSE
├── pdcodegen
├── predictor
├── README.md
├── sdk-cpp
├── seg-serving # 此为新增的目录
└── tools
# Step 3. 修改PaddleServing的serving目录下的CMakeLists.txt
cd ~/serving
vim CMakeLists.txt
# Step 4. 倒数第二行加入代码,使得seg-serving下的代码可与PaddleServing一起编译
add_subdirectory(seg-serving)
# Step 5. 进入PaddleServing的build目录,编译安装PaddleSegServing
cd ~/serving/build
make -j$(nproc)
make install
# Step 6. 完成安装后,可以看到执行文件的目录结构如下
build
├── boost_dummy.c
├── CMakeCache.txt
├── CMakeFiles
├── cmake_install.cmake
├── configure
├── demo-client
├── error
├── human-seg-serving
├── inferencer-fluid-cpu
├── inferencer-fluid-gpu
├── info
├── install_manifest.txt
├── kvdb
├── libboost.a
├── log
├── Makefile
├── output # 所有服务端的执行文件、配置文件、数据文件均安装到此目录下
│ ├── bin
│ ├── demo
│ │ ├── client
│ │ ├── db_func
│ │ ├── db_thread
│ │ ├── seg-serving
│ │ │ └── bin
│ │ │ ├── conf # 配置文件目录
│ │ │ ├── data # 数据模型文件、参数文件目录
│ │ │ ├── seg-serving #可执行文件
│ │ │ ├── kvdb
│ │ │ ├── libiomp5.so
│ │ │ ├── libmklml_gnu.so
│ │ │ ├── libmklml_intel.so
│ │ │ └── log
│ │ ├── kvdb_test
│ │ └── serving
│ ├── include
│ └── lib
├── Paddle
├── pdcodegen
├── predictor
├── sdk-cpp
├── seg-serving
└── third_party
```
## 3. 运行PaddleSegServing
### 3.1. 搭建人脸分割服务
搭建人脸分割服务只需完成一些配置文件的编写即可。与预编译版本的搭建大致相同,但模型文件、参数文件放置的目录略有不同。
#### 3.1.1. 下载人脸分割模型文件,并将其复制到PaddleSegServing相应目录。
可参考[预编译安装流程](./README.md)中2.2.1.1节。模型文件放置的目录在
~/serving/seg-serving/data/model/paddle/fluid/。
#### 3.1.2. 配置参数文件。
可参考[预编译安装流程](./README.md)中2.2.1.2节。配置文件的目录在~/serving/seg-serving/conf。
### 3.2 安装模型文件、配置文件。
```bash
cd ~/serving/build
make install
```
### 3.3 运行服务端程序
可参考[预编译安装流程](./README.md)中2.2.2节。可执行文件在该目录下:~/serving/build/output/demo/seg-serving/bin/。
### 3.4 运行客户端程序进行测试。
可参考[预编译安装流程](./README.md)中2.2.3节。
# PaddleSegServing
## 1.简介
PaddleSegServing是基于PaddleSeg开发的实时图像分割服务的企业级解决方案。用户仅需关注模型本身,无需理解模型模型的加载、预测以及GPU/CPU资源的并发调度等细节操作,通过设置不同的参数配置,即可根据自身的业务需求定制化不同图像分割服务。目前,PaddleSegServing支持人脸分割、城市道路分割、宠物外形分割模型。本文将通过一个人脸分割服务的搭建示例,展示PaddleSeg服务通用的搭建流程。
## 2.预编译版本安装及搭建服务流程
### 2.1. 下载预编译的PaddleSegServing
预编译版本在Centos7.6系统下编译,如果想快速体验PaddleSegServing,可在此系统下下载预编译版本进行安装。预编译版本有两个,一个是针对有GPU的机器,推荐安装GPU版本PaddleSegServing。另一个是CPU版本PaddleServing,针对无GPU的机器。
#### 2.1.1. 下载并解压GPU版本PaddleSegServing
```bash
cd ~
wget -c XXXX/PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
tar xvfz PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
```
#### 2.1.2. 下载并解压CPU版本PaddleSegServing
```bash
cd ~
wget -c XXXX/PaddleSegServing.centos7.6_cuda9.2_cpu.tar.gz
tar xvfz PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
```
解压后的PaddleSegServing目录如下。
```bash
├── seg-serving
└── bin
├── conf # 配置文件目录
├── data # 数据模型文件、参数文件目录
├── seg-serving #可执行文件
├── kvdb
├── libiomp5.so
├── libmklml_gnu.so
├── libmklml_intel.so
└── log
```
### 2.2. 运行PaddleSegServing
本节将介绍如何运行以及测试PaddleSegServing。
#### 2.2.1. 搭建人脸分割服务
搭建人脸分割服务只需完成一些配置文件的编写即可,其他分割服务的搭建流程类似。
##### 2.2.1.1. 下载人脸分割模型文件,并将其复制到相应目录。
```bash
# 下载人脸分割模型
wget -c https://paddleseg.bj.bcebos.com/inference_model/deeplabv3p_xception65_humanseg.tgz
tar xvfz deeplabv3p_xception65_humanseg.tgz
# 安装模型
cp -r deeplabv3p_xception65_humanseg seg-serving/bin/data/model/paddle/fluid
```
##### 2.2.1.2. 配置参数文件
参数文件如,PaddleSegServing仅新增一个配置文件seg_conf.yaml,用来指定具体分割模型的一些参数,如均值、方差、图像尺寸等。该配置文件可在gflags.conf中通过--seg_conf_file指定。
其他配置文件的字段解释可参考以下链接:https://github.com/PaddlePaddle/Serving/blob/develop/doc/SERVING_CONFIGURE.md (TODO:介绍seg_conf.yaml中每个字段的含义)
```bash
conf/
├── gflags.conf
├── model_toolkit.prototxt
├── resource.prototxt
├── seg_conf.yaml
├── service.prototxt
└── workflow.prototxt
```
#### 2.2.2 运行服务端程序
```bash
# 1. 设置环境变量
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib64:$LD_LIBRARY_PATH
# 2. 切换到bin目录,运行服务端程序
cd ~/serving/build/output/demo/seg-serving/bin/
./seg-serving
```
#### 2.2.3.运行客户端程序进行测试 (建议在windows、mac测试,可直接查看分割后的图像)
客户端程序是用Python3编写的,代码简洁易懂,可以通过运行客户端验证服务的正确性以及性能表现。
```bash
# 使用Python3.6,需要安装opencv-python、requests、numpy包(建议安装anaconda)
cd tools
vim image_seg_client.py (修改IMAGE_SEG_URL变量,改成服务端的ip地址)
python3.6 image_seg_client.py
# 当前目录下可以看到生成出分割结果的图片。
```
## 3. 源码编译安装及搭建服务流程 (可选)
源码编译安装时间较长,一般推荐在centos7.6下安装预编译版本进行使用。如果您系统版本非centos7.6或者您想进行二次开发,请点击以下链接查看[源码编译安装流程](./COMPILE_GUIDE.md)
opencv-python
requests
numpy
# TODO:下载data数据
#if (NOT EXISTS
# ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm)
# execute_process(COMMAND wget
# --no-check-certificate https://paddle-serving.bj.bcebos.com/data/text_classification/text_classification_lstm.tar.gz
# --output-document
# ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm.tar.gz)
# execute_process(COMMAND ${CMAKE_COMMAND} -E tar xzf
# "${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm.tar.gz"
# WORKING_DIRECTORY
# ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid
# )
# endif()
include_directories(SYSTEM ${CMAKE_CURRENT_LIST_DIR}/../kvdb/include)
find_library(MKLML_LIBS NAMES libmklml_intel.so libiomp5.so)
include(op/CMakeLists.txt)
include(proto/CMakeLists.txt)
add_executable(seg-serving ${serving_srcs})
add_dependencies(seg-serving pdcodegen fluid_cpu_engine pdserving paddle_fluid
opencv_imgcodecs)
if (WITH_GPU)
add_dependencies(seg-serving fluid_gpu_engine)
endif()
target_include_directories(seg-serving PUBLIC
${CMAKE_CURRENT_BINARY_DIR}/../predictor
)
if(WITH_GPU)
target_link_libraries(seg-serving -Wl,--whole-archive fluid_gpu_engine
-Wl,--no-whole-archive)
endif()
target_link_libraries(seg-serving -Wl,--whole-archive fluid_cpu_engine
-Wl,--no-whole-archive)
target_link_libraries(seg-serving paddle_fluid ${paddle_depend_libs})
target_link_libraries(seg-serving opencv_imgcodecs
${opencv_depend_libs})
target_link_libraries(seg-serving pdserving)
target_link_libraries(seg-serving cube-api)
target_link_libraries(seg-serving kvdb rocksdb)
if(WITH_GPU)
target_link_libraries(seg-serving ${CUDA_LIBRARIES})
endif()
target_link_libraries(seg-serving ${MKLML_LIB} ${MKLML_IOMP_LIB} -lpthread
-lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
install(TARGETS seg-serving
RUNTIME DESTINATION
${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
install(DIRECTORY ${CMAKE_CURRENT_LIST_DIR}/conf DESTINATION
${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
install(DIRECTORY ${CMAKE_CURRENT_LIST_DIR}/data DESTINATION
${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
FILE(GLOB inc ${CMAKE_CURRENT_BINARY_DIR}/*.pb.h)
install(FILES ${inc}
DESTINATION ${PADDLE_SERVING_INSTALL_DIR}/include/seg-serving)
if (${WITH_MKL})
install(FILES ${THIRD_PARTY_PATH}/install/mklml/lib/libmklml_intel.so
${THIRD_PARTY_PATH}/install/mklml/lib/libmklml_gnu.so
${THIRD_PARTY_PATH}/install/mklml/lib/libiomp5.so DESTINATION
${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
endif()
--enable_model_toolkit
--seg_conf_file=./conf/seg_conf.yaml
engines {
name: "human_segmentation"
type: "FLUID_GPU_NATIVE"
reloadable_meta: "./data/model/paddle/fluid_time_file"
reloadable_type: "timestamp_ne"
model_data_path: "./data/model/paddle/fluid/deeplabv3p_xception65_humanseg"
runtime_thread_num: 0
batch_infer_size: 0
enable_batch_align: 0
}
model_toolkit_path: "./conf/"
model_toolkit_file: "model_toolkit.prototxt"
%YAML:1.0
SIZE: [513, 513] # (width, height) crop size for test eval
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
CHANNELS: 3
CLASS_NUM: 2
MODEL_NAME: "human_segmentation"
%YAML:1.0
SIZE: [500, 500] # (width, height) crop size for test eval
MEAN: [127.5, 127.5, 127.5, 127.5]
STD: [1.0, 1.0, 1.0, 1.0]
CHANNELS: 4
CLASS_NUM: 2
MODEL_NAME: "image_segmentation"
services {
name: "ImageSegService"
workflows: "workflow1"
}
workflows {
name: "workflow1"
workflow_type: "Sequence"
nodes {
name: "image_reader_op"
type: "ReaderOp"
}
nodes {
name: "image_seg_op"
type: "ImageSegOp"
dependencies {
name: "image_reader_op"
mode: "RO"
}
}
nodes {
name: "image_writer_op"
type: "WriteJsonOp"
dependencies {
name: "image_seg_op"
mode: "RO"
}
}
}
FILE(GLOB op_srcs ${CMAKE_CURRENT_LIST_DIR}/*.cpp)
LIST(APPEND serving_srcs ${op_srcs})
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <vector>
#include "predictor/framework/infer.h"
#include "predictor/framework/memory.h"
#include "seg-serving/op/image_seg_op.h"
#include "seg-serving/op/reader_op.h"
#include "seg-serving/op/seg_conf.h"
namespace baidu {
namespace paddle_serving {
namespace serving {
using baidu::paddle_serving::image_segmentation::ImageSegResItem;
using baidu::paddle_serving::image_segmentation::ImageSegResponse;
using baidu::paddle_serving::predictor::InferManager;
using baidu::utils::seg_conf::SegConf;
int ImageSegOp::inference() {
const ReaderOutput* reader_out =
get_depend_argument<ReaderOutput>("image_reader_op");
if (!reader_out) {
LOG(ERROR) << "Failed mutable depended argument, op:"
<< "reader_op";
return -1;
}
const TensorVector* in = &reader_out->tensors;
const std::vector<int> *width_vec = &reader_out->width_vec;
const std::vector<int> *height_vec = &reader_out->height_vec;
//debug
for(int i = 0; i < width_vec->size(); ++i){
LOG(INFO) << "width = " << (*width_vec)[i] << ", height = " << (*height_vec)[i];
}
TensorVector* out = butil::get_object<TensorVector>();
if (!out) {
LOG(ERROR) << "Failed get tls output object failed";
return -1;
}
if (in->size() != 1) {
LOG(ERROR) << "Samples should have been packed into a single tensor";
return -1;
}
int batch_size = in->at(0).shape[0];
static const SegConf *sc_ptr = SegConf::instance();
// call paddle fluid model for inferencing
std::string model_name;
sc_ptr->get_model_name(model_name);
LOG(INFO) << "model name = " << model_name;
int ret;
if ((ret = InferManager::instance().infer(
model_name.c_str(), in, out, batch_size))) {
LOG(ERROR) << "Failed do infer in fluid model: "
<< model_name;
return -1;
}
LOG(INFO) << "ret = " << ret;
if (out->size() != in->size()) {
LOG(ERROR) << "Output size not eq input size: " << in->size()
<< out->size();
return -1;
}
// copy output tensor into response
ImageSegResponse* res = mutable_data<ImageSegResponse>();
const paddle::PaddleTensor& out_tensor = (*out)[0];
int sample_size = out_tensor.shape[0];
uint32_t total_size = 1;
for (int i = 0; i < out_tensor.shape.size(); ++i) {
total_size *= out_tensor.shape[i];
}
LOG(INFO) << "total_size = " << total_size;
uint32_t item_size = total_size / sample_size;
for (uint32_t si = 0; si < sample_size; si++) {
ImageSegResItem* ins = res->add_item();
// res->add_width((*width_vec)[si]);
// res->add_height((*height_vec)[si]);
if (!ins) {
LOG(ERROR) << "Failed append new out tensor";
return -1;
}
// assign output data
float* data = reinterpret_cast<float*>(out_tensor.data.data() +
si * sizeof(float) * item_size);
std::vector<int> size_vec;
sc_ptr->get_size_vector(size_vec);
int width = size_vec[0];
int height = size_vec[1];
int class_num;
sc_ptr->get_class_num(class_num);
LOG(INFO) << "width = " << width << ", height = " << height << ", class_num = " << class_num;
uint32_t out_size = width * height;
mask_raw.clear();
mask_raw.resize(out_size);
for (uint32_t di = 0; di < out_size; ++di) {
float max_value = -1;
int label = 0;
for (int j = 0; j < class_num; ++j) {
int index = di + j * out_size;
if (index >= class_num * width * height) {
break;
}
float value = data[index];
if (value > max_value){
max_value = value;
label = j;
}
}
if (label == 0) max_value = 0;
mask_raw[di] = label;
}
cv::Mat mask_mat = cv::Mat(height, width, CV_8UC1);
mask_mat.data = mask_raw.data();
cv::Mat mask_temp_mat((*height_vec)[si], (*width_vec)[si], mask_mat.type());
//Size(cols, rows)
cv::resize(mask_mat, mask_temp_mat, mask_temp_mat.size());
// cv::resize(mask_mat, mask_temp_mat, cv::Size((*width_vec)[si], (*height_vec)[si]));
std::vector<uchar> mat_buff;
cv::imencode(".png", mask_temp_mat, mat_buff);
ins->set_mask(mat_buff.data(), mat_buff.size());
}
// release out tensor object resource
size_t out_size = out->size();
for (size_t oi = 0; oi < out_size; ++oi) {
(*out)[oi].shape.clear();
}
out->clear();
butil::return_object<TensorVector>(out);
return 0;
}
DEFINE_OP(ImageSegOp);
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <vector>
#include "paddle/fluid/inference/paddle_inference_api.h"
#include "seg-serving/image_seg.pb.h"
namespace baidu {
namespace paddle_serving {
namespace serving {
// rename
static const char* IMAGE_CLASSIFICATION_MODEL_NAME =
"image_seg_deeplabv3p";
class ImageSegOp : public baidu::paddle_serving::predictor::OpWithChannel<
baidu::paddle_serving::image_segmentation::
ImageSegResponse> {
public:
typedef std::vector<paddle::PaddleTensor> TensorVector;
DECLARE_OP(ImageSegOp);
int inference();
private:
std::vector<unsigned char> mask_raw;
};
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include "predictor/framework/memory.h"
#include "seg-serving/op/reader_op.h"
#include "seg-serving/op/seg_conf.h"
namespace baidu {
namespace paddle_serving {
namespace serving {
using baidu::paddle_serving::predictor::MempoolWrapper;
using baidu::paddle_serving::image_segmentation::Request;
using baidu::paddle_serving::image_segmentation::ImageSegReqItem;
using baidu::utils::seg_conf::SegConf;
int ReaderOp::inference() {
const Request* req = dynamic_cast<const Request*>(get_request_message());
// LOG(INFO) << "Receive request in dense service:" << req->ShortDebugString();
ReaderOutput* res = mutable_data<ReaderOutput>();
if (!res) {
LOG(ERROR) << "Failed get op tls reader object output";
return -1;
}
TensorVector* in = &res->tensors;
uint32_t batch_size = req->instances_size();
if (batch_size <= 0) {
LOG(WARNING) << "No instances need to inference!";
return -1;
}
static const SegConf *sc_ptr = SegConf::instance();
std::vector<double> pmean;
if(sc_ptr->get_mean_vector(pmean) != 0) {
LOG(ERROR) << "Can't load the mean items";
return -1;
}
std::vector<double> scale;
if(sc_ptr->get_std_vector(scale) != 0) {
LOG(ERROR) << "Can't load the scale items";
return -1;
}
std::vector<int> iresize;
if(sc_ptr->get_size_vector(iresize) != 0) {
LOG(ERROR) << "Can't load size vector";
return -1;
}
int channels;
if(sc_ptr->get_channels(channels) != 0) {
LOG(ERROR) << "Can't load channels";
return -1;
}
//bool enable_crop = SegConf._enable_crop;
cv::Size resize;
resize.height = iresize[1];
resize.width = iresize[0];
paddle::PaddleTensor in_tensor;
in_tensor.name = "image";
in_tensor.dtype = paddle::FLOAT32;
// shape assignment
in_tensor.shape.push_back(batch_size); // batch_size
in_tensor.shape.push_back(channels);
in_tensor.shape.push_back(resize.height);
in_tensor.shape.push_back(resize.width);
// tls resource assignment
size_t dense_capacity = channels * resize.width * resize.height;
size_t len = dense_capacity * sizeof(float) * batch_size;
// Allocate buffer in PaddleTensor, so that buffer will be managed by the
// Tensor
in_tensor.data.Resize(len);
float* data = reinterpret_cast<float*>(in_tensor.data.data());
if (in_tensor.data.data() == NULL) {
LOG(ERROR) << "Failed create temp float array, "
<< "size=" << dense_capacity * batch_size * sizeof(float);
return -1;
}
std::vector<int> *in_width_vec = &res->width_vec;
std::vector<int> *in_height_vec = &res->height_vec;
for (uint32_t si = 0; si < batch_size; si++) {
// parse image object from x-image
const ImageSegReqItem& ins = req->instances(si);
// read dense image from request bytes
const char* binary = ins.image_binary().c_str();
size_t length = ins.image_length();
if (length == 0) {
LOG(ERROR) << "Empty image, length is 0";
return -1;
}
_image_vec_tmp.clear();
_image_vec_tmp.assign(binary, binary + length);
_image_8u_tmp = cv::imdecode(cv::Mat(_image_vec_tmp),
CV_LOAD_IMAGE_UNCHANGED);
if (_image_8u_tmp.data == NULL) {
LOG(ERROR) << "Image decode failed!";
return -1;
}
// accumulate length
const int HH = _image_8u_tmp.rows;
const int WW = _image_8u_tmp.cols;
const int CC = _image_8u_tmp.channels();
//HH: cols WW:rows
in_width_vec->push_back(HH);
in_height_vec->push_back(WW);
// resize/crop
if (_image_8u_tmp.cols != resize.width ||
_image_8u_tmp.rows != resize.height) {
// int short_egde = std::min<int>(_image_8u_tmp.cols, _image_8u_tmp.rows);
// int yy = static_cast<int>((_image_8u_tmp.rows - short_egde) / 2);
// int xx = static_cast<int>((_image_8u_tmp.cols - short_egde) / 2);
// _image_8u_tmp =
// cv::Mat(_image_8u_tmp, cv::Rect(xx, yy, short_egde, short_egde));
// if (_image_8u_tmp.cols != resize.width ||
// _image_8u_tmp.rows != resize.height) {
cv::Mat resize_image;
// cv::resize(_image_8u_tmp, resize_image, resize);
// _image_8u_tmp = resize_image;
// }
//
cv::resize(_image_8u_tmp, resize_image, resize);
_image_8u_tmp = resize_image;
LOG(INFO) << "Succ crop one image[CHW=" << _image_8u_tmp.channels()
<< ", " << _image_8u_tmp.cols << ", " << _image_8u_tmp.rows
<< "]"
<< " from image[CHW=" << CC << ", " << HH << ", " << WW << "]";
}
// BGR->RGB transformer
//cv::cvtColor(_image_8u_tmp, _image_8u_rgb, cv::COLOR_GRAY2BGR);
_image_8u_rgb = _image_8u_tmp;
const int H = _image_8u_rgb.rows;
const int W = _image_8u_rgb.cols;
const int C = _image_8u_rgb.channels();
if (H != resize.height || W != resize.width || C != channels) {
LOG(ERROR) << "Image " << si << " has incompitable size (" << H << ", " << W << "," << C << ")";
return -1;
}
LOG(INFO) << "Succ read one image, C: " << C << ", W: " << W
<< ", H: " << H;
float* data_ptr = data + dense_capacity * si;
for (int h = 0; h < H; h++) {
// p points to a new line
unsigned char* p = _image_8u_rgb.ptr<unsigned char>(h);
for (int w = 0; w < W; w++) {
for (int c = 0; c < C; c++) {
// HWC(row,column,channel) -> CWH
data_ptr[W * H * c + W * h + w] =
(p[C * w + c] - pmean[c]) / scale[c];
//HWC->CHW
//data_ptr[W * H * c + w * H + h] =
// (p[C * w + c] - pmean[c]) / scale[c];
}
}
}
}
in->push_back(in_tensor);
return 0;
}
DEFINE_OP(ReaderOp);
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
// stl
#include <string>
#include <vector>
// paddle inference
#include "paddle/fluid/inference/paddle_inference_api.h"
#include "predictor/builtin_format.pb.h"
#include "predictor/common/inner_common.h"
#include "predictor/framework/channel.h"
#include "predictor/framework/op_repository.h"
#include "predictor/op/op.h"
// opencv
#include "opencv/cv.h"
#include "opencv/cv.hpp"
#include "opencv/cxcore.h"
#include "opencv/highgui.h"
// project related
#include "seg-serving/image_seg.pb.h"
namespace baidu {
namespace paddle_serving {
namespace serving {
struct ReaderOutput {
std::vector<paddle::PaddleTensor> tensors;
std::vector<int> width_vec;
std::vector<int> height_vec;
void Clear() {
size_t tensor_count = tensors.size();
for (size_t ti = 0; ti < tensor_count; ++ti) {
tensors[ti].shape.clear();
}
tensors.clear();
width_vec.clear();
height_vec.clear();
}
std::string ShortDebugString() const { return "Not implemented!"; }
};
class ReaderOp
: public baidu::paddle_serving::predictor::OpWithChannel<ReaderOutput> {
public:
typedef std::vector<paddle::PaddleTensor> TensorVector;
DECLARE_OP(ReaderOp);
int inference();
private:
cv::Mat _image_8u_tmp;
cv::Mat _image_8u_rgb;
std::vector<char> _image_vec_tmp;
};
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
#include "seg_conf.h"
DEFINE_string(seg_conf_file, "conf/seg_conf.yaml", "seg configuration filename");
namespace baidu{
namespace utils{
namespace seg_conf{
SegConf::SegConf(const std::string &configuration_filename) {
std::cout << "filename: " << configuration_filename << std::endl;
try{
if(!_seg_conf_file.open(configuration_filename, cv::FileStorage::READ)){
std::cout << "Configuration file open error!" << std::endl;
}
} catch(...){
std::cout << "error" << std::endl;
}
}
SegConf::~SegConf(){
_seg_conf_file.release();
}
bool SegConf::get_item_by_name(const std::string &conf_node_name, cv::FileNode &return_file_node) const{
return_file_node = _seg_conf_file[conf_node_name];
if(return_file_node.isNone()) {
std::cout << "You haven't configure this item" << std::endl;
return false;
}
return true;
}
int SegConf::get_mean_vector(std::vector<double> &mean_vec) const {
return get_array_from_file_node("MEAN", mean_vec);
}
int SegConf::get_std_vector(std::vector<double> &std_vec) const{
return get_array_from_file_node("STD", std_vec);
}
int SegConf::get_size_vector(std::vector<int> &size_vec) const{
return get_array_from_file_node("SIZE", size_vec);
}
int SegConf::get_channels(int &channels) const{
return get_scalar_from_file_node("CHANNELS", channels);
}
int SegConf::get_class_num(int &class_num) const {
return get_scalar_from_file_node("CLASS_NUM", class_num);
}
int SegConf::get_model_name(std::string &name) const {
return get_scalar_from_file_node("MODEL_NAME", name);
}
const SegConf* SegConf::instance() {
//lock
static const SegConf s_seg_conf_instance(FLAGS_seg_conf_file);
return &s_seg_conf_instance;
}
//SegConf SegConf::s_seg_conf_instance(FLAGS_seg_conf_file);
} //seg_conf
} //utils
} //baidu
#ifndef SRC_SEG_CONF_H
#define SRC_SEG_CONF_H
#include <string>
#include <vector>
#include "opencv2/opencv.hpp"
#include "gflags/gflags.h"
DECLARE_string(seg_conf_file);
namespace baidu{
namespace utils{
namespace seg_conf{
class SegConf{
private:
explicit SegConf(const std::string &configuration_filename);
public:
static const SegConf *instance();
bool get_item_by_name(const std::string &conf_node_name, cv::FileNode &return_file_node) const;
int get_mean_vector(std::vector<double> &mean_vec) const;
int get_std_vector(std::vector<double> &std_vec) const;
int get_size_vector(std::vector<int> &size_vec) const;
int get_channels(int &channels) const;
int get_class_num(int &class_num) const;
int get_model_name(std::string &name) const;
~SegConf();
private:
cv::FileStorage _seg_conf_file;
//static SegConf s_seg_conf_instance;
template <typename T>
int get_array_from_file_node(const std::string &conf_node_name, std::vector<T> &vec) const{
cv::FileNode node;
if(!get_item_by_name(conf_node_name, node) && !node.isSeq()) {
return -1;
}
//node >> vec;
cv::FileNodeIterator start_file_node_iter = node.begin();
cv::FileNodeIterator end_file_node_iter = node.end();
for(cv::FileNodeIterator it = start_file_node_iter; it != end_file_node_iter; ++it) {
vec.push_back(static_cast<T>(*it));
}
return 0;
}
template<typename T>
int get_scalar_from_file_node(const std::string &conf_node_name, T &scalar) const{
cv::FileNode node;
if(!get_item_by_name(conf_node_name, node) && !(node.isReal() || node.isInt() || node.isString())) {
return -1;
}
node >> scalar;
return 0;
}
};
} //seg_conf
} //utils
} //baidu
#endif
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <string>
#include <google/protobuf/text_format.h>
#include "predictor/framework/memory.h"
#include "json2pb/pb_to_json.h"
#include "seg-serving/op/write_json_op.h"
namespace baidu {
namespace paddle_serving {
namespace predictor {
using json2pb::ProtoMessageToJson;
using baidu::paddle_serving::image_segmentation::ImageSegResponse;
using baidu::paddle_serving::image_segmentation::ResponseItem;
using baidu::paddle_serving::image_segmentation::Response;
int WriteJsonOp::inference() {
const ImageSegResponse* seg_out =
get_depend_argument<ImageSegResponse>("image_seg_op");
if (!seg_out) {
LOG(ERROR) << "Failed mutable depended argument, op:"
<< "image_seg_op";
return -1;
}
Response* res = mutable_data<Response>();
if (!res) {
LOG(ERROR) << "Failed mutable output response in op:"
<< "WriteJsonOp";
return -1;
}
// transfer seg output message into json format
std::string err_string;
uint32_t batch_size = seg_out->item_size();
LOG(INFO) << "batch_size = " << batch_size;
LOG(INFO) << seg_out->ShortDebugString();
for (uint32_t si = 0; si < batch_size; si++) {
ResponseItem* ins = res->add_prediction();
//LOG(INFO) << "Original image width = " << seg_out->width(si) << ", height = " << seg_out->height(si);
if (!ins) {
LOG(ERROR) << "Failed add one prediction ins";
return -1;
}
std::string* text = ins->mutable_info();
if (!ProtoMessageToJson(seg_out->item(si), text, &err_string)) {
LOG(ERROR) << "Failed convert message["
<< seg_out->item(si).ShortDebugString()
<< "], err: " << err_string;
return -1;
}
}
return 0;
}
DEFINE_OP(WriteJsonOp);
} // namespace predictor
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "predictor/common/inner_common.h"
#include "predictor/framework/channel.h"
#include "predictor/framework/op_repository.h"
#include "predictor/op/op.h"
#include "seg-serving/image_seg.pb.h"
namespace baidu {
namespace paddle_serving {
namespace predictor {
class WriteJsonOp
: public OpWithChannel<
baidu::paddle_serving::image_segmentation::Response> {
public:
DECLARE_OP(WriteJsonOp);
int inference();
};
} // namespace predictor
} // namespace paddle_serving
} // namespace baiu
LIST(APPEND protofiles
${CMAKE_CURRENT_LIST_DIR}/image_seg.proto
)
PROTOBUF_GENERATE_SERVING_CPP(TRUE PROTO_SRCS PROTO_HDRS ${protofiles})
LIST(APPEND serving_srcs ${PROTO_SRCS})
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
import "pds_option.proto";
package baidu.paddle_serving.image_segmentation;
option cc_generic_services = true;
message ImageSegReqItem {
required bytes image_binary = 1;
required uint32 image_length = 2;
};
message ImageSegResItem {
required bytes mask = 1;
};
message ImageSegResponse {
repeated ImageSegResItem item = 1;
// repeated int32 width = 2;
// repeated int32 height = 3;
};
message Request {
repeated ImageSegReqItem instances = 1;
};
message ResponseItem {
required string info = 1;
};
message Response {
repeated ResponseItem prediction = 1;
};
service ImageSegService {
rpc inference(Request) returns (Response);
rpc debug(Request) returns (Response);
option (pds.options).generate_impl = true;
};
/home/work/image-class/bin/image_class --workflow_path=/home/work/image-class/conf/ --inferservice_path=/home/work/image-class/conf/ --logger_path=/home/work/image-class/conf/ --resource_path=/home/work/image-class/conf/
# coding: utf-8
import sys
import cv2
import requests
import json
import base64
import numpy as np
import time
import threading
#分割服务的地址
#IMAGE_SEG_URL = 'http://yq01-gpu-151-23-00.epc:8010/ImageSegService/inference'
#IMAGE_SEG_URL = 'http://106.12.25.202:8010/ImageSegService/inference'
IMAGE_SEG_URL = 'http://180.76.118.53:8010/ImageSegService/inference'
# 请求预测服务
# input_img 要预测的图片列表
def get_item_json(input_img):
with open(input_img, mode="rb") as fp:
# 使用 http 协议请求服务时, 请使用 base64 编码发送图片
item_binary_b64 = str(base64.b64encode(fp.read()), 'utf-8')
item_size = len(item_binary_b64)
item_json = {
"image_length": item_size,
"image_binary": item_binary_b64
}
return item_json
def request_predictor_server(input_img_list, dir_name):
data = {"instances" : [get_item_json(dir_name + input_img) for input_img in input_img_list]}
response = requests.post(IMAGE_SEG_URL, data=json.dumps(data))
try:
response = json.loads(response.text)
prediction_list = response["prediction"]
mask_response_list = [mask_response["info"] for mask_response in prediction_list]
mask_raw_list = [json.loads(mask_response)["mask"] for mask_response in mask_response_list]
except Exception as err:
print ("Exception[%s], server_message[%s]" % (str(err), response.text))
return None
# 使用 json 协议回复的包也是 base64 编码过的
mask_binary_list = [base64.b64decode(mask_raw) for mask_raw in mask_raw_list]
m = [np.fromstring(mask_binary, np.uint8) for mask_binary in mask_binary_list]
return m
# 对预测结果进行可视化
# input_raw_mask 是server返回的预测结果
# output_img 是可视化结果存储路径
def visualization(mask_mat, output_img):
# ColorMap for visualization more clearly
color_map = [[128, 64, 128],
[244, 35, 231],
[69, 69, 69],
[102, 102, 156],
[190, 153, 153],
[153, 153, 153],
[250, 170, 29],
[219, 219, 0],
[106, 142, 35],
[152, 250, 152],
[69, 129, 180],
[219, 19, 60],
[255, 0, 0],
[0, 0, 142],
[0, 0, 69],
[0, 60, 100],
[0, 79, 100],
[0, 0, 230],
[119, 10, 32]]
im = cv2.imdecode(mask_mat, 1)
w, h, c = im.shape
im2 = cv2.resize(im, (w, h))
im = im2
for i in range(0, h):
for j in range(0, w):
im[i, j] = color_map[im[i, j, 0]]
cv2.imwrite(output_img, im)
#benchmark test
def benchmark_test(batch_size, img_list):
start = time.time()
total_size = len(img_list)
for i in range(0, total_size, batch_size):
mask_mat_list = request_predictor_server(img_list[i : np.min([i + batch_size, total_size])], "images/")
# 将获得的mask matrix转换成可视化图像,并在当前目录下保存为图像文件
# 如果进行压测,可以把这句话注释掉
# for j in range(len(mask_mat_list)):
# visualization(mask_mat_list[j], img_list[j + i])
latency = time.time() - start
print("batch size = %d, total latency = %f s" % (batch_size, latency))
class ClientThread(threading.Thread):
def __init__(self, thread_id, batch_size):
threading.Thread.__init__(self)
self.__thread_id = thread_id
self.__batch_size = batch_size
def run(self):
self.request_image_seg_service(3)
def request_image_seg_service(self, imgs_num):
total_size = imgs_num
img_list = [str(i + 1) + ".jpg" for i in range(total_size)]
# batch_size_list = [2**i for i in range(0, 4)]
# 持续发送150个请求
batch_size_list = [self.__batch_size] * 150
i = 1
for batch_size in batch_size_list:
print("Epoch %d, thread %d" % (i, self.__thread_id))
i += 1
benchmark_test(batch_size, img_list)
def create_thread_pool(thread_num, batch_size):
return [ClientThread(i + 1, batch_size) for i in range(thread_num)]
def run_threads(thread_pool):
for thread in thread_pool:
thread.start()
for thread in thread_pool:
thread.join()
if __name__ == "__main__":
thread_pool = create_thread_pool(thread_num=2, batch_size=1)
run_threads(thread_pool)
EVAL_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (1536, 576) # (width, height), for unpadding
INF_RESIZE_VALUE: 1280 # for rangescaling
MAX_RESIZE_VALUE: 1024 # for rangescaling
MIN_RESIZE_VALUE: 1536 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 1
MEAN: [127.5, 127.5, 127.5]
STD: [127.5, 127.5, 127.5]
DATASET:
DATA_DIR: "./dataset/line/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: "./dataset/line/test_list.txt"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
SAVE_DIR: "line_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "mobilenet"
TEST:
TEST_MODEL: "./test/models/line/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/line_v4/"
PRETRAINED_MODEL: "./models/deeplabv3p_mobilenetv2_init/"
RESUME: False
SNAPSHOT_EPOCH: 40
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
SNAPSHOT: 10
EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 4
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
VIS_FILE_LIST: "dataset/cityscapes/vis.list"
SEPARATOR: " "
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
MODEL:
DEFAULT_NORM_TYPE: "gn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
ASPP_WITH_SEP_CONV: True
DECODER_USE_SEP_CONV: True
TEST:
TEST_MODEL: "snapshots/cityscape_v5/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_v5/"
PRETRAINED_MODEL: "pretrain/deeplabv3plus_gn_init"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 700
TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (513, 513) # (width, height), for unpadding
INF_RESIZE_VALUE: 513 # for rangescaling
MAX_RESIZE_VALUE: 400 # for rangescaling
MIN_RESIZE_VALUE: 513 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: True
ASPECT_RATIO: 0
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 45
MIN_AREA_RATIO: 0
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 24
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: u"./data/humanseg/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 2
TEST_FILE_LIST: u"data/humanseg/list/val.txt"
TRAIN_FILE_LIST: u"data/humanseg/list/train.txt"
VAL_FILE_LIST: u"data/humanseg/list/val.txt"
IGNORE_INDEX: 255
SEPARATOR: "|"
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
SAVE_DIR: "human_freeze_model"
MODEL:
DEFAULT_NORM_TYPE: u"bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "xception_65"
TEST:
TEST_MODEL: "snapshots/humanseg/aic_v2/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/humanseg/aic_v2/"
PRETRAINED_MODEL: "pretrain/xception65_pretrained/"
RESUME: False
SNAPSHOT_EPOCH: 5
SOLVER:
LR: 0.1
NUM_EPOCHS: 40
LR_POLICY: "poly"
OPTIMIZER: "sgd"
EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: u"stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 10
#MEAN: [104.008, 116.669, 122.675]
#STD: [1.0, 1.0, 1.0]
MEAN: [127.5, 127.5, 127.5]
STD: [127.5, 127.5, 127.5]
DATASET:
DATA_DIR: "./data/COCO2014/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 21
TEST_FILE_LIST: "data/COCO2014/ImageSets/val.txt"
TRAIN_FILE_LIST: "data/COCO2014/ImageSets/train.txt"
VAL_FILE_LIST: "data/COCO2014/ImageSets/val.txt"
SEPARATOR: "|"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "unet"
UNET:
UPSAMPLE_MODE: "bilinear"
TEST:
TEST_MODEL: "snapshots/coco_v1/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/coco_v1/"
PRETRAINED_MODEL: ""
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.025
WEIGHT_DECAY: 0.00004
NUM_EPOCHS: 50
LR_POLICY: "piecewise"
OPTIMIZER: "Adam"
DECAY_EPOCH: "20,35,45"
TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 1.25 # for stepscaling
MIN_SCALE_FACTOR: 0.75 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
FLIP: True
FLIP_RATIO: 0.2
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 6
MEAN: [104.008, 116.669, 122.675]
STD: [1.0, 1.0, 1.0]
DATASET:
DATA_DIR: "./dataset/pet/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 4 # including ignore
TEST_FILE_LIST: "./dataset/pet/test_list.txt"
TRAIN_FILE_LIST: "./dataset/pet/train_list.txt"
VAL_FILE_LIST: "./dataset/pet/val_list.txt"
VIS_FILE_LIST: "./dataset/pet/val_list.txt"
IGNORE_INDEX: 255
SEPARATOR: " "
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
MODEL:
MODEL_NAME: "unet"
DEFAULT_NORM_TYPE: "bn"
TEST:
TEST_MODEL: "./test/saved_model/unet_pet/final/"
TRAIN:
MODEL_SAVE_DIR: "./test/saved_models/unet_pet/"
PRETRAINED_MODEL: "./test/models/unet_coco/"
RESUME: False
SNAPSHOT_EPOCH: 10
SOLVER:
NUM_EPOCHS: 500
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "adam"
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from test_utils import download_file_and_uncompress, train, eval, vis, export_model
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
DATASET_PATH = os.path.join(LOCAL_PATH, "..", "dataset")
MODEL_PATH = os.path.join(LOCAL_PATH, "models")
def download_cityscapes_dataset(savepath, extrapath):
url = "https://paddleseg.bj.bcebos.com/dataset/cityscapes.tar"
download_file_and_uncompress(
url=url, savepath=savepath, extrapath=extrapath)
def download_deeplabv3p_xception65_cityscapes_model(savepath, extrapath):
url = "https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_cityscapes.tgz"
download_file_and_uncompress(
url=url, savepath=savepath, extrapath=extrapath)
if __name__ == "__main__":
download_cityscapes_dataset(".", DATASET_PATH)
download_deeplabv3p_xception65_cityscapes_model(".", MODEL_PATH)
model_name = "deeplabv3p_xception65_cityscapes"
test_model = os.path.join(LOCAL_PATH, "models", model_name)
cfg = os.path.join(LOCAL_PATH, "configs", "{}.yaml".format(model_name))
freeze_save_dir = os.path.join(LOCAL_PATH, "inference_model", model_name)
vis_dir = os.path.join(LOCAL_PATH, "visual", model_name)
saved_model = os.path.join(LOCAL_PATH, "saved_model", model_name)
devices = ['0']
export_model(
flags=["--cfg", cfg],
options=[
"TEST.TEST_MODEL", test_model, "FREEZE.SAVE_DIR", freeze_save_dir
],
devices=devices)
# Final eval results should be #image=500 acc=0.9615 IoU=0.7804
eval(
flags=["--cfg", cfg, "--use_gpu"],
options=["TEST.TEST_MODEL", test_model],
devices=devices)
vis(flags=["--cfg", cfg, "--use_gpu", "--local_test", "--vis_dir", vis_dir],
options=["TEST.TEST_MODEL", test_model],
devices=devices)
train(
flags=["--cfg", cfg, "--use_gpu", "--log_steps", "10"],
options=[
"SOLVER.NUM_EPOCHS", "1", "TRAIN.PRETRAINED_MODEL", test_model,
"TRAIN.MODEL_SAVE_DIR", saved_model
],
devices=devices)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from test_utils import download_file_and_uncompress, train, eval, vis, export_model
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
DATASET_PATH = os.path.join(LOCAL_PATH, "..", "dataset")
MODEL_PATH = os.path.join(LOCAL_PATH, "models")
def download_pet_dataset(savepath, extrapath):
url = "https://paddleseg.bj.bcebos.com/dataset/mini_pet.zip"
download_file_and_uncompress(
url=url, savepath=savepath, extrapath=extrapath)
def download_unet_coco_model(savepath, extrapath):
url = "https://bj.bcebos.com/v1/paddleseg/models/unet_coco_init.tgz"
download_file_and_uncompress(
url=url, savepath=savepath, extrapath=extrapath)
if __name__ == "__main__":
download_pet_dataset(LOCAL_PATH, DATASET_PATH)
download_unet_coco_model(LOCAL_PATH, MODEL_PATH)
model_name = "unet_pet"
test_model = os.path.join(LOCAL_PATH, "models", "unet_coco_init")
cfg = os.path.join(LOCAL_PATH, "..", "configs",
"{}.yaml".format(model_name))
freeze_save_dir = os.path.join(LOCAL_PATH, "inference_model", model_name)
vis_dir = os.path.join(LOCAL_PATH, "visual", model_name)
saved_model = os.path.join(LOCAL_PATH, "saved_model", model_name)
devices = ['0']
train(
flags=["--cfg", cfg, "--use_gpu", "--log_steps", "10"],
options=[
"SOLVER.NUM_EPOCHS", "1", "TRAIN.PRETRAINED_MODEL", test_model,
"TRAIN.MODEL_SAVE_DIR", saved_model, "DATASET.TRAIN_FILE_LIST",
os.path.join(DATASET_PATH, "mini_pet", "file_list",
"train_list.txt"), "DATASET.VAL_FILE_LIST",
os.path.join(DATASET_PATH, "mini_pet", "file_list",
"val_list.txt"), "DATASET.TEST_FILE_LIST",
os.path.join(DATASET_PATH, "mini_pet", "file_list",
"test_list.txt"), "DATASET.DATA_DIR",
os.path.join(DATASET_PATH, "mini_pet"), "BATCH_SIZE", "1"
],
devices=devices)
eval(
flags=["--cfg", cfg, "--use_gpu"],
options=[
"TEST.TEST_MODEL",
os.path.join(saved_model, "final"), "DATASET.VAL_FILE_LIST",
os.path.join(DATASET_PATH, "mini_pet", "file_list", "val_list.txt"),
"DATASET.DATA_DIR",
os.path.join(DATASET_PATH, "mini_pet")
],
devices=devices)
vis(flags=["--cfg", cfg, "--use_gpu", "--local_test", "--vis_dir", vis_dir],
options=[
"DATASET.TEST_FILE_LIST",
os.path.join(DATASET_PATH, "mini_pet", "file_list",
"test_list.txt"), "DATASET.DATA_DIR",
os.path.join(DATASET_PATH, "mini_pet"), "TEST.TEST_MODEL",
os.path.join(saved_model, "final")
],
devices=devices)
export_model(
flags=["--cfg", cfg],
options=[
"TEST.TEST_MODEL",
os.path.join(saved_model, "final"), "FREEZE.SAVE_DIR",
freeze_save_dir
],
devices=devices)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import shutil
import requests
import sys
import tarfile
import zipfile
import platform
lasttime = time.time()
FLUSH_INTERVAL = 0.1
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
PDSEG_PATH = os.path.join(LOCAL_PATH, "..", "pdseg")
def get_platform():
return platform.platform()
def is_windows():
return get_platform().lower().startswith("windows")
def progress(str, end=False):
global lasttime
if end:
str += "\n"
lasttime = 0
if time.time() - lasttime >= FLUSH_INTERVAL:
sys.stdout.write("\r%s" % str)
lasttime = time.time()
sys.stdout.flush()
def _download_file(url, savepath, print_progress):
r = requests.get(url, stream=True)
total_length = r.headers.get('content-length')
if total_length is None:
with open(savepath, 'wb') as f:
shutil.copyfileobj(r.raw, f)
else:
with open(savepath, 'wb') as f:
dl = 0
total_length = int(total_length)
starttime = time.time()
if print_progress:
print("Downloading %s" % os.path.basename(savepath))
for data in r.iter_content(chunk_size=4096):
dl += len(data)
f.write(data)
if print_progress:
done = int(50 * dl / total_length)
progress("[%-50s] %.2f%%" %
('=' * done, float(dl / total_length * 100)))
if print_progress:
progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
def _uncompress_file(filepath, extrapath, delete_file, print_progress):
if print_progress:
print("Uncompress %s" % os.path.basename(filepath))
if filepath.endswith("zip"):
handler = _uncompress_file_zip
else:
handler = _uncompress_file_tar
for total_num, index in handler(filepath, extrapath):
if print_progress:
done = int(50 * float(index) / total_num)
progress(
"[%-50s] %.2f%%" % ('=' * done, float(index / total_num * 100)))
if print_progress:
progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
if delete_file:
os.remove(filepath)
def _uncompress_file_zip(filepath, extrapath):
files = zipfile.ZipFile(filepath, 'r')
filelist = files.namelist()
total_num = len(filelist)
for index, file in enumerate(filelist):
files.extract(file, extrapath)
yield total_num, index
files.close()
yield total_num, index
def _uncompress_file_tar(filepath, extrapath):
files = tarfile.open(filepath, "r:gz")
filelist = files.getnames()
total_num = len(filelist)
for index, file in enumerate(filelist):
files.extract(file, extrapath)
yield total_num, index
files.close()
yield total_num, index
def download_file_and_uncompress(url,
savepath=None,
extrapath=None,
print_progress=True,
cover=False,
delete_file=True):
if savepath is None:
savepath = "."
if extrapath is None:
extrapath = "."
savename = url.split("/")[-1]
savepath = os.path.join(savepath, savename)
extraname = ".".join(savename.split(".")[:-1])
extraname = os.path.join(extrapath, extraname)
if cover:
if os.path.exists(savepath):
shutil.rmtree(savepath)
if os.path.exists(extraname):
shutil.rmtree(extraname)
if not os.path.exists(extraname):
if not os.path.exists(savepath):
_download_file(url, savepath, print_progress)
_uncompress_file(savepath, extrapath, delete_file, print_progress)
def _pdseg(command, flags, options, devices):
script = "{}{}{}.py".format(PDSEG_PATH, os.sep, command)
flags = " ".join(flags)
options = " ".join(options)
if is_windows():
set_cuda_command = "set CUDA_VISIBLE_DEVICES={}".format(
",".join(devices))
else:
set_cuda_command = "export CUDA_VISIBLE_DEVICES={}".format(
",".join(devices))
cmd = "{} && python {} {} {}".format(set_cuda_command, script, flags,
options)
print(cmd)
os.system(cmd)
def train(flags, options, devices):
_pdseg("train", flags, options, devices)
def eval(flags, options, devices):
_pdseg("eval", flags, options, devices)
def vis(flags, options, devices):
_pdseg("vis", flags, options, devices)
def export_model(flags, options, devices):
_pdseg("export_model", flags, options, devices)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册