提交 fdfa761a 编写于 作者: C chenguowei01

Merge branch 'develop' of https://github.com/wuyefeilin/PaddleSeg into develop

language: python language: python
python: python:
- '2.7'
- '3.5' - '3.5'
- '3.6' - '3.6'
......
# PaddleSeg 图像分割库 # PaddleSeg
[![Build Status](https://travis-ci.org/PaddlePaddle/PaddleSeg.svg?branch=master)](https://travis-ci.org/PaddlePaddle/PaddleSeg) [![Build Status](https://travis-ci.org/PaddlePaddle/PaddleSeg.svg?branch=master)](https://travis-ci.org/PaddlePaddle/PaddleSeg)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
[![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleSeg.svg)](https://github.com/PaddlePaddle/PaddleSeg/releases) [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleSeg.svg)](https://github.com/PaddlePaddle/PaddleSeg/releases)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
## 简介 ## 简介
PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义分割库,覆盖了DeepLabv3+, U-Net, ICNet, PSPNet, HRNet, Fast-SCNN等主流分割模型。通过统一的配置,帮助用户更便捷地完成从训练到部署的全流程图像分割应用。 PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的端到端图像分割开发套件,覆盖了DeepLabv3+, U-Net, ICNet, PSPNet, HRNet, Fast-SCNN等主流分割网络。通过模块化的设计,以配置化方式驱动模型组合,帮助开发者更便捷地完成从训练到部署的全流程图像分割应用。
</br>
- [特点](#特点) - [特点](#特点)
- [安装](#安装) - [安装](#安装)
...@@ -23,8 +23,6 @@ PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义 ...@@ -23,8 +23,6 @@ PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义
- [更新日志](#更新日志) - [更新日志](#更新日志)
- [贡献代码](#贡献代码) - [贡献代码](#贡献代码)
</br>
## 特点 ## 特点
- **丰富的数据增强** - **丰富的数据增强**
...@@ -43,13 +41,17 @@ PaddleSeg支持多进程I/O、多卡并行、跨卡Batch Norm同步等训练加 ...@@ -43,13 +41,17 @@ PaddleSeg支持多进程I/O、多卡并行、跨卡Batch Norm同步等训练加
全面提供**服务端****移动端**的工业级部署能力,依托飞桨高性能推理引擎和高性能图像处理实现,开发者可以轻松完成高性能的分割模型部署和集成。通过[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite),可以在移动设备或者嵌入式设备上完成轻量级、高性能的人像分割模型部署。 全面提供**服务端****移动端**的工业级部署能力,依托飞桨高性能推理引擎和高性能图像处理实现,开发者可以轻松完成高性能的分割模型部署和集成。通过[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite),可以在移动设备或者嵌入式设备上完成轻量级、高性能的人像分割模型部署。
- **产业实践案例**
PaddleSeg提供丰富地产业实践案例,如[人像分割](./contrib/HumanSeg)[工业表计检测](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib#%E5%B7%A5%E4%B8%9A%E8%A1%A8%E7%9B%98%E5%88%86%E5%89%B2)[遥感分割](./contrib/RemoteSensing)[人体解析](contrib/ACE2P)[工业质检](https://aistudio.baidu.com/aistudio/projectdetail/184392)等产业实践案例,助力开发者更便捷地落地图像分割技术。
## 安装 ## 安装
### 1. 安装PaddlePaddle ### 1. 安装PaddlePaddle
版本要求 版本要求
* PaddlePaddle >= 1.6.1 * PaddlePaddle >= 1.7.0
* Python 2.7 or 3.5+ * Python >= 3.5+
由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用PaddleSeg. 由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用PaddleSeg.
``` ```
...@@ -70,8 +72,6 @@ cd PaddleSeg ...@@ -70,8 +72,6 @@ cd PaddleSeg
pip install -r requirements.txt pip install -r requirements.txt
``` ```
</br>
## 使用教程 ## 使用教程
我们提供了一系列的使用教程,来说明如何使用PaddleSeg完成语义分割模型的训练、评估、部署。 我们提供了一系列的使用教程,来说明如何使用PaddleSeg完成语义分割模型的训练、评估、部署。
...@@ -124,8 +124,6 @@ pip install -r requirements.txt ...@@ -124,8 +124,6 @@ pip install -r requirements.txt
|人像分割|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/188833)| |人像分割|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/188833)|
|PaddleSeg特色垂类模型|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/226710)| |PaddleSeg特色垂类模型|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/226710)|
</br>
## FAQ ## FAQ
#### Q: 安装requirements.txt指定的依赖包时,部分包提示找不到? #### Q: 安装requirements.txt指定的依赖包时,部分包提示找不到?
...@@ -148,26 +146,28 @@ python pdseg/train.py --cfg xxx.yaml TRAIN.RESUME_MODEL_DIR /PATH/TO/MODEL_CKPT/ ...@@ -148,26 +146,28 @@ python pdseg/train.py --cfg xxx.yaml TRAIN.RESUME_MODEL_DIR /PATH/TO/MODEL_CKPT/
A: 降低Batch size,使用Group Norm策略;请注意训练过程中当`DEFAULT_NORM_TYPE`选择`bn`时,为了Batch Norm计算稳定性,batch size需要满足>=2 A: 降低Batch size,使用Group Norm策略;请注意训练过程中当`DEFAULT_NORM_TYPE`选择`bn`时,为了Batch Norm计算稳定性,batch size需要满足>=2
#### Q: 出现错误 ModuleNotFoundError: No module named 'paddle.fluid.contrib.mixed_precision'
A: 请将PaddlePaddle升级至1.5.2版本或以上。
</br>
## 交流与反馈 ## 交流与反馈
* 欢迎您通过[Github Issues](https://github.com/PaddlePaddle/PaddleSeg/issues)来提交问题、报告与建议 * 欢迎您通过[Github Issues](https://github.com/PaddlePaddle/PaddleSeg/issues)来提交问题、报告与建议
* 微信公众号:飞桨PaddlePaddle * 微信公众号:飞桨PaddlePaddle
* QQ群: 796771754 * QQ群: 703252161
<p align="center"><img width="200" height="200" src="https://user-images.githubusercontent.com/45189361/64117959-1969de80-cdc9-11e9-84f7-e1c2849a004c.jpeg"/>&#8194;&#8194;&#8194;&#8194;&#8194;<img width="200" height="200" margin="500" src="./docs/imgs/qq_group2.png"/></p> <p align="center"><img width="200" height="200" src="https://user-images.githubusercontent.com/45189361/64117959-1969de80-cdc9-11e9-84f7-e1c2849a004c.jpeg"/>&#8194;&#8194;&#8194;&#8194;&#8194;<img width="200" height="200" margin="500" src="./docs/imgs/qq_group2.png"/></p>
<p align="center"> &#8194;&#8194;&#8194;微信公众号&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;官方技术交流QQ群</p> <p align="center"> &#8194;&#8194;&#8194;微信公众号&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;官方技术交流QQ群</p>
## 更新日志 ## 更新日志
* 2020.05.12
**`v0.5.0`**
* 全面升级[HumanSeg人像分割模型](./contrib/HumanSeg),新增超轻量级人像分割模型HumanSeg-lite支持移动端实时人像分割处理,并提供基于光流的视频分割后处理提升分割流畅性。
* 新增[气象遥感分割方案](./contrib/RemoteSensing),支持积雪识别、云检测等气象遥感场景。
* 新增[Lovasz Loss](docs/lovasz_loss.md),解决数据类别不均衡问题。
* 使用VisualDL 2.0作为训练可视化工具
* 2020.02.25 * 2020.02.25
**`v0.4.0`** **`v0.4.0`**
* 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN,提供基于Cityscapes的[预训练模型](./docs/model_zoo.md)1个 * 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN,提供基于Cityscapes的[预训练模型](./docs/model_zoo.md)1个
* 新增LaneNet车道线检测网络,提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个 * 新增LaneNet车道线检测网络,提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个
* 新增基于PaddleSlim的分割库压缩策略([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md)) * 新增基于PaddleSlim的分割库压缩策略([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md))
...@@ -203,4 +203,4 @@ A: 请将PaddlePaddle升级至1.5.2版本或以上。 ...@@ -203,4 +203,4 @@ A: 请将PaddlePaddle升级至1.5.2版本或以上。
## 贡献代码 ## 贡献代码
我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交pull requests. 我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交Pull Requests.
# Augmented Context Embedding with Edge Perceiving(ACE2P) # Augmented Context Embedding with Edge Perceiving(ACE2P)
## 模型概述 ## 模型概述
人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。ACE2P通过融合底层特征、全局上下文信息和边缘细节, 人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。Augmented Context Embedding with Edge Perceiving (ACE2P)通过融合底层特征、全局上下文信息和边缘细节,端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届Look into Person (LIP)挑战赛中赢得了全部三个人体解析任务的第一名。
端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
## 模型框架图 ## 模型框架图
![](imgs/net.jpg) ![](imgs/net.jpg)
...@@ -38,6 +37,59 @@ ACE2P模型包含三个分支: ...@@ -38,6 +37,59 @@ ACE2P模型包含三个分支:
![](imgs/result.jpg) ![](imgs/result.jpg)
![](ACE2P/imgs/result.jpg)
人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。本章节使用冠军模型Augmented Context Embedding with Edge Perceiving (ACE2P)进行预测分割。
## 代码使用说明
### 1. 模型下载
执行以下命令下载并解压ACE2P预测模型:
```
python download_ACE2P.py
```
或点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz)进行手动下载, 并在contrib/ACE2P下解压。
### 2. 数据下载
测试图片共10000张,
点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760)
下载Testing_images.zip,或前往LIP数据集官网进行下载。
下载后解压到./data文件夹下
### 3. 快速预测
使用GPU预测
```
python -u infer.py --example ACE2P --use_gpu
```
使用CPU预测:
```
python -u infer.py --example ACE2P
```
**NOTE:** 运行该模型需要2G左右显存。由于数据图片较多,预测过程将比较耗时。
#### 4. 预测结果示例:
原图:
![](ACE2P/imgs/117676_2149260.jpg)
预测结果:
![](ACE2P/imgs/117676_2149260.png)
### 备注
1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
2. ACE2P模型需预留2G显存,若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
## 引用 ## 引用
**论文** **论文**
......
# HumanSeg人像分割模型
本教程基于PaddleSeg核心分割网络,提供针对人像分割场景从预训练模型、Fine-tune、视频分割预测部署的全流程应用指南。最新发布HumanSeg-lite模型超轻量级人像分割模型,支持移动端场景的实时分割。
## 环境依赖
* Python == 3.5/3.6/3.7
* PaddlePaddle >= 1.7.2
PaddlePaddle的安装可参考[飞桨快速安装](https://www.paddlepaddle.org.cn/install/quick)
通过以下命令安装python包依赖,请确保在该分支上至少执行过一次以下命令
```shell
$ pip install -r requirements.txt
```
## 预训练模型
HumanSeg开放了在大规模人像数据上训练的三个预训练模型,满足多种使用场景的需求
| 模型类型 | Checkpoint | Inference Model | Quant Inference Model | 备注 |
| --- | --- | --- | ---| --- |
| HumanSeg-server | [humanseg_server_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_ckpt.zip) | [humanseg_server_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip) | -- | 高精度模型,适用于服务端GPU且背景复杂的人像场景, 模型结构为Deeplabv3+/Xcetion65, 输入大小(512, 512) |
| HumanSeg-mobile | [humanseg_mobile_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_ckpt.zip) | [humanseg_mobile_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip) | [humanseg_mobile_quant](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip) | 轻量级模型, 适用于移动端或服务端CPU的前置摄像头场景,模型结构为HRNet_w18_samll_v1,输入大小(192, 192) |
| HumanSeg-lite | [humanseg_lite_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_ckpt.zip) | [humanseg_lite_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_inference.zip) | [humanseg_lite_quant](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_quant.zip) | 超轻量级模型, 适用于手机自拍人像,且有移动端实时分割场景, 模型结构为优化的ShuffleNetV2,输入大小(192, 192) |
模型性能
| 模型 | 模型大小 | 计算耗时 |
| --- | --- | --- |
|humanseg_server_inference| 158M | - |
|humanseg_mobile_inference | 5.8 M | 42.35ms |
|humanseg_mobile_quant | 1.6M | 24.93ms |
|humanseg_lite_inference | 541K | 17.26ms |
|humanseg_lite_quant | 187k | 11.89ms |
计算耗时运行环境: 小米,cpu:骁龙855, 内存:6GB, 图片大小:192*192)
**NOTE:**
其中Checkpoint为模型权重,用于Fine-tuning场景。
* Inference Model和Quant Inference Model为预测部署模型,包含`__model__`计算图结构、`__params__`模型参数和`model.yaml`基础的模型配置信息。
* 其中Inference Model适用于服务端的CPU和GPU预测部署,Qunat Inference Model为量化版本,适用于通过Paddle Lite进行移动端等端侧设备部署。更多Paddle Lite部署说明查看[Paddle Lite文档](https://paddle-lite.readthedocs.io/zh/latest/)
执行以下脚本进行HumanSeg预训练模型的下载
```bash
python pretrained_weights/download_pretrained_weights.py
```
## 下载测试数据
我们提供了[supervise.ly](https://supervise.ly/)发布人像分割数据集**Supervisely Persons**, 从中随机抽取一小部分并转化成PaddleSeg可直接加载数据格式。通过运行以下代码进行快速下载,其中包含手机前置摄像头的人像测试视频`video_test.mp4`.
```bash
python data/download_data.py
```
## 快速体验视频流人像分割
结合DIS(Dense Inverse Search-basedmethod)光流算法预测结果与分割结果,改善视频流人像分割
```bash
# 通过电脑摄像头进行实时分割处理
python video_infer.py --model_dir pretrained_weights/humanseg_lite_inference
# 对人像视频进行分割处理
python video_infer.py --model_dir pretrained_weights/humanseg_lite_inference --video_path data/video_test.mp4
```
视频分割结果如下:
<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/result.gif" width="20%" height="20%">
**NOTE**:
视频分割处理时间需要几分钟,请耐心等待。
## 训练
使用下述命令基于与训练模型进行Fine-tuning,请确保选用的模型结构`model_type`与模型参数`pretrained_weights`匹配。
```bash
python train.py --model_type HumanSegMobile \
--save_dir output/ \
--data_dir data/mini_supervisely \
--train_list data/mini_supervisely/train.txt \
--val_list data/mini_supervisely/val.txt \
--pretrained_weights pretrained_weights/humanseg_mobile_ckpt \
--batch_size 8 \
--learning_rate 0.001 \
--num_epochs 10 \
--image_shape 192 192
```
其中参数含义如下:
* `--model_type`: 模型类型,可选项为:HumanSegServer、HumanSegMobile和HumanSegLite
* `--save_dir`: 模型保存路径
* `--data_dir`: 数据集路径
* `--train_list`: 训练集列表路径
* `--val_list`: 验证集列表路径
* `--pretrained_weights`: 预训练模型路径
* `--batch_size`: 批大小
* `--learning_rate`: 初始学习率
* `--num_epochs`: 训练轮数
* `--image_shape`: 网络输入图像大小(w, h)
更多命令行帮助可运行下述命令进行查看:
```bash
python train.py --help
```
**NOTE**
可通过更换`--model_type`变量与对应的`--pretrained_weights`使用不同的模型快速尝试。
## 评估
使用下述命令进行评估
```bash
python val.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--val_list data/mini_supervisely/val.txt \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--data_dir`: 数据集路径
* `--val_list`: 验证集列表路径
* `--image_shape`: 网络输入图像大小(w, h)
## 预测
使用下述命令进行预测
```bash
python infer.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--test_list data/mini_supervisely/test.txt \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--data_dir`: 数据集路径
* `--test_list`: 测试集列表路径
* `--image_shape`: 网络输入图像大小(w, h)
## 模型导出
```bash
python export.py --model_dir output/best_model \
--save_dir output/export
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--save_dir`: 导出模型保存路径
## 离线量化
```bash
python quant_offline.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--quant_list data/mini_supervisely/val.txt \
--save_dir output/quant_offline \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 待量化模型路径
* `--data_dir`: 数据集路径
* `--quant_list`: 量化数据集列表路径,一般直接选择训练集或验证集
* `--save_dir`: 量化模型保存路径
* `--image_shape`: 网络输入图像大小(w, h)
## 在线量化
利用float训练模型进行在线量化。
```bash
python quant_online.py --model_type HumanSegMobile \
--save_dir output/quant_online \
--data_dir data/mini_supervisely \
--train_list data/mini_supervisely/train.txt \
--val_list data/mini_supervisely/val.txt \
--pretrained_weights output/best_model \
--batch_size 2 \
--learning_rate 0.001 \
--num_epochs 2 \
--image_shape 192 192
```
其中参数含义如下:
* `--model_type`: 模型类型,可选项为:HumanSegServer、HumanSegMobile和HumanSegLite
* `--save_dir`: 模型保存路径
* `--data_dir`: 数据集路径
* `--train_list`: 训练集列表路径
* `--val_list`: 验证集列表路径
* `--pretrained_weights`: 预训练模型路径,
* `--batch_size`: 批大小
* `--learning_rate`: 初始学习率
* `--num_epochs`: 训练轮数
* `--image_shape`: 网络输入图像大小(w, h)
# -*- coding: utf-8 -*-
from utils.util import AttrDict, get_arguments, merge_cfg_from_args
import os
args = get_arguments()
cfg = AttrDict()
# 待预测图像所在路径
cfg.data_dir = os.path.join(args.example , "data", "test_images")
# 待预测图像名称列表
cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
# 模型加载路径
cfg.model_path = os.path.join(args.example , "model")
# 预测结果保存路径
cfg.vis_dir = os.path.join(args.example , "result")
# 预测类别数
cfg.class_num = 2
# 均值, 图像预处理减去的均值
cfg.MEAN = 104.008, 116.669, 122.675
# 标准差,图像预处理除以标准差
cfg.STD = 1.0, 1.0, 1.0
# 待预测图像输入尺寸
cfg.input_size = 513, 513
merge_cfg_from_args(args, cfg)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
TEST_PATH = os.path.join(LOCAL_PATH, "../../../", "test")
sys.path.append(TEST_PATH)
from test_utils import download_file_and_uncompress
def download_data(savepath, extrapath):
url = "https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip"
download_file_and_uncompress(
url=url, savepath=savepath, extrapath=extrapath)
url = "https://paddleseg.bj.bcebos.com/humanseg/data/video_test.zip"
download_file_and_uncompress(
url=url,
savepath=savepath,
extrapath=extrapath,
extraname='video_test.mp4')
if __name__ == "__main__":
download_data(LOCAL_PATH, LOCAL_PATH)
print("Data download finish!")
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .dataset import Dataset
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
from threading import Thread
import multiprocessing
import collections
import numpy as np
import six
import sys
import copy
import random
import platform
import chardet
import utils.logging as logging
class EndSignal():
pass
def is_pic(img_name):
valid_suffix = ['JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png']
suffix = img_name.split('.')[-1]
if suffix not in valid_suffix:
return False
return True
def is_valid(sample):
if sample is None:
return False
if isinstance(sample, tuple):
for s in sample:
if s is None:
return False
elif isinstance(s, np.ndarray) and s.size == 0:
return False
elif isinstance(s, collections.Sequence) and len(s) == 0:
return False
return True
def get_encoding(path):
f = open(path, 'rb')
data = f.read()
file_encoding = chardet.detect(data).get('encoding')
return file_encoding
def multithread_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from queue import Queue
end = EndSignal()
# define a worker to read samples from reader to in_queue
def read_worker(reader, in_queue):
for i in reader():
in_queue.put(i)
in_queue.put(end)
# define a worker to handle samples from in_queue by mapper
# and put mapped samples into out_queue
def handle_worker(in_queue, out_queue, mapper):
sample = in_queue.get()
while not isinstance(sample, EndSignal):
if len(sample) == 2:
r = mapper(sample[0], sample[1])
elif len(sample) == 3:
r = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(r):
out_queue.put(r)
sample = in_queue.get()
in_queue.put(end)
out_queue.put(end)
def xreader():
in_queue = Queue(buffer_size)
out_queue = Queue(buffer_size)
# start a read worker in a thread
target = read_worker
t = Thread(target=target, args=(reader, in_queue))
t.daemon = True
t.start()
# start several handle_workers
target = handle_worker
args = (in_queue, out_queue, mapper)
workers = []
for i in range(num_workers):
worker = Thread(target=target, args=args)
worker.daemon = True
workers.append(worker)
for w in workers:
w.start()
batch_data = []
sample = out_queue.get()
while not isinstance(sample, EndSignal):
batch_data.append(sample)
if len(batch_data) == batch_size:
yield batch_data
batch_data = []
sample = out_queue.get()
finish = 1
while finish < num_workers:
sample = out_queue.get()
if isinstance(sample, EndSignal):
finish += 1
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
yield batch_data
batch_data = []
if not drop_last and len(batch_data) != 0:
yield batch_data
batch_data = []
return xreader
def multiprocess_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from .shared_queue import SharedQueue as Queue
def _read_into_queue(samples, mapper, queue):
end = EndSignal()
try:
for sample in samples:
if sample is None:
raise ValueError("sample has None")
if len(sample) == 2:
result = mapper(sample[0], sample[1])
elif len(sample) == 3:
result = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(result):
queue.put(result)
queue.put(end)
except:
queue.put("")
six.reraise(*sys.exc_info())
def queue_reader():
queue = Queue(buffer_size, memsize=3 * 1024**3)
total_samples = [[] for i in range(num_workers)]
for i, sample in enumerate(reader()):
index = i % num_workers
total_samples[index].append(sample)
for i in range(num_workers):
p = multiprocessing.Process(
target=_read_into_queue, args=(total_samples[i], mapper, queue))
p.start()
finish_num = 0
batch_data = list()
while finish_num < num_workers:
sample = queue.get()
if isinstance(sample, EndSignal):
finish_num += 1
elif sample == "":
raise ValueError("multiprocess reader raises an exception")
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
yield batch_data
batch_data = []
if len(batch_data) != 0 and not drop_last:
yield batch_data
batch_data = []
return queue_reader
class Dataset:
def __init__(self,
data_dir,
file_list,
label_list=None,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=False):
if num_workers == 'auto':
import multiprocessing as mp
num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
if transforms is None:
raise Exception("transform should be defined.")
self.transforms = transforms
self.num_workers = num_workers
self.buffer_size = buffer_size
self.parallel_method = parallel_method
self.shuffle = shuffle
self.file_list = list()
self.labels = list()
self._epoch = 0
if label_list is not None:
with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f:
item = line.strip()
self.labels.append(item)
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if not is_pic(items[0]):
continue
full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im):
raise IOError(
'The image file {} is not exist!'.format(full_path_im))
if not osp.exists(full_path_label):
raise IOError('The image file {} is not exist!'.format(
full_path_label))
self.file_list.append([full_path_im, full_path_label])
self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format(
len(self.file_list), file_list))
def iterator(self):
self._epoch += 1
self._pos = 0
files = copy.deepcopy(self.file_list)
if self.shuffle:
random.shuffle(files)
files = files[:self.num_samples]
self.num_samples = len(files)
for f in files:
label_path = f[1]
sample = [f[0], None, label_path]
yield sample
def generator(self, batch_size=1, drop_last=True):
self.batch_size = batch_size
parallel_reader = multithread_reader
if self.parallel_method == "process":
if platform.platform().startswith("Windows"):
logging.debug(
"multiprocess_reader is not supported in Windows platform, force to use multithread_reader."
)
else:
parallel_reader = multiprocess_reader
return parallel_reader(
self.transforms,
self.iterator,
num_workers=self.num_workers,
buffer_size=self.buffer_size,
batch_size=batch_size,
drop_last=drop_last)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
from .sharedmemory import SharedBuffer
from .sharedmemory import SharedMemoryMgr
from .sharedmemory import SharedMemoryError
from .queue import SharedQueue
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import sys
import six
if six.PY3:
import pickle
from io import BytesIO as StringIO
else:
import cPickle as pickle
from cStringIO import StringIO
import logging
import traceback
import multiprocessing as mp
from multiprocessing.queues import Queue
from .sharedmemory import SharedMemoryMgr
logger = logging.getLogger(__name__)
class SharedQueueError(ValueError):
""" SharedQueueError
"""
pass
class SharedQueue(Queue):
""" a Queue based on shared memory to communicate data between Process,
and it's interface is compatible with 'multiprocessing.queues.Queue'
"""
def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
""" init
"""
if six.PY3:
super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
else:
super(SharedQueue, self).__init__(maxsize)
if mem_mgr is not None:
self._shared_mem = mem_mgr
else:
self._shared_mem = SharedMemoryMgr(
capacity=memsize, pagesize=pagesize)
def put(self, obj, **kwargs):
""" put an object to this queue
"""
obj = pickle.dumps(obj, -1)
buff = None
try:
buff = self._shared_mem.malloc(len(obj))
buff.put(obj)
super(SharedQueue, self).put(buff, **kwargs)
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to put a element to SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
if buff is not None:
buff.free()
raise e
def get(self, **kwargs):
""" get an object from this queue
"""
buff = None
try:
buff = super(SharedQueue, self).get(**kwargs)
data = buff.get()
return pickle.load(StringIO(data))
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to get element from SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
raise e
finally:
if buff is not None:
buff.free()
def release(self):
self._shared_mem.release()
self._shared_mem = None
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# utils for memory management which is allocated on sharedmemory,
# note that these structures may not be thread-safe
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import time
import math
import struct
import sys
import six
if six.PY3:
import pickle
else:
import cPickle as pickle
import json
import uuid
import random
import numpy as np
import weakref
import logging
from multiprocessing import Lock
from multiprocessing import RawArray
logger = logging.getLogger(__name__)
class SharedMemoryError(ValueError):
""" SharedMemoryError
"""
pass
class SharedBufferError(SharedMemoryError):
""" SharedBufferError
"""
pass
class MemoryFullError(SharedMemoryError):
""" MemoryFullError
"""
def __init__(self, errmsg=''):
super(MemoryFullError, self).__init__()
self.errmsg = errmsg
def memcopy(dst, src, offset=0, length=None):
""" copy data from 'src' to 'dst' in bytes
"""
length = length if length is not None else len(src)
assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
if type(src) is not np.ndarray:
if type(src) is str and six.PY3:
src = src.encode()
src = np.frombuffer(src, dtype='uint8', count=len(src))
dst[:] = src[offset:offset + length]
class SharedBuffer(object):
""" Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
note that:
every instance of this should be freed explicitely by calling 'self.free'
"""
def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
""" Init
Args:
owner (str): manager to own this buffer
capacity (int): capacity in bytes for this buffer
pos (int): page position in shared memory
size (int): bytes already used
alloc_status (str): debug info about allocator when allocate this
"""
self._owner = owner
self._cap = capacity
self._pos = pos
self._size = size
self._alloc_status = alloc_status
assert self._pos >= 0 and self._cap > 0, \
"invalid params[%d:%d] to construct SharedBuffer" \
% (self._pos, self._cap)
def owner(self):
""" get owner
"""
return SharedMemoryMgr.get_mgr(self._owner)
def put(self, data, override=False):
""" put data to this buffer
Args:
data (str): data to be stored in this buffer
Returns:
None
Raises:
SharedMemoryError when not enough space in this buffer
"""
assert type(data) in [str, bytes], \
'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
if self._size > 0 and not override:
raise SharedBufferError('already has already been setted before')
if self.capacity() < len(data):
raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
% (len(data), str(self)))
self.owner().put_data(self, data)
self._size = len(data)
def get(self, offset=0, size=None, no_copy=True):
""" get the data stored this buffer
Args:
offset (int): position for the start point to 'get'
size (int): size to get
Returns:
data (np.ndarray('uint8')): user's data in numpy
which is passed in by 'put'
None: if no data stored in
"""
offset = offset if offset >= 0 else self._size + offset
if self._size <= 0:
return None
size = self._size if size is None else size
assert offset + size <= self._cap, 'invalid offset[%d] '\
'or size[%d] for capacity[%d]' % (offset, size, self._cap)
return self.owner().get_data(self, offset, size, no_copy=no_copy)
def size(self):
""" bytes of used memory
"""
return self._size
def resize(self, size):
""" resize the used memory to 'size', should not be greater than capacity
"""
assert size >= 0 and size <= self._cap, \
"invalid size[%d] for resize" % (size)
self._size = size
def capacity(self):
""" size of allocated memory
"""
return self._cap
def __str__(self):
""" human readable format
"""
return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
"capacity:%d, alloc_status:[%s], pid:%d)" \
% (str(self._owner), self._pos, self._size, \
self._cap, self._alloc_status, os.getpid())
def free(self):
""" free this buffer to it's owner
"""
if self._owner is not None:
self.owner().free(self)
self._owner = None
self._cap = 0
self._pos = -1
self._size = 0
return True
else:
return False
class PageAllocator(object):
""" allocator used to malloc and free shared memory which
is split into pages
"""
s_allocator_header = 12
def __init__(self, base, total_pages, page_size):
""" init
"""
self._magic_num = 1234321000 + random.randint(100, 999)
self._base = base
self._total_pages = total_pages
self._page_size = page_size
header_pages = int(
math.ceil((total_pages + self.s_allocator_header) / page_size))
self._header_pages = header_pages
self._free_pages = total_pages - header_pages
self._header_size = self._header_pages * page_size
self._reset()
def _dump_alloc_info(self, fname):
hpages, tpages, pos, used = self.header()
start = self.s_allocator_header
end = start + self._page_size * hpages
alloc_flags = self._base[start:end].tostring()
info = {
'magic_num': self._magic_num,
'header_pages': hpages,
'total_pages': tpages,
'pos': pos,
'used': used
}
info['alloc_flags'] = alloc_flags
fname = fname + '.' + str(uuid.uuid4())[:6]
with open(fname, 'wb') as f:
f.write(pickle.dumps(info, -1))
logger.warn('dump alloc info to file[%s]' % (fname))
def _reset(self):
alloc_page_pos = self._header_pages
used_pages = self._header_pages
header_info = struct.pack(
str('III'), self._magic_num, alloc_page_pos, used_pages)
assert len(header_info) == self.s_allocator_header, \
'invalid size of header_info'
memcopy(self._base[0:self.s_allocator_header], header_info)
self.set_page_status(0, self._header_pages, '1')
self.set_page_status(self._header_pages, self._free_pages, '0')
def header(self):
""" get header info of this allocator
"""
header_str = self._base[0:self.s_allocator_header].tostring()
magic, pos, used = struct.unpack(str('III'), header_str)
assert magic == self._magic_num, \
'invalid header magic[%d] in shared memory' % (magic)
return self._header_pages, self._total_pages, pos, used
def empty(self):
""" are all allocatable pages available
"""
header_pages, pages, pos, used = self.header()
return header_pages == used
def full(self):
""" are all allocatable pages used
"""
header_pages, pages, pos, used = self.header()
return header_pages + used == pages
def __str__(self):
header_pages, pages, pos, used = self.header()
desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
% (self._magic_num, pages, used, header_pages, pos, self._page_size)
return 'PageAllocator:%s' % (desc)
def set_alloc_info(self, alloc_pos, used_pages):
""" set allocating position to new value
"""
memcopy(self._base[4:12], struct.pack(str('II'), alloc_pos, used_pages))
def set_page_status(self, start, page_num, status):
""" set pages from 'start' to 'end' with new same status 'status'
"""
assert status in ['0', '1'], 'invalid status[%s] for page status '\
'in allocator[%s]' % (status, str(self))
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
memcopy(self._base[start:end], str(status * page_num))
def get_page_status(self, start, page_num, ret_flag=False):
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
status = self._base[start:end].tostring().decode()
if ret_flag:
return status
zero_num = status.count('0')
if zero_num == 0:
return (page_num, 1)
else:
return (zero_num, 0)
def malloc_page(self, page_num):
header_pages, pages, pos, used = self.header()
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
start_pos = pos
flags = ''
while True:
# maybe flags already has some '0' pages,
# so just check 'page_num - len(flags)' pages
flags = self.get_page_status(pos, page_num, ret_flag=True)
if flags.count('0') == page_num:
break
# not found enough pages, so shift to next few pages
free_pos = flags.rfind('1') + 1
pos += free_pos
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
flags = ''
# not found available pages after scan all pages
if pos <= start_pos and end >= start_pos:
logger.debug('not found available pages after scan all pages')
break
page_status = (flags.count('0'), 0)
if page_status != (page_num, 0):
free_pages = self._total_pages - used
if free_pages == 0:
err_msg = 'all pages have been used:%s' % (str(self))
else:
err_msg = 'not found available pages with page_status[%s] '\
'and %d free pages' % (str(page_status), free_pages)
err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
% (page_num, pos, err_msg, str(self))
raise MemoryFullError(err_msg)
self.set_page_status(pos, page_num, '1')
used += page_num
self.set_alloc_info(end, used)
return pos
def free_page(self, start, page_num):
""" free 'page_num' pages start from 'start'
"""
page_status = self.get_page_status(start, page_num)
assert page_status == (page_num, 1), \
'invalid status[%s] when free [%d, %d]' \
% (str(page_status), start, page_num)
self.set_page_status(start, page_num, '0')
_, _, pos, used = self.header()
used -= page_num
self.set_alloc_info(pos, used)
DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
class SharedMemoryMgr(object):
""" manage a continouse block of memory, provide
'malloc' to allocate new buffer, and 'free' to free buffer
"""
s_memory_mgrs = weakref.WeakValueDictionary()
s_mgr_num = 0
s_log_statis = False
@classmethod
def get_mgr(cls, id):
""" get a SharedMemoryMgr with size of 'capacity'
"""
assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
id)
return cls.s_memory_mgrs[id]
def __init__(self, capacity=None, pagesize=None):
""" init
"""
logger.debug('create SharedMemoryMgr')
pagesize = 64 * 1024 if pagesize is None else pagesize
assert type(pagesize) is int, "invalid type of pagesize[%s]" \
% (str(pagesize))
capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
assert type(capacity) is int, "invalid type of capacity[%s]" \
% (str(capacity))
assert capacity > 0, '"size of shared memory should be greater than 0'
self._released = False
self._cap = capacity
self._page_size = pagesize
assert self._cap % self._page_size == 0, \
"capacity[%d] and pagesize[%d] are not consistent" \
% (self._cap, self._page_size)
self._total_pages = self._cap // self._page_size
self._pid = os.getpid()
SharedMemoryMgr.s_mgr_num += 1
self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
SharedMemoryMgr.s_memory_mgrs[self._id] = self
self._locker = Lock()
self._setup()
def _setup(self):
self._shared_mem = RawArray('c', self._cap)
self._base = np.frombuffer(
self._shared_mem, dtype='uint8', count=self._cap)
self._locker.acquire()
try:
self._allocator = PageAllocator(self._base, self._total_pages,
self._page_size)
finally:
self._locker.release()
def malloc(self, size, wait=True):
""" malloc a new SharedBuffer
Args:
size (int): buffer size to be malloc
wait (bool): whether to wait when no enough memory
Returns:
SharedBuffer
Raises:
SharedMemoryError when not found available memory
"""
page_num = int(math.ceil(size / self._page_size))
size = page_num * self._page_size
start = None
ct = 0
errmsg = ''
while True:
self._locker.acquire()
try:
start = self._allocator.malloc_page(page_num)
alloc_status = str(self._allocator)
except MemoryFullError as e:
start = None
errmsg = e.errmsg
if not wait:
raise e
finally:
self._locker.release()
if start is None:
time.sleep(0.1)
if ct % 100 == 0:
logger.warn('not enough space for reason[%s]' % (errmsg))
ct += 1
else:
break
return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
def free(self, shared_buf):
""" free a SharedBuffer
Args:
shared_buf (SharedBuffer): buffer to be freed
Returns:
None
Raises:
SharedMemoryError when failed to release this buffer
"""
assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
"for it's not allocated from me[%s]" % (str(shared_buf), str(self))
cap = shared_buf.capacity()
start_page = shared_buf._pos
page_num = cap // self._page_size
#maybe we don't need this lock here
self._locker.acquire()
try:
self._allocator.free_page(start_page, page_num)
finally:
self._locker.release()
def put_data(self, shared_buf, data):
""" fill 'data' into 'shared_buf'
"""
assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
'for this buffer[%s]' % (len(data), str(shared_buf))
start = shared_buf._pos * self._page_size
end = start + len(data)
assert start >= 0 and end <= self._cap, "invalid start "\
"position[%d] when put data to buff:%s" % (start, str(shared_buf))
self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
def get_data(self, shared_buf, offset, size, no_copy=True):
""" extract 'data' from 'shared_buf' in range [offset, offset + size)
"""
start = shared_buf._pos * self._page_size
start += offset
if no_copy:
return self._base[start:start + size]
else:
return self._base[start:start + size].tostring()
def __str__(self):
return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
def __del__(self):
if SharedMemoryMgr.s_log_statis:
logger.info('destroy [%s]' % (self))
if not self._released and not self._allocator.empty():
logger.debug(
'not empty when delete this SharedMemoryMgr[%s]' % (self))
else:
self._released = True
if self._id in SharedMemoryMgr.s_memory_mgrs:
del SharedMemoryMgr.s_memory_mgrs[self._id]
SharedMemoryMgr.s_mgr_num -= 1
import models
import argparse
def parse_args():
parser = argparse.ArgumentParser(description='Export model')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for exporting',
type=str)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the export model',
type=str,
default='./output/export')
return parser.parse_args()
def export(args):
model = models.load_model(args.model_dir)
model.export_inference_model(args.save_dir)
if __name__ == '__main__':
args = parse_args()
export(args)
# -*- coding: utf-8 -*- import argparse
import os import os
import os.path as osp
import cv2 import cv2
import numpy as np import numpy as np
from utils.util import get_arguments import tqdm
from utils.palette import get_palette
from PIL import Image as PILImage import utils
import importlib import models
import transforms
args = get_arguments()
config = importlib.import_module('config')
cfg = getattr(config, 'cfg') def parse_args():
parser = argparse.ArgumentParser(
# paddle垃圾回收策略FLAG,ACE2P模型较大,当显存不够时建议开启 description='HumanSeg inference and visualization')
os.environ['FLAGS_eager_delete_tensor_gb']='0.0' parser.add_argument(
'--model_dir',
import paddle.fluid as fluid dest='model_dir',
help='Model path for inference',
# 预测数据集类 type=str)
class TestDataSet(): parser.add_argument(
def __init__(self): '--data_dir',
self.data_dir = cfg.data_dir dest='data_dir',
self.data_list_file = cfg.data_list_file help='The root directory of dataset',
self.data_list = self.get_data_list() type=str)
self.data_num = len(self.data_list) parser.add_argument(
'--test_list',
def get_data_list(self): dest='test_list',
# 获取预测图像路径列表 help='Test list file of dataset',
data_list = [] type=str)
data_file_handler = open(self.data_list_file, 'r') parser.add_argument(
for line in data_file_handler: '--save_dir',
img_name = line.strip() dest='save_dir',
name_prefix = img_name.split('.')[0] help='The directory for saving the inference results',
if len(img_name.split('.')) == 1: type=str,
img_name = img_name + '.jpg' default='./output/result')
img_path = os.path.join(self.data_dir, img_name) parser.add_argument(
data_list.append(img_path) "--image_shape",
return data_list dest="image_shape",
help="The image shape for net inputs.",
def preprocess(self, img): nargs=2,
# 图像预处理 default=[192, 192],
if cfg.example == 'ACE2P': type=int)
reader = importlib.import_module(args.example+'.reader') return parser.parse_args()
ACE2P_preprocess = getattr(reader, 'preprocess')
img = ACE2P_preprocess(img)
else: def mkdir(path):
img = cv2.resize(img, cfg.input_size).astype(np.float32) sub_dir = osp.dirname(path)
img -= np.array(cfg.MEAN) if not osp.exists(sub_dir):
img /= np.array(cfg.STD) os.makedirs(sub_dir)
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0)
return img def infer(args):
test_transforms = transforms.Compose(
def get_data(self, index): [transforms.Resize(args.image_shape),
# 获取图像信息 transforms.Normalize()])
img_path = self.data_list[index] model = models.load_model(args.model_dir)
img = cv2.imread(img_path, cv2.IMREAD_COLOR) added_saveed_path = osp.join(args.save_dir, 'added')
if img is None: mat_saved_path = osp.join(args.save_dir, 'mat')
return img, img,img_path, None scoremap_saved_path = osp.join(args.save_dir, 'scoremap')
img_name = img_path.split(os.sep)[-1] with open(args.test_list, 'r') as f:
name_prefix = img_name.replace('.'+img_name.split('.')[-1],'') files = f.readlines()
img_shape = img.shape[:2]
img_process = self.preprocess(img) for file in tqdm.tqdm(files):
file = file.strip()
return img, img_process, name_prefix, img_shape im_file = osp.join(args.data_dir, file)
im = cv2.imread(im_file)
result = model.predict(im, transforms=test_transforms)
def infer():
if not os.path.exists(cfg.vis_dir): # save added image
os.makedirs(cfg.vis_dir) added_image = utils.visualize(im_file, result, weight=0.6)
palette = get_palette(cfg.class_num) added_image_file = osp.join(added_saveed_path, file)
# 人像分割结果显示阈值 mkdir(added_image_file)
thresh = 120 cv2.imwrite(added_image_file, added_image)
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace() # save score map
exe = fluid.Executor(place) score_map = result['score_map'][:, :, 1]
score_map = (score_map * 255).astype(np.uint8)
# 加载预测模型 score_map_file = osp.join(scoremap_saved_path, file)
test_prog, feed_name, fetch_list = fluid.io.load_inference_model( mkdir(score_map_file)
dirname=cfg.model_path, executor=exe, params_filename='__params__') cv2.imwrite(score_map_file, score_map)
#加载预测数据集 # save mat image
test_dataset = TestDataSet() score_map = np.expand_dims(score_map, axis=-1)
data_num = test_dataset.data_num mat_image = np.concatenate([im, score_map], axis=2)
mat_file = osp.join(mat_saved_path, file)
for idx in range(data_num): ext = osp.splitext(mat_file)[-1]
# 数据获取 mat_file = mat_file.replace(ext, '.png')
ori_img, image, im_name, im_shape = test_dataset.get_data(idx) mkdir(mat_file)
if image is None: cv2.imwrite(mat_file, mat_image)
print(im_name, 'is None')
continue
if __name__ == '__main__':
# 预测 args = parse_args()
if cfg.example == 'ACE2P': infer(args)
# ACE2P模型使用多尺度预测
reader = importlib.import_module(args.example+'.reader')
multi_scale_test = getattr(reader, 'multi_scale_test')
parsing, logits = multi_scale_test(exe, test_prog, feed_name, fetch_list, image, im_shape)
else:
# HumanSeg,RoadLine模型单尺度预测
result = exe.run(program=test_prog, feed={feed_name[0]: image}, fetch_list=fetch_list)
parsing = np.argmax(result[0][0], axis=0)
parsing = cv2.resize(parsing.astype(np.uint8), im_shape[::-1])
# 预测结果保存
result_path = os.path.join(cfg.vis_dir, im_name + '.png')
if cfg.example == 'HumanSeg':
logits = result[0][0][1]*255
logits = cv2.resize(logits, im_shape[::-1])
ret, logits = cv2.threshold(logits, thresh, 0, cv2.THRESH_TOZERO)
logits = 255 *(logits - thresh)/(255 - thresh)
# 将分割结果添加到alpha通道
rgba = np.concatenate((ori_img, np.expand_dims(logits, axis=2)), axis=2)
cv2.imwrite(result_path, rgba)
else:
output_im = PILImage.fromarray(np.asarray(parsing, dtype=np.uint8))
output_im.putpalette(palette)
output_im.save(result_path)
if (idx + 1) % 100 == 0:
print('%d processd' % (idx + 1))
print('%d processd done' % (idx + 1))
return 0
if __name__ == "__main__":
infer()
from .humanseg import HumanSegMobile
from .humanseg import HumanSegServer
from .humanseg import HumanSegLite
from .load_model import load_model
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import paddle.fluid as fluid
import os
from os import path as osp
import numpy as np
from collections import OrderedDict
import copy
import math
import time
import tqdm
import cv2
import yaml
import paddleslim as slim
import utils
import utils.logging as logging
from utils import seconds_to_hms
from utils import ConfusionMatrix
from utils import get_environ_info
from nets import DeepLabv3p, ShuffleSeg, HRNet
import transforms as T
def dict2str(dict_input):
out = ''
for k, v in dict_input.items():
try:
v = round(float(v), 6)
except:
pass
out = out + '{}={}, '.format(k, v)
return out.strip(', ')
class SegModel(object):
# DeepLab mobilenet
def __init__(self,
num_classes=2,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255,
sync_bn=True):
self.init_params = locals()
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.sync_bn = sync_bn
self.labels = None
self.env_info = get_environ_info()
if self.env_info['place'] == 'cpu':
self.places = fluid.cpu_places()
else:
self.places = fluid.cuda_places()
self.exe = fluid.Executor(self.places[0])
self.train_prog = None
self.test_prog = None
self.parallel_train_prog = None
self.train_inputs = None
self.test_inputs = None
self.train_outputs = None
self.test_outputs = None
self.train_data_loader = None
self.eval_metrics = None
# 当前模型状态
self.status = 'Normal'
def _get_single_car_bs(self, batch_size):
if batch_size % len(self.places) == 0:
return int(batch_size // len(self.places))
else:
raise Exception("Please support correct batch_size, \
which can be divided by available cards({}) in {}".
format(self.env_info['num'],
self.env_info['place']))
def build_net(self, mode='train'):
"""应根据不同的情况进行构建"""
pass
def build_program(self):
# build training network
self.train_inputs, self.train_outputs = self.build_net(mode='train')
self.train_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program()
# build prediction network
self.test_prog = fluid.Program()
with fluid.program_guard(self.test_prog, startup_prog):
with fluid.unique_name.guard():
self.test_inputs, self.test_outputs = self.build_net(
mode='test')
self.test_prog = self.test_prog.clone(for_test=True)
def arrange_transform(self, transforms, mode='train'):
arrange_transform = T.ArrangeSegmenter
if type(transforms.transforms[-1]).__name__.startswith('Arrange'):
transforms.transforms[-1] = arrange_transform(mode=mode)
else:
transforms.transforms.append(arrange_transform(mode=mode))
def build_train_data_loader(self, dataset, batch_size):
# init data_loader
if self.train_data_loader is None:
self.train_data_loader = fluid.io.DataLoader.from_generator(
feed_list=list(self.train_inputs.values()),
capacity=64,
use_double_buffer=True,
iterable=True)
batch_size_each_gpu = self._get_single_car_bs(batch_size)
self.train_data_loader.set_sample_list_generator(
dataset.generator(batch_size=batch_size_each_gpu),
places=self.places)
def net_initialize(self,
startup_prog=None,
pretrained_weights=None,
resume_weights=None):
if startup_prog is None:
startup_prog = fluid.default_startup_program()
self.exe.run(startup_prog)
if resume_weights is not None:
logging.info("Resume weights from {}".format(resume_weights))
if not osp.exists(resume_weights):
raise Exception("Path {} not exists.".format(resume_weights))
fluid.load(self.train_prog, osp.join(resume_weights, 'model'),
self.exe)
# Check is path ended by path spearator
if resume_weights[-1] == os.sep:
resume_weights = resume_weights[0:-1]
epoch_name = osp.basename(resume_weights)
# If resume weights is end of digit, restore epoch status
epoch = epoch_name.split('_')[-1]
if epoch.isdigit():
self.begin_epoch = int(epoch)
else:
raise ValueError("Resume model path is not valid!")
logging.info("Model checkpoint loaded successfully!")
elif pretrained_weights is not None:
logging.info(
"Load pretrain weights from {}.".format(pretrained_weights))
utils.load_pretrained_weights(self.exe, self.train_prog,
pretrained_weights)
def get_model_info(self):
# 存储相应的信息到yml文件
info = dict()
info['Model'] = self.__class__.__name__
if 'self' in self.init_params:
del self.init_params['self']
if '__class__' in self.init_params:
del self.init_params['__class__']
info['_init_params'] = self.init_params
info['_Attributes'] = dict()
info['_Attributes']['num_classes'] = self.num_classes
info['_Attributes']['labels'] = self.labels
try:
info['_Attributes']['eval_metric'] = dict()
for k, v in self.eval_metrics.items():
if isinstance(v, np.ndarray):
if v.size > 1:
v = [float(i) for i in v]
else:
v = float(v)
info['_Attributes']['eval_metric'][k] = v
except:
pass
if hasattr(self, 'test_transforms'):
if self.test_transforms is not None:
info['test_transforms'] = list()
for op in self.test_transforms.transforms:
name = op.__class__.__name__
attr = op.__dict__
info['test_transforms'].append({name: attr})
if hasattr(self, 'train_transforms'):
if self.train_transforms is not None:
info['train_transforms'] = list()
for op in self.train_transforms.transforms:
name = op.__class__.__name__
attr = op.__dict__
info['train_transforms'].append({name: attr})
if hasattr(self, 'train_init'):
if 'self' in self.train_init:
del self.train_init['self']
if 'train_dataset' in self.train_init:
del self.train_init['train_dataset']
if 'eval_dataset' in self.train_init:
del self.train_init['eval_dataset']
if 'optimizer' in self.train_init:
del self.train_init['optimizer']
info['train_init'] = self.train_init
return info
def save_model(self, save_dir):
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
model_info = self.get_model_info()
if self.status == 'Normal':
fluid.save(self.train_prog, osp.join(save_dir, 'model'))
elif self.status == 'Quant':
float_prog, _ = slim.quant.convert(
self.test_prog, self.exe.place, save_int8=True)
test_input_names = [
var.name for var in list(self.test_inputs.values())
]
test_outputs = list(self.test_outputs.values())
fluid.io.save_inference_model(
dirname=save_dir,
executor=self.exe,
params_filename='__params__',
feeded_var_names=test_input_names,
target_vars=test_outputs,
main_program=float_prog)
model_info['_ModelInputsOutputs'] = dict()
model_info['_ModelInputsOutputs']['test_inputs'] = [
[k, v.name] for k, v in self.test_inputs.items()
]
model_info['_ModelInputsOutputs']['test_outputs'] = [
[k, v.name] for k, v in self.test_outputs.items()
]
model_info['status'] = self.status
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# The flag of model for saving successfully
open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model saved in {}.".format(save_dir))
def export_inference_model(self, save_dir):
test_input_names = [var.name for var in list(self.test_inputs.values())]
test_outputs = list(self.test_outputs.values())
fluid.io.save_inference_model(
dirname=save_dir,
executor=self.exe,
params_filename='__params__',
feeded_var_names=test_input_names,
target_vars=test_outputs,
main_program=self.test_prog)
model_info = self.get_model_info()
model_info['status'] = 'Infer'
# Save input and output descrition of model
model_info['_ModelInputsOutputs'] = dict()
model_info['_ModelInputsOutputs']['test_inputs'] = [
[k, v.name] for k, v in self.test_inputs.items()
]
model_info['_ModelInputsOutputs']['test_outputs'] = [
[k, v.name] for k, v in self.test_outputs.items()
]
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# The flag of model for saving successfully
open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model for inference deploy saved in {}.".format(save_dir))
def export_quant_model(self,
dataset,
save_dir,
batch_size=1,
batch_nums=10,
cache_dir="./.temp"):
self.arrange_transform(transforms=dataset.transforms, mode='quant')
dataset.num_samples = batch_size * batch_nums
try:
from utils import HumanSegPostTrainingQuantization
except:
raise Exception(
"Model Quantization is not available, try to upgrade your paddlepaddle>=1.7.0"
)
is_use_cache_file = True
if cache_dir is None:
is_use_cache_file = False
post_training_quantization = HumanSegPostTrainingQuantization(
executor=self.exe,
dataset=dataset,
program=self.test_prog,
inputs=self.test_inputs,
outputs=self.test_outputs,
batch_size=batch_size,
batch_nums=batch_nums,
scope=None,
algo='KL',
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
is_full_quantize=False,
is_use_cache_file=is_use_cache_file,
cache_dir=cache_dir)
post_training_quantization.quantize()
post_training_quantization.save_quantized_model(save_dir)
if cache_dir is not None:
os.system('rm -r' + cache_dir)
model_info = self.get_model_info()
model_info['status'] = 'Quant'
# Save input and output descrition of model
model_info['_ModelInputsOutputs'] = dict()
model_info['_ModelInputsOutputs']['test_inputs'] = [
[k, v.name] for k, v in self.test_inputs.items()
]
model_info['_ModelInputsOutputs']['test_outputs'] = [
[k, v.name] for k, v in self.test_outputs.items()
]
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# The flag of model for saving successfully
open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model for quant saved in {}.".format(save_dir))
def default_optimizer(self,
learning_rate,
num_epochs,
num_steps_each_epoch,
lr_decay_power=0.9,
regularization_coeff=4e-5):
decay_step = num_epochs * num_steps_each_epoch
lr_decay = fluid.layers.polynomial_decay(
learning_rate,
decay_step,
end_learning_rate=0,
power=lr_decay_power)
optimizer = fluid.optimizer.Momentum(
lr_decay,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=regularization_coeff))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=2,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrained_weights=None,
resume_weights=None,
optimizer=None,
learning_rate=0.01,
lr_decay_power=0.9,
regularization_coeff=4e-5,
use_vdl=False,
quant=False):
self.labels = train_dataset.labels
self.train_transforms = train_dataset.transforms
self.train_init = locals()
self.begin_epoch = 0
if optimizer is None:
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
num_epochs=num_epochs,
num_steps_each_epoch=num_steps_each_epoch,
lr_decay_power=lr_decay_power,
regularization_coeff=regularization_coeff)
self.optimizer = optimizer
self.build_program()
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrained_weights=pretrained_weights,
resume_weights=resume_weights)
# 进行量化
if quant:
# 当 for_test=False ,返回类型为 fluid.CompiledProgram
# 当 for_test=True ,返回类型为 fluid.Program
self.train_prog = slim.quant.quant_aware(
self.train_prog, self.exe.place, for_test=False)
self.test_prog = slim.quant.quant_aware(
self.test_prog, self.exe.place, for_test=True)
# self.parallel_train_prog = self.train_prog.with_data_parallel(
# loss_name=self.train_outputs['loss'].name)
self.status = 'Quant'
if self.begin_epoch >= num_epochs:
raise ValueError(
("begin epoch[{}] is larger than num_epochs[{}]").format(
self.begin_epoch, num_epochs))
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
# add arrange op tor transforms
self.arrange_transform(
transforms=train_dataset.transforms, mode='train')
self.build_train_data_loader(
dataset=train_dataset, batch_size=train_batch_size)
if eval_dataset is not None:
self.eval_transforms = eval_dataset.transforms
self.test_transforms = copy.deepcopy(eval_dataset.transforms)
lr = self.optimizer._learning_rate
lr.persistable = True
if isinstance(lr, fluid.framework.Variable):
self.train_outputs['lr'] = lr
# 多卡训练
if self.parallel_train_prog is None:
build_strategy = fluid.compiler.BuildStrategy()
if self.env_info['place'] != 'cpu' and len(self.places) > 1:
build_strategy.sync_batch_norm = self.sync_bn
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1
if quant:
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
self.parallel_train_prog = self.train_prog.with_data_parallel(
loss_name=self.train_outputs['loss'].name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
else:
self.parallel_train_prog = fluid.CompiledProgram(
self.train_prog).with_data_parallel(
loss_name=self.train_outputs['loss'].name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
total_num_steps = math.floor(
train_dataset.num_samples / train_batch_size)
num_steps = 0
time_stat = list()
time_train_one_epoch = None
time_eval_one_epoch = None
total_num_steps_eval = 0
# eval times
total_eval_times = math.ceil(num_epochs / save_interval_epochs)
eval_batch_size = train_batch_size
if eval_dataset is not None:
total_num_steps_eval = math.ceil(
eval_dataset.num_samples / eval_batch_size)
if use_vdl:
from visualdl import LogWriter
vdl_logdir = osp.join(save_dir, 'vdl_log')
log_writer = LogWriter(vdl_logdir)
best_miou = -1.0
best_model_epoch = 1
for i in range(self.begin_epoch, num_epochs):
records = list()
step_start_time = time.time()
epoch_start_time = time.time()
for step, data in enumerate(self.train_data_loader()):
outputs = self.exe.run(
self.parallel_train_prog,
feed=data,
fetch_list=list(self.train_outputs.values()))
outputs_avg = np.mean(np.array(outputs), axis=1)
records.append(outputs_avg)
# time estimated to complete the training
currend_time = time.time()
step_cost_time = currend_time - step_start_time
step_start_time = currend_time
if len(time_stat) < 20:
time_stat.append(step_cost_time)
else:
time_stat[num_steps % 20] = step_cost_time
num_steps += 1
if num_steps % log_interval_steps == 0:
step_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), outputs_avg))
if use_vdl:
for k, v in step_metrics.items():
log_writer.add_scalar(
step=num_steps,
tag='train/{}'.format(k),
value=v)
# 计算剩余时间
avg_step_time = np.mean(time_stat)
if time_train_one_epoch is not None:
eta = (num_epochs - i - 1) * time_train_one_epoch + (
total_num_steps - step - 1) * avg_step_time
else:
eta = ((num_epochs - i) * total_num_steps - step -
1) * avg_step_time
if time_eval_one_epoch is not None:
eval_eta = (total_eval_times - i // save_interval_epochs
) * time_eval_one_epoch
else:
eval_eta = (total_eval_times - i // save_interval_epochs
) * total_num_steps_eval * avg_step_time
eta_str = seconds_to_hms(eta + eval_eta)
logging.info(
"[TRAIN] Epoch={}/{}, Step={}/{}, {}, time_each_step={}s, eta={}"
.format(i + 1, num_epochs, step + 1, total_num_steps,
dict2str(step_metrics), round(avg_step_time, 2),
eta_str))
train_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), np.mean(records, axis=0)))
logging.info('[TRAIN] Epoch {} finished, {} .'.format(
i + 1, dict2str(train_metrics)))
time_train_one_epoch = time.time() - epoch_start_time
eval_epoch_start_time = time.time()
if (i + 1) % save_interval_epochs == 0 or i == num_epochs - 1:
current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
if not osp.isdir(current_save_dir):
os.makedirs(current_save_dir)
if eval_dataset is not None:
self.eval_metrics = self.evaluate(
eval_dataset=eval_dataset,
batch_size=eval_batch_size,
epoch_id=i + 1)
# 保存最优模型
current_miou = self.eval_metrics['miou']
if current_miou > best_miou:
best_miou = current_miou
best_model_epoch = i + 1
best_model_dir = osp.join(save_dir, "best_model")
self.save_model(save_dir=best_model_dir)
if use_vdl:
for k, v in self.eval_metrics.items():
if isinstance(v, list):
continue
if isinstance(v, np.ndarray):
if v.size > 1:
continue
log_writer.add_scalar(
step=num_steps,
tag='evaluate/{}'.format(k),
value=v)
self.save_model(save_dir=current_save_dir)
time_eval_one_epoch = time.time() - eval_epoch_start_time
if eval_dataset is not None:
logging.info(
'Current evaluated best model in eval_dataset is epoch_{}, miou={}'
.format(best_model_epoch, best_miou))
def evaluate(self, eval_dataset, batch_size=1, epoch_id=None):
"""评估。
Args:
eval_dataset (paddlex.datasets): 评估数据读取器。
batch_size (int): 评估时的batch大小。默认1。
epoch_id (int): 当前评估模型所在的训练轮数。
return_details (bool): 是否返回详细信息。默认False。
Returns:
dict: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
tuple (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
包含关键字:'confusion_matrix',表示评估的混淆矩阵。
"""
self.arrange_transform(transforms=eval_dataset.transforms, mode='train')
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
conf_mat = ConfusionMatrix(self.num_classes, streaming=True)
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
if not hasattr(self, 'parallel_test_prog'):
self.parallel_test_prog = fluid.CompiledProgram(
self.test_prog).with_data_parallel(
share_vars_from=self.parallel_train_prog)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data])
labels = np.array([d[1] for d in data])
num_samples = images.shape[0]
if num_samples < batch_size:
num_pad_samples = batch_size - num_samples
pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
images = np.concatenate([images, pad_images])
feed_data = {'image': images}
outputs = self.exe.run(
self.parallel_test_prog,
feed=feed_data,
fetch_list=list(self.test_outputs.values()),
return_numpy=True)
pred = outputs[0]
if num_samples < batch_size:
pred = pred[0:num_samples]
mask = labels != self.ignore_index
conf_mat.calculate(pred=pred, label=labels, ignore=mask)
_, iou = conf_mat.mean_iou()
logging.debug("[EVAL] Epoch={}, Step={}/{}, iou={}".format(
epoch_id, step + 1, total_steps, iou))
category_iou, miou = conf_mat.mean_iou()
category_acc, macc = conf_mat.accuracy()
metrics = OrderedDict(
zip(['miou', 'category_iou', 'macc', 'category_acc', 'kappa'],
[miou, category_iou, macc, category_acc,
conf_mat.kappa()]))
logging.info('[EVAL] Finished, Epoch={}, {} .'.format(
epoch_id, dict2str(metrics)))
return metrics
def predict(self, im_file, transforms=None):
"""预测。
Args:
img_file(str|np.ndarray): 预测图像。
transforms(paddlex.cv.transforms): 数据预处理操作。
Returns:
dict: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,
像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)
"""
if isinstance(im_file, str):
if not osp.exists(im_file):
raise ValueError(
'The Image file does not exist: {}'.format(im_file))
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transform(transforms=transforms, mode='test')
im, im_info = transforms(im_file)
else:
self.arrange_transform(transforms=self.test_transforms, mode='test')
im, im_info = self.test_transforms(im_file)
im = np.expand_dims(im, axis=0)
result = self.exe.run(
self.test_prog,
feed={'image': im},
fetch_list=list(self.test_outputs.values()))
pred = result[0]
logit = result[1]
logit = np.squeeze(logit)
logit = np.transpose(logit, (1, 2, 0))
pred = np.squeeze(pred).astype('uint8')
keys = list(im_info.keys())
for k in keys[::-1]:
if k == 'shape_before_resize':
h, w = im_info[k][0], im_info[k][1]
pred = cv2.resize(pred, (w, h), cv2.INTER_NEAREST)
logit = cv2.resize(logit, (w, h), cv2.INTER_LINEAR)
elif k == 'shape_before_padding':
h, w = im_info[k][0], im_info[k][1]
pred = pred[0:h, 0:w]
logit = logit[0:h, 0:w, :]
return {'label_map': pred, 'score_map': logit}
class HumanSegLite(SegModel):
# DeepLab ShuffleNet
def build_net(self, mode='train'):
"""应根据不同的情况进行构建"""
model = ShuffleSeg(
self.num_classes,
mode=mode,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
class HumanSegServer(SegModel):
# DeepLab Xception
def __init__(self,
num_classes=2,
backbone='Xception65',
output_stride=16,
aspp_with_sep_conv=True,
decoder_use_sep_conv=True,
encoder_with_aspp=True,
enable_decoder=True,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255,
sync_bn=True):
super().__init__(
num_classes=num_classes,
use_bce_loss=use_bce_loss,
use_dice_loss=use_dice_loss,
class_weight=class_weight,
ignore_index=ignore_index,
sync_bn=sync_bn)
self.init_params = locals()
self.output_stride = output_stride
if backbone not in ['Xception65', 'Xception41']:
raise ValueError("backbone: {} is set wrong. it should be one of "
"('Xception65', 'Xception41')".format(backbone))
self.backbone = backbone
self.aspp_with_sep_conv = aspp_with_sep_conv
self.decoder_use_sep_conv = decoder_use_sep_conv
self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder
self.sync_bn = sync_bn
def build_net(self, mode='train'):
model = DeepLabv3p(
self.num_classes,
mode=mode,
backbone=self.backbone,
output_stride=self.output_stride,
aspp_with_sep_conv=self.aspp_with_sep_conv,
decoder_use_sep_conv=self.decoder_use_sep_conv,
encoder_with_aspp=self.encoder_with_aspp,
enable_decoder=self.enable_decoder,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
class HumanSegMobile(SegModel):
def __init__(self,
num_classes=2,
stage1_num_modules=1,
stage1_num_blocks=[1],
stage1_num_channels=[32],
stage2_num_modules=1,
stage2_num_blocks=[2, 2],
stage2_num_channels=[16, 32],
stage3_num_modules=1,
stage3_num_blocks=[2, 2, 2],
stage3_num_channels=[16, 32, 64],
stage4_num_modules=1,
stage4_num_blocks=[2, 2, 2, 2],
stage4_num_channels=[16, 32, 64, 128],
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255,
sync_bn=True):
super().__init__(
num_classes=num_classes,
use_bce_loss=use_bce_loss,
use_dice_loss=use_dice_loss,
class_weight=class_weight,
ignore_index=ignore_index,
sync_bn=sync_bn)
self.init_params = locals()
self.stage1_num_modules = stage1_num_modules
self.stage1_num_blocks = stage1_num_blocks
self.stage1_num_channels = stage1_num_channels
self.stage2_num_modules = stage2_num_modules
self.stage2_num_blocks = stage2_num_blocks
self.stage2_num_channels = stage2_num_channels
self.stage3_num_modules = stage3_num_modules
self.stage3_num_blocks = stage3_num_blocks
self.stage3_num_channels = stage3_num_channels
self.stage4_num_modules = stage4_num_modules
self.stage4_num_blocks = stage4_num_blocks
self.stage4_num_channels = stage4_num_channels
def build_net(self, mode='train'):
"""应根据不同的情况进行构建"""
model = HRNet(
self.num_classes,
mode=mode,
stage1_num_modules=self.stage1_num_modules,
stage1_num_blocks=self.stage1_num_blocks,
stage1_num_channels=self.stage1_num_channels,
stage2_num_modules=self.stage2_num_modules,
stage2_num_blocks=self.stage2_num_blocks,
stage2_num_channels=self.stage2_num_channels,
stage3_num_modules=self.stage3_num_modules,
stage3_num_blocks=self.stage3_num_blocks,
stage3_num_channels=self.stage3_num_channels,
stage4_num_modules=self.stage4_num_modules,
stage4_num_blocks=self.stage4_num_blocks,
stage4_num_channels=self.stage4_num_channels,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
def train(self,
num_epochs,
train_dataset,
train_batch_size=2,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrained_weights=None,
resume_weights=None,
optimizer=None,
learning_rate=0.01,
lr_decay_power=0.9,
regularization_coeff=5e-4,
use_vdl=False,
quant=False):
super().train(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
pretrained_weights=pretrained_weights,
resume_weights=resume_weights,
optimizer=optimizer,
learning_rate=learning_rate,
lr_decay_power=lr_decay_power,
regularization_coeff=regularization_coeff,
use_vdl=use_vdl,
quant=quant)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import yaml
import os.path as osp
import six
import copy
from collections import OrderedDict
import paddle.fluid as fluid
import utils.logging as logging
import models
def load_model(model_dir):
if not osp.exists(osp.join(model_dir, "model.yml")):
raise Exception("There's not model.yml in {}".format(model_dir))
with open(osp.join(model_dir, "model.yml")) as f:
info = yaml.load(f.read(), Loader=yaml.Loader)
status = info['status']
if not hasattr(models, info['Model']):
raise Exception("There's no attribute {} in models".format(
info['Model']))
model = getattr(models, info['Model'])(**info['_init_params'])
if status == "Normal":
startup_prog = fluid.Program()
model.test_prog = fluid.Program()
with fluid.program_guard(model.test_prog, startup_prog):
with fluid.unique_name.guard():
model.test_inputs, model.test_outputs = model.build_net(
mode='test')
model.test_prog = model.test_prog.clone(for_test=True)
model.exe.run(startup_prog)
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
load_dict = pickle.load(f)
fluid.io.set_program_state(model.test_prog, load_dict)
elif status in ['Infer', 'Quant']:
[prog, input_names, outputs] = fluid.io.load_inference_model(
model_dir, model.exe, params_filename='__params__')
model.test_prog = prog
test_outputs_info = info['_ModelInputsOutputs']['test_outputs']
model.test_inputs = OrderedDict()
model.test_outputs = OrderedDict()
for name in input_names:
model.test_inputs[name] = model.test_prog.global_block().var(name)
for i, out in enumerate(outputs):
var_desc = test_outputs_info[i]
model.test_outputs[var_desc[0]] = out
if 'test_transforms' in info:
model.test_transforms = build_transforms(info['test_transforms'])
model.eval_transforms = copy.deepcopy(model.test_transforms)
if '_Attributes' in info:
for k, v in info['_Attributes'].items():
if k in model.__dict__:
model.__dict__[k] = v
logging.info("Model[{}] loaded.".format(info['Model']))
return model
def build_transforms(transforms_info):
import transforms as T
transforms = list()
for op_info in transforms_info:
op_name = list(op_info.keys())[0]
op_attr = op_info[op_name]
if not hasattr(T, op_name):
raise Exception(
"There's no operator named '{}' in transforms".format(op_name))
transforms.append(getattr(T, op_name)(**op_attr))
eval_transforms = T.Compose(transforms)
return eval_transforms
from .backbone import mobilenet_v2
from .backbone import xception
from .deeplabv3p import DeepLabv3p
from .shufflenet_slim import ShuffleSeg
from .hrnet import HRNet
from .mobilenet_v2 import MobileNetV2
from .xception import Xception
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
class MobileNetV2:
def __init__(self,
num_classes=None,
scale=1.0,
output_stride=None,
end_points=None,
decode_points=None):
self.scale = scale
self.num_classes = num_classes
self.output_stride = output_stride
self.end_points = end_points
self.decode_points = decode_points
self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2),
(6, 32, 3, 2), (6, 64, 4, 2),
(6, 96, 3, 1), (6, 160, 3, 2),
(6, 320, 1, 1)]
self.modify_bottle_params(output_stride)
def __call__(self, input):
scale = self.scale
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
# conv1
input = self.conv_bn_layer(
input,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
if_act=True,
name='conv1_1')
layer_count = 1
if check_points(layer_count, self.decode_points):
decode_ends[layer_count] = input
if check_points(layer_count, self.end_points):
return input, decode_ends
# bottleneck sequences
i = 1
in_c = int(32 * scale)
for layer_setting in self.bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
input, depthwise_output = self.invresi_blocks(
input=input,
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name='conv' + str(i))
in_c = int(c * scale)
layer_count += n
if check_points(layer_count, self.decode_points):
decode_ends[layer_count] = depthwise_output
if check_points(layer_count, self.end_points):
return input, decode_ends
# last_conv
output = self.conv_bn_layer(
input=input,
num_filters=int(1280 * scale) if scale > 1.0 else 1280,
filter_size=1,
stride=1,
padding=0,
if_act=True,
name='conv9')
if self.num_classes is not None:
output = fluid.layers.pool2d(
input=output, pool_type='avg', global_pooling=True)
output = fluid.layers.fc(
input=output,
size=self.num_classes,
param_attr=ParamAttr(name='fc10_weights'),
bias_attr=ParamAttr(name='fc10_offset'))
return output
def modify_bottle_params(self, output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s = layer_setting
stride = stride * s
if stride > output_stride:
s = 1
self.bottleneck_params_list[i] = (t, c, n, s)
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def shortcut(self, input, data_residual):
return fluid.layers.elementwise_add(input, data_residual)
def inverted_residual_unit(self,
input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
padding,
expansion_factor,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = self.conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name=name + '_expand')
bottleneck_conv = self.conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
if_act=True,
name=name + '_dwise',
use_cudnn=False)
depthwise_output = bottleneck_conv
linear_out = self.conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=False,
name=name + '_linear')
if ifshortcut:
out = self.shortcut(input=input, data_residual=linear_out)
return out, depthwise_output
else:
return linear_out, depthwise_output
def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
first_block, depthwise_output = self.inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = self.inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle.fluid as fluid
from nets.libs import scope, name_scope
from nets.libs import bn, bn_relu, relu
from nets.libs import conv
from nets.libs import separate_conv
__all__ = ['xception_65', 'xception_41', 'xception_71']
def check_data(data, number):
if type(data) == int:
return [data] * number
assert len(data) == number
return data
def check_stride(s, os):
if s <= os:
return True
else:
return False
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
class Xception():
def __init__(self,
num_classes=None,
layers=65,
output_stride=32,
end_points=None,
decode_points=None):
self.backbone = 'xception_' + str(layers)
self.num_classes = num_classes
self.output_stride = output_stride
self.end_points = end_points
self.decode_points = decode_points
self.bottleneck_params = self.gen_bottleneck_params(self.backbone)
def __call__(
self,
input,
):
self.stride = 2
self.block_point = 0
self.short_cuts = dict()
with scope(self.backbone):
# Entry flow
data = self.entry_flow(input)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
# Middle flow
data = self.middle_flow(data)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
# Exit flow
data = self.exit_flow(data)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
if self.num_classes is not None:
data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
data = fluid.layers.dropout(data, 0.5)
stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
with scope("logit"):
out = fluid.layers.fc(
input=data,
size=self.num_classes,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
name='weights',
initializer=fluid.initializer.Uniform(-stdv, stdv)),
bias_attr=fluid.param_attr.ParamAttr(name='bias'))
return out
else:
return data
def gen_bottleneck_params(self, backbone='xception_65'):
if backbone == 'xception_65':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_41':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (8, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_71':
bottleneck_params = {
"entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
else:
raise Exception(
"xception backbont only support xception_41/xception_65/xception_71"
)
return bottleneck_params
def entry_flow(self, data):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
with scope("entry_flow"):
with scope("conv1"):
data = bn_relu(
conv(
data, 32, 3, stride=2, padding=1,
param_attr=param_attr),
eps=1e-3)
with scope("conv2"):
data = bn_relu(
conv(
data, 64, 3, stride=1, padding=1,
param_attr=param_attr),
eps=1e-3)
# get entry flow params
block_num = self.bottleneck_params["entry_flow"][0]
strides = self.bottleneck_params["entry_flow"][1]
chns = self.bottleneck_params["entry_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("entry_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def middle_flow(self, data):
block_num = self.bottleneck_params["middle_flow"][0]
strides = self.bottleneck_params["middle_flow"][1]
chns = self.bottleneck_params["middle_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("middle_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def exit_flow(self, data):
block_num = self.bottleneck_params["exit_flow"][0]
strides = self.bottleneck_params["exit_flow"][1]
chns = self.bottleneck_params["exit_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
assert (block_num == 2)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("exit_flow"):
with scope('block1'):
block_point += 1
stride = strides[0] if check_stride(s * strides[0],
output_stride) else 1
data, short_cuts = self.xception_block(data, chns[0],
[1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
with scope('block2'):
block_point += 1
stride = strides[1] if check_stride(s * strides[1],
output_stride) else 1
data, short_cuts = self.xception_block(
data,
chns[1], [1, 1, stride],
dilation=2,
has_skip=False,
activation_fn_in_separable_conv=True)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def xception_block(self,
input,
channels,
strides=1,
filters=3,
dilation=1,
skip_conv=True,
has_skip=True,
activation_fn_in_separable_conv=False):
repeat_number = 3
channels = check_data(channels, repeat_number)
filters = check_data(filters, repeat_number)
strides = check_data(strides, repeat_number)
data = input
results = []
for i in range(repeat_number):
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
eps=1e-3)
else:
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
act=relu,
eps=1e-3)
results.append(data)
if not has_skip:
return data, results
if skip_conv:
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope('shortcut'):
skip = bn(
conv(
input,
channels[-1],
1,
strides[-1],
groups=1,
padding=0,
param_attr=param_attr),
eps=1e-3)
else:
skip = input
return data + skip, results
def xception_65(num_classes=None):
model = Xception(num_classes, 65)
return model
def xception_41(num_classes=None):
model = Xception(num_classes, 41)
return model
def xception_71(num_classes=None):
model = Xception(num_classes, 71)
return model
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from .libs import scope, name_scope
from .libs import bn_relu, relu
from .libs import conv
from .libs import separate_conv
from .libs import sigmoid_to_softmax
from .seg_modules import softmax_with_loss
from .seg_modules import dice_loss
from .seg_modules import bce_loss
from .backbone import MobileNetV2
from .backbone import Xception
class DeepLabv3p(object):
"""实现DeepLabv3+模型
`"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
<https://arxiv.org/abs/1802.02611>`
Args:
num_classes (int): 类别数。
backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41',
'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。
mode (str): 网络运行模式,根据mode构建网络的输入和返回。
当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
与label相同大小,bool类型)。
当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。
aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。
decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。
encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。
enable_decoder (bool): 是否使用decoder模块。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes,
backbone='MobileNetV2_x1.0',
mode='train',
output_stride=16,
aspp_with_sep_conv=True,
decoder_use_sep_conv=True,
encoder_with_aspp=True,
enable_decoder=True,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.backbone = backbone
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.output_stride = output_stride
self.aspp_with_sep_conv = aspp_with_sep_conv
self.decoder_use_sep_conv = decoder_use_sep_conv
self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder
def _get_backbone(self, backbone):
def mobilenetv2(backbone):
# backbone: xception结构配置
# output_stride:下采样倍数
# end_points: mobilenetv2的block数
# decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
if '0.25' in backbone:
scale = 0.25
elif '0.5' in backbone:
scale = 0.5
elif '1.0' in backbone:
scale = 1.0
elif '1.5' in backbone:
scale = 1.5
elif '2.0' in backbone:
scale = 2.0
end_points = 18
decode_points = 4
return MobileNetV2(
scale=scale,
output_stride=self.output_stride,
end_points=end_points,
decode_points=decode_points)
def xception(backbone):
# decode_point: 从Xception中引出分支所在block数,作为decoder输入
# end_point:Xception的block数
if '65' in backbone:
decode_points = 2
end_points = 21
layers = 65
if '41' in backbone:
decode_points = 2
end_points = 13
layers = 41
if '71' in backbone:
decode_points = 3
end_points = 23
layers = 71
return Xception(
layers=layers,
output_stride=self.output_stride,
end_points=end_points,
decode_points=decode_points)
if 'Xception' in backbone:
return xception(backbone)
elif 'MobileNetV2' in backbone:
return mobilenetv2(backbone)
def _encoder(self, input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
# OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
# aspp_ratios:ASPP模块空洞卷积的采样率
if self.output_stride == 16:
aspp_ratios = [6, 12, 18]
elif self.output_stride == 8:
aspp_ratios = [12, 24, 36]
else:
raise Exception("DeepLabv3p only support stride 8 or 16")
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('encoder'):
channel = 256
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean(
input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
input_shape = fluid.layers.shape(input)
image_avg = fluid.layers.resize_bilinear(
image_avg, input_shape[2:])
with scope("aspp0"):
aspp0 = bn_relu(
conv(
input,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
with scope("aspp1"):
if self.aspp_with_sep_conv:
aspp1 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
else:
aspp1 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[0],
padding=aspp_ratios[0],
param_attr=param_attr))
with scope("aspp2"):
if self.aspp_with_sep_conv:
aspp2 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
else:
aspp2 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[1],
padding=aspp_ratios[1],
param_attr=param_attr))
with scope("aspp3"):
if self.aspp_with_sep_conv:
aspp3 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
else:
aspp3 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[2],
padding=aspp_ratios[2],
param_attr=param_attr))
with scope("concat"):
data = fluid.layers.concat(
[image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
data = bn_relu(
conv(
data,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9)
return data
def _decoder(self, encode_data, decode_shortcut):
# 解码器配置
# encode_data:编码器输出
# decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
# decoder_use_sep_conv: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('decoder'):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut_shape[2:])
encode_data = fluid.layers.concat(
[encode_data, decode_shortcut], axis=1)
if self.decoder_use_sep_conv:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
with scope("decoder_conv2"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def build_net(self, inputs):
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if self.use_dice_loss or self.use_bce_loss:
self.num_classes = 1
image = inputs['image']
backbone_net = self._get_backbone(self.backbone)
data, decode_shortcuts = backbone_net(image)
decode_shortcut = decode_shortcuts[backbone_net.decode_points]
# 编码器解码器设置
if self.encoder_with_aspp:
data = self._encoder(data)
if self.enable_decoder:
data = self._decoder(data, decode_shortcut)
# 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
with fluid.name_scope('last_conv'):
logit = conv(
data,
self.num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
image_shape = fluid.layers.shape(image)
logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
return logit
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
from .seg_modules import softmax_with_loss
from .seg_modules import dice_loss
from .seg_modules import bce_loss
from .libs import sigmoid_to_softmax
class HRNet(object):
def __init__(self,
num_classes,
mode='train',
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[18, 36],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[18, 36, 72],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[18, 36, 72, 144],
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.stage1_num_modules = stage1_num_modules
self.stage1_num_blocks = stage1_num_blocks
self.stage1_num_channels = stage1_num_channels
self.stage2_num_modules = stage2_num_modules
self.stage2_num_blocks = stage2_num_blocks
self.stage2_num_channels = stage2_num_channels
self.stage3_num_modules = stage3_num_modules
self.stage3_num_blocks = stage3_num_blocks
self.stage3_num_channels = stage3_num_channels
self.stage4_num_modules = stage4_num_modules
self.stage4_num_blocks = stage4_num_blocks
self.stage4_num_channels = stage4_num_channels
def build_net(self, inputs):
image = inputs['image']
logit = self._high_resolution_net(image, self.num_classes)
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
return logit
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def _conv_bn_layer(self,
input,
filter_size,
num_filters,
stride=1,
padding=1,
num_groups=1,
if_act=True,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=num_groups,
act=None,
param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(
name=bn_name + "_scale",
initializer=fluid.initializer.Constant(1.0)),
bias_attr=ParamAttr(
name=bn_name + "_offset",
initializer=fluid.initializer.Constant(0.0)),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
bn = fluid.layers.relu(bn)
return bn
def _basic_block(self,
input,
num_filters,
stride=1,
downsample=False,
name=None):
residual = input
conv = self._conv_bn_layer(
input=input,
filter_size=3,
num_filters=num_filters,
stride=stride,
name=name + '_conv1')
conv = self._conv_bn_layer(
input=conv,
filter_size=3,
num_filters=num_filters,
if_act=False,
name=name + '_conv2')
if downsample:
residual = self._conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_filters,
if_act=False,
name=name + '_downsample')
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def _bottleneck_block(self,
input,
num_filters,
stride=1,
downsample=False,
name=None):
residual = input
conv = self._conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_filters,
name=name + '_conv1')
conv = self._conv_bn_layer(
input=conv,
filter_size=3,
num_filters=num_filters,
stride=stride,
name=name + '_conv2')
conv = self._conv_bn_layer(
input=conv,
filter_size=1,
num_filters=num_filters * 4,
if_act=False,
name=name + '_conv3')
if downsample:
residual = self._conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_filters * 4,
if_act=False,
name=name + '_downsample')
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def _fuse_layers(self, x, channels, multi_scale_output=True, name=None):
out = []
for i in range(len(channels) if multi_scale_output else 1):
residual = x[i]
shape = fluid.layers.shape(residual)[-2:]
for j in range(len(channels)):
if j > i:
y = self._conv_bn_layer(
x[j],
filter_size=1,
num_filters=channels[i],
if_act=False,
name=name + '_layer_' + str(i + 1) + '_' + str(j + 1))
y = fluid.layers.resize_bilinear(input=y, out_shape=shape)
residual = fluid.layers.elementwise_add(
x=residual, y=y, act=None)
elif j < i:
y = x[j]
for k in range(i - j):
if k == i - j - 1:
y = self._conv_bn_layer(
y,
filter_size=3,
num_filters=channels[i],
stride=2,
if_act=False,
name=name + '_layer_' + str(i + 1) + '_' +
str(j + 1) + '_' + str(k + 1))
else:
y = self._conv_bn_layer(
y,
filter_size=3,
num_filters=channels[j],
stride=2,
name=name + '_layer_' + str(i + 1) + '_' +
str(j + 1) + '_' + str(k + 1))
residual = fluid.layers.elementwise_add(
x=residual, y=y, act=None)
residual = fluid.layers.relu(residual)
out.append(residual)
return out
def _branches(self, x, block_num, channels, name=None):
out = []
for i in range(len(channels)):
residual = x[i]
for j in range(block_num[i]):
residual = self._basic_block(
residual,
channels[i],
name=name + '_branch_layer_' + str(i + 1) + '_' +
str(j + 1))
out.append(residual)
return out
def _high_resolution_module(self,
x,
blocks,
channels,
multi_scale_output=True,
name=None):
residual = self._branches(x, blocks, channels, name=name)
out = self._fuse_layers(
residual,
channels,
multi_scale_output=multi_scale_output,
name=name)
return out
def _transition_layer(self, x, in_channels, out_channels, name=None):
num_in = len(in_channels)
num_out = len(out_channels)
out = []
for i in range(num_out):
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self._conv_bn_layer(
x[i],
filter_size=3,
num_filters=out_channels[i],
name=name + '_layer_' + str(i + 1))
out.append(residual)
else:
out.append(x[i])
else:
residual = self._conv_bn_layer(
x[-1],
filter_size=3,
num_filters=out_channels[i],
stride=2,
name=name + '_layer_' + str(i + 1))
out.append(residual)
return out
def _stage(self,
x,
num_modules,
num_blocks,
num_channels,
multi_scale_output=True,
name=None):
out = x
for i in range(num_modules):
if i == num_modules - 1 and multi_scale_output == False:
out = self._high_resolution_module(
out,
num_blocks,
num_channels,
multi_scale_output=False,
name=name + '_' + str(i + 1))
else:
out = self._high_resolution_module(
out, num_blocks, num_channels, name=name + '_' + str(i + 1))
return out
def _layer1(self, input, num_modules, num_blocks, num_channels, name=None):
# num_modules 默认为1,是否增加处理,官网实现为[1],是否对齐。
conv = input
for i in range(num_blocks[0]):
conv = self._bottleneck_block(
conv,
num_filters=num_channels[0],
downsample=True if i == 0 else False,
name=name + '_' + str(i + 1))
return conv
def _high_resolution_net(self, input, num_classes):
x = self._conv_bn_layer(
input=input,
filter_size=3,
num_filters=self.stage1_num_channels[0],
stride=2,
if_act=True,
name='layer1_1')
x = self._conv_bn_layer(
input=x,
filter_size=3,
num_filters=self.stage1_num_channels[0],
stride=2,
if_act=True,
name='layer1_2')
la1 = self._layer1(
x,
self.stage1_num_modules,
self.stage1_num_blocks,
self.stage1_num_channels,
name='layer2')
tr1 = self._transition_layer([la1],
self.stage1_num_channels,
self.stage2_num_channels,
name='tr1')
st2 = self._stage(
tr1,
self.stage2_num_modules,
self.stage2_num_blocks,
self.stage2_num_channels,
name='st2')
tr2 = self._transition_layer(
st2, self.stage2_num_channels, self.stage3_num_channels, name='tr2')
st3 = self._stage(
tr2,
self.stage3_num_modules,
self.stage3_num_blocks,
self.stage3_num_channels,
name='st3')
tr3 = self._transition_layer(
st3, self.stage3_num_channels, self.stage4_num_channels, name='tr3')
st4 = self._stage(
tr3,
self.stage4_num_modules,
self.stage4_num_blocks,
self.stage4_num_channels,
name='st4')
# upsample
shape = fluid.layers.shape(st4[0])[-2:]
st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=shape)
st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=shape)
st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
out = fluid.layers.concat(st4, axis=1)
last_channels = sum(self.stage4_num_channels)
out = self._conv_bn_layer(
input=out,
filter_size=1,
num_filters=last_channels,
stride=1,
if_act=True,
name='conv-2')
out = fluid.layers.conv2d(
input=out,
num_filters=num_classes,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(initializer=MSRA(), name='conv-1_weights'),
bias_attr=False)
input_shape = fluid.layers.shape(input)[-2:]
out = fluid.layers.resize_bilinear(out, input_shape)
return out
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import contextlib
bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
name_scope = ""
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def max_pool(input, kernel, stride, padding):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='max',
pool_stride=stride,
pool_padding=padding)
return data
def avg_pool(input, kernel, stride, padding=0):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='avg',
pool_stride=stride,
pool_padding=padding)
return data
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape
if C % G != 0:
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
break
assert C % G == 0, "group can not divide channle"
x = fluid.layers.group_norm(
input,
groups=G,
param_attr=param_attr,
bias_attr=bias_attr,
name=name_scope + 'group_norm')
return x
def bn(*args,
norm_type='bn',
eps=1e-5,
bn_momentum=0.99,
group_norm=32,
**kargs):
if norm_type == 'bn':
with scope('BatchNorm'):
return fluid.layers.batch_norm(
*args,
epsilon=eps,
momentum=bn_momentum,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
elif norm_type == 'gn':
with scope('GroupNorm'):
return group_norm(
args[0],
group_norm,
eps=eps,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer))
else:
raise Exception("Unsupport norm type:" + norm_type)
def bn_relu(data, norm_type='bn', eps=1e-5):
return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
def relu(data):
return fluid.layers.relu(data)
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = fluid.ParamAttr(
name=name_scope + 'biases',
regularizer=None,
initializer=fluid.initializer.ConstantInitializer(value=0.0))
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d(*args, **kargs)
def deconv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d_transpose(*args, **kargs)
def separate_conv(input,
channel,
stride,
filter,
dilation=1,
act=None,
eps=1e-5):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter // 2) * dilation,
dilation=dilation,
use_cudnn=False,
param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
return input
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=fluid.ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def sigmoid_to_softmax(input):
"""
one channel to two channel
"""
logit = fluid.layers.sigmoid(input)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=1)
return logit
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
import numpy as np
def softmax_with_loss(logit,
label,
ignore_mask=None,
num_classes=2,
weight=None,
ignore_index=255):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
if weight is None:
loss, probs = fluid.layers.softmax_with_cross_entropy(
logit, label, ignore_index=ignore_index, return_softmax=True)
else:
label_one_hot = fluid.one_hot(input=label, depth=num_classes)
if isinstance(weight, list):
assert len(
weight
) == num_classes, "weight length must equal num of classes"
weight = fluid.layers.assign(np.array([weight], dtype='float32'))
elif isinstance(weight, str):
assert weight.lower(
) == 'dynamic', 'if weight is string, must be dynamic!'
tmp = []
total_num = fluid.layers.cast(
fluid.layers.shape(label)[0], 'float32')
for i in range(num_classes):
cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
ratio = total_num / (cls_pixel_num + 1)
tmp.append(ratio)
weight = fluid.layers.concat(tmp)
weight = weight / fluid.layers.reduce_sum(weight) * num_classes
elif isinstance(weight, fluid.layers.Variable):
pass
else:
raise ValueError(
'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes])
weighted_label_one_hot = fluid.layers.elementwise_mul(
label_one_hot, weight)
probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy(
probs,
weighted_label_one_hot,
soft_label=True,
ignore_index=ignore_index)
weighted_label_one_hot.stop_gradient = True
loss = loss * ignore_mask
avg_loss = fluid.layers.mean(loss) / (
fluid.layers.mean(ignore_mask) + 0.00001)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception(
"dice loss is only applicable to one channel classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
label = fluid.layers.transpose(label, [0, 2, 3, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit = logit * ignore_mask
label = label * ignore_mask
reduce_dim = list(range(1, len(logit.shape)))
inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
dice_denominator = fluid.layers.reduce_sum(
logit, dim=reduce_dim) + fluid.layers.reduce_sum(
label, dim=reduce_dim)
dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
def bce_loss(logit, label, ignore_mask=None, ignore_index=255):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
label = fluid.layers.cast(label, 'float32')
loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=logit, label=label, ignore_index=ignore_index,
normalize=True) # or False
loss = fluid.layers.reduce_sum(loss)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return loss
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
from .libs import sigmoid_to_softmax
from .seg_modules import softmax_with_loss
from .seg_modules import dice_loss
from .seg_modules import bce_loss
class ShuffleSeg(object):
# def __init__(self):
# self.params = train_parameters
def __init__(self,
num_classes,
mode='train',
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def build_net(self, inputs, class_dim=2):
if self.use_dice_loss or self.use_bce_loss:
self.num_classes = 1
image = inputs['image']
## Encoder
conv1 = self.conv_bn(image, 3, 36, 2, 1)
print('encoder 1', conv1.shape)
shortcut = self.conv_bn(
input=conv1, filter_size=1, num_filters=18, stride=1, padding=0)
print('shortcut 1', shortcut.shape)
pool = fluid.layers.pool2d(
input=conv1,
pool_size=3,
pool_type='max',
pool_stride=2,
pool_padding=1)
print('encoder 2', pool.shape)
# Block 1
conv = self.sfnetv2module(pool, stride=2, num_filters=72)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
print('encoder 3', conv.shape)
# Block 2
conv = self.sfnetv2module(conv, stride=2)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
conv = self.sfnetv2module(conv, stride=1)
print('encoder 4', conv.shape)
### decoder
conv = self.depthwise_separable(conv, 3, 64, 1)
shortcut_shape = fluid.layers.shape(shortcut)[2:]
conv_b = fluid.layers.resize_bilinear(conv, shortcut_shape)
concat = fluid.layers.concat([shortcut, conv_b], axis=1)
decode_conv = self.depthwise_separable(concat, 3, 64, 1)
logit = self.output_layer(decode_conv, class_dim)
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
return logit
def conv_bn(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
parameter_attr = ParamAttr(learning_rate=1, initializer=MSRA())
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def depthwise_separable(self, input, filter_size, num_filters, stride):
num_filters1 = int(input.shape[1])
num_groups = num_filters1
depthwise_conv = self.conv_bn(
input=input,
filter_size=filter_size,
num_filters=int(num_filters1),
stride=stride,
padding=int(filter_size / 2),
num_groups=num_groups,
use_cudnn=False,
act=None)
pointwise_conv = self.conv_bn(
input=depthwise_conv,
filter_size=1,
num_filters=num_filters,
stride=1,
padding=0)
return pointwise_conv
def sfnetv2module(self, input, stride, num_filters=None):
if stride == 1:
shortcut, branch = fluid.layers.split(
input, num_or_sections=2, dim=1)
if num_filters is None:
in_channels = int(branch.shape[1])
else:
in_channels = int(num_filters / 2)
else:
branch = input
if num_filters is None:
in_channels = int(branch.shape[1])
else:
in_channels = int(num_filters / 2)
shortcut = self.depthwise_separable(input, 3, in_channels, stride)
branch_1x1 = self.conv_bn(
input=branch,
filter_size=1,
num_filters=int(in_channels),
stride=1,
padding=0)
branch_dw1x1 = self.depthwise_separable(branch_1x1, 3, in_channels,
stride)
output = fluid.layers.concat(input=[shortcut, branch_dw1x1], axis=1)
# channel shuffle
# b, c, h, w = output.shape
shape = fluid.layers.shape(output)
c = output.shape[1]
b, h, w = shape[0], shape[2], shape[3]
output = fluid.layers.reshape(x=output, shape=[b, 2, in_channels, h, w])
output = fluid.layers.transpose(x=output, perm=[0, 2, 1, 3, 4])
output = fluid.layers.reshape(x=output, shape=[b, c, h, w])
return output
def output_layer(self, input, out_dim):
param_attr = fluid.param_attr.ParamAttr(
learning_rate=1.,
regularizer=fluid.regularizer.L2Decay(0.),
initializer=fluid.initializer.Xavier())
# deconv
output = fluid.layers.conv2d_transpose(
input=input,
num_filters=out_dim,
filter_size=2,
padding=0,
stride=2,
bias_attr=True,
param_attr=param_attr,
act=None)
return output
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
TEST_PATH = os.path.join(LOCAL_PATH, "../../../", "test")
sys.path.append(TEST_PATH)
from test_utils import download_file_and_uncompress
model_urls = {
"humanseg_server_ckpt":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_ckpt.zip",
"humanseg_server_inference":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip",
"humanseg_mobile_ckpt":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_ckpt.zip",
"humanseg_mobile_inference":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip",
"humanseg_mobile_quant":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip",
"humanseg_lite_ckpt":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_ckpt.zip",
"humanseg_lite_inference":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_inference.zip",
"humanseg_lite_quant":
"https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_quant.zip",
}
if __name__ == "__main__":
for model_name, url in model_urls.items():
download_file_and_uncompress(
url=url,
savepath=LOCAL_PATH,
extrapath=LOCAL_PATH,
extraname=model_name)
print("Pretrained Model download success!")
import argparse
from datasets.dataset import Dataset
import transforms
import models
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for quant',
type=str,
default='output/best_model')
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=1)
parser.add_argument(
'--batch_nums',
dest='batch_nums',
help='Batch number for quant',
type=int,
default=10)
parser.add_argument(
'--data_dir',
dest='data_dir',
help='the root directory of dataset',
type=str)
parser.add_argument(
'--quant_list',
dest='quant_list',
help=
'Image file list for model quantization, it can be vat.txt or train.txt',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the quant model',
type=str,
default='./output/quant_offline')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def evaluate(args):
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
eval_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.quant_list,
transforms=eval_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=False)
model = models.load_model(args.model_dir)
model.export_quant_model(
dataset=eval_dataset,
save_dir=args.save_dir,
batch_size=args.batch_size,
batch_nums=args.batch_nums)
if __name__ == '__main__':
args = parse_args()
evaluate(args)
import argparse
from datasets.dataset import Dataset
from models import HumanSegMobile, HumanSegLite, HumanSegServer
import transforms
MODEL_TYPE = ['HumanSegMobile', 'HumanSegLite', 'HumanSegServer']
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_type',
dest='model_type',
help=
"Model type for traing, which is one of ('HumanSegMobile', 'HumanSegLite', 'HumanSegServer')",
type=str,
default='HumanSegMobile')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--train_list',
dest='train_list',
help='Train list file of dataset',
type=str)
parser.add_argument(
'--val_list',
dest='val_list',
help='Val list file of dataset',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the model snapshot',
type=str,
default='./output/quant_train')
parser.add_argument(
'--num_classes',
dest='num_classes',
help='Number of classes',
type=int,
default=2)
parser.add_argument(
'--num_epochs',
dest='num_epochs',
help='Number epochs for training',
type=int,
default=2)
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=128)
parser.add_argument(
'--learning_rate',
dest='learning_rate',
help='Learning rate',
type=float,
default=0.001)
parser.add_argument(
'--pretrained_weights',
dest='pretrained_weights',
help='The model path for quant',
type=str,
default=None)
parser.add_argument(
'--save_interval_epochs',
dest='save_interval_epochs',
help='The interval epochs for save a model snapshot',
type=int,
default=1)
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def train(args):
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Resize(args.image_shape),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
train_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.train_list,
transforms=train_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=True)
eval_dataset = None
if args.val_list is not None:
eval_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.val_list,
transforms=eval_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=False)
if args.model_type == 'HumanSegMobile':
model = HumanSegMobile(num_classes=2)
elif args.model_type == 'HumanSegLite':
model = HumanSegLite(num_classes=2)
elif args.model_type == 'HumanSegServer':
model = HumanSegServer(num_classes=2)
else:
raise ValueError(
"--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', "
"'HumanSegLite', 'HumanSegServer')".format(args.model_type))
model.train(
num_epochs=args.num_epochs,
train_dataset=train_dataset,
train_batch_size=args.batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=args.save_interval_epochs,
save_dir=args.save_dir,
pretrained_weights=args.pretrained_weights,
learning_rate=args.learning_rate,
quant=True)
if __name__ == '__main__':
args = parse_args()
train(args)
visualdl == 2.0.0-alpha.1
paddleslim
import argparse
from datasets.dataset import Dataset
from models import HumanSegMobile, HumanSegLite, HumanSegServer
import transforms
MODEL_TYPE = ['HumanSegMobile', 'HumanSegLite', 'HumanSegServer']
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_type',
dest='model_type',
help=
"Model type for traing, which is one of ('HumanSegMobile', 'HumanSegLite', 'HumanSegServer')",
type=str,
default='HumanSegMobile')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--train_list',
dest='train_list',
help='Train list file of dataset',
type=str)
parser.add_argument(
'--val_list',
dest='val_list',
help='Val list file of dataset',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the model snapshot',
type=str,
default='./output')
parser.add_argument(
'--num_classes',
dest='num_classes',
help='Number of classes',
type=int,
default=2)
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
parser.add_argument(
'--num_epochs',
dest='num_epochs',
help='Number epochs for training',
type=int,
default=100)
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=128)
parser.add_argument(
'--learning_rate',
dest='learning_rate',
help='Learning rate',
type=float,
default=0.01)
parser.add_argument(
'--pretrained_weights',
dest='pretrained_weights',
help='The path of pretrianed weight',
type=str,
default=None)
parser.add_argument(
'--resume_weights',
dest='resume_weights',
help='The path of resume weight',
type=str,
default=None)
parser.add_argument(
'--use_vdl',
dest='use_vdl',
help='Whether to use visualdl',
type=bool,
default=True)
parser.add_argument(
'--save_interval_epochs',
dest='save_interval_epochs',
help='The interval epochs for save a model snapshot',
type=int,
default=5)
return parser.parse_args()
def train(args):
train_transforms = transforms.Compose([
transforms.Resize(args.image_shape),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
train_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.train_list,
transforms=train_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=True)
eval_dataset = None
if args.val_list is not None:
eval_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.val_list,
transforms=eval_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=False)
if args.model_type == 'HumanSegMobile':
model = HumanSegMobile(num_classes=2)
elif args.model_type == 'HumanSegLite':
model = HumanSegLite(num_classes=2)
elif args.model_type == 'HumanSegServer':
model = HumanSegServer(num_classes=2)
else:
raise ValueError(
"--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', "
"'HumanSegLite', 'HumanSegServer')".format(args.model_type))
model.train(
num_epochs=args.num_epochs,
train_dataset=train_dataset,
train_batch_size=args.batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=args.save_interval_epochs,
save_dir=args.save_dir,
pretrained_weights=args.pretrained_weights,
resume_weights=args.resume_weights,
learning_rate=args.learning_rate,
use_vdl=args.use_vdl)
if __name__ == '__main__':
args = parse_args()
train(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .transforms import *
from . import functional
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
from PIL import Image, ImageEnhance
def normalize(im, mean, std):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
def permute(im):
im = np.transpose(im, (2, 0, 1))
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR):
value = max(im.shape[0], im.shape[1])
scale = float(long_size) / float(value)
resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale))
im = cv2.resize(
im, (resized_width, resized_height), interpolation=interpolation)
return im
def horizontal_flip(im):
if len(im.shape) == 3:
im = im[:, ::-1, :]
elif len(im.shape) == 2:
im = im[:, ::-1]
return im
def vertical_flip(im):
if len(im.shape) == 3:
im = im[::-1, :, :]
elif len(im.shape) == 2:
im = im[::-1, :]
return im
def brightness(im, brightness_lower, brightness_upper):
brightness_delta = np.random.uniform(brightness_lower, brightness_upper)
im = ImageEnhance.Brightness(im).enhance(brightness_delta)
return im
def contrast(im, contrast_lower, contrast_upper):
contrast_delta = np.random.uniform(contrast_lower, contrast_upper)
im = ImageEnhance.Contrast(im).enhance(contrast_delta)
return im
def saturation(im, saturation_lower, saturation_upper):
saturation_delta = np.random.uniform(saturation_lower, saturation_upper)
im = ImageEnhance.Color(im).enhance(saturation_delta)
return im
def hue(im, hue_lower, hue_upper):
hue_delta = np.random.uniform(hue_lower, hue_upper)
im = np.array(im.convert('HSV'))
im[:, :, 0] = im[:, :, 0] + hue_delta
im = Image.fromarray(im, mode='HSV').convert('RGB')
return im
def rotate(im, rotate_lower, rotate_upper):
rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
im = im.rotate(int(rotate_delta))
return im
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .functional import *
import random
import numpy as np
from PIL import Image
import cv2
from collections import OrderedDict
class Compose:
"""根据数据预处理/增强算子对输入数据进行操作。
所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
Args:
transforms (list): 数据预处理/增强算子。
to_rgb (bool): 是否转化为rgb通道格式
Raises:
TypeError: transforms不是list对象
ValueError: transforms元素个数小于1。
"""
def __init__(self, transforms, to_rgb=False):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
if len(transforms) < 1:
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息,dict中的字段如下:
- shape_before_resize (tuple): 图像resize之前的大小(h, w)。
- shape_before_padding (tuple): 图像padding之前的大小(h, w)。
label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。
Returns:
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
"""
if im_info is None:
im_info = dict()
if isinstance(im, str):
im = cv2.imread(im).astype('float32')
if isinstance(label, str):
label = np.asarray(Image.open(label))
if im is None:
raise ValueError('Can\'t read The image file {}!'.format(im))
if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
for op in self.transforms:
outputs = op(im, im_info, label)
im = outputs[0]
if len(outputs) >= 2:
im_info = outputs[1]
if len(outputs) == 3:
label = outputs[2]
return outputs
class RandomHorizontalFlip:
"""以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机水平翻转的概率。默认值为0.5。
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = horizontal_flip(im)
if label is not None:
label = horizontal_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomVerticalFlip:
"""以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机垂直翻转的概率。默认值为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = vertical_flip(im)
if label is not None:
label = vertical_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Resize:
"""调整图像大小(resize)。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size。
注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。
Args:
target_size (int/list/tuple): 短边目标长度。默认为608。
interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为
['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC',
'AREA', 'LANCZOS4', 'RANDOM']中。
"""
# The interpolation mode
interp_dict = {
'NEAREST': cv2.INTER_NEAREST,
'LINEAR': cv2.INTER_LINEAR,
'CUBIC': cv2.INTER_CUBIC,
'AREA': cv2.INTER_AREA,
'LANCZOS4': cv2.INTER_LANCZOS4
}
def __init__(self, target_size=512, interp='LINEAR'):
self.interp = interp
if not (interp == "RANDOM" or interp in self.interp_dict):
raise ValueError("interp should be one of {}".format(
self.interp_dict.keys()))
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise TypeError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info跟新字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
if not isinstance(im, np.ndarray):
raise TypeError("Resize: image type is not numpy.")
if len(im.shape) != 3:
raise ValueError('Resize: image is not 3-dimensional.')
if self.interp == "RANDOM":
interp = random.choice(list(self.interp_dict.keys()))
else:
interp = self.interp
im = resize(im, self.target_size, self.interp_dict[interp])
if label is not None:
label = resize(label, self.target_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeByLong:
"""对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
long_size (int): resize后图像的长边大小。
"""
def __init__(self, long_size):
self.long_size = long_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
im = resize_long(im, self.long_size)
if label is not None:
label = resize_long(label, self.long_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeRangeScaling:
"""对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
min_value (int): 图像长边resize后的最小值。默认值400。
max_value (int): 图像长边resize后的最大值。默认值600。
Raises:
ValueError: min_value大于max_value
"""
def __init__(self, min_value=400, max_value=600):
if min_value > max_value:
raise ValueError('min_value must be less than max_value, '
'but they are {} and {}.'.format(
min_value, max_value))
self.min_value = min_value
self.max_value = max_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_value == self.max_value:
random_size = self.max_value
else:
random_size = int(
np.random.uniform(self.min_value, self.max_value) + 0.5)
im = resize_long(im, random_size, cv2.INTER_LINEAR)
if label is not None:
label = resize_long(label, random_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeStepScaling:
"""对图像按照某一个比例resize,这个比例以scale_step_size为步长
在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。
Args:
min_scale_factor(float), resize最小尺度。默认值0.75。
max_scale_factor (float), resize最大尺度。默认值1.25。
scale_step_size (float), resize尺度范围间隔。默认值0.25。
Raises:
ValueError: min_scale_factor大于max_scale_factor
"""
def __init__(self,
min_scale_factor=0.75,
max_scale_factor=1.25,
scale_step_size=0.25):
if min_scale_factor > max_scale_factor:
raise ValueError(
'min_scale_factor must be less than max_scale_factor, '
'but they are {} and {}.'.format(min_scale_factor,
max_scale_factor))
self.min_scale_factor = min_scale_factor
self.max_scale_factor = max_scale_factor
self.scale_step_size = scale_step_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale_factor == self.max_scale_factor:
scale_factor = self.min_scale_factor
elif self.scale_step_size == 0:
scale_factor = np.random.uniform(self.min_scale_factor,
self.max_scale_factor)
else:
num_steps = int((self.max_scale_factor - self.min_scale_factor) /
self.scale_step_size + 1)
scale_factors = np.linspace(self.min_scale_factor,
self.max_scale_factor,
num_steps).tolist()
np.random.shuffle(scale_factors)
scale_factor = scale_factors[0]
w = int(round(scale_factor * im.shape[1]))
h = int(round(scale_factor * im.shape[0]))
im = resize(im, (w, h), cv2.INTER_LINEAR)
if label is not None:
label = resize(label, (w, h), cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Normalize:
"""对图像进行标准化。
1.尺度缩放到 [0,1]。
2.对图像进行减均值除以标准差操作。
Args:
mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
Raises:
ValueError: mean或std不是list对象。std包含0。
"""
def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]):
self.mean = mean
self.std = std
if not (isinstance(self.mean, list) and isinstance(self.std, list)):
raise ValueError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, mean, std)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Padding:
"""对图像或标注图像进行padding,padding方向为右和下。
根据提供的值对图像或标注图像进行padding操作。
Args:
target_size (int|list|tuple): padding后图像的大小。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: target_size不是int|list|tuple。
ValueError: target_size为list|tuple时元素个数不等于2。
"""
def __init__(self,
target_size,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise ValueError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。
Raises:
ValueError: 输入图像im或label的形状大于目标值
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_padding'] = im.shape[:2]
im_height, im_width = im.shape[0], im.shape[1]
if isinstance(self.target_size, int):
target_height = self.target_size
target_width = self.target_size
else:
target_height = self.target_size[1]
target_width = self.target_size[0]
pad_height = target_height - im_height
pad_width = target_width - im_width
if pad_height < 0 or pad_width < 0:
raise ValueError(
'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})'
.format(im_width, im_height, target_width, target_height))
else:
im = cv2.copyMakeBorder(
im,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.im_padding_value)
if label is not None:
label = cv2.copyMakeBorder(
label,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.label_padding_value)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomPaddingCrop:
"""对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。
Args:
crop_size (int|list|tuple): 裁剪图像大小。默认为512。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: crop_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
"""
def __init__(self,
crop_size=512,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
if isinstance(crop_size, list) or isinstance(crop_size, tuple):
if len(crop_size) != 2:
raise ValueError(
'when crop_size is list or tuple, it should include 2 elements, but it is {}'
.format(crop_size))
elif not isinstance(crop_size, int):
raise TypeError(
"Type of crop_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(crop_size)))
self.crop_size = crop_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if isinstance(self.crop_size, int):
crop_width = self.crop_size
crop_height = self.crop_size
else:
crop_width = self.crop_size[0]
crop_height = self.crop_size[1]
img_height = im.shape[0]
img_width = im.shape[1]
if img_height == crop_height and img_width == crop_width:
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
else:
pad_height = max(crop_height - img_height, 0)
pad_width = max(crop_width - img_width, 0)
if (pad_height > 0 or pad_width > 0):
im = cv2.copyMakeBorder(
im,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.im_padding_value)
if label is not None:
label = cv2.copyMakeBorder(
label,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.label_padding_value)
img_height = im.shape[0]
img_width = im.shape[1]
if crop_height > 0 and crop_width > 0:
h_off = np.random.randint(img_height - crop_height + 1)
w_off = np.random.randint(img_width - crop_width + 1)
im = im[h_off:(crop_height + h_off), w_off:(
w_off + crop_width), :]
if label is not None:
label = label[h_off:(crop_height + h_off), w_off:(
w_off + crop_width)]
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomBlur:
"""以一定的概率对图像进行高斯模糊。
Args:
prob (float): 图像模糊概率。默认为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.prob <= 0:
n = 0
elif self.prob >= 1:
n = 1
else:
n = int(1.0 / self.prob)
if n > 0:
if np.random.randint(0, n) == 0:
radius = np.random.randint(3, 10)
if radius % 2 != 1:
radius = radius + 1
if radius > 9:
radius = 9
im = cv2.GaussianBlur(im, (radius, radius), 0, 0)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomRotation:
"""对图像进行随机旋转。
在不超过最大旋转角度的情况下,图像进行随机旋转,当存在标注图像时,同步进行,
并对旋转后的图像和标注图像进行相应的padding。
Args:
max_rotation (float): 最大旋转角度。默认为15度。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认为255。
"""
def __init__(self,
max_rotation=15,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
self.max_rotation = max_rotation
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.max_rotation > 0:
(h, w) = im.shape[:2]
do_rotation = np.random.uniform(-self.max_rotation,
self.max_rotation)
pc = (w // 2, h // 2)
r = cv2.getRotationMatrix2D(pc, do_rotation, 1.0)
cos = np.abs(r[0, 0])
sin = np.abs(r[0, 1])
nw = int((h * sin) + (w * cos))
nh = int((h * cos) + (w * sin))
(cx, cy) = pc
r[0, 2] += (nw / 2) - cx
r[1, 2] += (nh / 2) - cy
dsize = (nw, nh)
im = cv2.warpAffine(
im,
r,
dsize=dsize,
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.im_padding_value)
label = cv2.warpAffine(
label,
r,
dsize=dsize,
flags=cv2.INTER_NEAREST,
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.label_padding_value)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomScaleAspect:
"""裁剪并resize回原始尺寸的图像和标注图像。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
Args:
min_scale (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
aspect_ratio (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
"""
def __init__(self, min_scale=0.5, aspect_ratio=0.33):
self.min_scale = min_scale
self.aspect_ratio = aspect_ratio
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale != 0 and self.aspect_ratio != 0:
img_height = im.shape[0]
img_width = im.shape[1]
for i in range(0, 10):
area = img_height * img_width
target_area = area * np.random.uniform(self.min_scale, 1.0)
aspectRatio = np.random.uniform(self.aspect_ratio,
1.0 / self.aspect_ratio)
dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
if (np.random.randint(10) < 5):
tmp = dw
dw = dh
dh = tmp
if (dh < img_height and dw < img_width):
h1 = np.random.randint(0, img_height - dh)
w1 = np.random.randint(0, img_width - dw)
im = im[h1:(h1 + dh), w1:(w1 + dw), :]
label = label[h1:(h1 + dh), w1:(w1 + dw)]
im = cv2.resize(
im, (img_width, img_height),
interpolation=cv2.INTER_LINEAR)
label = cv2.resize(
label, (img_width, img_height),
interpolation=cv2.INTER_NEAREST)
break
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomDistort:
"""对图像进行随机失真。
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。
Args:
brightness_range (float): 明亮度因子的范围。默认为0.5。
brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
contrast_range (float): 对比度因子的范围。默认为0.5。
contrast_prob (float): 随机调整对比度的概率。默认为0.5。
saturation_range (float): 饱和度因子的范围。默认为0.5。
saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
hue_range (int): 色调因子的范围。默认为18。
hue_prob (float): 随机调整色调的概率。默认为0.5。
"""
def __init__(self,
brightness_range=0.5,
brightness_prob=0.5,
contrast_range=0.5,
contrast_prob=0.5,
saturation_range=0.5,
saturation_prob=0.5,
hue_range=18,
hue_prob=0.5):
self.brightness_range = brightness_range
self.brightness_prob = brightness_prob
self.contrast_range = contrast_range
self.contrast_prob = contrast_prob
self.saturation_range = saturation_range
self.saturation_prob = saturation_prob
self.hue_range = hue_range
self.hue_prob = hue_prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
brightness_lower = 1 - self.brightness_range
brightness_upper = 1 + self.brightness_range
contrast_lower = 1 - self.contrast_range
contrast_upper = 1 + self.contrast_range
saturation_lower = 1 - self.saturation_range
saturation_upper = 1 + self.saturation_range
hue_lower = -self.hue_range
hue_upper = self.hue_range
ops = [brightness, contrast, saturation, hue]
random.shuffle(ops)
params_dict = {
'brightness': {
'brightness_lower': brightness_lower,
'brightness_upper': brightness_upper
},
'contrast': {
'contrast_lower': contrast_lower,
'contrast_upper': contrast_upper
},
'saturation': {
'saturation_lower': saturation_lower,
'saturation_upper': saturation_upper
},
'hue': {
'hue_lower': hue_lower,
'hue_upper': hue_upper
}
}
prob_dict = {
'brightness': self.brightness_prob,
'contrast': self.contrast_prob,
'saturation': self.saturation_prob,
'hue': self.hue_prob
}
im = im.astype('uint8')
im = Image.fromarray(im)
for id in range(4):
params = params_dict[ops[id].__name__]
prob = prob_dict[ops[id].__name__]
params['im'] = im
if np.random.uniform(0, 1) < prob:
im = ops[id](**params)
im = np.asarray(im).astype('float32')
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ArrangeSegmenter:
"""获取训练/验证/预测所需的信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内
"""
def __init__(self, mode):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode should be defined as one of ['train', 'eval', 'test', 'quant']!"
)
self.mode = mode
def __call__(self, im, im_info, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为
'quant'时,返回的tuple为(im,),为图像np.ndarray数据。
"""
im = permute(im)
if self.mode == 'train' or self.mode == 'eval':
label = label[np.newaxis, :, :]
return (im, label)
elif self.mode == 'test':
return (im, im_info)
else:
return (im, )
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import logging
from . import humanseg_postprocess
from .metrics import ConfusionMatrix
from .utils import *
from .post_quantization import HumanSegPostTrainingQuantization
import numpy as np
import cv2
import os
def get_round(data):
round = 0.5 if data >= 0 else -0.5
return (int)(data + round)
def human_seg_tracking(pre_gray, cur_gray, prev_cfd, dl_weights, disflow):
"""计算光流跟踪匹配点和光流图
输入参数:
pre_gray: 上一帧灰度图
cur_gray: 当前帧灰度图
prev_cfd: 上一帧光流图
dl_weights: 融合权重图
disflow: 光流数据结构
返回值:
is_track: 光流点跟踪二值图,即是否具有光流点匹配
track_cfd: 光流跟踪图
"""
check_thres = 8
h, w = pre_gray.shape[:2]
track_cfd = np.zeros_like(prev_cfd)
is_track = np.zeros_like(pre_gray)
flow_fw = disflow.calc(pre_gray, cur_gray, None)
flow_bw = disflow.calc(cur_gray, pre_gray, None)
for r in range(h):
for c in range(w):
fxy_fw = flow_fw[r, c]
dx_fw = get_round(fxy_fw[0])
cur_x = dx_fw + c
dy_fw = get_round(fxy_fw[1])
cur_y = dy_fw + r
if cur_x < 0 or cur_x >= w or cur_y < 0 or cur_y >= h:
continue
fxy_bw = flow_bw[cur_y, cur_x]
dx_bw = get_round(fxy_bw[0])
dy_bw = get_round(fxy_bw[1])
if ((dy_fw + dy_bw) * (dy_fw + dy_bw) +
(dx_fw + dx_bw) * (dx_fw + dx_bw)) >= check_thres:
continue
if abs(dy_fw) <= 0 and abs(dx_fw) <= 0 and abs(dy_bw) <= 0 and abs(
dx_bw) <= 0:
dl_weights[cur_y, cur_x] = 0.05
is_track[cur_y, cur_x] = 1
track_cfd[cur_y, cur_x] = prev_cfd[r, c]
return track_cfd, is_track, dl_weights
def human_seg_track_fuse(track_cfd, dl_cfd, dl_weights, is_track):
"""光流追踪图和人像分割结构融合
输入参数:
track_cfd: 光流追踪图
dl_cfd: 当前帧分割结果
dl_weights: 融合权重图
is_track: 光流点匹配二值图
返回
cur_cfd: 光流跟踪图和人像分割结果融合图
"""
fusion_cfd = dl_cfd.copy()
idxs = np.where(is_track > 0)
for i in range(len(idxs[0])):
x, y = idxs[0][i], idxs[1][i]
dl_score = dl_cfd[x, y]
track_score = track_cfd[x, y]
fusion_cfd[x, y] = dl_weights[x, y] * dl_score + (
1 - dl_weights[x, y]) * track_score
if dl_score > 0.9 or dl_score < 0.1:
if dl_weights[x, y] < 0.1:
fusion_cfd[x, y] = 0.3 * dl_score + 0.7 * track_score
else:
fusion_cfd[x, y] = 0.4 * dl_score + 0.6 * track_score
else:
fusion_cfd[x, y] = dl_weights[x, y] * dl_score + (
1 - dl_weights[x, y]) * track_score
return fusion_cfd
def postprocess(cur_gray, scoremap, prev_gray, pre_cfd, disflow, is_init):
"""光流优化
Args:
cur_gray : 当前帧灰度图
pre_gray : 前一帧灰度图
pre_cfd :前一帧融合结果
scoremap : 当前帧分割结果
difflow : 光流
is_init : 是否第一帧
Returns:
fusion_cfd : 光流追踪图和预测结果融合图
"""
height, width = scoremap.shape[0], scoremap.shape[1]
disflow = cv2.DISOpticalFlow_create(cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
h, w = scoremap.shape
cur_cfd = scoremap.copy()
if is_init:
is_init = False
if h <= 64 or w <= 64:
disflow.setFinestScale(1)
elif h <= 160 or w <= 160:
disflow.setFinestScale(2)
else:
disflow.setFinestScale(3)
fusion_cfd = cur_cfd
else:
weights = np.ones((w, h), np.float32) * 0.3
track_cfd, is_track, weights = human_seg_tracking(
prev_gray, cur_gray, pre_cfd, weights, disflow)
fusion_cfd = human_seg_track_fuse(track_cfd, cur_cfd, weights, is_track)
fusion_cfd = cv2.GaussianBlur(fusion_cfd, (3, 3), 0)
return fusion_cfd
def threshold_mask(img, thresh_bg, thresh_fg):
dst = (img / 255.0 - thresh_bg) / (thresh_fg - thresh_bg)
dst[np.where(dst > 1)] = 1
dst[np.where(dst < 0)] = 0
return dst.astype(np.float32)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import sys
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
log_level = 2
def log(level=2, message=""):
current_time = time.time()
time_array = time.localtime(current_time)
current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
if log_level >= level:
print("{} [{}]\t{}".format(current_time, levels[level],
message).encode("utf-8").decode("latin1"))
sys.stdout.flush()
def debug(message=""):
log(level=3, message=message)
def info(message=""):
log(level=2, message=message)
def warning(message=""):
log(level=1, message=message)
def error(message=""):
log(level=0, message=message)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
from scipy.sparse import csr_matrix
class ConfusionMatrix(object):
"""
Confusion Matrix for segmentation evaluation
"""
def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes],
dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
def calculate(self, pred, label, ignore=None):
# If not in streaming mode, clear matrix everytime when call `calculate`
if not self.streaming:
self.zero_matrix()
label = np.transpose(label, (0, 2, 3, 1))
ignore = np.transpose(ignore, (0, 2, 3, 1))
mask = np.array(ignore) == 1
label = np.asarray(label)[mask]
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)),
shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
dtype='int64')
def mean_iou(self):
iou_list = []
avg_iou = 0
# TODO: use numpy sum axis api to simpliy
vji = np.zeros(self.num_classes, dtype=int)
vij = np.zeros(self.num_classes, dtype=int)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
for c in range(self.num_classes):
total = vji[c] + vij[c] - self.confusion_matrix[c][c]
if total == 0:
iou = 0
else:
iou = float(self.confusion_matrix[c][c]) / total
avg_iou += iou
iou_list.append(iou)
avg_iou = float(avg_iou) / float(self.num_classes)
return np.array(iou_list), avg_iou
def accuracy(self):
total = self.confusion_matrix.sum()
total_right = 0
for c in range(self.num_classes):
total_right += self.confusion_matrix[c][c]
if total == 0:
avg_acc = 0
else:
avg_acc = float(total_right) / total
vij = np.zeros(self.num_classes, dtype=int)
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
acc_list = []
for c in range(self.num_classes):
if vij[c] == 0:
acc = 0
else:
acc = self.confusion_matrix[c][c] / float(vij[c])
acc_list.append(acc)
return np.array(acc_list), avg_acc
def kappa(self):
vji = np.zeros(self.num_classes)
vij = np.zeros(self.num_classes)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
total = self.confusion_matrix.sum()
# avoid spillovers
# TODO: is it reasonable to hard code 10000.0?
total = float(total) / 10000.0
vji = vji / 10000.0
vij = vij / 10000.0
tp = 0
tc = 0
for c in range(self.num_classes):
tp += vji[c] * vij[c]
tc += self.confusion_matrix[c][c]
tc = tc / 10000.0
pe = tp / (total * total)
po = tc / total
kappa = (po - pe) / (1 - pe)
return kappa
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## Created by: RainbowSecret
## Microsoft Research
## yuyua@microsoft.com
## Copyright (c) 2018
##
## This source code is licensed under the MIT-style license found in the
## LICENSE file in the root directory of this source tree
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import cv2
def get_palette(num_cls):
""" Returns the color map for visualizing the segmentation mask.
Args:
num_cls: Number of classes
Returns:
The color map
"""
n = num_cls
palette = [0] * (n * 3)
for j in range(0, n):
lab = j
palette[j * 3 + 0] = 0
palette[j * 3 + 1] = 0
palette[j * 3 + 2] = 0
i = 0
while lab:
palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
i += 1
lab >>= 3
return palette
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.fluid.contrib.slim.quantization.quantization_pass import QuantizationTransformPass
from paddle.fluid.contrib.slim.quantization.quantization_pass import AddQuantDequantPass
from paddle.fluid.contrib.slim.quantization.quantization_pass import _op_real_in_out_name
from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
import paddle.fluid as fluid
import os
import utils.logging as logging
class HumanSegPostTrainingQuantization(PostTrainingQuantization):
def __init__(self,
executor,
dataset,
program,
inputs,
outputs,
batch_size=10,
batch_nums=None,
scope=None,
algo="KL",
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
is_full_quantize=False,
is_use_cache_file=False,
cache_dir="./temp_post_training"):
'''
The class utilizes post training quantization methon to quantize the
fp32 model. It uses calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the
quantized model.
Args:
executor(fluid.Executor): The executor to load, run and save the
quantized model.
dataset(Python Iterator): The data Reader.
program(fluid.Program): The paddle program, save the parameters for model.
inputs(dict): The input of prigram.
outputs(dict): The output of program.
batch_size(int, optional): The batch size of DataLoader. Default is 10.
batch_nums(int, optional): If batch_nums is not None, the number of
calibrate data is batch_size*batch_nums. If batch_nums is None, use
all data provided by sample_generator as calibrate data.
scope(fluid.Scope, optional): The scope of the program, use it to load
and save variables. If scope=None, get scope by global_scope().
algo(str, optional): If algo=KL, use KL-divergenc method to
get the more precise scale factor. If algo='direct', use
abs_max methon to get the scale factor. Default is KL.
quantizable_op_type(list[str], optional): List the type of ops
that will be quantized. Default is ["conv2d", "depthwise_conv2d",
"mul"].
is_full_quantized(bool, optional): If set is_full_quantized as True,
apply quantization to all supported quantizable op type. If set
is_full_quantized as False, only apply quantization to the op type
according to the input quantizable_op_type.
is_use_cache_file(bool, optional): If set is_use_cache_file as False,
all temp data will be saved in memory. If set is_use_cache_file as True,
it will save temp data to disk. When the fp32 model is complex or
the number of calibrate data is large, we should set is_use_cache_file
as True. Defalut is False.
cache_dir(str, optional): When is_use_cache_file is True, set cache_dir as
the directory for saving temp data. Default is ./temp_post_training.
Returns:
None
'''
self._executor = executor
self._dataset = dataset
self._batch_size = batch_size
self._batch_nums = batch_nums
self._scope = fluid.global_scope() if scope == None else scope
self._algo = algo
self._is_use_cache_file = is_use_cache_file
self._cache_dir = cache_dir
if self._is_use_cache_file and not os.path.exists(self._cache_dir):
os.mkdir(self._cache_dir)
supported_quantizable_op_type = \
QuantizationTransformPass._supported_quantizable_op_type + \
AddQuantDequantPass._supported_quantizable_op_type
if is_full_quantize:
self._quantizable_op_type = supported_quantizable_op_type
else:
self._quantizable_op_type = quantizable_op_type
for op_type in self._quantizable_op_type:
assert op_type in supported_quantizable_op_type + \
AddQuantDequantPass._activation_type, \
op_type + " is not supported for quantization."
self._place = self._executor.place
self._program = program
self._feed_list = list(inputs.values())
self._fetch_list = list(outputs.values())
self._data_loader = None
self._op_real_in_out_name = _op_real_in_out_name
self._bit_length = 8
self._quantized_weight_var_name = set()
self._quantized_act_var_name = set()
self._sampling_data = {}
self._quantized_var_scale_factor = {}
def quantize(self):
'''
Quantize the fp32 model. Use calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the
quantized model.
Args:
None
Returns:
the program of quantized model.
'''
self._preprocess()
batch_id = 0
for data in self._data_loader():
self._executor.run(
program=self._program,
feed=data,
fetch_list=self._fetch_list,
return_numpy=False)
self._sample_data(batch_id)
if batch_id % 5 == 0:
logging.info("run batch: {}".format(batch_id))
batch_id += 1
if self._batch_nums and batch_id >= self._batch_nums:
break
logging.info("all run batch: ".format(batch_id))
logging.info("calculate scale factor ...")
self._calculate_scale_factor()
logging.info("update the program ...")
self._update_program()
self._save_output_scale()
return self._program
def save_quantized_model(self, save_model_path):
'''
Save the quantized model to the disk.
Args:
save_model_path(str): The path to save the quantized model
Returns:
None
'''
feed_vars_names = [var.name for var in self._feed_list]
fluid.io.save_inference_model(
dirname=save_model_path,
feeded_var_names=feed_vars_names,
target_vars=self._fetch_list,
executor=self._executor,
params_filename='__params__',
main_program=self._program)
def _preprocess(self):
'''
Load model and set data loader, collect the variable names for sampling,
and set activation variables to be persistable.
'''
feed_vars = [fluid.framework._get_var(var.name, self._program) \
for var in self._feed_list]
self._data_loader = fluid.io.DataLoader.from_generator(
feed_list=feed_vars, capacity=3 * self._batch_size, iterable=True)
self._data_loader.set_sample_list_generator(
self._dataset.generator(self._batch_size, drop_last=True),
places=self._place)
# collect the variable names for sampling
persistable_var_names = []
for var in self._program.list_vars():
if var.persistable:
persistable_var_names.append(var.name)
for op in self._program.global_block().ops:
op_type = op.type
if op_type in self._quantizable_op_type:
if op_type in ("conv2d", "depthwise_conv2d"):
self._quantized_act_var_name.add(op.input("Input")[0])
self._quantized_weight_var_name.add(op.input("Filter")[0])
self._quantized_act_var_name.add(op.output("Output")[0])
elif op_type == "mul":
if self._is_input_all_not_persistable(
op, persistable_var_names):
op._set_attr("skip_quant", True)
logging.warning(
"Skip quant a mul op for two input variables are not persistable"
)
else:
self._quantized_act_var_name.add(op.input("X")[0])
self._quantized_weight_var_name.add(op.input("Y")[0])
self._quantized_act_var_name.add(op.output("Out")[0])
else:
# process other quantizable op type, the input must all not persistable
if self._is_input_all_not_persistable(
op, persistable_var_names):
input_output_name_list = self._op_real_in_out_name[
op_type]
for input_name in input_output_name_list[0]:
for var_name in op.input(input_name):
self._quantized_act_var_name.add(var_name)
for output_name in input_output_name_list[1]:
for var_name in op.output(output_name):
self._quantized_act_var_name.add(var_name)
# set activation variables to be persistable, so can obtain
# the tensor data in sample_data
for var in self._program.list_vars():
if var.name in self._quantized_act_var_name:
var.persistable = True
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import os
def get_arguments():
parser = argparse.ArgumentParser()
parser.add_argument("--use_gpu",
action="store_true",
help="Use gpu or cpu to test.")
parser.add_argument('--example',
type=str,
help='RoadLine, HumanSeg or ACE2P')
return parser.parse_args()
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
def merge_cfg_from_args(args, cfg):
"""Merge config keys, values in args into the global config."""
for k, v in vars(args).items():
d = cfg
try:
value = eval(v)
except:
value = v
if value is not None:
cfg[k] = value
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import os
import os.path as osp
import numpy as np
import six
import yaml
import math
import cv2
from . import logging
def seconds_to_hms(seconds):
h = math.floor(seconds / 3600)
m = math.floor((seconds - h * 3600) / 60)
s = int(seconds - h * 3600 - m * 60)
hms_str = "{}:{}:{}".format(h, m, s)
return hms_str
def setting_environ_flags():
if 'FLAGS_eager_delete_tensor_gb' not in os.environ:
os.environ['FLAGS_eager_delete_tensor_gb'] = '0.0'
if 'FLAGS_allocator_strategy' not in os.environ:
os.environ['FLAGS_allocator_strategy'] = 'auto_growth'
if "CUDA_VISIBLE_DEVICES" in os.environ:
if os.environ["CUDA_VISIBLE_DEVICES"].count("-1") > 0:
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def get_environ_info():
setting_environ_flags()
import paddle.fluid as fluid
info = dict()
info['place'] = 'cpu'
info['num'] = int(os.environ.get('CPU_NUM', 1))
if os.environ.get('CUDA_VISIBLE_DEVICES', None) != "":
if hasattr(fluid.core, 'get_cuda_device_count'):
gpu_num = 0
try:
gpu_num = fluid.core.get_cuda_device_count()
except:
os.environ['CUDA_VISIBLE_DEVICES'] = ''
pass
if gpu_num > 0:
info['place'] = 'cuda'
info['num'] = fluid.core.get_cuda_device_count()
return info
def parse_param_file(param_file, return_shape=True):
from paddle.fluid.proto.framework_pb2 import VarType
f = open(param_file, 'rb')
version = np.fromstring(f.read(4), dtype='int32')
lod_level = np.fromstring(f.read(8), dtype='int64')
for i in range(int(lod_level)):
_size = np.fromstring(f.read(8), dtype='int64')
_ = f.read(_size)
version = np.fromstring(f.read(4), dtype='int32')
tensor_desc = VarType.TensorDesc()
tensor_desc_size = np.fromstring(f.read(4), dtype='int32')
tensor_desc.ParseFromString(f.read(int(tensor_desc_size)))
tensor_shape = tuple(tensor_desc.dims)
if return_shape:
f.close()
return tuple(tensor_desc.dims)
if tensor_desc.data_type != 5:
raise Exception(
"Unexpected data type while parse {}".format(param_file))
data_size = 4
for i in range(len(tensor_shape)):
data_size *= tensor_shape[i]
weight = np.fromstring(f.read(data_size), dtype='float32')
f.close()
return np.reshape(weight, tensor_shape)
def fuse_bn_weights(exe, main_prog, weights_dir):
import paddle.fluid as fluid
logging.info("Try to fuse weights of batch_norm...")
bn_vars = list()
for block in main_prog.blocks:
ops = list(block.ops)
for op in ops:
if op.type == 'affine_channel':
scale_name = op.input('Scale')[0]
bias_name = op.input('Bias')[0]
prefix = scale_name[:-5]
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
if not osp.exists(osp.join(
weights_dir, mean_name)) or not osp.exists(
osp.join(weights_dir, variance_name)):
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn."
)
return
bias = block.var(bias_name)
pretrained_shape = parse_param_file(
osp.join(weights_dir, bias_name))
actual_shape = tuple(bias.shape)
if pretrained_shape != actual_shape:
continue
bn_vars.append(
[scale_name, bias_name, mean_name, variance_name])
eps = 1e-5
for names in bn_vars:
scale_name, bias_name, mean_name, variance_name = names
scale = parse_param_file(
osp.join(weights_dir, scale_name), return_shape=False)
bias = parse_param_file(
osp.join(weights_dir, bias_name), return_shape=False)
mean = parse_param_file(
osp.join(weights_dir, mean_name), return_shape=False)
variance = parse_param_file(
osp.join(weights_dir, variance_name), return_shape=False)
bn_std = np.sqrt(np.add(variance, eps))
new_scale = np.float32(np.divide(scale, bn_std))
new_bias = bias - mean * new_scale
scale_tensor = fluid.global_scope().find_var(scale_name).get_tensor()
bias_tensor = fluid.global_scope().find_var(bias_name).get_tensor()
scale_tensor.set(new_scale, exe.place)
bias_tensor.set(new_bias, exe.place)
if len(bn_vars) == 0:
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn.")
else:
logging.info("There's {} batch_norm ops been fused.".format(
len(bn_vars)))
def load_pdparams(exe, main_prog, model_dir):
import paddle.fluid as fluid
from paddle.fluid.proto.framework_pb2 import VarType
from paddle.fluid.framework import Program
vars_to_load = list()
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
params_dict = pickle.load(f) if six.PY2 else pickle.load(
f, encoding='latin1')
unused_vars = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if var.name not in params_dict:
raise Exception("{} is not in saved model".format(var.name))
if var.shape != params_dict[var.name].shape:
unused_vars.append(var.name)
logging.warning(
"[SKIP] Shape of pretrained weight {} doesn't match.(Pretrained: {}, Actual: {})"
.format(var.name, params_dict[var.name].shape, var.shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
for var_name in unused_vars:
del params_dict[var_name]
fluid.io.set_program_state(main_prog, params_dict)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), model_dir))
def load_pretrained_weights(exe, main_prog, weights_dir, fuse_bn=False):
if not osp.exists(weights_dir):
raise Exception("Path {} not exists.".format(weights_dir))
if osp.exists(osp.join(weights_dir, "model.pdparams")):
return load_pdparams(exe, main_prog, weights_dir)
import paddle.fluid as fluid
vars_to_load = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if not osp.exists(osp.join(weights_dir, var.name)):
logging.debug("[SKIP] Pretrained weight {}/{} doesn't exist".format(
weights_dir, var.name))
continue
pretrained_shape = parse_param_file(osp.join(weights_dir, var.name))
actual_shape = tuple(var.shape)
if pretrained_shape != actual_shape:
logging.warning(
"[SKIP] Shape of pretrained weight {}/{} doesn't match.(Pretrained: {}, Actual: {})"
.format(weights_dir, var.name, pretrained_shape, actual_shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
fluid.io.load_vars(
executor=exe,
dirname=weights_dir,
main_program=main_prog,
vars=vars_to_load)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), weights_dir))
if fuse_bn:
fuse_bn_weights(exe, main_prog, weights_dir)
def visualize(image, result, save_dir=None, weight=0.6):
"""
Convert segment result to color image, and save added image.
Args:
image: the path of origin image
result: the predict result of image
save_dir: the directory for saving visual image
weight: the image weight of visual image, and the result weight is (1 - weight)
"""
label_map = result['label_map']
color_map = get_color_map_list(256)
color_map = np.array(color_map).astype("uint8")
# Use OpenCV LUT for color mapping
c1 = cv2.LUT(label_map, color_map[:, 0])
c2 = cv2.LUT(label_map, color_map[:, 1])
c3 = cv2.LUT(label_map, color_map[:, 2])
pseudo_img = np.dstack((c1, c2, c3))
im = cv2.imread(image)
vis_result = cv2.addWeighted(im, weight, pseudo_img, 1 - weight, 0)
if save_dir is not None:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
image_name = os.path.split(image)[-1]
out_path = os.path.join(save_dir, image_name)
cv2.imwrite(out_path, vis_result)
else:
return vis_result
def get_color_map_list(num_classes):
""" Returns the color map for visualizing the segmentation mask,
which can support arbitrary number of classes.
Args:
num_classes: Number of classes
Returns:
The color map
"""
num_classes += 1
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
color_map = color_map[1:]
return color_map
import argparse
from datasets.dataset import Dataset
import transforms
import models
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for evaluating',
type=str,
default='output/best_model')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--val_list',
dest='val_list',
help='Val list file of dataset',
type=str,
default=None)
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=128)
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def evaluate(args):
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
eval_dataset = Dataset(
data_dir=args.data_dir,
file_list=args.val_list,
transforms=eval_transforms,
num_workers='auto',
buffer_size=100,
parallel_method='thread',
shuffle=False)
model = models.load_model(args.model_dir)
model.evaluate(eval_dataset, args.batch_size)
if __name__ == '__main__':
args = parse_args()
evaluate(args)
import argparse
import os
import os.path as osp
import cv2
import numpy as np
from utils.humanseg_postprocess import postprocess, threshold_mask
import models
import transforms
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg inference for video')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for inference',
type=str)
parser.add_argument(
'--video_path',
dest='video_path',
help=
'Video path for inference, camera will be used if the path not existing',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the inference results',
type=str,
default='./output')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def predict(img, model, test_transforms):
model.arrange_transform(transforms=test_transforms, mode='test')
img, im_info = test_transforms(img)
img = np.expand_dims(img, axis=0)
result = model.exe.run(
model.test_prog,
feed={'image': img},
fetch_list=list(model.test_outputs.values()))
score_map = result[1]
score_map = np.squeeze(score_map, axis=0)
score_map = np.transpose(score_map, (1, 2, 0))
return score_map, im_info
def recover(img, im_info):
keys = list(im_info.keys())
for k in keys[::-1]:
if k == 'shape_before_resize':
h, w = im_info[k][0], im_info[k][1]
img = cv2.resize(img, (w, h), cv2.INTER_LINEAR)
elif k == 'shape_before_padding':
h, w = im_info[k][0], im_info[k][1]
img = img[0:h, 0:w]
return img
def video_infer(args):
resize_h = args.image_shape[1]
resize_w = args.image_shape[0]
test_transforms = transforms.Compose(
[transforms.Resize((resize_w, resize_h)),
transforms.Normalize()])
model = models.load_model(args.model_dir)
if not args.video_path:
cap = cv2.VideoCapture(0)
else:
cap = cv2.VideoCapture(args.video_path)
if not cap.isOpened():
raise IOError("Error opening video stream or file, "
"--video_path whether existing: {}"
" or camera whether working".format(args.video_path))
return
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
disflow = cv2.DISOpticalFlow_create(cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
prev_gray = np.zeros((resize_h, resize_w), np.uint8)
prev_cfd = np.zeros((resize_h, resize_w), np.float32)
is_init = True
fps = cap.get(cv2.CAP_PROP_FPS)
if args.video_path:
# 用于保存预测结果视频
if not osp.exists(args.save_dir):
os.makedirs(args.save_dir)
out = cv2.VideoWriter(
osp.join(args.save_dir, 'result.avi'),
cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
# 开始获取视频帧
while cap.isOpened():
ret, frame = cap.read()
if ret:
score_map, im_info = predict(frame, model, test_transforms)
cur_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
scoremap = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, scoremap, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
img_mat = np.repeat(optflow_map[:, :, np.newaxis], 3, axis=2)
img_mat = recover(img_mat, im_info)
bg_im = np.ones_like(img_mat) * 255
comb = (img_mat * frame + (1 - img_mat) * bg_im).astype(
np.uint8)
out.write(comb)
else:
break
cap.release()
out.release()
else:
while cap.isOpened():
ret, frame = cap.read()
if ret:
score_map, im_info = predict(frame, model, test_transforms)
cur_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
scoremap = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, scoremap, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
# optflow_map = optflow_map/255.0
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
img_mat = np.repeat(optflow_map[:, :, np.newaxis], 3, axis=2)
img_mat = recover(img_mat, im_info)
bg_im = np.ones_like(img_mat) * 255
comb = (img_mat * frame + (1 - img_mat) * bg_im).astype(
np.uint8)
cv2.imshow('HumanSegmentation', comb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
if __name__ == "__main__":
args = parse_args()
video_infer(args)
...@@ -2,8 +2,6 @@ pre-commit ...@@ -2,8 +2,6 @@ pre-commit
yapf == 0.26.0 yapf == 0.26.0
flake8 flake8
pyyaml >= 5.1 pyyaml >= 5.1
tb-paddle
tensorboard >= 1.15.0
Pillow Pillow
numpy numpy
six six
...@@ -11,3 +9,4 @@ opencv-python ...@@ -11,3 +9,4 @@ opencv-python
tqdm tqdm
requests requests
sklearn sklearn
visualdl == 2.0.0-alpha.2
...@@ -78,14 +78,14 @@ def parse_args(): ...@@ -78,14 +78,14 @@ def parse_args():
help='debug mode, display detail information of training', help='debug mode, display detail information of training',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--use_tb', '--use_vdl',
dest='use_tb', dest='use_vdl',
help='whether to record the data during training to Tensorboard', help='whether to record the data during training to VisualDL',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--tb_log_dir', '--vdl_log_dir',
dest='tb_log_dir', dest='vdl_log_dir',
help='Tensorboard logging directory', help='VisualDL logging directory',
default=None, default=None,
type=str) type=str)
parser.add_argument( parser.add_argument(
...@@ -327,17 +327,17 @@ def train(cfg): ...@@ -327,17 +327,17 @@ def train(cfg):
fetch_list.extend([pred.name, grts.name, masks.name]) fetch_list.extend([pred.name, grts.name, masks.name])
# cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) # cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb: if args.use_vdl:
if not args.tb_log_dir: if not args.vdl_log_dir:
print_info("Please specify the log directory by --tb_log_dir.") print_info("Please specify the log directory by --vdl_log_dir.")
exit(1) exit(1)
from tb_paddle import SummaryWriter from visualdl import LogWriter
log_writer = SummaryWriter(args.tb_log_dir) log_writer = LogWriter(args.vdl_log_dir)
# trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0)) # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
# num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
global_step = 0 step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1 all_step += 1
...@@ -377,9 +377,9 @@ def train(cfg): ...@@ -377,9 +377,9 @@ def train(cfg):
avg_acc += np.mean(out_acc) avg_acc += np.mean(out_acc)
avg_fp += np.mean(out_fp) avg_fp += np.mean(out_fp)
avg_fn += np.mean(out_fn) avg_fn += np.mean(out_fn)
global_step += 1 step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: if step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps avg_loss /= args.log_steps
avg_seg_loss /= args.log_steps avg_seg_loss /= args.log_steps
avg_emb_loss /= args.log_steps avg_emb_loss /= args.log_steps
...@@ -389,14 +389,14 @@ def train(cfg): ...@@ -389,14 +389,14 @@ def train(cfg):
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
print(( print((
"epoch={} step={} lr={:.5f} loss={:.4f} seg_loss={:.4f} emb_loss={:.4f} accuracy={:.4} fp={:.4} fn={:.4} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} seg_loss={:.4f} emb_loss={:.4f} accuracy={:.4} fp={:.4} fn={:.4} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, avg_seg_loss, ).format(epoch, step, lr[0], avg_loss, avg_seg_loss,
avg_emb_loss, avg_acc, avg_fp, avg_fn, speed, avg_emb_loss, avg_acc, avg_fp, avg_fn, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], global_step) log_writer.add_scalar('Train/lr', lr[0], step)
log_writer.add_scalar('Train/speed', speed, global_step) log_writer.add_scalar('Train/speed', speed, step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
avg_seg_loss = 0.0 avg_seg_loss = 0.0
...@@ -422,14 +422,14 @@ def train(cfg): ...@@ -422,14 +422,14 @@ def train(cfg):
ckpt_dir=ckpt_dir, ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
use_mpio=args.use_mpio) use_mpio=args.use_mpio)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Evaluate/accuracy', accuracy, log_writer.add_scalar('Evaluate/accuracy', accuracy,
global_step) step)
log_writer.add_scalar('Evaluate/fp', fp, global_step) log_writer.add_scalar('Evaluate/fp', fp, step)
log_writer.add_scalar('Evaluate/fn', fn, global_step) log_writer.add_scalar('Evaluate/fn', fn, step)
# Use Tensorboard to visualize results # Use VisualDL to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: if args.use_vdl and cfg.DATASET.VIS_FILE_LIST is not None:
visualize( visualize(
cfg=cfg, cfg=cfg,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -68,7 +68,7 @@ cfg.DATASET.VAL_TOTAL_IMAGES = 500 ...@@ -68,7 +68,7 @@ cfg.DATASET.VAL_TOTAL_IMAGES = 500
cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list' cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list'
# 测试数据数量 # 测试数据数量
cfg.DATASET.TEST_TOTAL_IMAGES = 500 cfg.DATASET.TEST_TOTAL_IMAGES = 500
# Tensorboard 可视化的数据集 # VisualDL 可视化的数据集
cfg.DATASET.VIS_FILE_LIST = None cfg.DATASET.VIS_FILE_LIST = None
# 类别数(需包括背景类) # 类别数(需包括背景类)
cfg.DATASET.NUM_CLASSES = 19 cfg.DATASET.NUM_CLASSES = 19
......
# PaddleSeg 特色垂类分割模型 # PaddleSeg 产业实践
提供基于PaddlePaddle最新的分割特色模型: 提供基于PaddlSeg最新的分割特色模型:
- [人像分割](#人像分割) - [人像分割](./HumanSeg)
- [人体解析](#人体解析) - [人体解析](./ACE2P)
- [车道线分割](#车道线分割) - [车道线分割](./LaneNet)
- [工业用表分割](#工业用表分割) - [工业表盘分割](#工业表盘分割)
- [在线体验](#在线体验) - [AIStudio在线教程](#AIStudio在线教程)
## 人像分割 ## 人像分割 HumanSeg
**Note:** 本章节所有命令均在`contrib/HumanSeg`目录下执行。 HumanSeg系列全新升级,提供三个适用于不同场景,包含适用于移动端实时分割场景的模型`HumanSeg-lite`,提供了包含光流的后处理的优化,使人像分割在视频场景中更加顺畅,更多详情请参考[HumanSeg](./HumanSeg)
```
cd contrib/HumanSeg
```
### 1. 模型结构
DeepLabv3+ backbone为Xception65
### 2. 下载模型和数据
执行以下命令下载并解压模型和数据集:
```
python download_HumanSeg.py
```
或点击[链接](https://paddleseg.bj.bcebos.com/models/HumanSeg.tgz)进行手动下载,并解压到contrib/HumanSeg文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example HumanSeg --use_gpu
```
使用CPU预测:
```
python -u infer.py --example HumanSeg
```
预测结果存放在contrib/HumanSeg/HumanSeg/result目录下。
### 4. 预测结果示例:
原图:
![](HumanSeg/imgs/Human.jpg)
预测结果:
![](HumanSeg/imgs/HumanSeg.jpg)
## 人体解析
![](ACE2P/imgs/result.jpg)
人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。本章节使用冠军模型Augmented Context Embedding with Edge Perceiving (ACE2P)进行预测分割。
**Note:** 本章节所有命令均在`contrib/ACE2P`目录下执行。
```
cd contrib/ACE2P
```
### 1. 模型概述
Augmented Context Embedding with Edge Perceiving (ACE2P)通过融合底层特征、全局上下文信息和边缘细节,端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届Look into Person (LIP)挑战赛中赢得了全部三个人体解析任务的第一名。详情请参见[ACE2P](./ACE2P)
### 2. 模型下载
执行以下命令下载并解压ACE2P预测模型:
```
python download_ACE2P.py
```
或点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz)进行手动下载, 并在contrib/ACE2P下解压。
### 3. 数据下载
测试图片共10000张,
点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760)
下载Testing_images.zip,或前往LIP数据集官网进行下载。
下载后解压到contrib/ACE2P/data文件夹下
### 4. 运行
## 人体解析 Human Parsing
使用GPU预测 人体解析(Human Parsing)是细粒度的语义分割任务,旨在识别像素级别的人类图像的组成部分(例如,身体部位和服装)。ACE2P通过融合底层特征、全局上下文信息和边缘细节,端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
```
python -u infer.py --example ACE2P --use_gpu
```
使用CPU预测:
```
python -u infer.py --example ACE2P
```
**NOTE:** 运行该模型需要2G左右显存。由于数据图片较多,预测过程将比较耗时。
#### 5. 预测结果示例:
原图:
![](ACE2P/imgs/117676_2149260.jpg)
预测结果:
![](ACE2P/imgs/117676_2149260.png)
### 备注
1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
2. ACE2P模型需预留2G显存,若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
#### ACE2P模型框架图
![](./ACE2P/imgs/net.jpg)
PaddleSeg提供了ACE2P获得比赛冠军的预训练模型,更多详情请点击[ACE2P](./ACE2P)
## 车道线分割 ## 车道线分割 LaneNet
**Note:** 本章节所有命令均在`contrib/RoadLine`目录下执行。 PaddleSeg提供了基于LaneNet的车道线分割模型,更多详情请点击[LaneNet](./LaneNet)
```
cd contrib/RoadLine
```
### 1. 模型结构
Deeplabv3+ backbone为MobileNetv2 ![](https://pic2.zhimg.com/80/v2-8015f4b256791d4456fbc2739efc106d_1440w.jpg)
### 2. 下载模型和数据 ## 工业表盘分割
执行以下命令下载并解压模型和数据集:
```
python download_RoadLine.py
```
或点击[链接](https://paddleseg.bj.bcebos.com/inference_model/RoadLine.tgz)进行手动下载,并解压到contrib/RoadLine文件夹下
### 3. 运行
使用GPU预测:
```
python -u infer.py --example RoadLine --use_gpu
```
使用CPU预测:
```
python -u infer.py --example RoadLine
```
预测结果存放在contrib/RoadLine/RoadLine/result目录下。
#### 4. 预测结果示例:
原图:
![](RoadLine/imgs/RoadLine.jpg)
预测结果:
![](RoadLine/imgs/RoadLine.png)
## 工业用表分割
**Note:** 本章节所有命令均在`PaddleSeg`目录下执行。 **Note:** 本章节所有命令均在`PaddleSeg`目录下执行。
### 1. 模型结构 ### 1. 模型结构
unet U-Net
### 2. 数据准备 ### 2. 数据准备
...@@ -198,7 +47,6 @@ unet ...@@ -198,7 +47,6 @@ unet
python ./contrib/MechanicalIndustryMeter/download_mini_mechanical_industry_meter.py python ./contrib/MechanicalIndustryMeter/download_mini_mechanical_industry_meter.py
``` ```
### 3. 下载预训练模型 ### 3. 下载预训练模型
``` ```
...@@ -237,7 +85,7 @@ TEST.TEST_MODEL "./contrib/MechanicalIndustryMeter/unet_mechanical_industry_mete ...@@ -237,7 +85,7 @@ TEST.TEST_MODEL "./contrib/MechanicalIndustryMeter/unet_mechanical_industry_mete
![](MechanicalIndustryMeter/imgs/1560143028.5_IMG_3091.png) ![](MechanicalIndustryMeter/imgs/1560143028.5_IMG_3091.png)
## 在线体验 ## AIStudio在线教程
PaddleSeg在AI Studio平台上提供了在线体验的教程,欢迎体验: PaddleSeg在AI Studio平台上提供了在线体验的教程,欢迎体验:
...@@ -246,5 +94,3 @@ PaddleSeg在AI Studio平台上提供了在线体验的教程,欢迎体验: ...@@ -246,5 +94,3 @@ PaddleSeg在AI Studio平台上提供了在线体验的教程,欢迎体验:
|工业质检|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/184392)| |工业质检|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/184392)|
|人像分割|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/188833)| |人像分割|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/188833)|
|特色垂类模型|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/226710)| |特色垂类模型|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/226710)|
# 实时人像分割预测部署
本模型基于飞浆开源的人像分割模型,并做了大量的针对视频的光流追踪优化,提供了完整的支持视频流的实时人像分割解决方案,并提供了高性能的`Python``C++`集成部署方案,以满足不同场景的需求。
## 模型下载
支持的模型文件如下,请根据应用场景选择合适的模型:
|模型文件 | 说明 |
|---|---|
|[shv75_deeplab_0303_quant](https://paddleseg.bj.bcebos.com/deploy/models/shv75_0303_quant.zip) | 小模型, 适合轻量级计算环境 |
|[shv75_deeplab_0303](https://paddleseg.bj.bcebos.com/deploy/models/shv75_deeplab_0303.zip)| 小模型,适合轻量级计算环境 |
|[deeplabv3_xception_humanseg](https://paddleseg.bj.bcebos.com/deploy/models/deeplabv3_xception_humanseg.zip) | 服务端GPU环境 |
**注意:下载后解压到合适的路径,后续该路径将做为预测参数用于加载模型。**
## 预测部署
- [Python预测部署](./python)
- [C++预测部署](./cpp)
## 效果预览
<figure class="half">
<img src="https://paddleseg.bj.bcebos.com/deploy/data/input.gif">
<img src="https://paddleseg.bj.bcebos.com/deploy/data/output.gif">
</figure>
cmake_minimum_required(VERSION 3.0)
project(PaddleMaskDetector CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${PADDLE_DIR}/")
include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
include_directories("${PADDLE_DIR}/third_party/install/glog/include")
include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include")
include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include")
include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
endif()
include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
include_directories("${PADDLE_DIR}/third_party/boost")
include_directories("${PADDLE_DIR}/third_party/eigen3")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
endif()
link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
link_directories("${PADDLE_DIR}/paddle/lib/")
link_directories("${CMAKE_CURRENT_BINARY_DIR}")
if (WIN32)
include_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${OPENCV_DIR}/build/include")
include_directories("${OPENCV_DIR}/opencv/build/include")
link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
else ()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
include_directories(${OpenCV_INCLUDE_DIRS})
endif ()
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
# TODO let users define cuda lib path
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
if (NOT WIN32)
if (USE_TENSORRT AND WITH_GPU)
include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
endif()
endif(NOT WIN32)
if (NOT WIN32)
set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
if(EXISTS ${NGRAPH_PATH})
include(GNUInstallDirs)
include_directories("${NGRAPH_PATH}/include")
link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_MKL)
include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
if (WIN32)
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
execute_process(COMMAND cp -r ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
endif ()
set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if (WIN32)
if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
if (NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
)
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
opencv_world346 glog gflags_static libprotobuf zlibstatic xxhash)
set(DEPS ${DEPS} libcmt shlwapi)
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
set(DEPS ${DEPS} ${EXTERNAL_LIB} ${OpenCV_LIBS})
endif()
add_executable(main main.cc humanseg.cc humanseg_postprocess.cc)
target_link_libraries(main ${DEPS})
if (WIN32)
add_custom_command(TARGET main POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
)
endif()
{
"configurations": [
{
"name": "x64-Release",
"generator": "Ninja",
"configurationType": "RelWithDebInfo",
"inheritEnvironments": [ "msvc_x64_x64" ],
"buildRoot": "${projectDir}\\out\\build\\${name}",
"installRoot": "${projectDir}\\out\\install\\${name}",
"cmakeCommandArgs": "",
"buildCommandArgs": "-v",
"ctestCommandArgs": "",
"variables": [
{
"name": "CUDA_LIB",
"value": "D:/projects/packages/cuda10_0/lib64",
"type": "PATH"
},
{
"name": "CUDNN_LIB",
"value": "D:/projects/packages/cuda10_0/lib64",
"type": "PATH"
},
{
"name": "OPENCV_DIR",
"value": "D:/projects/packages/opencv3_4_6",
"type": "PATH"
},
{
"name": "PADDLE_DIR",
"value": "D:/projects/packages/fluid_inference1_6_1",
"type": "PATH"
},
{
"name": "CMAKE_BUILD_TYPE",
"value": "Release",
"type": "STRING"
}
]
}
]
}
\ No newline at end of file
# 视频实时图像分割模型C++预测部署
本文档主要介绍实时图像分割模型如何在`Windows``Linux`上完成基于`C++`的预测部署。
## C++预测部署编译
### 1. 下载模型
点击右边下载:[模型下载地址](https://paddleseg.bj.bcebos.com/deploy/models/humanseg_paddleseg_int8.zip)
模型文件路径将做为预测时的输入参数,请解压到合适的目录位置。
### 2. 编译
本项目支持在Windows和Linux上编译并部署C++项目,不同平台的编译请参考:
- [Linux 编译](./docs/linux_build.md)
- [Windows 使用 Visual Studio 2019编译](./docs/windows_build.md)
# 视频实时人像分割模型Linux平台C++预测部署
## 1. 系统和软件依赖
### 1.1 操作系统及硬件要求
- Ubuntu 14.04 或者 16.04 (其它平台未测试)
- GCC版本4.8.5 ~ 4.9.2
- 支持Intel MKL-DNN的CPU
- NOTE: 如需在Nvidia GPU运行,请自行安装CUDA 9.0 / 10.0 + CUDNN 7.3+ (不支持9.1/10.1版本的CUDA)
### 1.2 下载PaddlePaddle C++预测库
PaddlePaddle C++ 预测库主要分为CPU版本和GPU版本。
其中,GPU 版本支持`CUDA 10.0``CUDA 9.0`:
以下为各版本C++预测库的下载链接:
| 版本 | 链接 |
| ---- | ---- |
| CPU+MKL版 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.6.3-cpu-avx-mkl/fluid_inference.tgz) |
| CUDA9.0+MKL 版 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.6.3-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz) |
| CUDA10.0+MKL 版 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.6.3-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz) |
更多可用预测库版本,请点击以下链接下载:[C++预测库下载列表](https://paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/deploy/inference/build_and_install_lib_cn.html)
下载并解压, 解压后的 `fluid_inference`目录包含的内容:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
**注意:** 请把解压后的目录放到合适的路径,**该目录路径后续会作为编译依赖**使用。
## 2. 编译与运行
### 2.1 配置编译脚本
打开文件`linux_build.sh`, 看到以下内容:
```shell
# 是否使用GPU
WITH_GPU=OFF
# Paddle 预测库路径
PADDLE_DIR=/PATH/TO/fluid_inference/
# CUDA库路径, 仅 WITH_GPU=ON 时设置
CUDA_LIB=/PATH/TO/CUDA_LIB64/
# CUDNN库路径,仅 WITH_GPU=ON 且 CUDA_LIB有效时设置
CUDNN_LIB=/PATH/TO/CUDNN_LIB64/
# OpenCV 库路径, 无须设置
OPENCV_DIR=/PATH/TO/opencv3gcc4.8/
cd build
cmake .. \
-DWITH_GPU=${WITH_GPU} \
-DPADDLE_DIR=${PADDLE_DIR} \
-DCUDA_LIB=${CUDA_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DOPENCV_DIR=${OPENCV_DIR} \
-DWITH_STATIC_LIB=OFF
make -j4
```
把上述参数根据实际情况做修改后,运行脚本编译程序:
```shell
sh linux_build.sh
```
### 2.2. 运行和可视化
可执行文件有 **2** 个参数,第一个是前面导出的`inference_model`路径,第二个是需要预测的视频路径。
示例:
```shell
./build/main ./models /PATH/TO/TEST_VIDEO
```
点击下载[测试视频](https://paddleseg.bj.bcebos.com/deploy/data/test.avi)
预测的结果保存在视频文件`result.avi`中。
# 视频实时人像分割模型Windows平台C++预测部署
## 1. 系统和软件依赖
### 1.1 基础依赖
- Windows 10 / Windows Server 2016+ (其它平台未测试)
- Visual Studio 2019 (社区版或专业版均可)
- CUDA 9.0 / 10.0 + CUDNN 7.3+ (不支持9.1/10.1版本的CUDA)
### 1.2 下载OpenCV并设置环境变量
- 在OpenCV官网下载适用于Windows平台的3.4.6版本: [点击下载](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
- 运行下载的可执行文件,将OpenCV解压至合适目录,这里以解压到`D:\projects\opencv`为例
- 把OpenCV动态库加入到系统环境变量
- 此电脑(我的电脑)->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如D:\projects\opencv\build\x64\vc14\bin
**注意:** `OpenCV`的解压目录后续将做为编译配置项使用,所以请放置合适的目录中。
### 1.3 下载PaddlePaddle C++ 预测库
`PaddlePaddle` **C++ 预测库** 主要分为`CPU``GPU`版本, 其中`GPU版本`提供`CUDA 9.0``CUDA 10.0` 支持。
常用的版本如下:
| 版本 | 链接 |
| ---- | ---- |
| CPU+MKL版 | [fluid_inference_install_dir.zip](https://paddle-wheel.bj.bcebos.com/1.6.3/win-infer/mkl/cpu/fluid_inference_install_dir.zip) |
| CUDA9.0+MKL 版 | [fluid_inference_install_dir.zip](https://paddle-wheel.bj.bcebos.com/1.6.3/win-infer/mkl/post97/fluid_inference_install_dir.zip) |
| CUDA10.0+MKL 版 | [fluid_inference_install_dir.zip](https://paddle-wheel.bj.bcebos.com/1.6.3/win-infer/mkl/post107/fluid_inference_install_dir.zip) |
更多不同平台的可用预测库版本,请[点击查看](https://paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/deploy/inference/windows_cpp_inference.html) 选择适合你的版本。
下载并解压, 解压后的 `fluid_inference`目录包含的内容:
```
fluid_inference_install_dir
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
**注意:** 这里的`fluid_inference_install_dir` 目录所在路径,将用于后面的编译参数设置,请放置在合适的位置。
## 2. Visual Studio 2019 编译
- 2.1 打开Visual Studio 2019 Community,点击`继续但无需代码`, 如下图:
![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
- 2.2 点击 `文件`->`打开`->`CMake`, 如下图:
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
- 2.3 选择本项目根目录`CMakeList.txt`文件打开, 如下图:
![step2.3](https://paddleseg.bj.bcebos.com/deploy/docs/vs2019_step2.3.png)
- 2.4 点击:`项目`->`PaddleMaskDetector的CMake设置`
![step2.4](https://paddleseg.bj.bcebos.com/deploy/docs/vs2019_step2.4.png)
- 2.5 点击浏览设置`OPENCV_DIR`, `CUDA_LIB``PADDLE_DIR` 3个编译依赖库的位置, 设置完成后点击`保存并生成CMake缓存并加载变量`
![step2.5](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
- 2.6 点击`生成`->`全部生成` 编译项目
![step2.6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
## 3. 运行程序
成功编译后, 产出的可执行文件在项目子目录`out\build\x64-Release`目录, 按以下步骤运行代码:
- 打开`cmd`切换至该目录
- 运行以下命令传入模型路径与测试视频
```shell
main.exe ./models/ ./data/test.avi
```
第一个参数即人像分割预测模型的路径,第二个参数即要预测的视频。
点击下载[测试视频](https://paddleseg.bj.bcebos.com/deploy/data/test.avi)
运行后,预测结果保存在文件`result.avi`中。
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
# include "humanseg.h"
# include "humanseg_postprocess.h"
// Normalize the image by (pix - mean) * scale
void NormalizeImage(
const std::vector<float> &mean,
const std::vector<float> &scale,
cv::Mat& im, // NOLINT
float* input_buffer) {
int height = im.rows;
int width = im.cols;
int stride = width * height;
for (int h = 0; h < height; h++) {
for (int w = 0; w < width; w++) {
int base = h * width + w;
input_buffer[base + 0 * stride] =
(im.at<cv::Vec3f>(h, w)[0] - mean[0]) * scale[0];
input_buffer[base + 1 * stride] =
(im.at<cv::Vec3f>(h, w)[1] - mean[1]) * scale[1];
input_buffer[base + 2 * stride] =
(im.at<cv::Vec3f>(h, w)[2] - mean[2]) * scale[2];
}
}
}
// Load Model and return model predictor
void LoadModel(
const std::string& model_dir,
bool use_gpu,
std::unique_ptr<paddle::PaddlePredictor>* predictor) {
// Config the model info
paddle::AnalysisConfig config;
auto prog_file = model_dir + "/__model__";
auto params_file = model_dir + "/__params__";
config.SetModel(prog_file, params_file);
if (use_gpu) {
config.EnableUseGpu(100, 0);
} else {
config.DisableGpu();
}
config.SwitchUseFeedFetchOps(false);
config.SwitchSpecifyInputNames(true);
// Memory optimization
config.EnableMemoryOptim();
*predictor = std::move(CreatePaddlePredictor(config));
}
void HumanSeg::Preprocess(const cv::Mat& image_mat) {
// Clone the image : keep the original mat for postprocess
cv::Mat im = image_mat.clone();
auto eval_wh = cv::Size(eval_size_[0], eval_size_[1]);
cv::resize(im, im, eval_wh, 0.f, 0.f, cv::INTER_LINEAR);
im.convertTo(im, CV_32FC3, 1.0);
int rc = im.channels();
int rh = im.rows;
int rw = im.cols;
input_shape_ = {1, rc, rh, rw};
input_data_.resize(1 * rc * rh * rw);
float* buffer = input_data_.data();
NormalizeImage(mean_, scale_, im, input_data_.data());
}
cv::Mat HumanSeg::Postprocess(const cv::Mat& im) {
int h = input_shape_[2];
int w = input_shape_[3];
scoremap_data_.resize(3 * h * w * sizeof(float));
float* base = output_data_.data() + h * w;
for (int i = 0; i < h * w; ++i) {
scoremap_data_[i] = uchar(base[i] * 255);
}
cv::Mat im_scoremap = cv::Mat(h, w, CV_8UC1);
im_scoremap.data = scoremap_data_.data();
cv::resize(im_scoremap, im_scoremap, cv::Size(im.cols, im.rows));
im_scoremap.convertTo(im_scoremap, CV_32FC1, 1 / 255.0);
float* pblob = reinterpret_cast<float*>(im_scoremap.data);
int out_buff_capacity = 10 * im.cols * im.rows * sizeof(float);
segout_data_.resize(out_buff_capacity);
unsigned char* seg_result = segout_data_.data();
MergeProcess(im.data, pblob, im.rows, im.cols, seg_result);
cv::Mat seg_mat(im.rows, im.cols, CV_8UC1, seg_result);
cv::resize(seg_mat, seg_mat, cv::Size(im.cols, im.rows));
cv::GaussianBlur(seg_mat, seg_mat, cv::Size(5, 5), 0, 0);
float fg_threshold = 0.8;
float bg_threshold = 0.4;
cv::Mat show_seg_mat;
seg_mat.convertTo(seg_mat, CV_32FC1, 1 / 255.0);
ThresholdMask(seg_mat, fg_threshold, bg_threshold, show_seg_mat);
auto out_im = MergeSegMat(show_seg_mat, im);
return out_im;
}
cv::Mat HumanSeg::Predict(const cv::Mat& im) {
// Preprocess image
Preprocess(im);
// Prepare input tensor
auto input_names = predictor_->GetInputNames();
auto in_tensor = predictor_->GetInputTensor(input_names[0]);
in_tensor->Reshape(input_shape_);
in_tensor->copy_from_cpu(input_data_.data());
// Run predictor
predictor_->ZeroCopyRun();
// Get output tensor
auto output_names = predictor_->GetOutputNames();
auto out_tensor = predictor_->GetOutputTensor(output_names[0]);
auto output_shape = out_tensor->shape();
// Calculate output length
int output_size = 1;
for (int j = 0; j < output_shape.size(); ++j) {
output_size *= output_shape[j];
}
output_data_.resize(output_size);
out_tensor->copy_to_cpu(output_data_.data());
// Postprocessing result
return Postprocess(im);
}
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <string>
#include <vector>
#include <memory>
#include <utility>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/optflow.hpp>
#include "paddle_inference_api.h" // NOLINT
// Load Paddle Inference Model
void LoadModel(
const std::string& model_dir,
bool use_gpu,
std::unique_ptr<paddle::PaddlePredictor>* predictor);
class HumanSeg {
public:
explicit HumanSeg(const std::string& model_dir,
const std::vector<float>& mean,
const std::vector<float>& scale,
const std::vector<int>& eval_size,
bool use_gpu = false) :
mean_(mean),
scale_(scale),
eval_size_(eval_size) {
LoadModel(model_dir, use_gpu, &predictor_);
}
// Run predictor
cv::Mat Predict(const cv::Mat& im);
private:
// Preprocess image and copy data to input buffer
void Preprocess(const cv::Mat& im);
// Postprocess result
cv::Mat Postprocess(const cv::Mat& im);
std::unique_ptr<paddle::PaddlePredictor> predictor_;
std::vector<float> input_data_;
std::vector<int> input_shape_;
std::vector<float> output_data_;
std::vector<uchar> scoremap_data_;
std::vector<uchar> segout_data_;
std::vector<float> mean_;
std::vector<float> scale_;
std::vector<int> eval_size_;
};
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <iostream>
#include <string>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/optflow.hpp>
#include "humanseg_postprocess.h" // NOLINT
int HumanSegTrackFuse(const cv::Mat &track_fg_cfd,
const cv::Mat &dl_fg_cfd,
const cv::Mat &dl_weights,
const cv::Mat &is_track,
const float cfd_diff_thres,
const int patch_size,
cv::Mat cur_fg_cfd) {
float *cur_fg_cfd_ptr = reinterpret_cast<float*>(cur_fg_cfd.data);
float *dl_fg_cfd_ptr = reinterpret_cast<float*>(dl_fg_cfd.data);
float *track_fg_cfd_ptr = reinterpret_cast<float*>(track_fg_cfd.data);
float *dl_weights_ptr = reinterpret_cast<float*>(dl_weights.data);
uchar *is_track_ptr = reinterpret_cast<uchar*>(is_track.data);
int y_offset = 0;
int ptr_offset = 0;
int h = track_fg_cfd.rows;
int w = track_fg_cfd.cols;
float dl_fg_score = 0.0;
float track_fg_score = 0.0;
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x) {
dl_fg_score = dl_fg_cfd_ptr[ptr_offset];
if (is_track_ptr[ptr_offset] > 0) {
track_fg_score = track_fg_cfd_ptr[ptr_offset];
if (dl_fg_score > 0.9 || dl_fg_score < 0.1) {
if (dl_weights_ptr[ptr_offset] <= 0.10) {
cur_fg_cfd_ptr[ptr_offset] = dl_fg_score * 0.3
+ track_fg_score * 0.7;
} else {
cur_fg_cfd_ptr[ptr_offset] = dl_fg_score * 0.4
+ track_fg_score * 0.6;
}
} else {
cur_fg_cfd_ptr[ptr_offset] = dl_fg_score * dl_weights_ptr[ptr_offset]
+ track_fg_score * (1 - dl_weights_ptr[ptr_offset]);
}
} else {
cur_fg_cfd_ptr[ptr_offset] = dl_fg_score;
}
++ptr_offset;
}
y_offset += w;
ptr_offset = y_offset;
}
return 0;
}
int HumanSegTracking(const cv::Mat &prev_gray,
const cv::Mat &cur_gray,
const cv::Mat &prev_fg_cfd,
int patch_size,
cv::Mat track_fg_cfd,
cv::Mat is_track,
cv::Mat dl_weights,
cv::Ptr<cv::optflow::DISOpticalFlow> disflow) {
cv::Mat flow_fw;
disflow->calc(prev_gray, cur_gray, flow_fw);
cv::Mat flow_bw;
disflow->calc(cur_gray, prev_gray, flow_bw);
float double_check_thres = 8;
cv::Point2f fxy_fw;
int dy_fw = 0;
int dx_fw = 0;
cv::Point2f fxy_bw;
int dy_bw = 0;
int dx_bw = 0;
float *prev_fg_cfd_ptr = reinterpret_cast<float*>(prev_fg_cfd.data);
float *track_fg_cfd_ptr = reinterpret_cast<float*>(track_fg_cfd.data);
float *dl_weights_ptr = reinterpret_cast<float*>(dl_weights.data);
uchar *is_track_ptr = reinterpret_cast<uchar*>(is_track.data);
int prev_y_offset = 0;
int prev_ptr_offset = 0;
int cur_ptr_offset = 0;
float *flow_fw_ptr = reinterpret_cast<float*>(flow_fw.data);
float roundy_fw = 0.0;
float roundx_fw = 0.0;
float roundy_bw = 0.0;
float roundx_bw = 0.0;
int h = prev_fg_cfd.rows;
int w = prev_fg_cfd.cols;
for (int r = 0; r < h; ++r) {
for (int c = 0; c < w; ++c) {
++prev_ptr_offset;
fxy_fw = flow_fw.ptr<cv::Point2f>(r)[c];
roundy_fw = fxy_fw.y >= 0 ? 0.5 : -0.5;
roundx_fw = fxy_fw.x >= 0 ? 0.5 : -0.5;
dy_fw = static_cast<int>(fxy_fw.y + roundy_fw);
dx_fw = static_cast<int>(fxy_fw.x + roundx_fw);
int cur_x = c + dx_fw;
int cur_y = r + dy_fw;
if (cur_x < 0
|| cur_x >= h
|| cur_y < 0
|| cur_y >= w) {
continue;
}
fxy_bw = flow_bw.ptr<cv::Point2f>(cur_y)[cur_x];
roundy_bw = fxy_bw.y >= 0 ? 0.5 : -0.5;
roundx_bw = fxy_bw.x >= 0 ? 0.5 : -0.5;
dy_bw = static_cast<int>(fxy_bw.y + roundy_bw);
dx_bw = static_cast<int>(fxy_bw.x + roundx_bw);
auto total = (dy_fw + dy_bw) * (dy_fw + dy_bw)
+ (dx_fw + dx_bw) * (dx_fw + dx_bw);
if (total >= double_check_thres) {
continue;
}
cur_ptr_offset = cur_y * w + cur_x;
if (abs(dy_fw) <= 0
&& abs(dx_fw) <= 0
&& abs(dy_bw) <= 0
&& abs(dx_bw) <= 0) {
dl_weights_ptr[cur_ptr_offset] = 0.05;
}
is_track_ptr[cur_ptr_offset] = 1;
track_fg_cfd_ptr[cur_ptr_offset] = prev_fg_cfd_ptr[prev_ptr_offset];
}
prev_y_offset += w;
prev_ptr_offset = prev_y_offset - 1;
}
return 0;
}
int MergeProcess(const uchar *im_buff,
const float *scoremap_buff,
const int height,
const int width,
uchar *result_buff) {
cv::Mat prev_fg_cfd;
cv::Mat cur_fg_cfd;
cv::Mat cur_fg_mask;
cv::Mat track_fg_cfd;
cv::Mat prev_gray;
cv::Mat cur_gray;
cv::Mat bgr_temp;
cv::Mat is_track;
cv::Mat static_roi;
cv::Mat weights;
cv::Ptr<cv::optflow::DISOpticalFlow> disflow =
cv::optflow::createOptFlow_DIS(
cv::optflow::DISOpticalFlow::PRESET_ULTRAFAST);
bool is_init = false;
const float *cfd_ptr = scoremap_buff;
if (!is_init) {
is_init = true;
cur_fg_cfd = cv::Mat(height, width, CV_32FC1, cv::Scalar::all(0));
memcpy(cur_fg_cfd.data, cfd_ptr, height * width * sizeof(float));
cur_fg_mask = cv::Mat(height, width, CV_8UC1, cv::Scalar::all(0));
if (height <= 64 || width <= 64) {
disflow->setFinestScale(1);
} else if (height <= 160 || width <= 160) {
disflow->setFinestScale(2);
} else {
disflow->setFinestScale(3);
}
is_track = cv::Mat(height, width, CV_8UC1, cv::Scalar::all(0));
static_roi = cv::Mat(height, width, CV_8UC1, cv::Scalar::all(0));
track_fg_cfd = cv::Mat(height, width, CV_32FC1, cv::Scalar::all(0));
bgr_temp = cv::Mat(height, width, CV_8UC3);
memcpy(bgr_temp.data, im_buff, height * width * 3 * sizeof(uchar));
cv::cvtColor(bgr_temp, cur_gray, cv::COLOR_BGR2GRAY);
weights = cv::Mat(height, width, CV_32FC1, cv::Scalar::all(0.30));
} else {
memcpy(cur_fg_cfd.data, cfd_ptr, height * width * sizeof(float));
memcpy(bgr_temp.data, im_buff, height * width * 3 * sizeof(uchar));
cv::cvtColor(bgr_temp, cur_gray, cv::COLOR_BGR2GRAY);
memset(is_track.data, 0, height * width * sizeof(uchar));
memset(static_roi.data, 0, height * width * sizeof(uchar));
weights = cv::Mat(height, width, CV_32FC1, cv::Scalar::all(0.30));
HumanSegTracking(prev_gray,
cur_gray,
prev_fg_cfd,
0,
track_fg_cfd,
is_track,
weights,
disflow);
HumanSegTrackFuse(track_fg_cfd,
cur_fg_cfd,
weights,
is_track,
1.1,
0,
cur_fg_cfd);
}
int ksize = 3;
cv::GaussianBlur(cur_fg_cfd, cur_fg_cfd, cv::Size(ksize, ksize), 0, 0);
prev_fg_cfd = cur_fg_cfd.clone();
prev_gray = cur_gray.clone();
cur_fg_cfd.convertTo(cur_fg_mask, CV_8UC1, 255);
memcpy(result_buff, cur_fg_mask.data, height * width);
return 0;
}
cv::Mat MergeSegMat(const cv::Mat& seg_mat,
const cv::Mat& ori_frame) {
cv::Mat return_frame;
cv::resize(ori_frame, return_frame, cv::Size(ori_frame.cols, ori_frame.rows));
for (int i = 0; i < ori_frame.rows; i++) {
for (int j = 0; j < ori_frame.cols; j++) {
float score = seg_mat.at<uchar>(i, j) / 255.0;
if (score > 0.1) {
return_frame.at<cv::Vec3b>(i, j)[2] = static_cast<int>((1 - score) * 255
+ score*return_frame.at<cv::Vec3b>(i, j)[2]);
return_frame.at<cv::Vec3b>(i, j)[1] = static_cast<int>((1 - score) * 255
+ score*return_frame.at<cv::Vec3b>(i, j)[1]);
return_frame.at<cv::Vec3b>(i, j)[0] = static_cast<int>((1 - score) * 255
+ score*return_frame.at<cv::Vec3b>(i, j)[0]);
} else {
return_frame.at<cv::Vec3b>(i, j) = {255, 255, 255};
}
}
}
return return_frame;
}
int ThresholdMask(const cv::Mat &fg_cfd,
const float fg_thres,
const float bg_thres,
cv::Mat& fg_mask) {
if (fg_cfd.type() != CV_32FC1) {
printf("ThresholdMask: type is not CV_32FC1.\n");
return -1;
}
if (!(fg_mask.type() == CV_8UC1
&& fg_mask.rows == fg_cfd.rows
&& fg_mask.cols == fg_cfd.cols)) {
fg_mask = cv::Mat(fg_cfd.rows, fg_cfd.cols, CV_8UC1, cv::Scalar::all(0));
}
for (int r = 0; r < fg_cfd.rows; ++r) {
for (int c = 0; c < fg_cfd.cols; ++c) {
float score = fg_cfd.at<float>(r, c);
if (score < bg_thres) {
fg_mask.at<uchar>(r, c) = 0;
} else if (score > fg_thres) {
fg_mask.at<uchar>(r, c) = 255;
} else {
fg_mask.at<uchar>(r, c) = static_cast<int>(
(score-bg_thres) / (fg_thres - bg_thres) * 255);
}
}
}
return 0;
}
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <opencv2/core/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/optflow.hpp>
int ThresholdMask(const cv::Mat &fg_cfd,
const float fg_thres,
const float bg_thres,
cv::Mat& fg_mask);
cv::Mat MergeSegMat(const cv::Mat& seg_mat,
const cv::Mat& ori_frame);
int MergeProcess(const uchar *im_buff,
const float *im_scoremap_buff,
const int height,
const int width,
uchar *result_buff);
OPENCV_URL=https://paddleseg.bj.bcebos.com/deploy/deps/opencv346.tar.bz2
if [ ! -d "./deps/opencv346" ]; then
mkdir -p deps
cd deps
wget -c ${OPENCV_URL}
tar xvfj opencv346.tar.bz2
rm -rf opencv346.tar.bz2
cd ..
fi
WITH_GPU=OFF
PADDLE_DIR=/root/projects/deps/fluid_inference/
CUDA_LIB=/usr/local/cuda-10.0/lib64/
CUDNN_LIB=/usr/local/cuda-10.0/lib64/
OPENCV_DIR=$(pwd)/deps/opencv346/
echo ${OPENCV_DIR}
rm -rf build
mkdir -p build
cd build
cmake .. \
-DWITH_GPU=${WITH_GPU} \
-DPADDLE_DIR=${PADDLE_DIR} \
-DCUDA_LIB=${CUDA_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DOPENCV_DIR=${OPENCV_DIR} \
-DWITH_STATIC_LIB=OFF
make clean
make -j12
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <iostream>
#include <string>
#include "humanseg.h" // NOLINT
#include "humanseg_postprocess.h" // NOLINT
// Do predicting on a video file
int VideoPredict(const std::string& video_path, HumanSeg& seg)
{
cv::VideoCapture capture;
capture.open(video_path.c_str());
if (!capture.isOpened()) {
printf("can not open video : %s\n", video_path.c_str());
return -1;
}
int video_width = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_WIDTH));
int video_height = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
cv::VideoWriter video_out;
std::string video_out_path = "result.avi";
video_out.open(video_out_path.c_str(),
CV_FOURCC('M', 'J', 'P', 'G'),
30.0,
cv::Size(video_width, video_height),
true);
if (!video_out.isOpened()) {
printf("create video writer failed!\n");
return -1;
}
cv::Mat frame;
while (capture.read(frame)) {
if (frame.empty()) {
break;
}
cv::Mat out_im = seg.Predict(frame);
video_out.write(out_im);
}
capture.release();
video_out.release();
return 0;
}
// Do predicting on a image file
int ImagePredict(const std::string& image_path, HumanSeg& seg)
{
cv::Mat img = imread(image_path, cv::IMREAD_COLOR);
cv::Mat out_im = seg.Predict(img);
imwrite("result.jpeg", out_im);
return 0;
}
int main(int argc, char* argv[]) {
if (argc < 3 || argc > 4) {
std::cout << "Usage:"
<< "./humanseg ./models/ ./data/test.avi"
<< std::endl;
return -1;
}
bool use_gpu = (argc == 4 ? std::stoi(argv[3]) : false);
auto model_dir = std::string(argv[1]);
auto input_path = std::string(argv[2]);
// Init Model
std::vector<float> means = {104.008, 116.669, 122.675};
std::vector<float> scale = {1.000, 1.000, 1.000};
std::vector<int> eval_sz = {192, 192};
HumanSeg seg(model_dir, means, scale, eval_sz, use_gpu);
// Call ImagePredict while input_path is a image file path
// The output will be saved as result.jpeg
// ImagePredict(input_path, seg);
// Call VideoPredict while input_path is a video file path
// The output will be saved as result.avi
VideoPredict(input_path, seg);
return 0;
}
# 实时人像分割Python预测部署方案
本方案基于Python实现,最小化依赖并把所有模型加载、数据预处理、预测、光流处理等后处理都封装在文件`infer.py`中,用户可以直接使用或集成到自己项目中。
## 前置依赖
- Windows(7,8,10) / Linux (Ubuntu 16.04) or MacOS 10.1+
- Paddle 1.6.1+
- Python 3.0+
注意:
1. 仅测试过Paddle1.6 和 1.7, 其它版本不支持
2. MacOS上不支持GPU预测
3. Python2上未测试
其它未涉及情形,能正常安装`Paddle``OpenCV`通常都能正常使用。
## 安装依赖
### 1. 安装paddle
PaddlePaddle的安装, 请按照[官网指引](https://paddlepaddle.org.cn/install/quick)安装合适自己的版本。
### 2. 安装其它依赖
执行如下命令
```shell
pip install -r requirements.txt
```
## 运行
1. 输入图片进行分割
```
python infer.py --model_dir /PATH/TO/INFERENCE/MODEL --img_path /PATH/TO/INPUT/IMAGE
```
预测结果会保存为`result.jpeg`
2. 输入视频进行分割
```shell
python infer.py --model_dir /PATH/TO/INFERENCE/MODEL --video_path /PATH/TO/INPUT/VIDEO
```
预测结果会保存在`result.avi`
3. 使用摄像头视频流
```shell
python infer.py --model_dir /PATH/TO/INFERENCE/MODEL --use_camera 1
```
预测结果会通过可视化窗口实时显示。
**注意:**
`GPU`默认关闭, 如果要使用`GPU`进行加速,则先运行
```
export CUDA_VISIBLE_DEVICES=0
```
然后在前面的预测命令中增加参数`--use_gpu 1`即可。
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""实时人像分割Python预测部署"""
import os
import argparse
import numpy as np
import cv2
import paddle.fluid as fluid
def human_seg_tracking(pre_gray, cur_gray, prev_cfd, dl_weights, disflow):
"""计算光流跟踪匹配点和光流图
输入参数:
pre_gray: 上一帧灰度图
cur_gray: 当前帧灰度图
prev_cfd: 上一帧光流图
dl_weights: 融合权重图
disflow: 光流数据结构
返回值:
is_track: 光流点跟踪二值图,即是否具有光流点匹配
track_cfd: 光流跟踪图
"""
check_thres = 8
hgt, wdh = pre_gray.shape[:2]
track_cfd = np.zeros_like(prev_cfd)
is_track = np.zeros_like(pre_gray)
# 计算前向光流
flow_fw = disflow.calc(pre_gray, cur_gray, None)
# 计算后向光流
flow_bw = disflow.calc(cur_gray, pre_gray, None)
get_round = lambda data: (int)(data + 0.5) if data >= 0 else (int)(data -0.5)
for row in range(hgt):
for col in range(wdh):
# 计算光流处理后对应点坐标
# (row, col) -> (cur_x, cur_y)
fxy_fw = flow_fw[row, col]
dx_fw = get_round(fxy_fw[0])
cur_x = dx_fw + col
dy_fw = get_round(fxy_fw[1])
cur_y = dy_fw + row
if cur_x < 0 or cur_x >= wdh or cur_y < 0 or cur_y >= hgt:
continue
fxy_bw = flow_bw[cur_y, cur_x]
dx_bw = get_round(fxy_bw[0])
dy_bw = get_round(fxy_bw[1])
# 光流移动小于阈值
lmt = ((dy_fw + dy_bw) * (dy_fw + dy_bw) + (dx_fw + dx_bw) * (dx_fw + dx_bw))
if lmt >= check_thres:
continue
# 静止点降权
if abs(dy_fw) <= 0 and abs(dx_fw) <= 0 and abs(dy_bw) <= 0 and abs(dx_bw) <= 0:
dl_weights[cur_y, cur_x] = 0.05
is_track[cur_y, cur_x] = 1
track_cfd[cur_y, cur_x] = prev_cfd[row, col]
return track_cfd, is_track, dl_weights
def human_seg_track_fuse(track_cfd, dl_cfd, dl_weights, is_track):
"""光流追踪图和人像分割结构融合
输入参数:
track_cfd: 光流追踪图
dl_cfd: 当前帧分割结果
dl_weights: 融合权重图
is_track: 光流点匹配二值图
返回值:
cur_cfd: 光流跟踪图和人像分割结果融合图
"""
cur_cfd = dl_cfd.copy()
idxs = np.where(is_track > 0)
for i in range(len(idxs)):
x, y = idxs[0][i], idxs[1][i]
dl_score = dl_cfd[y, x]
track_score = track_cfd[y, x]
if dl_score > 0.9 or dl_score < 0.1:
if dl_weights[x, y] < 0.1:
cur_cfd[x, y] = 0.3 * dl_score + 0.7 * track_score
else:
cur_cfd[x, y] = 0.4 * dl_score + 0.6 * track_score
else:
cur_cfd[x, y] = dl_weights[x, y] * dl_score + (1 - dl_weights[x, y]) * track_score
return cur_cfd
def threshold_mask(img, thresh_bg, thresh_fg):
"""设置背景和前景阈值mask
输入参数:
img : 原始图像, np.uint8 类型.
thresh_bg : 背景阈值百分比,低于该值置为0.
thresh_fg : 前景阈值百分比,超过该值置为1.
返回值:
dst : 原始图像设置完前景背景阈值mask结果, np.float32 类型.
"""
dst = (img / 255.0 - thresh_bg) / (thresh_fg - thresh_bg)
dst[np.where(dst > 1)] = 1
dst[np.where(dst < 0)] = 0
return dst.astype(np.float32)
def optflow_handle(cur_gray, scoremap, is_init):
"""光流优化
Args:
cur_gray : 当前帧灰度图
scoremap : 当前帧分割结果
is_init : 是否第一帧
Returns:
dst : 光流追踪图和预测结果融合图, 类型为 np.float32
"""
width, height = scoremap.shape[0], scoremap.shape[1]
disflow = cv2.DISOpticalFlow_create(
cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
prev_gray = np.zeros((height, width), np.uint8)
prev_cfd = np.zeros((height, width), np.float32)
cur_cfd = scoremap.copy()
if is_init:
is_init = False
if height <= 64 or width <= 64:
disflow.setFinestScale(1)
elif height <= 160 or width <= 160:
disflow.setFinestScale(2)
else:
disflow.setFinestScale(3)
fusion_cfd = cur_cfd
else:
weights = np.ones((width, height), np.float32) * 0.3
track_cfd, is_track, weights = human_seg_tracking(
prev_gray, cur_gray, prev_cfd, weights, disflow)
fusion_cfd = human_seg_track_fuse(track_cfd, cur_cfd, weights, is_track)
fusion_cfd = cv2.GaussianBlur(fusion_cfd, (3, 3), 0)
return fusion_cfd
class HumanSeg:
"""人像分割类
封装了人像分割模型的加载,数据预处理,预测,后处理等
"""
def __init__(self, model_dir, mean, scale, eval_size, use_gpu=False):
self.mean = np.array(mean).reshape((3, 1, 1))
self.scale = np.array(scale).reshape((3, 1, 1))
self.eval_size = eval_size
self.load_model(model_dir, use_gpu)
def load_model(self, model_dir, use_gpu):
"""加载模型并创建predictor
Args:
model_dir: 预测模型路径, 包含 `__model__` 和 `__params__`
use_gpu: 是否使用GPU加速
"""
prog_file = os.path.join(model_dir, '__model__')
params_file = os.path.join(model_dir, '__params__')
config = fluid.core.AnalysisConfig(prog_file, params_file)
if use_gpu:
config.enable_use_gpu(100, 0)
config.switch_ir_optim(True)
else:
config.disable_gpu()
config.disable_glog_info()
config.switch_specify_input_names(True)
config.enable_memory_optim()
self.predictor = fluid.core.create_paddle_predictor(config)
def preprocess(self, image):
"""图像预处理
hwc_rgb 转换为 chw_bgr,并进行归一化
输入参数:
image: 原始图像
返回值:
经过预处理后的图片结果
"""
img_mat = cv2.resize(
image, self.eval_size, interpolation=cv2.INTER_LINEAR)
# HWC -> CHW
img_mat = img_mat.swapaxes(1, 2)
img_mat = img_mat.swapaxes(0, 1)
# Convert to float
img_mat = img_mat[:, :, :].astype('float32')
# img_mat = (img_mat - mean) * scale
img_mat = img_mat - self.mean
img_mat = img_mat * self.scale
img_mat = img_mat[np.newaxis, :, :, :]
return img_mat
def postprocess(self, image, output_data):
"""对预测结果进行后处理
Args:
image: 原始图,opencv 图片对象
output_data: Paddle预测结果原始数据
Returns:
原图和预测结果融合并做了光流优化的结果图
"""
scoremap = output_data[0, 1, :, :]
scoremap = (scoremap * 255).astype(np.uint8)
ori_h, ori_w = image.shape[0], image.shape[1]
evl_h, evl_w = self.eval_size[0], self.eval_size[1]
# 光流处理
cur_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (evl_w, evl_h))
optflow_map = optflow_handle(cur_gray, scoremap, False)
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(optflow_map, thresh_bg=0.2, thresh_fg=0.8)
optflow_map = cv2.resize(optflow_map, (ori_w, ori_h))
optflow_map = np.repeat(optflow_map[:, :, np.newaxis], 3, axis=2)
bg_im = np.ones_like(optflow_map) * 255
comb = (optflow_map * image + (1 - optflow_map) * bg_im).astype(np.uint8)
return comb
def run_predict(self, image):
"""运行预测并返回可视化结果图
输入参数:
image: 需要预测的原始图, opencv图片对象
返回值:
可视化的预测结果图
"""
im_mat = self.preprocess(image)
im_tensor = fluid.core.PaddleTensor(im_mat.copy().astype('float32'))
output_data = self.predictor.run([im_tensor])[0]
output_data = output_data.as_ndarray()
return self.postprocess(image, output_data)
def predict_image(seg, image_path):
"""对图片文件进行分割
结果保存到`result.jpeg`文件中
"""
img_mat = cv2.imread(image_path)
img_mat = seg.run_predict(img_mat)
cv2.imwrite('result.jpeg', img_mat)
def predict_video(seg, video_path):
"""对视频文件进行分割
结果保存到`result.avi`文件中
"""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error opening video stream or file")
return
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
# 用于保存预测结果视频
out = cv2.VideoWriter('result.avi',
cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps,
(width, height))
# 开始获取视频帧
while cap.isOpened():
ret, frame = cap.read()
if ret:
img_mat = seg.run_predict(frame)
out.write(img_mat)
else:
break
cap.release()
out.release()
def predict_camera(seg):
"""从摄像头获取视频流进行预测
视频分割结果实时显示到可视化窗口中
"""
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error opening video stream or file")
return
# Start capturing from video
while cap.isOpened():
ret, frame = cap.read()
if ret:
img_mat = seg.run_predict(frame)
cv2.imshow('HumanSegmentation', img_mat)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
def main(args):
"""预测程序入口
完成模型加载, 对视频、摄像头、图片文件等预测过程
"""
model_dir = args.model_dir
use_gpu = args.use_gpu
# 加载模型
mean = [104.008, 116.669, 122.675]
scale = [1.0, 1.0, 1.0]
eval_size = (192, 192)
seg = HumanSeg(model_dir, mean, scale, eval_size, use_gpu)
if args.use_camera:
# 开启摄像头
predict_camera(seg)
elif args.video_path:
# 使用视频文件作为输入
predict_video(seg, args.video_path)
elif args.img_path:
# 使用图片文件作为输入
predict_image(seg, args.img_path)
def parse_args():
"""解析命令行参数
"""
parser = argparse.ArgumentParser('Realtime Human Segmentation')
parser.add_argument('--model_dir',
type=str,
default='',
help='path of human segmentation model')
parser.add_argument('--img_path',
type=str,
default='',
help='path of input image')
parser.add_argument('--video_path',
type=str,
default='',
help='path of input video')
parser.add_argument('--use_camera',
type=bool,
default=False,
help='input video stream from camera')
parser.add_argument('--use_gpu',
type=bool,
default=False,
help='enable gpu')
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
main(args)
opencv-python==4.1.2.30
opencv-contrib-python==4.2.0.32
# 遥感分割(RemoteSensing)
遥感影像分割是图像分割领域中的重要应用场景,广泛应用于土地测绘、环境监测、城市建设等领域。遥感影像分割的目标多种多样,有诸如积雪、农作物、道路、建筑、水源等地物目标,也有例如云层的空中目标。
PaddleSeg提供了针对遥感专题的语义分割库RemoteSensing,涵盖图像预处理、数据增强、模型训练、预测流程,帮助用户利用深度学习技术解决遥感影像分割问题。
## 特点
针对遥感数据多通道、分布范围大、分布不均的特点,我们支持多通道训练预测,内置一系列多通道预处理和数据增强的策略,可结合实际业务场景进行定制组合,提升模型泛化能力和鲁棒性。
**Note:** 所有命令需要在`PaddleSeg/contrib/RemoteSensing/`目录下执行。
## 前置依赖
- Paddle 1.7.1+
由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用。
PaddlePaddle的安装, 请按照[官网指引](https://paddlepaddle.org.cn/install/quick)安装合适自己的版本。
- Python 3.5+
- 其他依赖安装
通过以下命令安装python包依赖,请确保至少执行过一次以下命令:
```
cd RemoteSensing
pip install -r requirements.txt
```
## 目录结构说明
```
RemoteSensing # 根目录
|-- dataset # 数据集
|-- docs # 文档
|-- models # 模型类定义模块
|-- nets # 组网模块
|-- readers # 数据读取模块
|-- tools # 工具集
|-- transforms # 数据增强模块
|-- utils # 公用模块
|-- train_demo.py # 训练demo脚本
|-- predict_demo.py # 预测demo脚本
|-- README.md # 使用手册
```
## 数据协议
数据集包含原图、标注图及相应的文件列表文件。
参考数据文件结构如下:
```
./dataset/ # 数据集根目录
|--images # 原图目录
| |--xxx1.npy
| |--...
| └--...
|
|--annotations # 标注图目录
| |--xxx1.png
| |--...
| └--...
|
|--train_list.txt # 训练文件列表文件
|
|--val_list.txt # 验证文件列表文件
|
└--labels.txt # 标签列表
```
其中,相应的文件名可根据需要自行定义。
遥感领域图像格式多种多样,不同传感器产生的数据格式可能不同。为方便数据加载,本分割库统一采用numpy存储格式`npy`作为原图格式,采用`png`无损压缩格式作为标注图片格式。
原图的前两维是图像的尺寸,第3维是图像的通道数。
标注图像为单通道图像,像素值即为对应的类别,像素标注类别需要从0开始递增,
例如0,1,2,3表示有4种类别,标注类别最多为256类。其中可以指定特定的像素值用于表示该值的像素不参与训练和评估(默认为255)。
`train_list.txt``val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为标注图像文件相对于dataset的相对路径。如下所示:
```
images/xxx1.npy annotations/xxx1.png
images/xxx2.npy annotations/xxx2.png
...
```
具体要求和如何生成文件列表可参考[文件列表规范](../../docs/data_prepare.md#文件列表)
`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
```
labelA
labelB
...
```
## 快速上手
本章节在一个小数据集上展示了如何通过RemoteSensing进行训练预测。
### 1. 准备数据集
为了快速体验,我们准备了一个小型demo数据集,已位于`RemoteSensing/dataset/demo/`目录下.
对于您自己的数据集,您需要按照上述的数据协议进行格式转换,可分别使用numpy和pil库保存遥感数据和标注图片。其中numpy api示例如下:
```python
import numpy as np
# 保存遥感数据
# img类型:numpy.ndarray
np.save(save_path, img)
```
### 2. 训练代码开发
通过如下`train_demo.py`代码进行训练。
> 导入RemoteSensing api
```python
import transforms.transforms as T
from readers.reader import Reader
from models import UNet
```
> 定义训练和验证时的数据处理和增强流程, 在`train_transforms`中加入了`RandomVerticalFlip`,`RandomHorizontalFlip`等数据增强方式。
```python
train_transforms = T.Compose([
T.RandomVerticalFlip(0.5),
T.RandomHorizontalFlip(0.5),
T.ResizeStepScaling(0.5, 2.0, 0.25),
T.RandomPaddingCrop(256),
T.Normalize(mean=[0.5] * channel, std=[0.5] * channel),
])
eval_transforms = T.Compose([
T.Normalize(mean=[0.5] * channel, std=[0.5] * channel),
])
```
> 定义数据读取器
```python
import os
import os.path as osp
train_list = osp.join(data_dir, 'train.txt')
val_list = osp.join(data_dir, 'val.txt')
label_list = osp.join(data_dir, 'labels.txt')
train_reader = Reader(
data_dir=data_dir,
file_list=train_list,
label_list=label_list,
transforms=train_transforms,
num_workers=8,
buffer_size=16,
shuffle=True,
parallel_method='thread')
eval_reader = Reader(
data_dir=data_dir,
file_list=val_list,
label_list=label_list,
transforms=eval_transforms,
num_workers=8,
buffer_size=16,
shuffle=False,
parallel_method='thread')
```
> 模型构建
```python
model = UNet(
num_classes=2, input_channel=channel, use_bce_loss=True, use_dice_loss=True)
```
> 模型训练,并开启边训边评估
```python
model.train(
num_epochs=num_epochs,
train_reader=train_reader,
train_batch_size=train_batch_size,
eval_reader=eval_reader,
save_interval_epochs=5,
log_interval_steps=10,
save_dir=save_dir,
pretrain_weights=None,
optimizer=None,
learning_rate=lr,
use_vdl=True
)
```
### 3. 模型训练
> 设置GPU卡号
```shell script
export CUDA_VISIBLE_DEVICES=0
```
> 在RemoteSensing目录下运行`train_demo.py`即可开始训练。
```shell script
python train_demo.py --data_dir dataset/demo/ --save_dir saved_model/unet/ --channel 3 --num_epochs 20
```
### 4. 模型预测代码开发
通过如下`predict_demo.py`代码进行预测。
> 导入RemoteSensing api
```python
from models import load_model
```
> 加载训练过程中最好的模型,设置预测结果保存路径。
```python
import os
import os.path as osp
model = load_model(osp.join(save_dir, 'best_model'))
pred_dir = osp.join(save_dir, 'pred')
if not osp.exists(pred_dir):
os.mkdir(pred_dir)
```
> 使用模型对验证集进行测试,并保存预测结果。
```python
import numpy as np
from PIL import Image as Image
val_list = osp.join(data_dir, 'val.txt')
color_map = [0, 0, 0, 255, 255, 255]
with open(val_list) as f:
lines = f.readlines()
for line in lines:
img_path = line.split(' ')[0]
print('Predicting {}'.format(img_path))
img_path_ = osp.join(data_dir, img_path)
pred = model.predict(img_path_)
# 以伪彩色png图片保存预测结果
pred_name = osp.basename(img_path).rstrip('npy') + 'png'
pred_path = osp.join(pred_dir, pred_name)
pred_mask = Image.fromarray(pred.astype(np.uint8), mode='P')
pred_mask.putpalette(color_map)
pred_mask.save(pred_path)
```
### 5. 模型预测
> 设置GPU卡号
```shell script
export CUDA_VISIBLE_DEVICES=0
```
> 在RemoteSensing目录下运行`predict_demo.py`即可开始训练。
```shell script
python predict_demo.py --data_dir dataset/demo/ --load_model_dir saved_model/unet/best_model/
```
## Api说明
您可以使用`RemoteSensing`目录下提供的api构建自己的分割代码。
- [数据处理-transforms](docs/transforms.md)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import utils
import nets
import models
import transforms
import readers
from utils.utils import get_environ_info
env_info = get_environ_info()
log_level = 2
__background__
cloud
\ No newline at end of file
images/1001.npy annotations/1001.png
images/1002.npy annotations/1002.png
images/1005.npy annotations/1005.png
images/0.npy annotations/0.png
images/1003.npy annotations/1003.png
images/1000.npy annotations/1000.png
images/1004.npy annotations/1004.png
images/100.npy annotations/100.png
images/1.npy annotations/1.png
images/10.npy annotations/10.png
# transforms.transforms
对用于分割任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
## Compose类
```python
transforms.transforms.Compose(transforms)
```
根据数据预处理/数据增强列表对输入数据进行操作。
### 参数
* **transforms** (list): 数据预处理/数据增强列表。
## RandomHorizontalFlip类
```python
transforms.transforms.RandomHorizontalFlip(prob=0.5)
```
以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机水平翻转的概率。默认值为0.5。
## RandomVerticalFlip类
```python
transforms.transforms.RandomVerticalFlip(prob=0.1)
```
以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机垂直翻转的概率。默认值为0.1。
## Resize类
```python
transforms.transforms.Resize(target_size, interp='LINEAR')
```
调整图像大小(resize)。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
### 参数
* **target_size** (int|list|tuple): 目标大小
* **interp** (str): resize的插值方式,与opencv的插值方式对应,
可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
## ResizeByLong类
```python
transforms.transforms.ResizeByLong(long_size)
```
对图像长边resize到固定值,短边按比例进行缩放。
### 参数
* **long_size** (int): resize后图像的长边大小。
## ResizeRangeScaling类
```python
transforms.transforms.ResizeRangeScaling(min_value=400, max_value=600)
```
对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。
### 参数
* **min_value** (int): 图像长边resize后的最小值。默认值400。
* **max_value** (int): 图像长边resize后的最大值。默认值600。
## ResizeStepScaling类
```python
transforms.transforms.ResizeStepScaling(min_scale_factor=0.75, max_scale_factor=1.25, scale_step_size=0.25)
```
对图像按照某一个比例resize,这个比例以scale_step_size为步长,在[min_scale_factor, max_scale_factor]随机变动,模型训练时的数据增强操作。
### 参数
* **min_scale_factor**(float), resize最小尺度。默认值0.75。
* **max_scale_factor** (float), resize最大尺度。默认值1.25。
* **scale_step_size** (float), resize尺度范围间隔。默认值0.25。
## Clip类
```python
transforms.transforms.Clip(min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0])
```
对图像上超出一定范围的数据进行裁剪。
### 参数
* **min_var** (list): 裁剪的下限,小于min_val的数值均设为min_val. 默认值[0, 0, 0].
* **max_var** (list): 裁剪的上限,大于max_val的数值均设为max_val. 默认值[255.0, 255.0, 255.0]
## Normalize类
```python
transforms.transforms.Normalize(min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0], mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
```
对图像进行标准化。
1.图像像素归一化到区间 [0.0, 1.0]。
2.对图像进行减均值除以标准差操作。
### 参数
* **min_val** (list): 图像数据集的最小值。默认值[0, 0, 0].
* **max_val** (list): 图像数据集的最大值。默认值[255.0, 255.0, 255.0]
* **mean** (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
* **std** (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
## Padding类
```python
transforms.transforms.Padding(target_size, im_padding_value=127.5, label_padding_value=255)
```
对图像或标注图像进行padding,padding方向为右和下。根据提供的值对图像或标注图像进行padding操作。
### 参数
* **target_size** (int|list|tuple): padding后图像的大小。
* **im_padding_value** (list): 图像padding的值。默认为127.5
* **label_padding_value** (int): 标注图像padding的值。默认值为255(仅在训练时需要设定该参数)。
## RandomPaddingCrop类
```python
transforms.transforms.RandomPaddingCrop(crop_size=512, im_padding_value=127.5, label_padding_value=255)
```
对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作,模型训练时的数据增强操作。
### 参数
* **crop_size**(int|list|tuple): 裁剪图像大小。默认为512。
* **im_padding_value** (list): 图像padding的值。默认为127.5。
* **label_padding_value** (int): 标注图像padding的值。默认值为255。
## RandomBlur类
```python
transforms.transforms.RandomBlur(prob=0.1)
```
以一定的概率对图像进行高斯模糊,模型训练时的数据增强操作。
### 参数
* **prob** (float): 图像模糊概率。默认为0.1。
## RandomScaleAspect类
```python
transforms.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33)
```
裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
### 参数
* **min_scale** (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
* **aspect_ratio** (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
from .load_model import *
from .unet import *
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import paddle.fluid as fluid
import os
import numpy as np
import time
import math
import yaml
import copy
import json
import utils.logging as logging
from collections import OrderedDict
from os import path as osp
from utils.pretrain_weights import get_pretrain_weights
import transforms.transforms as T
import utils
import __init__
def dict2str(dict_input):
out = ''
for k, v in dict_input.items():
try:
v = round(float(v), 6)
except:
pass
out = out + '{}={}, '.format(k, v)
return out.strip(', ')
class BaseAPI:
def __init__(self):
# 现有的CV模型都有这个属性,而这个属且也需要在eval时用到
self.num_classes = None
self.labels = None
if __init__.env_info['place'] == 'cpu':
self.places = fluid.cpu_places()
else:
self.places = fluid.cuda_places()
self.exe = fluid.Executor(self.places[0])
self.train_prog = None
self.test_prog = None
self.parallel_train_prog = None
self.train_inputs = None
self.test_inputs = None
self.train_outputs = None
self.test_outputs = None
self.train_data_loader = None
self.eval_metrics = None
# 若模型是从inference model加载进来的,无法调用训练接口进行训练
self.trainable = True
# 是否使用多卡间同步BatchNorm均值和方差
self.sync_bn = False
# 当前模型状态
self.status = 'Normal'
def _get_single_card_bs(self, batch_size):
if batch_size % len(self.places) == 0:
return int(batch_size // len(self.places))
else:
raise Exception("Please support correct batch_size, \
which can be divided by available cards({}) in {}".
format(__init__.env_info['num'],
__init__.env_info['place']))
def build_program(self):
# 构建训练网络
self.train_inputs, self.train_outputs = self.build_net(mode='train')
self.train_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program()
# 构建预测网络
self.test_prog = fluid.Program()
with fluid.program_guard(self.test_prog, startup_prog):
with fluid.unique_name.guard():
self.test_inputs, self.test_outputs = self.build_net(
mode='test')
self.test_prog = self.test_prog.clone(for_test=True)
def arrange_transforms(self, transforms, mode='train'):
# 给transforms添加arrange操作
if transforms.transforms[-1].__class__.__name__.startswith('Arrange'):
transforms.transforms[-1] = T.ArrangeSegmenter(mode=mode)
else:
transforms.transforms.append(T.ArrangeSegmenter(mode=mode))
def build_train_data_loader(self, reader, batch_size):
# 初始化data_loader
if self.train_data_loader is None:
self.train_data_loader = fluid.io.DataLoader.from_generator(
feed_list=list(self.train_inputs.values()),
capacity=64,
use_double_buffer=True,
iterable=True)
batch_size_each_gpu = self._get_single_card_bs(batch_size)
generator = reader.generator(
batch_size=batch_size_each_gpu, drop_last=True)
self.train_data_loader.set_sample_list_generator(
reader.generator(batch_size=batch_size_each_gpu),
places=self.places)
def net_initialize(self,
startup_prog=None,
pretrain_weights=None,
fuse_bn=False,
save_dir='.',
sensitivities_file=None,
eval_metric_loss=0.05):
if hasattr(self, 'backbone'):
backbone = self.backbone
else:
backbone = self.__class__.__name__
pretrain_weights = get_pretrain_weights(pretrain_weights, backbone,
save_dir)
if startup_prog is None:
startup_prog = fluid.default_startup_program()
self.exe.run(startup_prog)
if pretrain_weights is not None:
logging.info(
"Load pretrain weights from {}.".format(pretrain_weights))
utils.utils.load_pretrain_weights(self.exe, self.train_prog,
pretrain_weights, fuse_bn)
# 进行裁剪
if sensitivities_file is not None:
from .slim.prune_config import get_sensitivities
sensitivities_file = get_sensitivities(sensitivities_file, self,
save_dir)
from .slim.prune import get_params_ratios, prune_program
prune_params_ratios = get_params_ratios(
sensitivities_file, eval_metric_loss=eval_metric_loss)
prune_program(self, prune_params_ratios)
self.status = 'Prune'
def get_model_info(self):
info = dict()
info['Model'] = self.__class__.__name__
info['_Attributes'] = {}
if 'self' in self.init_params:
del self.init_params['self']
if '__class__' in self.init_params:
del self.init_params['__class__']
info['_init_params'] = self.init_params
info['_Attributes']['num_classes'] = self.num_classes
info['_Attributes']['labels'] = self.labels
try:
primary_metric_key = list(self.eval_metrics.keys())[0]
primary_metric_value = float(self.eval_metrics[primary_metric_key])
info['_Attributes']['eval_metrics'] = {
primary_metric_key: primary_metric_value
}
except:
pass
if hasattr(self, 'test_transforms'):
if self.test_transforms is not None:
info['Transforms'] = list()
for op in self.test_transforms.transforms:
name = op.__class__.__name__
attr = op.__dict__
info['Transforms'].append({name: attr})
return info
def save_model(self, save_dir):
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
fluid.save(self.train_prog, osp.join(save_dir, 'model'))
model_info = self.get_model_info()
model_info['status'] = self.status
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# 评估结果保存
if hasattr(self, 'eval_details'):
with open(osp.join(save_dir, 'eval_details.json'), 'w') as f:
json.dump(self.eval_details, f)
if self.status == 'Prune':
# 保存裁剪的shape
shapes = {}
for block in self.train_prog.blocks:
for param in block.all_parameters():
pd_var = fluid.global_scope().find_var(param.name)
pd_param = pd_var.get_tensor()
shapes[param.name] = np.array(pd_param).shape
with open(
osp.join(save_dir, 'prune.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(shapes, f)
# 模型保存成功的标志
open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model saved in {}.".format(save_dir))
def train_loop(self,
num_epochs,
train_reader,
train_batch_size,
eval_reader=None,
eval_best_metric=None,
save_interval_epochs=1,
log_interval_steps=10,
save_dir='output',
use_vdl=False):
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
if use_vdl:
from visualdl import LogWriter
vdl_logdir = osp.join(save_dir, 'vdl_log')
# 给transform添加arrange操作
self.arrange_transforms(
transforms=train_reader.transforms, mode='train')
# 构建train_data_loader
self.build_train_data_loader(
reader=train_reader, batch_size=train_batch_size)
if eval_reader is not None:
self.eval_transforms = eval_reader.transforms
self.test_transforms = copy.deepcopy(eval_reader.transforms)
# 获取实时变化的learning rate
lr = self.optimizer._learning_rate
if isinstance(lr, fluid.framework.Variable):
self.train_outputs['lr'] = lr
# 在多卡上跑训练
if self.parallel_train_prog is None:
build_strategy = fluid.compiler.BuildStrategy()
build_strategy.fuse_all_optimizer_ops = False
if __init__.env_info['place'] != 'cpu' and len(self.places) > 1:
build_strategy.sync_batch_norm = self.sync_bn
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1
self.parallel_train_prog = fluid.CompiledProgram(
self.train_prog).with_data_parallel(
loss_name=self.train_outputs['loss'].name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
total_num_steps = math.floor(
train_reader.num_samples / train_batch_size)
num_steps = 0
time_stat = list()
if use_vdl:
# VisualDL component
log_writer = LogWriter(vdl_logdir)
best_accuracy = -1.0
best_model_epoch = 1
for i in range(num_epochs):
records = list()
step_start_time = time.time()
for step, data in enumerate(self.train_data_loader()):
outputs = self.exe.run(
self.parallel_train_prog,
feed=data,
fetch_list=list(self.train_outputs.values()))
outputs_avg = np.mean(np.array(outputs), axis=1)
records.append(outputs_avg)
# 训练完成剩余时间预估
current_time = time.time()
step_cost_time = current_time - step_start_time
step_start_time = current_time
if len(time_stat) < 20:
time_stat.append(step_cost_time)
else:
time_stat[num_steps % 20] = step_cost_time
eta = ((num_epochs - i) * total_num_steps - step -
1) * np.mean(time_stat)
eta_h = math.floor(eta / 3600)
eta_m = math.floor((eta - eta_h * 3600) / 60)
eta_s = int(eta - eta_h * 3600 - eta_m * 60)
eta_str = "{}:{}:{}".format(eta_h, eta_m, eta_s)
# 每间隔log_interval_steps,输出loss信息
num_steps += 1
if num_steps % log_interval_steps == 0:
step_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), outputs_avg))
if use_vdl:
for k, v in step_metrics.items():
log_writer.add_scalar(
tag="Training: {}".format(k),
value=v,
step=num_steps)
logging.info(
"[TRAIN] Epoch={}/{}, Step={}/{}, {}, eta={}".format(
i + 1, num_epochs, step + 1, total_num_steps,
dict2str(step_metrics), eta_str))
train_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), np.mean(records, axis=0)))
logging.info('[TRAIN] Epoch {} finished, {} .'.format(
i + 1, dict2str(train_metrics)))
# 每间隔save_interval_epochs, 在验证集上评估和对模型进行保存
if (i + 1) % save_interval_epochs == 0 or i == num_epochs - 1:
current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
if not osp.isdir(current_save_dir):
os.makedirs(current_save_dir)
if eval_reader is not None:
# 检测目前仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。
eval_batch_size = train_batch_size
self.eval_metrics, self.eval_details = self.evaluate(
eval_reader=eval_reader,
batch_size=eval_batch_size,
verbose=True,
epoch_id=i + 1,
return_details=True)
logging.info('[EVAL] Finished, Epoch={}, {} .'.format(
i + 1, dict2str(self.eval_metrics)))
# 保存最优模型
current_metric = self.eval_metrics[eval_best_metric]
if current_metric > best_accuracy:
best_accuracy = current_metric
best_model_epoch = i + 1
best_model_dir = osp.join(save_dir, "best_model")
self.save_model(save_dir=best_model_dir)
if use_vdl:
for k, v in self.eval_metrics.items():
if isinstance(v, list):
continue
if isinstance(v, np.ndarray):
if v.size > 1:
continue
log_writer.add_scalar(
tag="Evaluation: {}".format(k),
step=i + 1,
value=v)
self.save_model(save_dir=current_save_dir)
logging.info(
'Current evaluated best model in eval_reader is epoch_{}, {}={}'
.format(best_model_epoch, eval_best_metric, best_accuracy))
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import yaml
import os.path as osp
import six
import copy
from collections import OrderedDict
import paddle.fluid as fluid
from paddle.fluid.framework import Parameter
from utils import logging
import models
def load_model(model_dir):
if not osp.exists(osp.join(model_dir, "model.yml")):
raise Exception("There's not model.yml in {}".format(model_dir))
with open(osp.join(model_dir, "model.yml")) as f:
info = yaml.load(f.read(), Loader=yaml.Loader)
status = info['status']
if not hasattr(models, info['Model']):
raise Exception("There's no attribute {} in models".format(
info['Model']))
model = getattr(models, info['Model'])(**info['_init_params'])
if status == "Normal" or \
status == "Prune":
startup_prog = fluid.Program()
model.test_prog = fluid.Program()
with fluid.program_guard(model.test_prog, startup_prog):
with fluid.unique_name.guard():
model.test_inputs, model.test_outputs = model.build_net(
mode='test')
model.test_prog = model.test_prog.clone(for_test=True)
model.exe.run(startup_prog)
if status == "Prune":
from .slim.prune import update_program
model.test_prog = update_program(model.test_prog, model_dir,
model.places[0])
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
load_dict = pickle.load(f)
fluid.io.set_program_state(model.test_prog, load_dict)
elif status == "Infer" or \
status == "Quant":
[prog, input_names, outputs] = fluid.io.load_inference_model(
model_dir, model.exe, params_filename='__params__')
model.test_prog = prog
test_outputs_info = info['_ModelInputsOutputs']['test_outputs']
model.test_inputs = OrderedDict()
model.test_outputs = OrderedDict()
for name in input_names:
model.test_inputs[name] = model.test_prog.global_block().var(name)
for i, out in enumerate(outputs):
var_desc = test_outputs_info[i]
model.test_outputs[var_desc[0]] = out
if 'Transforms' in info:
model.test_transforms = build_transforms(info['Transforms'])
model.eval_transforms = copy.deepcopy(model.test_transforms)
if '_Attributes' in info:
for k, v in info['_Attributes'].items():
if k in model.__dict__:
model.__dict__[k] = v
logging.info("Model[{}] loaded.".format(info['Model']))
return model
def build_transforms(transforms_info):
from transforms import transforms as T
transforms = list()
for op_info in transforms_info:
op_name = list(op_info.keys())[0]
op_attr = op_info[op_name]
if not hasattr(T, op_name):
raise Exception(
"There's no operator named '{}' in transforms".format(op_name))
transforms.append(getattr(T, op_name)(**op_attr))
eval_transforms = T.Compose(transforms)
return eval_transforms
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import os.path as osp
import numpy as np
import math
import cv2
import paddle.fluid as fluid
import utils.logging as logging
from collections import OrderedDict
from .base import BaseAPI
from utils.metrics import ConfusionMatrix
import nets
class UNet(BaseAPI):
"""实现UNet网络的构建并进行训练、评估、预测和模型导出。
Args:
num_classes (int): 类别数。
upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes=2,
upsample_mode='bilinear',
input_channel=3,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
self.init_params = locals()
super(UNet, self).__init__()
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.upsample_mode = upsample_mode
self.input_channel = input_channel
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.labels = None
# 若模型是从inference model加载进来的,无法调用训练接口进行训练
self.trainable = True
def build_net(self, mode='train'):
model = nets.UNet(
self.num_classes,
mode=mode,
upsample_mode=self.upsample_mode,
input_channel=self.input_channel,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
elif mode == 'eval':
outputs['loss'] = model_out[0]
outputs['pred'] = model_out[1]
outputs['label'] = model_out[2]
outputs['mask'] = model_out[3]
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
def default_optimizer(self,
learning_rate,
num_epochs,
num_steps_each_epoch,
lr_decay_power=0.9):
decay_step = num_epochs * num_steps_each_epoch
lr_decay = fluid.layers.polynomial_decay(
learning_rate,
decay_step,
end_learning_rate=0,
power=lr_decay_power)
optimizer = fluid.optimizer.Momentum(
lr_decay,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=4e-05))
return optimizer
def train(self,
num_epochs,
train_reader,
train_batch_size=2,
eval_reader=None,
eval_best_metric='kappa',
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='COCO',
optimizer=None,
learning_rate=0.01,
lr_decay_power=0.9,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_reader (readers): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
eval_reader (readers): 边训边评估的评估数据读取器。
eval_best_metric (str): 边训边评估保存最好模型的指标。默认为'kappa'。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
save_dir (str): 模型保存路径。默认'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'COCO',
则自动下载在COCO图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'COCO'。
optimizer (paddle.fluid.optimizer): 优化器。当改参数为None时,使用默认的优化器:使用
fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
learning_rate (float): 默认优化器的初始学习率。默认0.01。
lr_decay_power (float): 默认优化器学习率多项式衰减系数。默认0.9。
use_vdl (bool): 是否使用VisualDL进行可视化。默认False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Raises:
ValueError: 模型从inference model进行加载。
"""
if not self.trainable:
raise ValueError(
"Model is not trainable since it was loaded from a inference model."
)
self.labels = train_reader.labels
if optimizer is None:
num_steps_each_epoch = train_reader.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
num_epochs=num_epochs,
num_steps_each_epoch=num_steps_each_epoch,
lr_decay_power=lr_decay_power)
self.optimizer = optimizer
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_reader=train_reader,
train_batch_size=train_batch_size,
eval_reader=eval_reader,
eval_best_metric=eval_best_metric,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_reader,
batch_size=1,
verbose=True,
epoch_id=None,
return_details=False):
"""评估。
Args:
eval_reader (readers): 评估数据读取器。
batch_size (int): 评估时的batch大小。默认1。
verbose (bool): 是否打印日志。默认True。
epoch_id (int): 当前评估模型所在的训练轮数。
return_details (bool): 是否返回详细信息。默认False。
Returns:
dict: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
tuple (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
包含关键字:'confusion_matrix',表示评估的混淆矩阵。
"""
self.arrange_transforms(transforms=eval_reader.transforms, mode='eval')
total_steps = math.ceil(eval_reader.num_samples * 1.0 / batch_size)
conf_mat = ConfusionMatrix(self.num_classes, streaming=True)
data_generator = eval_reader.generator(
batch_size=batch_size, drop_last=False)
if not hasattr(self, 'parallel_test_prog'):
self.parallel_test_prog = fluid.CompiledProgram(
self.test_prog).with_data_parallel(
share_vars_from=self.parallel_train_prog)
batch_size_each_gpu = self._get_single_card_bs(batch_size)
for step, data in enumerate(data_generator()):
images = np.array([d[0] for d in data])
images = images.astype(np.float32)
labels = np.array([d[1] for d in data])
num_samples = images.shape[0]
if num_samples < batch_size:
num_pad_samples = batch_size - num_samples
pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
images = np.concatenate([images, pad_images])
feed_data = {'image': images}
outputs = self.exe.run(
self.parallel_test_prog,
feed=feed_data,
fetch_list=list(self.test_outputs.values()),
return_numpy=True)
pred = outputs[0]
if num_samples < batch_size:
pred = pred[0:num_samples]
mask = labels != self.ignore_index
conf_mat.calculate(pred=pred, label=labels, ignore=mask)
_, iou = conf_mat.mean_iou()
if verbose:
logging.info("[EVAL] Epoch={}, Step={}/{}, iou={}".format(
epoch_id, step + 1, total_steps, iou))
category_iou, miou = conf_mat.mean_iou()
category_acc, macc = conf_mat.accuracy()
metrics = OrderedDict(
zip(['miou', 'category_iou', 'macc', 'category_acc', 'kappa'],
[miou, category_iou, macc, category_acc,
conf_mat.kappa()]))
if return_details:
eval_details = {
'confusion_matrix': conf_mat.confusion_matrix.tolist()
}
return metrics, eval_details
return metrics
def predict(self, im_file, transforms=None):
"""预测。
Args:
img_file(str): 预测图像路径。
transforms(transforms): 数据预处理操作。
Returns:
np.ndarray: 预测结果灰度图。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im, im_info = transforms(im_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im, im_info = self.test_transforms(im_file)
im = im.astype(np.float32)
im = np.expand_dims(im, axis=0)
result = self.exe.run(
self.test_prog,
feed={'image': im},
fetch_list=list(self.test_outputs.values()))
pred = result[0]
pred = np.squeeze(pred).astype(np.uint8)
keys = list(im_info.keys())
for k in keys[::-1]:
if k == 'shape_before_resize':
h, w = im_info[k][0], im_info[k][1]
pred = cv2.resize(pred, (w, h), cv2.INTER_NEAREST)
elif k == 'shape_before_padding':
h, w = im_info[k][0], im_info[k][1]
pred = pred[0:h, 0:w]
return pred
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import contextlib
bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
name_scope = ""
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def max_pool(input, kernel, stride, padding):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='max',
pool_stride=stride,
pool_padding=padding)
return data
def avg_pool(input, kernel, stride, padding=0):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='avg',
pool_stride=stride,
pool_padding=padding)
return data
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape
if C % G != 0:
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
break
assert C % G == 0, "group can not divide channle"
x = fluid.layers.group_norm(
input,
groups=G,
param_attr=param_attr,
bias_attr=bias_attr,
name=name_scope + 'group_norm')
return x
def bn(*args,
norm_type='bn',
eps=1e-5,
bn_momentum=0.99,
group_norm=32,
**kargs):
if norm_type == 'bn':
with scope('BatchNorm'):
return fluid.layers.batch_norm(
*args,
epsilon=eps,
momentum=bn_momentum,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
elif norm_type == 'gn':
with scope('GroupNorm'):
return group_norm(
args[0],
group_norm,
eps=eps,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer))
else:
raise Exception("Unsupport norm type:" + norm_type)
def bn_relu(data, norm_type='bn', eps=1e-5):
return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
def relu(data):
return fluid.layers.relu(data)
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = fluid.ParamAttr(
name=name_scope + 'biases',
regularizer=None,
initializer=fluid.initializer.ConstantInitializer(value=0.0))
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d(*args, **kargs)
def deconv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d_transpose(*args, **kargs)
def separate_conv(input,
channel,
stride,
filter,
dilation=1,
act=None,
eps=1e-5):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter // 2) * dilation,
dilation=dilation,
use_cudnn=False,
param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
return input
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=fluid.ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def sigmoid_to_softmax(input):
"""
one channel to two channel
"""
logit = fluid.layers.sigmoid(input)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=1)
return logit
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
import numpy as np
def softmax_with_loss(logit,
label,
ignore_mask=None,
num_classes=2,
weight=None,
ignore_index=255):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
if weight is None:
loss, probs = fluid.layers.softmax_with_cross_entropy(
logit, label, ignore_index=ignore_index, return_softmax=True)
else:
label_one_hot = fluid.one_hot(input=label, depth=num_classes)
if isinstance(weight, list):
assert len(
weight
) == num_classes, "weight length must equal num of classes"
weight = fluid.layers.assign(np.array([weight], dtype='float32'))
elif isinstance(weight, str):
assert weight.lower(
) == 'dynamic', 'if weight is string, must be dynamic!'
tmp = []
total_num = fluid.layers.cast(
fluid.layers.shape(label)[0], 'float32')
for i in range(num_classes):
cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
ratio = total_num / (cls_pixel_num + 1)
tmp.append(ratio)
weight = fluid.layers.concat(tmp)
weight = weight / fluid.layers.reduce_sum(weight) * num_classes
elif isinstance(weight, fluid.layers.Variable):
pass
else:
raise ValueError(
'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes])
weighted_label_one_hot = fluid.layers.elementwise_mul(
label_one_hot, weight)
probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy(
probs,
weighted_label_one_hot,
soft_label=True,
ignore_index=ignore_index)
weighted_label_one_hot.stop_gradient = True
loss = loss * ignore_mask
avg_loss = fluid.layers.mean(loss) / (
fluid.layers.mean(ignore_mask) + 0.00001)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception(
"dice loss is only applicable to one channel classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
label = fluid.layers.transpose(label, [0, 2, 3, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit = logit * ignore_mask
label = label * ignore_mask
reduce_dim = list(range(1, len(logit.shape)))
inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
dice_denominator = fluid.layers.reduce_sum(
logit, dim=reduce_dim) + fluid.layers.reduce_sum(
label, dim=reduce_dim)
dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
def bce_loss(logit, label, ignore_mask=None, ignore_index=255):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
label = fluid.layers.cast(label, 'float32')
loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=logit, label=label, ignore_index=ignore_index,
normalize=True) # or False
loss = fluid.layers.reduce_sum(loss)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return loss
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from .libs import scope, name_scope
from .libs import bn, bn_relu, relu
from .libs import conv, max_pool, deconv
from .libs import sigmoid_to_softmax
from .loss import softmax_with_loss
from .loss import dice_loss
from .loss import bce_loss
class UNet(object):
"""实现Unet模型
`"U-Net: Convolutional Networks for Biomedical Image Segmentation"
<https://arxiv.org/abs/1505.04597>`
Args:
num_classes (int): 类别数
mode (str): 网络运行模式,根据mode构建网络的输入和返回。
当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
与label相同大小,bool类型)。
当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes,
mode='train',
upsample_mode='bilinear',
input_channel=3,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise Exception(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.mode = mode
self.upsample_mode = upsample_mode
self.input_channel = input_channel
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
def _double_conv(self, data, out_ch):
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope("conv0"):
data = bn_relu(
conv(
data, out_ch, 3, stride=1, padding=1,
param_attr=param_attr))
with scope("conv1"):
data = bn_relu(
conv(
data, out_ch, 3, stride=1, padding=1,
param_attr=param_attr))
return data
def _down(self, data, out_ch):
# 下采样:max_pool + 2个卷积
with scope("down"):
data = max_pool(data, 2, 2, 0)
data = self._double_conv(data, out_ch)
return data
def _up(self, data, short_cut, out_ch):
# 上采样:data上采样(resize或deconv), 并与short_cut concat
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.XavierInitializer(),
)
with scope("up"):
if self.upsample_mode == 'bilinear':
short_cut_shape = fluid.layers.shape(short_cut)
data = fluid.layers.resize_bilinear(data, short_cut_shape[2:])
else:
data = deconv(
data,
out_ch // 2,
filter_size=2,
stride=2,
padding=0,
param_attr=param_attr)
data = fluid.layers.concat([data, short_cut], axis=1)
data = self._double_conv(data, out_ch)
return data
def _encode(self, data):
# 编码器设置
short_cuts = []
with scope("encode"):
with scope("block1"):
data = self._double_conv(data, 64)
short_cuts.append(data)
with scope("block2"):
data = self._down(data, 128)
short_cuts.append(data)
with scope("block3"):
data = self._down(data, 256)
short_cuts.append(data)
with scope("block4"):
data = self._down(data, 512)
short_cuts.append(data)
with scope("block5"):
data = self._down(data, 512)
return data, short_cuts
def _decode(self, data, short_cuts):
# 解码器设置,与编码器对称
with scope("decode"):
with scope("decode1"):
data = self._up(data, short_cuts[3], 256)
with scope("decode2"):
data = self._up(data, short_cuts[2], 128)
with scope("decode3"):
data = self._up(data, short_cuts[1], 64)
with scope("decode4"):
data = self._up(data, short_cuts[0], 64)
return data
def _get_logit(self, data, num_classes):
# 根据类别数设置最后一个卷积层输出
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope("logit"):
data = conv(
data,
num_classes,
3,
stride=1,
padding=1,
param_attr=param_attr)
return data
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32',
shape=[None, self.input_channel, None, None],
name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def build_net(self, inputs):
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if self.use_dice_loss or self.use_bce_loss:
self.num_classes = 1
image = inputs['image']
encode_data, short_cuts = self._encode(image)
decode_data = self._decode(encode_data, short_cuts)
logit = self._get_logit(decode_data, self.num_classes)
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
elif self.mode == 'eval':
label = inputs['label']
mask = label != self.ignore_index
loss = self._get_loss(logit, label, mask)
return loss, pred, label, mask
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
import os
import os.path as osp
import numpy as np
from PIL import Image as Image
import argparse
from models import load_model
def parse_args():
parser = argparse.ArgumentParser(description='RemoteSensing predict')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='dataset directory',
default=None,
type=str)
parser.add_argument(
'--load_model_dir',
dest='load_model_dir',
help='model load directory',
default=None,
type=str)
return parser.parse_args()
args = parse_args()
data_dir = args.data_dir
load_model_dir = args.load_model_dir
# predict
model = load_model(load_model_dir)
pred_dir = osp.join(load_model_dir, 'predict')
if not osp.exists(pred_dir):
os.mkdir(pred_dir)
val_list = osp.join(data_dir, 'val.txt')
color_map = [0, 0, 0, 255, 255, 255]
with open(val_list) as f:
lines = f.readlines()
for line in lines:
img_path = line.split(' ')[0]
print('Predicting {}'.format(img_path))
img_path_ = osp.join(data_dir, img_path)
pred = model.predict(img_path_)
# 以伪彩色png图片保存预测结果
pred_name = osp.basename(img_path).rstrip('npy') + 'png'
pred_path = osp.join(pred_dir, pred_name)
pred_mask = Image.fromarray(pred.astype(np.uint8), mode='P')
pred_mask.putpalette(color_map)
pred_mask.save(pred_path)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .reader import Reader
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from threading import Thread
import multiprocessing
import collections
import numpy as np
import six
import sys
import copy
import random
import platform
import chardet
from utils import logging
class EndSignal():
pass
def is_pic(img_name):
valid_suffix = ['JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png']
suffix = img_name.split('.')[-1]
if suffix not in valid_suffix:
return False
return True
def is_valid(sample):
if sample is None:
return False
if isinstance(sample, tuple):
for s in sample:
if s is None:
return False
elif isinstance(s, np.ndarray) and s.size == 0:
return False
elif isinstance(s, collections.Sequence) and len(s) == 0:
return False
return True
def get_encoding(path):
f = open(path, 'rb')
data = f.read()
file_encoding = chardet.detect(data).get('encoding')
return file_encoding
def multithread_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from queue import Queue
end = EndSignal()
# define a worker to read samples from reader to in_queue
def read_worker(reader, in_queue):
for i in reader():
in_queue.put(i)
in_queue.put(end)
# define a worker to handle samples from in_queue by mapper
# and put mapped samples into out_queue
def handle_worker(in_queue, out_queue, mapper):
sample = in_queue.get()
while not isinstance(sample, EndSignal):
if len(sample) == 2:
r = mapper(sample[0], sample[1])
elif len(sample) == 3:
r = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(r):
out_queue.put(r)
sample = in_queue.get()
in_queue.put(end)
out_queue.put(end)
def xreader():
in_queue = Queue(buffer_size)
out_queue = Queue(buffer_size)
# start a read worker in a thread
target = read_worker
t = Thread(target=target, args=(reader, in_queue))
t.daemon = True
t.start()
# start several handle_workers
target = handle_worker
args = (in_queue, out_queue, mapper)
workers = []
for i in range(num_workers):
worker = Thread(target=target, args=args)
worker.daemon = True
workers.append(worker)
for w in workers:
w.start()
batch_data = []
sample = out_queue.get()
while not isinstance(sample, EndSignal):
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
sample = out_queue.get()
finish = 1
while finish < num_workers:
sample = out_queue.get()
if isinstance(sample, EndSignal):
finish += 1
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
if not drop_last and len(batch_data) != 0:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
return xreader
def multiprocess_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from .shared_queue import SharedQueue as Queue
def _read_into_queue(samples, mapper, queue):
end = EndSignal()
try:
for sample in samples:
if sample is None:
raise ValueError("sample has None")
if len(sample) == 2:
result = mapper(sample[0], sample[1])
elif len(sample) == 3:
result = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(result):
queue.put(result)
queue.put(end)
except:
queue.put("")
six.reraise(*sys.exc_info())
def queue_reader():
queue = Queue(buffer_size, memsize=3 * 1024**3)
total_samples = [[] for i in range(num_workers)]
for i, sample in enumerate(reader()):
index = i % num_workers
total_samples[index].append(sample)
for i in range(num_workers):
p = multiprocessing.Process(
target=_read_into_queue, args=(total_samples[i], mapper, queue))
p.start()
finish_num = 0
batch_data = list()
while finish_num < num_workers:
sample = queue.get()
if isinstance(sample, EndSignal):
finish_num += 1
elif sample == "":
raise ValueError("multiprocess reader raises an exception")
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
if len(batch_data) != 0 and not drop_last:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
return queue_reader
def GenerateMiniBatch(batch_data):
if len(batch_data) == 1:
return batch_data
width = [data[0].shape[2] for data in batch_data]
height = [data[0].shape[1] for data in batch_data]
if len(set(width)) == 1 and len(set(height)) == 1:
return batch_data
max_shape = np.array([data[0].shape for data in batch_data]).max(axis=0)
padding_batch = []
for data in batch_data:
im_c, im_h, im_w = data[0].shape[:]
padding_im = np.zeros((im_c, max_shape[1], max_shape[2]),
dtype=np.float32)
padding_im[:, :im_h, :im_w] = data[0]
padding_batch.append((padding_im, ) + data[1:])
return padding_batch
class BaseReader:
def __init__(self,
transforms=None,
num_workers=4,
buffer_size=100,
parallel_method='thread',
shuffle=False):
if transforms is None:
raise Exception("transform should be defined.")
self.transforms = transforms
self.num_workers = num_workers
self.buffer_size = buffer_size
self.parallel_method = parallel_method
self.shuffle = shuffle
def generator(self, batch_size=1, drop_last=True):
self.batch_size = batch_size
parallel_reader = multithread_reader
if self.parallel_method == "process":
if platform.platform().startswith("Windows"):
logging.debug(
"multiprocess_reader is not supported in Windows platform, force to use multithread_reader."
)
else:
parallel_reader = multiprocess_reader
return parallel_reader(
self.transforms,
self.iterator,
num_workers=self.num_workers,
buffer_size=self.buffer_size,
batch_size=batch_size,
drop_last=drop_last)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import os.path as osp
import random
from utils import logging
from .base import BaseReader
from .base import get_encoding
from collections import OrderedDict
from .base import is_pic
class Reader(BaseReader):
"""读取语分分割任务数据集,并对样本进行相应的处理。
Args:
data_dir (str): 数据集所在的目录路径。
file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
label_list (str): 描述数据集包含的类别信息文件路径。
transforms (list): 数据集中每个样本的预处理/增强算子。
num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
线程和'process'进程两种方式。默认为'thread'。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
def __init__(self,
data_dir,
file_list,
label_list,
transforms=None,
num_workers=4,
buffer_size=100,
parallel_method='thread',
shuffle=False):
super(Reader, self).__init__(
transforms=transforms,
num_workers=num_workers,
buffer_size=buffer_size,
parallel_method=parallel_method,
shuffle=shuffle)
self.file_list = OrderedDict()
self.labels = list()
self._epoch = 0
with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f:
item = line.strip()
self.labels.append(item)
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im):
raise IOError(
'The image file {} is not exist!'.format(full_path_im))
if not osp.exists(full_path_label):
raise IOError('The image file {} is not exist!'.format(
full_path_label))
self.file_list[full_path_im] = full_path_label
self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format(
len(self.file_list), file_list))
def iterator(self):
self._epoch += 1
self._pos = 0
files = list(self.file_list.keys())
if self.shuffle:
random.shuffle(files)
files = files[:self.num_samples]
self.num_samples = len(files)
for f in files:
label_path = self.file_list[f]
sample = [f, None, label_path]
yield sample
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import glob
import os.path
import argparse
import warnings
def parse_args():
parser = argparse.ArgumentParser(
description=
'A tool for dividing dataset and generating file lists by file directory structure.'
)
parser.add_argument('dataset_root', help='dataset root directory', type=str)
parser.add_argument(
'--separator',
dest='separator',
help='file list separator',
default=" ",
type=str)
parser.add_argument(
'--folder',
help='the folder names of images and labels',
type=str,
nargs=2,
default=['images', 'annotations'])
parser.add_argument(
'--second_folder',
help=
'the second-level folder names of train set, validation set, test set',
type=str,
nargs='*',
default=['train', 'val', 'test'])
parser.add_argument(
'--format',
help='data format of images and labels, default npy, png.',
type=str,
nargs=2,
default=['npy', 'png'])
parser.add_argument(
'--label_class',
help='label class names',
type=str,
nargs='*',
default=['__background__', '__foreground__'])
parser.add_argument(
'--postfix',
help='postfix of images or labels',
type=str,
nargs=2,
default=['', ''])
return parser.parse_args()
def get_files(image_or_label, dataset_split, args):
dataset_root = args.dataset_root
postfix = args.postfix
format = args.format
folder = args.folder
pattern = '*%s.%s' % (postfix[image_or_label], format[image_or_label])
search_files = os.path.join(dataset_root, folder[image_or_label],
dataset_split, pattern)
search_files2 = os.path.join(dataset_root, folder[image_or_label],
dataset_split, "*", pattern) # 包含子目录
search_files3 = os.path.join(dataset_root, folder[image_or_label],
dataset_split, "*", "*", pattern) # 包含三级目录
filenames = glob.glob(search_files)
filenames2 = glob.glob(search_files2)
filenames3 = glob.glob(search_files3)
filenames = filenames + filenames2 + filenames3
return sorted(filenames)
def generate_list(args):
dataset_root = args.dataset_root
separator = args.separator
file_list = os.path.join(dataset_root, 'labels.txt')
with open(file_list, "w") as f:
for label_class in args.label_class:
f.write(label_class + '\n')
for dataset_split in args.second_folder:
print("Creating {}.txt...".format(dataset_split))
image_files = get_files(0, dataset_split, args)
label_files = get_files(1, dataset_split, args)
if not image_files:
img_dir = os.path.join(dataset_root, args.folder[0], dataset_split)
warnings.warn("No images in {} !!!".format(img_dir))
num_images = len(image_files)
if not label_files:
label_dir = os.path.join(dataset_root, args.folder[1],
dataset_split)
warnings.warn("No labels in {} !!!".format(label_dir))
num_label = len(label_files)
if num_images != num_label and num_label > 0:
raise Exception(
"Number of images = {} number of labels = {} \n"
"Either number of images is equal to number of labels, "
"or number of labels is equal to 0.\n"
"Please check your dataset!".format(num_images, num_label))
file_list = os.path.join(dataset_root, dataset_split + '.txt')
with open(file_list, "w") as f:
for item in range(num_images):
left = image_files[item].replace(dataset_root, '')
if left[0] == os.path.sep:
left = left.lstrip(os.path.sep)
try:
right = label_files[item].replace(dataset_root, '')
if right[0] == os.path.sep:
right = right.lstrip(os.path.sep)
line = left + separator + right + '\n'
except:
line = left + '\n'
f.write(line)
print(line)
if __name__ == '__main__':
args = parse_args()
generate_list(args)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import glob
import os.path
import argparse
import warnings
import numpy as np
def parse_args():
parser = argparse.ArgumentParser(
description=
'A tool for proportionally randomizing dataset to produce file lists.')
parser.add_argument('dataset_root', help='the dataset root path', type=str)
parser.add_argument('images', help='the directory name of images', type=str)
parser.add_argument('labels', help='the directory name of labels', type=str)
parser.add_argument(
'--split', help='', nargs=3, type=float, default=[0.7, 0.3, 0])
parser.add_argument(
'--label_class',
help='label class names',
type=str,
nargs='*',
default=['__background__', '__foreground__'])
parser.add_argument(
'--separator',
dest='separator',
help='file list separator',
default=" ",
type=str)
parser.add_argument(
'--format',
help='data format of images and labels, e.g. jpg, npy or png.',
type=str,
nargs=2,
default=['npy', 'png'])
parser.add_argument(
'--postfix',
help='postfix of images or labels',
type=str,
nargs=2,
default=['', ''])
return parser.parse_args()
def get_files(path, format, postfix):
pattern = '*%s.%s' % (postfix, format)
search_files = os.path.join(path, pattern)
search_files2 = os.path.join(path, "*", pattern) # 包含子目录
search_files3 = os.path.join(path, "*", "*", pattern) # 包含三级目录
filenames = glob.glob(search_files)
filenames2 = glob.glob(search_files2)
filenames3 = glob.glob(search_files3)
filenames = filenames + filenames2 + filenames3
return sorted(filenames)
def generate_list(args):
separator = args.separator
dataset_root = args.dataset_root
if sum(args.split) != 1.0:
raise ValueError("划分比例之和必须为1")
file_list = os.path.join(dataset_root, 'labels.txt')
with open(file_list, "w") as f:
for label_class in args.label_class:
f.write(label_class + '\n')
image_dir = os.path.join(dataset_root, args.images)
label_dir = os.path.join(dataset_root, args.labels)
image_files = get_files(image_dir, args.format[0], args.postfix[0])
label_files = get_files(label_dir, args.format[1], args.postfix[1])
if not image_files:
warnings.warn("No files in {}".format(image_dir))
num_images = len(image_files)
if not label_files:
warnings.warn("No files in {}".format(label_dir))
num_label = len(label_files)
if num_images != num_label and num_label > 0:
raise Exception("Number of images = {} number of labels = {} \n"
"Either number of images is equal to number of labels, "
"or number of labels is equal to 0.\n"
"Please check your dataset!".format(
num_images, num_label))
image_files = np.array(image_files)
label_files = np.array(label_files)
state = np.random.get_state()
np.random.shuffle(image_files)
np.random.set_state(state)
np.random.shuffle(label_files)
start = 0
num_split = len(args.split)
dataset_name = ['train', 'val', 'test']
for i in range(num_split):
dataset_split = dataset_name[i]
print("Creating {}.txt...".format(dataset_split))
if args.split[i] > 1.0 or args.split[i] < 0:
raise ValueError(
"{} dataset percentage should be 0~1.".format(dataset_split))
file_list = os.path.join(dataset_root, dataset_split + '.txt')
with open(file_list, "w") as f:
num = round(args.split[i] * num_images)
end = start + num
if i == num_split - 1:
end = num_images
for item in range(start, end):
left = image_files[item].replace(dataset_root, '')
if left[0] == os.path.sep:
left = left.lstrip(os.path.sep)
try:
right = label_files[item].replace(dataset_root, '')
if right[0] == os.path.sep:
right = right.lstrip(os.path.sep)
line = left + separator + right + '\n'
except:
line = left + '\n'
f.write(line)
print(line)
start = end
if __name__ == '__main__':
args = parse_args()
generate_list(args)
import os.path as osp
import argparse
import transforms.transforms as T
from readers.reader import Reader
from models import UNet
def parse_args():
parser = argparse.ArgumentParser(description='RemoteSensing training')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='dataset directory',
default=None,
type=str)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='model save directory',
default=None,
type=str)
parser.add_argument(
'--channel',
dest='channel',
help='number of data channel',
default=3,
type=int)
parser.add_argument(
'--num_epochs',
dest='num_epochs',
help='number of traing epochs',
default=100,
type=int)
parser.add_argument(
'--train_batch_size',
dest='train_batch_size',
help='training batch size',
default=4,
type=int)
parser.add_argument(
'--lr', dest='lr', help='learning rate', default=0.01, type=float)
return parser.parse_args()
args = parse_args()
data_dir = args.data_dir
save_dir = args.save_dir
channel = args.channel
num_epochs = args.num_epochs
train_batch_size = args.train_batch_size
lr = args.lr
# 定义训练和验证时的transforms
train_transforms = T.Compose([
T.RandomVerticalFlip(0.5),
T.RandomHorizontalFlip(0.5),
T.ResizeStepScaling(0.5, 2.0, 0.25),
T.RandomPaddingCrop(256),
T.Normalize(mean=[0.5] * channel, std=[0.5] * channel),
])
eval_transforms = T.Compose([
T.Normalize(mean=[0.5] * channel, std=[0.5] * channel),
])
train_list = osp.join(data_dir, 'train.txt')
val_list = osp.join(data_dir, 'val.txt')
label_list = osp.join(data_dir, 'labels.txt')
# 定义数据读取器
train_reader = Reader(
data_dir=data_dir,
file_list=train_list,
label_list=label_list,
transforms=train_transforms,
num_workers=8,
buffer_size=16,
shuffle=True,
parallel_method='thread')
eval_reader = Reader(
data_dir=data_dir,
file_list=val_list,
label_list=label_list,
transforms=eval_transforms,
num_workers=8,
buffer_size=16,
shuffle=False,
parallel_method='thread')
model = UNet(
num_classes=2, input_channel=channel, use_bce_loss=True, use_dice_loss=True)
model.train(
num_epochs=num_epochs,
train_reader=train_reader,
train_batch_size=train_batch_size,
eval_reader=eval_reader,
save_interval_epochs=5,
log_interval_steps=10,
save_dir=save_dir,
pretrain_weights=None,
optimizer=None,
learning_rate=lr,
use_vdl=True)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import transforms
from . import ops
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import math
import numpy as np
from PIL import Image, ImageEnhance
def normalize(im, min_value, max_value, mean, std):
# Rescaling (min-max normalization)
range_value = [max_value[i] - min_value[i] for i in range(len(max_value))]
im = (im.astype(np.float32, copy=False) - min_value) / range_value
# Standardization (Z-score Normalization)
im -= mean
im /= std
return im
def permute(im, to_bgr=False):
im = np.swapaxes(im, 1, 2)
im = np.swapaxes(im, 1, 0)
if to_bgr:
im = im[[2, 1, 0], :, :]
return im
def _resize(im, shape):
return cv2.resize(im, shape)
def resize_short(im, short_size=224):
percent = float(short_size) / min(im.shape[0], im.shape[1])
resized_width = int(round(im.shape[1] * percent))
resized_height = int(round(im.shape[0] * percent))
im = _resize(im, shape=(resized_width, resized_height))
return im
def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR):
value = max(im.shape[0], im.shape[1])
scale = float(long_size) / float(value)
im = cv2.resize(im, (0, 0), fx=scale, fy=scale, interpolation=interpolation)
return im
def random_crop(im,
crop_size=224,
lower_scale=0.08,
lower_ratio=3. / 4,
upper_ratio=4. / 3):
scale = [lower_scale, 1.0]
ratio = [lower_ratio, upper_ratio]
aspect_ratio = math.sqrt(np.random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
bound = min((float(im.shape[0]) / im.shape[1]) / (h**2),
(float(im.shape[1]) / im.shape[0]) / (w**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = im.shape[0] * im.shape[1] * np.random.uniform(
scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = np.random.randint(0, im.shape[0] - h + 1)
j = np.random.randint(0, im.shape[1] - w + 1)
im = im[i:i + h, j:j + w, :]
im = _resize(im, shape=(crop_size, crop_size))
return im
def center_crop(im, crop_size=224):
height, width = im.shape[:2]
w_start = (width - crop_size) // 2
h_start = (height - crop_size) // 2
w_end = w_start + crop_size
h_end = h_start + crop_size
im = im[h_start:h_end, w_start:w_end, :]
return im
def horizontal_flip(im):
if len(im.shape) == 3:
im = im[:, ::-1, :]
elif len(im.shape) == 2:
im = im[:, ::-1]
return im
def vertical_flip(im):
if len(im.shape) == 3:
im = im[::-1, :, :]
elif len(im.shape) == 2:
im = im[::-1, :]
return im
def bgr2rgb(im):
return im[:, :, ::-1]
def brightness(im, brightness_lower, brightness_upper):
brightness_delta = np.random.uniform(brightness_lower, brightness_upper)
im = ImageEnhance.Brightness(im).enhance(brightness_delta)
return im
def contrast(im, contrast_lower, contrast_upper):
contrast_delta = np.random.uniform(contrast_lower, contrast_upper)
im = ImageEnhance.Contrast(im).enhance(contrast_delta)
return im
def saturation(im, saturation_lower, saturation_upper):
saturation_delta = np.random.uniform(saturation_lower, saturation_upper)
im = ImageEnhance.Color(im).enhance(saturation_delta)
return im
def hue(im, hue_lower, hue_upper):
hue_delta = np.random.uniform(hue_lower, hue_upper)
im = np.array(im.convert('HSV'))
im[:, :, 0] = im[:, :, 0] + hue_delta
im = Image.fromarray(im, mode='HSV').convert('RGB')
return im
def rotate(im, rotate_lower, rotate_upper):
rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
im = im.rotate(int(rotate_delta))
return im
def resize_padding(im, max_side_len=2400):
'''
resize image to a size multiple of 32 which is required by the network
:param im: the resized image
:param max_side_len: limit of max image size to avoid out of memory in gpu
:return: the resized image and the resize ratio
'''
h, w, _ = im.shape
resize_w = w
resize_h = h
# limit the max side
if max(resize_h, resize_w) > max_side_len:
ratio = float(
max_side_len) / resize_h if resize_h > resize_w else float(
max_side_len) / resize_w
else:
ratio = 1.
resize_h = int(resize_h * ratio)
resize_w = int(resize_w * ratio)
resize_h = resize_h if resize_h % 32 == 0 else (resize_h // 32 - 1) * 32
resize_w = resize_w if resize_w % 32 == 0 else (resize_w // 32 - 1) * 32
resize_h = max(32, resize_h)
resize_w = max(32, resize_w)
im = cv2.resize(im, (int(resize_w), int(resize_h)))
#im = cv2.resize(im, (512, 512))
ratio_h = resize_h / float(h)
ratio_w = resize_w / float(w)
_ratio = np.array([ratio_h, ratio_w]).reshape(-1, 2)
return im, _ratio
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .ops import *
import random
import os.path as osp
import numpy as np
from PIL import Image
import cv2
from collections import OrderedDict
class Compose:
"""根据数据预处理/增强算子对输入数据进行操作。
所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
Args:
transforms (list): 数据预处理/增强算子。
Raises:
TypeError: transforms不是list对象
ValueError: transforms元素个数小于1。
"""
def __init__(self, transforms):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
if len(transforms) < 1:
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息,dict中的字段如下:
- shape_before_resize (tuple): 图像resize之前的大小(h, w)。
- shape_before_padding (tuple): 图像padding之前的大小(h, w)。
label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。
Returns:
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
"""
if im_info is None:
im_info = dict()
im = np.load(im)
if im is None:
raise ValueError('Can\'t read The image file {}!'.format(im))
if label is not None:
label = np.asarray(Image.open(label))
for op in self.transforms:
outputs = op(im, im_info, label)
im = outputs[0]
if len(outputs) >= 2:
im_info = outputs[1]
if len(outputs) == 3:
label = outputs[2]
return outputs
class RandomHorizontalFlip:
"""以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机水平翻转的概率。默认值为0.5。
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = horizontal_flip(im)
if label is not None:
label = horizontal_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomVerticalFlip:
"""以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机垂直翻转的概率。默认值为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = vertical_flip(im)
if label is not None:
label = vertical_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Resize:
"""调整图像大小(resize),当存在标注图像时,则同步进行处理。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
Args:
target_size (int/list/tuple): 目标大小
interp (str): resize的插值方式,与opencv的插值方式对应,
可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
Raises:
TypeError: target_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
AssertionError: interp的取值不在['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4']之内
"""
# The interpolation mode
interp_dict = {
'NEAREST': cv2.INTER_NEAREST,
'LINEAR': cv2.INTER_LINEAR,
'CUBIC': cv2.INTER_CUBIC,
'AREA': cv2.INTER_AREA,
'LANCZOS4': cv2.INTER_LANCZOS4
}
def __init__(self, target_size, interp='LINEAR'):
self.interp = interp
assert interp in self.interp_dict, "interp should be one of {}".format(
self.interp_dict.keys())
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise ValueError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info跟新字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
Raises:
ZeroDivisionError: im的短边为0。
TypeError: im不是np.ndarray数据。
ValueError: im不是3维nd.ndarray。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
if not isinstance(im, np.ndarray):
raise TypeError("ResizeImage: image type is not np.ndarray.")
if len(im.shape) != 3:
raise ValueError('ResizeImage: image is not 3-dimensional.')
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
if float(im_size_min) == 0:
raise ZeroDivisionError('ResizeImage: min size of image is 0')
if isinstance(self.target_size, int):
resize_w = self.target_size
resize_h = self.target_size
else:
resize_w = self.target_size[0]
resize_h = self.target_size[1]
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp_dict[self.interp])
if label is not None:
label = cv2.resize(
label,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp_dict['NEAREST'])
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeByLong:
"""对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
long_size (int): resize后图像的长边大小。
"""
def __init__(self, long_size):
self.long_size = long_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
im = resize_long(im, self.long_size)
if label is not None:
label = resize_long(label, self.long_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeRangeScaling:
"""对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
min_value (int): 图像长边resize后的最小值。默认值400。
max_value (int): 图像长边resize后的最大值。默认值600。
Raises:
ValueError: min_value大于max_value
"""
def __init__(self, min_value=400, max_value=600):
if min_value > max_value:
raise ValueError('min_value must be less than max_value, '
'but they are {} and {}.'.format(
min_value, max_value))
self.min_value = min_value
self.max_value = max_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_value == self.max_value:
random_size = self.max_value
else:
random_size = int(
np.random.uniform(self.min_value, self.max_value) + 0.5)
value = max(im.shape[0], im.shape[1])
scale = float(random_size) / float(value)
im = cv2.resize(
im, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
if label is not None:
label = cv2.resize(
label, (0, 0),
fx=scale,
fy=scale,
interpolation=cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeStepScaling:
"""对图像按照某一个比例resize,这个比例以scale_step_size为步长
在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。
Args:
min_scale_factor(float), resize最小尺度。默认值0.75。
max_scale_factor (float), resize最大尺度。默认值1.25。
scale_step_size (float), resize尺度范围间隔。默认值0.25。
Raises:
ValueError: min_scale_factor大于max_scale_factor
"""
def __init__(self,
min_scale_factor=0.75,
max_scale_factor=1.25,
scale_step_size=0.25):
if min_scale_factor > max_scale_factor:
raise ValueError(
'min_scale_factor must be less than max_scale_factor, '
'but they are {} and {}.'.format(min_scale_factor,
max_scale_factor))
self.min_scale_factor = min_scale_factor
self.max_scale_factor = max_scale_factor
self.scale_step_size = scale_step_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale_factor == self.max_scale_factor:
scale_factor = self.min_scale_factor
elif self.scale_step_size == 0:
scale_factor = np.random.uniform(self.min_scale_factor,
self.max_scale_factor)
else:
num_steps = int((self.max_scale_factor - self.min_scale_factor) /
self.scale_step_size + 1)
scale_factors = np.linspace(self.min_scale_factor,
self.max_scale_factor,
num_steps).tolist()
np.random.shuffle(scale_factors)
scale_factor = scale_factors[0]
im = cv2.resize(
im, (0, 0),
fx=scale_factor,
fy=scale_factor,
interpolation=cv2.INTER_LINEAR)
if label is not None:
label = cv2.resize(
label, (0, 0),
fx=scale_factor,
fy=scale_factor,
interpolation=cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Clip:
"""
对图像上超出一定范围的数据进行裁剪。
Args:
min_val (list): 裁剪的下限,小于min_val的数值均设为min_val. 默认值[0, 0, 0].
max_val (list): 裁剪的上限,大于max_val的数值均设为max_val. 默认值[255.0, 255.0, 255.0]
"""
def __init__(self, min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0]):
self.min_val = min_val
self.max_val = max_val
def __call__(self, im, im_info=None, label=None):
if isinstance(self.min_val, list) and isinstance(self.max_val, list):
for k in range(im.shape[2]):
np.clip(
im[:, :, k],
self.min_val[k],
self.max_val[k],
out=im[:, :, k])
else:
raise TypeError('min_val and max_val must be list')
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Normalize:
"""对图像进行标准化。
1.图像像素归一化到区间 [0.0, 1.0]。
2.对图像进行减均值除以标准差操作。
Args:
min_val (list): 图像数据集的最小值。默认值[0, 0, 0].
max_val (list): 图像数据集的最大值。默认值[255.0, 255.0, 255.0]
mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5].
std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5].
Raises:
ValueError: mean或std不是list对象。std包含0。
"""
def __init__(self,
min_val=[0, 0, 0],
max_val=[255.0, 255.0, 255.0],
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]):
self.min_val = min_val
self.max_val = max_val
self.mean = mean
self.std = std
if not (isinstance(self.mean, list) and isinstance(self.std, list)):
raise ValueError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, self.min_val, self.max_val, mean, std)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Padding:
"""对图像或标注图像进行padding,padding方向为右和下。
根据提供的值对图像或标注图像进行padding操作。
Args:
target_size (int/list/tuple): padding后图像的大小。
im_padding_value (list): 图像padding的值。默认为127.5。
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: target_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
"""
def __init__(self,
target_size,
im_padding_value=127.5,
label_padding_value=255):
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise ValueError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。
Raises:
ValueError: 输入图像im或label的形状大于目标值
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_padding'] = im.shape[:2]
im_height, im_width = im.shape[0], im.shape[1]
if isinstance(self.target_size, int):
target_height = self.target_size
target_width = self.target_size
else:
target_height = self.target_size[1]
target_width = self.target_size[0]
pad_height = target_height - im_height
pad_width = target_width - im_width
if pad_height < 0 or pad_width < 0:
raise ValueError(
'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})'
.format(im_width, im_height, target_width, target_height))
else:
im = np.pad(
im,
pad_width=((0, pad_height), (0, pad_width), (0, 0)),
mode='constant',
constant_values=(self.im_padding_value, self.im_padding_value))
if label is not None:
label = np.pad(
label,
pad_width=((0, pad_height), (0, pad_width)),
mode='constant',
constant_values=(self.label_padding_value,
self.label_padding_value))
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomPaddingCrop:
"""对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。
Args:
crop_size(int or list or tuple): 裁剪图像大小。默认为512。
im_padding_value (list): 图像padding的值。默认为127.5
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: crop_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
"""
def __init__(self,
crop_size=512,
im_padding_value=127.5,
label_padding_value=255):
if isinstance(crop_size, list) or isinstance(crop_size, tuple):
if len(crop_size) != 2:
raise ValueError(
'when crop_size is list or tuple, it should include 2 elements, but it is {}'
.format(crop_size))
elif not isinstance(crop_size, int):
raise TypeError(
"Type of crop_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(crop_size)))
self.crop_size = crop_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if isinstance(self.crop_size, int):
crop_width = self.crop_size
crop_height = self.crop_size
else:
crop_width = self.crop_size[0]
crop_height = self.crop_size[1]
img_height = im.shape[0]
img_width = im.shape[1]
if img_height == crop_height and img_width == crop_width:
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
else:
pad_height = max(crop_height - img_height, 0)
pad_width = max(crop_width - img_width, 0)
if (pad_height > 0 or pad_width > 0):
im = np.pad(
im,
pad_width=((0, pad_height), (0, pad_width), (0, 0)),
mode='constant',
constant_values=(self.im_padding_value,
self.im_padding_value))
if label is not None:
label = np.pad(
label,
pad_width=((0, pad_height), (0, pad_width)),
mode='constant',
constant_values=(self.label_padding_value,
self.label_padding_value))
img_height = im.shape[0]
img_width = im.shape[1]
if crop_height > 0 and crop_width > 0:
h_off = np.random.randint(img_height - crop_height + 1)
w_off = np.random.randint(img_width - crop_width + 1)
im = im[h_off:(crop_height + h_off), w_off:(
w_off + crop_width), :]
if label is not None:
label = label[h_off:(crop_height + h_off), w_off:(
w_off + crop_width)]
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomBlur:
"""以一定的概率对图像进行高斯模糊。
Args:
prob (float): 图像模糊概率。默认为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.prob <= 0:
n = 0
elif self.prob >= 1:
n = 1
else:
n = int(1.0 / self.prob)
if n > 0:
if np.random.randint(0, n) == 0:
radius = np.random.randint(3, 10)
if radius % 2 != 1:
radius = radius + 1
if radius > 9:
radius = 9
im = cv2.GaussianBlur(im, (radius, radius), 0, 0)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomScaleAspect:
"""裁剪并resize回原始尺寸的图像和标注图像。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
Args:
min_scale (float):裁取图像占原始图像的面积比,0-1,默认0返回原图。默认为0.5。
aspect_ratio (float): 裁取图像的宽高比范围,非负,默认0返回原图。默认为0.33。
"""
def __init__(self, min_scale=0.5, aspect_ratio=0.33):
self.min_scale = min_scale
self.aspect_ratio = aspect_ratio
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale != 0 and self.aspect_ratio != 0:
img_height = im.shape[0]
img_width = im.shape[1]
for i in range(0, 10):
area = img_height * img_width
target_area = area * np.random.uniform(self.min_scale, 1.0)
aspectRatio = np.random.uniform(self.aspect_ratio,
1.0 / self.aspect_ratio)
dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
if (np.random.randint(10) < 5):
tmp = dw
dw = dh
dh = tmp
if (dh < img_height and dw < img_width):
h1 = np.random.randint(0, img_height - dh)
w1 = np.random.randint(0, img_width - dw)
im = im[h1:(h1 + dh), w1:(w1 + dw), :]
label = label[h1:(h1 + dh), w1:(w1 + dw)]
im = cv2.resize(
im, (img_width, img_height),
interpolation=cv2.INTER_LINEAR)
label = cv2.resize(
label, (img_width, img_height),
interpolation=cv2.INTER_NEAREST)
break
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ArrangeSegmenter:
"""获取训练/验证/预测所需的信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内
"""
def __init__(self, mode):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode should be defined as one of ['train', 'eval', 'test', 'quant']!"
)
self.mode = mode
def __call__(self, im, im_info, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为
'quant'时,返回的tuple为(im,),为图像np.ndarray数据。
"""
im = permute(im, False)
if self.mode == 'train' or self.mode == 'eval':
label = label[np.newaxis, :, :]
return (im, label)
elif self.mode == 'test':
return (im, im_info)
else:
return (im, )
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import logging
from . import utils
from .metrics import ConfusionMatrix
from .utils import *
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import sys
import __init__
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
def log(level=2, message=""):
current_time = time.time()
time_array = time.localtime(current_time)
current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
if __init__.log_level >= level:
print("{} [{}]\t{}".format(current_time, levels[level],
message).encode("utf-8").decode("latin1"))
sys.stdout.flush()
def debug(message=""):
log(level=3, message=message)
def info(message=""):
log(level=2, message=message)
def warning(message=""):
log(level=1, message=message)
def error(message=""):
log(level=0, message=message)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
from scipy.sparse import csr_matrix
class ConfusionMatrix(object):
"""
Confusion Matrix for segmentation evaluation
"""
def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes],
dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
def calculate(self, pred, label, ignore=None):
# If not in streaming mode, clear matrix everytime when call `calculate`
if not self.streaming:
self.zero_matrix()
label = np.transpose(label, (0, 2, 3, 1))
ignore = np.transpose(ignore, (0, 2, 3, 1))
mask = np.array(ignore) == 1
label = np.asarray(label)[mask]
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)),
shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
dtype='int64')
def mean_iou(self):
iou_list = []
avg_iou = 0
# TODO: use numpy sum axis api to simpliy
vji = np.zeros(self.num_classes, dtype=int)
vij = np.zeros(self.num_classes, dtype=int)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
for c in range(self.num_classes):
total = vji[c] + vij[c] - self.confusion_matrix[c][c]
if total == 0:
iou = 0
else:
iou = float(self.confusion_matrix[c][c]) / total
avg_iou += iou
iou_list.append(iou)
avg_iou = float(avg_iou) / float(self.num_classes)
return np.array(iou_list), avg_iou
def accuracy(self):
total = self.confusion_matrix.sum()
total_right = 0
for c in range(self.num_classes):
total_right += self.confusion_matrix[c][c]
if total == 0:
avg_acc = 0
else:
avg_acc = float(total_right) / total
vij = np.zeros(self.num_classes, dtype=int)
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
acc_list = []
for c in range(self.num_classes):
if vij[c] == 0:
acc = 0
else:
acc = self.confusion_matrix[c][c] / float(vij[c])
acc_list.append(acc)
return np.array(acc_list), avg_acc
def kappa(self):
vji = np.zeros(self.num_classes)
vij = np.zeros(self.num_classes)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
total = self.confusion_matrix.sum()
# avoid spillovers
# TODO: is it reasonable to hard code 10000.0?
total = float(total) / 10000.0
vji = vji / 10000.0
vij = vij / 10000.0
tp = 0
tc = 0
for c in range(self.num_classes):
tp += vji[c] * vij[c]
tc += self.confusion_matrix[c][c]
tc = tc / 10000.0
pe = tp / (total * total)
po = tc / total
kappa = (po - pe) / (1 - pe)
return kappa
import os.path as osp
def get_pretrain_weights(flag, backbone, save_dir):
if flag is None:
return None
elif osp.isdir(flag):
return flag
else:
raise Exception(
"pretrain_weights need to be defined as directory path.")
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import os
import os.path as osp
import numpy as np
import six
import yaml
import math
from . import logging
def seconds_to_hms(seconds):
h = math.floor(seconds / 3600)
m = math.floor((seconds - h * 3600) / 60)
s = int(seconds - h * 3600 - m * 60)
hms_str = "{}:{}:{}".format(h, m, s)
return hms_str
def setting_environ_flags():
if 'FLAGS_eager_delete_tensor_gb' not in os.environ:
os.environ['FLAGS_eager_delete_tensor_gb'] = '0.0'
if 'FLAGS_allocator_strategy' not in os.environ:
os.environ['FLAGS_allocator_strategy'] = 'auto_growth'
if "CUDA_VISIBLE_DEVICES" in os.environ:
if os.environ["CUDA_VISIBLE_DEVICES"].count("-1") > 0:
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def get_environ_info():
setting_environ_flags()
import paddle.fluid as fluid
info = dict()
info['place'] = 'cpu'
info['num'] = int(os.environ.get('CPU_NUM', 1))
if os.environ.get('CUDA_VISIBLE_DEVICES', None) != "":
if hasattr(fluid.core, 'get_cuda_device_count'):
gpu_num = 0
try:
gpu_num = fluid.core.get_cuda_device_count()
except:
os.environ['CUDA_VISIBLE_DEVICES'] = ''
pass
if gpu_num > 0:
info['place'] = 'cuda'
info['num'] = fluid.core.get_cuda_device_count()
return info
def parse_param_file(param_file, return_shape=True):
from paddle.fluid.proto.framework_pb2 import VarType
f = open(param_file, 'rb')
version = np.fromstring(f.read(4), dtype='int32')
lod_level = np.fromstring(f.read(8), dtype='int64')
for i in range(int(lod_level)):
_size = np.fromstring(f.read(8), dtype='int64')
_ = f.read(_size)
version = np.fromstring(f.read(4), dtype='int32')
tensor_desc = VarType.TensorDesc()
tensor_desc_size = np.fromstring(f.read(4), dtype='int32')
tensor_desc.ParseFromString(f.read(int(tensor_desc_size)))
tensor_shape = tuple(tensor_desc.dims)
if return_shape:
f.close()
return tuple(tensor_desc.dims)
if tensor_desc.data_type != 5:
raise Exception(
"Unexpected data type while parse {}".format(param_file))
data_size = 4
for i in range(len(tensor_shape)):
data_size *= tensor_shape[i]
weight = np.fromstring(f.read(data_size), dtype='float32')
f.close()
return np.reshape(weight, tensor_shape)
def fuse_bn_weights(exe, main_prog, weights_dir):
import paddle.fluid as fluid
logging.info("Try to fuse weights of batch_norm...")
bn_vars = list()
for block in main_prog.blocks:
ops = list(block.ops)
for op in ops:
if op.type == 'affine_channel':
scale_name = op.input('Scale')[0]
bias_name = op.input('Bias')[0]
prefix = scale_name[:-5]
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
if not osp.exists(osp.join(
weights_dir, mean_name)) or not osp.exists(
osp.join(weights_dir, variance_name)):
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn."
)
return
bias = block.var(bias_name)
pretrained_shape = parse_param_file(
osp.join(weights_dir, bias_name))
actual_shape = tuple(bias.shape)
if pretrained_shape != actual_shape:
continue
bn_vars.append(
[scale_name, bias_name, mean_name, variance_name])
eps = 1e-5
for names in bn_vars:
scale_name, bias_name, mean_name, variance_name = names
scale = parse_param_file(
osp.join(weights_dir, scale_name), return_shape=False)
bias = parse_param_file(
osp.join(weights_dir, bias_name), return_shape=False)
mean = parse_param_file(
osp.join(weights_dir, mean_name), return_shape=False)
variance = parse_param_file(
osp.join(weights_dir, variance_name), return_shape=False)
bn_std = np.sqrt(np.add(variance, eps))
new_scale = np.float32(np.divide(scale, bn_std))
new_bias = bias - mean * new_scale
scale_tensor = fluid.global_scope().find_var(scale_name).get_tensor()
bias_tensor = fluid.global_scope().find_var(bias_name).get_tensor()
scale_tensor.set(new_scale, exe.place)
bias_tensor.set(new_bias, exe.place)
if len(bn_vars) == 0:
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn.")
else:
logging.info("There's {} batch_norm ops been fused.".format(
len(bn_vars)))
def load_pdparams(exe, main_prog, model_dir):
import paddle.fluid as fluid
from paddle.fluid.proto.framework_pb2 import VarType
from paddle.fluid.framework import Program
vars_to_load = list()
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
params_dict = pickle.load(f) if six.PY2 else pickle.load(
f, encoding='latin1')
unused_vars = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if var.name not in params_dict:
raise Exception("{} is not in saved model".format(var.name))
if var.shape != params_dict[var.name].shape:
unused_vars.append(var.name)
logging.warning(
"[SKIP] Shape of pretrained weight {} doesn't match.(Pretrained: {}, Actual: {})"
.format(var.name, params_dict[var.name].shape, var.shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
for var_name in unused_vars:
del params_dict[var_name]
fluid.io.set_program_state(main_prog, params_dict)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), model_dir))
def load_pretrain_weights(exe, main_prog, weights_dir, fuse_bn=False):
if not osp.exists(weights_dir):
raise Exception("Path {} not exists.".format(weights_dir))
if osp.exists(osp.join(weights_dir, "model.pdparams")):
return load_pdparams(exe, main_prog, weights_dir)
import paddle.fluid as fluid
vars_to_load = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if not osp.exists(osp.join(weights_dir, var.name)):
logging.debug("[SKIP] Pretrained weight {}/{} doesn't exist".format(
weights_dir, var.name))
continue
pretrained_shape = parse_param_file(osp.join(weights_dir, var.name))
actual_shape = tuple(var.shape)
if pretrained_shape != actual_shape:
logging.warning(
"[SKIP] Shape of pretrained weight {}/{} doesn't match.(Pretrained: {}, Actual: {})"
.format(weights_dir, var.name, pretrained_shape, actual_shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
fluid.io.load_vars(
executor=exe,
dirname=weights_dir,
main_program=main_prog,
vars=vars_to_load)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), weights_dir))
if fuse_bn:
fuse_bn_weights(exe, main_prog, weights_dir)
...@@ -27,10 +27,10 @@ python pdseg/train.py BATCH_SIZE 1 --cfg configs/unet_optic.yaml ...@@ -27,10 +27,10 @@ python pdseg/train.py BATCH_SIZE 1 --cfg configs/unet_optic.yaml
|--cfg|配置文件路径|ALL|None|| |--cfg|配置文件路径|ALL|None||
|--use_gpu|是否使用GPU进行训练|train/eval/vis|False|| |--use_gpu|是否使用GPU进行训练|train/eval/vis|False||
|--use_mpio|是否使用多进程进行IO处理|train/eval|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。</br> **NOTE:** windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 | |--use_mpio|是否使用多进程进行IO处理|train/eval|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。</br> **NOTE:** windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 |
|--use_tb|是否使用TensorBoard记录训练数据|train|False|| |--use_vdl|是否使用VisualDL记录训练数据|train|False||
|--log_steps|训练日志的打印周期(单位为step)|train|10|| |--log_steps|训练日志的打印周期(单位为step)|train|10||
|--debug|是否打印debug信息|train|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度| |--debug|是否打印debug信息|train|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度|
|--tb_log_dir &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|TensorBoard的日志路径|train|None|| |--vdl_log_dir &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|VisualDL的日志路径|train|None||
|--do_eval|是否在保存模型时进行效果评估 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|train|False|| |--do_eval|是否在保存模型时进行效果评估 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|train|False||
|--vis_dir|保存可视化图片的路径|vis|"visual"|| |--vis_dir|保存可视化图片的路径|vis|"visual"||
...@@ -80,7 +80,7 @@ DATASET: ...@@ -80,7 +80,7 @@ DATASET:
VAL_FILE_LIST: './dataset/cityscapes/val.list' VAL_FILE_LIST: './dataset/cityscapes/val.list'
# 测试数据列表 # 测试数据列表
TEST_FILE_LIST: './dataset/cityscapes/test.list' TEST_FILE_LIST: './dataset/cityscapes/test.list'
# Tensorboard 可视化的数据集 # VisualDL 可视化的数据集
VIS_FILE_LIST: None VIS_FILE_LIST: None
# 类别数(需包括背景类) # 类别数(需包括背景类)
NUM_CLASSES: 19 NUM_CLASSES: 19
......
...@@ -62,7 +62,7 @@ DATASET Group存放所有与数据集相关的配置 ...@@ -62,7 +62,7 @@ DATASET Group存放所有与数据集相关的配置
## `VIS_FILE_LIST` ## `VIS_FILE_LIST`
可视化列表,调用`pdseg/train.py`进行训练时,如果打开了--use_tb开关,则在每次模型保存的时候,会读取该列表中的图片进行可视化 可视化列表,调用`pdseg/train.py`进行训练时,如果打开了--use_vdl开关,则在每次模型保存的时候,会读取该列表中的图片进行可视化
文件列表由多行组成,每一行的格式为 文件列表由多行组成,每一行的格式为
``` ```
......
...@@ -49,8 +49,8 @@ export CUDA_VISIBLE_DEVICES=0 ...@@ -49,8 +49,8 @@ export CUDA_VISIBLE_DEVICES=0
python pdseg/train.py --cfg configs/unet_optic.yaml \ python pdseg/train.py --cfg configs/unet_optic.yaml \
--use_gpu \ --use_gpu \
--do_eval \ --do_eval \
--use_tb \ --use_vdl \
--tb_log_dir train_log \ --vdl_log_dir train_log \
BATCH_SIZE 4 \ BATCH_SIZE 4 \
SOLVER.LR 0.001 SOLVER.LR 0.001
...@@ -70,22 +70,22 @@ export CUDA_VISIBLE_DEVICES=0,1,2 ...@@ -70,22 +70,22 @@ export CUDA_VISIBLE_DEVICES=0,1,2
## 5.训练过程可视化 ## 5.训练过程可视化
当打开do_eval和use_tb两个开关后,我们可以通过TensorBoard查看边训练边评估的效果。 训练过程可视化需要在启动训练脚本`train.py`时,打开`--do_eval``--use_vdl`两个开关,并设置日志保存目录`--vdl_log_dir`,然后便可以通过VisualDL查看边训练边评估的效果。
```shell ```shell
tensorboard --logdir train_log --host {$HOST_IP} --port {$PORT} visualdl --logdir train_log --host {$HOST_IP} --port {$PORT}
``` ```
NOTE: NOTE:
1. 上述示例中,$HOST\_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口。 1. 上述示例中,$HOST\_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口。
2. 数据量较大时,前端加载速度会比较慢,请耐心等待。 2. 数据量较大时,前端加载速度会比较慢,请耐心等待。
启动TensorBoard命令后,我们可以在浏览器中查看对应的训练数据。 启动VisualDL命令后,我们可以在浏览器中查看对应的训练数据。
`SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势。 `SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势。
![](./imgs/tensorboard_scalar.JPG) ![](./imgs/visualdl_scalar.png)
`IMAGE`这个tab中,查看样本图片。 `IMAGE`这个tab中,查看样本图片。
![](./imgs/tensorboard_image.JPG) ![](./imgs/visualdl_image.png)
## 6.模型评估 ## 6.模型评估
训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为10,保存间隔为5,因此一共会产生2个定期保存的模型,加上最终保存的final模型,一共有3个模型。我们选择最后保存的模型进行效果的评估: 训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为10,保存间隔为5,因此一共会产生2个定期保存的模型,加上最终保存的final模型,一共有3个模型。我们选择最后保存的模型进行效果的评估:
......
...@@ -20,7 +20,11 @@ import importlib ...@@ -20,7 +20,11 @@ import importlib
from utils.config import cfg from utils.config import cfg
def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None): def softmax_with_loss(logit,
label,
ignore_mask=None,
num_classes=2,
weight=None):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32') ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min( label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32))) label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
...@@ -36,14 +40,18 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None ...@@ -36,14 +40,18 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None
ignore_index=cfg.DATASET.IGNORE_INDEX, ignore_index=cfg.DATASET.IGNORE_INDEX,
return_softmax=True) return_softmax=True)
else: else:
label_one_hot = fluid.layers.one_hot(input=label, depth=num_classes) label_one_hot = fluid.one_hot(input=label, depth=num_classes)
if isinstance(weight, list): if isinstance(weight, list):
assert len(weight) == num_classes, "weight length must equal num of classes" assert len(
weight
) == num_classes, "weight length must equal num of classes"
weight = fluid.layers.assign(np.array([weight], dtype='float32')) weight = fluid.layers.assign(np.array([weight], dtype='float32'))
elif isinstance(weight, str): elif isinstance(weight, str):
assert weight.lower() == 'dynamic', 'if weight is string, must be dynamic!' assert weight.lower(
) == 'dynamic', 'if weight is string, must be dynamic!'
tmp = [] tmp = []
total_num = fluid.layers.cast(fluid.layers.shape(label)[0], 'float32') total_num = fluid.layers.cast(
fluid.layers.shape(label)[0], 'float32')
for i in range(num_classes): for i in range(num_classes):
cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i]) cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
ratio = total_num / (cls_pixel_num + 1) ratio = total_num / (cls_pixel_num + 1)
...@@ -53,9 +61,12 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None ...@@ -53,9 +61,12 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None
elif isinstance(weight, fluid.layers.Variable): elif isinstance(weight, fluid.layers.Variable):
pass pass
else: else:
raise ValueError('Expect weight is a list, string or Variable, but receive {}'.format(type(weight))) raise ValueError(
'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes]) weight = fluid.layers.reshape(weight, [1, num_classes])
weighted_label_one_hot = fluid.layers.elementwise_mul(label_one_hot, weight) weighted_label_one_hot = fluid.layers.elementwise_mul(
label_one_hot, weight)
probs = fluid.layers.softmax(logit) probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy( loss = fluid.layers.cross_entropy(
probs, probs,
...@@ -75,10 +86,11 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None ...@@ -75,10 +86,11 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None
# to change, how to appicate ignore index and ignore mask # to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001): def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1: if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("dice loss is only applicable to one channel classfication") raise Exception(
"dice loss is only applicable to one channel classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32') ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
label = fluid.layers.transpose(label, [0, 2, 3, 1]) label = fluid.layers.transpose(label, [0, 2, 3, 1])
label = fluid.layers.cast(label, 'int64') label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1]) ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit) logit = fluid.layers.sigmoid(logit)
...@@ -88,7 +100,7 @@ def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001): ...@@ -88,7 +100,7 @@ def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim) inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
dice_denominator = fluid.layers.reduce_sum( dice_denominator = fluid.layers.reduce_sum(
logit, dim=reduce_dim) + fluid.layers.reduce_sum( logit, dim=reduce_dim) + fluid.layers.reduce_sum(
label, dim=reduce_dim) label, dim=reduce_dim)
dice_score = 1 - inse * 2 / (dice_denominator + epsilon) dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
label.stop_gradient = True label.stop_gradient = True
ignore_mask.stop_gradient = True ignore_mask.stop_gradient = True
...@@ -103,26 +115,31 @@ def bce_loss(logit, label, ignore_mask=None): ...@@ -103,26 +115,31 @@ def bce_loss(logit, label, ignore_mask=None):
x=logit, x=logit,
label=label, label=label,
ignore_index=cfg.DATASET.IGNORE_INDEX, ignore_index=cfg.DATASET.IGNORE_INDEX,
normalize=True) # or False normalize=True) # or False
loss = fluid.layers.reduce_sum(loss) loss = fluid.layers.reduce_sum(loss)
label.stop_gradient = True label.stop_gradient = True
ignore_mask.stop_gradient = True ignore_mask.stop_gradient = True
return loss return loss
def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2, weight=None): def multi_softmax_with_loss(logits,
label,
ignore_mask=None,
num_classes=2,
weight=None):
if isinstance(logits, tuple): if isinstance(logits, tuple):
avg_loss = 0 avg_loss = 0
for i, logit in enumerate(logits): for i, logit in enumerate(logits):
if label.shape[2] != logit.shape[2] or label.shape[3] != logit.shape[3]: if label.shape[2] != logit.shape[2] or label.shape[
3] != logit.shape[3]:
label = fluid.layers.resize_nearest(label, logit.shape[2:]) label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (label.astype('int32') != logit_mask = (label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32') cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = softmax_with_loss(logit, label, logit_mask, loss = softmax_with_loss(logit, label, logit_mask, num_classes)
num_classes)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else: else:
avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes, weight=weight) avg_loss = softmax_with_loss(
logits, label, ignore_mask, num_classes, weight=weight)
return avg_loss return avg_loss
......
...@@ -17,10 +17,8 @@ from gray2pseudo_color import get_color_map_list ...@@ -17,10 +17,8 @@ from gray2pseudo_color import get_color_map_list
def parse_args(): def parse_args():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter formatter_class=argparse.ArgumentDefaultsHelpFormatter)
) parser.add_argument('input_dir', help='input annotated directory')
parser.add_argument('input_dir',
help='input annotated directory')
return parser.parse_args() return parser.parse_args()
...@@ -62,8 +60,7 @@ def main(args): ...@@ -62,8 +60,7 @@ def main(args):
print('Generating dataset from:', label_file) print('Generating dataset from:', label_file)
with open(label_file) as f: with open(label_file) as f:
base = osp.splitext(osp.basename(label_file))[0] base = osp.splitext(osp.basename(label_file))[0]
out_png_file = osp.join( out_png_file = osp.join(output_dir, base + '.png')
output_dir, base + '.png')
data = json.load(f) data = json.load(f)
...@@ -77,14 +74,20 @@ def main(args): ...@@ -77,14 +74,20 @@ def main(args):
# convert jingling format to labelme format # convert jingling format to labelme format
points = [] points = []
for i in range(1, int(len(polygon) / 2) + 1): for i in range(1, int(len(polygon) / 2) + 1):
points.append([polygon['x' + str(i)], polygon['y' + str(i)]]) points.append(
shape = {'label': name, 'points': points, 'shape_type': 'polygon'} [polygon['x' + str(i)], polygon['y' + str(i)]])
shape = {
'label': name,
'points': points,
'shape_type': 'polygon'
}
data_shapes.append(shape) data_shapes.append(shape)
if 'size' not in data: if 'size' not in data:
continue continue
data_size = data['size'] data_size = data['size']
img_shape = (data_size['height'], data_size['width'], data_size['depth']) img_shape = (data_size['height'], data_size['width'],
data_size['depth'])
lbl = labelme.utils.shapes_to_label( lbl = labelme.utils.shapes_to_label(
img_shape=img_shape, img_shape=img_shape,
...@@ -102,8 +105,7 @@ def main(args): ...@@ -102,8 +105,7 @@ def main(args):
else: else:
raise ValueError( raise ValueError(
'[%s] Cannot save the pixel-wise class label as PNG. ' '[%s] Cannot save the pixel-wise class label as PNG. '
'Please consider using the .npy format.' % out_png_file 'Please consider using the .npy format.' % out_png_file)
)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -17,10 +17,8 @@ from gray2pseudo_color import get_color_map_list ...@@ -17,10 +17,8 @@ from gray2pseudo_color import get_color_map_list
def parse_args(): def parse_args():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter formatter_class=argparse.ArgumentDefaultsHelpFormatter)
) parser.add_argument('input_dir', help='input annotated directory')
parser.add_argument('input_dir',
help='input annotated directory')
return parser.parse_args() return parser.parse_args()
...@@ -61,8 +59,7 @@ def main(args): ...@@ -61,8 +59,7 @@ def main(args):
print('Generating dataset from:', label_file) print('Generating dataset from:', label_file)
with open(label_file) as f: with open(label_file) as f:
base = osp.splitext(osp.basename(label_file))[0] base = osp.splitext(osp.basename(label_file))[0]
out_png_file = osp.join( out_png_file = osp.join(output_dir, base + '.png')
output_dir, base + '.png')
data = json.load(f) data = json.load(f)
...@@ -85,8 +82,7 @@ def main(args): ...@@ -85,8 +82,7 @@ def main(args):
else: else:
raise ValueError( raise ValueError(
'[%s] Cannot save the pixel-wise class label as PNG. ' '[%s] Cannot save the pixel-wise class label as PNG. '
'Please consider using the .npy format.' % out_png_file 'Please consider using the .npy format.' % out_png_file)
)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -77,14 +77,14 @@ def parse_args(): ...@@ -77,14 +77,14 @@ def parse_args():
help='debug mode, display detail information of training', help='debug mode, display detail information of training',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--use_tb', '--use_vdl',
dest='use_tb', dest='use_vdl',
help='whether to record the data during training to Tensorboard', help='whether to record the data during training to VisualDL',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--tb_log_dir', '--vdl_log_dir',
dest='tb_log_dir', dest='vdl_log_dir',
help='Tensorboard logging directory', help='VisualDL logging directory',
default=None, default=None,
type=str) type=str)
parser.add_argument( parser.add_argument(
...@@ -354,17 +354,17 @@ def train(cfg): ...@@ -354,17 +354,17 @@ def train(cfg):
fetch_list.extend([pred.name, grts.name, masks.name]) fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb: if args.use_vdl:
if not args.tb_log_dir: if not args.vdl_log_dir:
print_info("Please specify the log directory by --tb_log_dir.") print_info("Please specify the log directory by --vdl_log_dir.")
exit(1) exit(1)
from tb_paddle import SummaryWriter from visualdl import LogWriter
log_writer = SummaryWriter(args.tb_log_dir) log_writer = LogWriter(args.vdl_log_dir)
# trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0)) # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
# num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
global_step = 0 step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1 all_step += 1
...@@ -398,9 +398,9 @@ def train(cfg): ...@@ -398,9 +398,9 @@ def train(cfg):
return_numpy=True) return_numpy=True)
cm.calculate(pred, grts, masks) cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0: if step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy() category_acc, mean_acc = cm.accuracy()
...@@ -408,22 +408,22 @@ def train(cfg): ...@@ -408,22 +408,22 @@ def train(cfg):
print_info(( print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc, ).format(epoch, step, lr[0], avg_loss, mean_acc,
mean_iou, speed, mean_iou, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
print_info("Category IoU: ", category_iou) print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc) print_info("Category Acc: ", category_acc)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/mean_iou', mean_iou, log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Train/mean_acc', mean_acc, log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step) step)
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/step/sec', speed, log_writer.add_scalar('Train/step/sec', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
cm.zero_matrix() cm.zero_matrix()
...@@ -435,30 +435,30 @@ def train(cfg): ...@@ -435,30 +435,30 @@ def train(cfg):
fetch_list=fetch_list, fetch_list=fetch_list,
return_numpy=True) return_numpy=True)
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: if step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
print(( print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed, ).format(epoch, step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/speed', speed, log_writer.add_scalar('Train/speed', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
timer.restart() timer.restart()
# NOTE : used for benchmark, profiler tools # NOTE : used for benchmark, profiler tools
if args.is_profiler and epoch == 1 and global_step == args.log_steps: if args.is_profiler and epoch == 1 and step == args.log_steps:
profiler.start_profiler("All") profiler.start_profiler("All")
elif args.is_profiler and epoch == 1 and global_step == args.log_steps + 5: elif args.is_profiler and epoch == 1 and step == args.log_steps + 5:
profiler.stop_profiler("total", args.profiler_path) profiler.stop_profiler("total", args.profiler_path)
return return
...@@ -479,11 +479,11 @@ def train(cfg): ...@@ -479,11 +479,11 @@ def train(cfg):
ckpt_dir=ckpt_dir, ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
use_mpio=args.use_mpio) use_mpio=args.use_mpio)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou, log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc, log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step) step)
if mean_iou > best_mIoU: if mean_iou > best_mIoU:
best_mIoU = mean_iou best_mIoU = mean_iou
...@@ -493,8 +493,8 @@ def train(cfg): ...@@ -493,8 +493,8 @@ def train(cfg):
os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'), os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
mean_iou)) mean_iou))
# Use Tensorboard to visualize results # Use VisualDL to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: if args.use_vdl and cfg.DATASET.VIS_FILE_LIST is not None:
visualize( visualize(
cfg=cfg, cfg=cfg,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -56,7 +56,7 @@ cfg.DATASET.VAL_TOTAL_IMAGES = 500 ...@@ -56,7 +56,7 @@ cfg.DATASET.VAL_TOTAL_IMAGES = 500
cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list' cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list'
# 测试数据数量 # 测试数据数量
cfg.DATASET.TEST_TOTAL_IMAGES = 500 cfg.DATASET.TEST_TOTAL_IMAGES = 500
# Tensorboard 可视化的数据集 # VisualDL 可视化的数据集
cfg.DATASET.VIS_FILE_LIST = None cfg.DATASET.VIS_FILE_LIST = None
# 类别数(需包括背景类) # 类别数(需包括背景类)
cfg.DATASET.NUM_CLASSES = 19 cfg.DATASET.NUM_CLASSES = 19
......
...@@ -162,18 +162,17 @@ def visualize(cfg, ...@@ -162,18 +162,17 @@ def visualize(cfg,
img_cnt += 1 img_cnt += 1
print("#{} visualize image path: {}".format(img_cnt, vis_fn)) print("#{} visualize image path: {}".format(img_cnt, vis_fn))
# Use Tensorboard to visualize image # Use VisualDL to visualize image
if log_writer is not None: if log_writer is not None:
# Calulate epoch from ckpt_dir folder name # Calulate epoch from ckpt_dir folder name
epoch = int(os.path.split(ckpt_dir)[-1]) epoch = int(os.path.split(ckpt_dir)[-1])
print("Tensorboard visualization epoch", epoch) print("VisualDL visualization epoch", epoch)
pred_mask_np = np.array(pred_mask.convert("RGB")) pred_mask_np = np.array(pred_mask.convert("RGB"))
log_writer.add_image( log_writer.add_image(
"Predict/{}".format(img_name), "Predict/{}".format(img_name),
pred_mask_np, pred_mask_np,
epoch, epoch)
dataformats='HWC')
# Original image # Original image
# BGR->RGB # BGR->RGB
img = cv2.imread( img = cv2.imread(
...@@ -181,8 +180,7 @@ def visualize(cfg, ...@@ -181,8 +180,7 @@ def visualize(cfg,
log_writer.add_image( log_writer.add_image(
"Images/{}".format(img_name), "Images/{}".format(img_name),
img, img,
epoch, epoch)
dataformats='HWC')
# add ground truth (label) images # add ground truth (label) images
grt = grts[i] grt = grts[i]
if grt is not None: if grt is not None:
...@@ -194,8 +192,7 @@ def visualize(cfg, ...@@ -194,8 +192,7 @@ def visualize(cfg,
log_writer.add_image( log_writer.add_image(
"Label/{}".format(img_name), "Label/{}".format(img_name),
grt, grt,
epoch, epoch)
dataformats='HWC')
# If in local_test mode, only visualize 5 images just for testing # If in local_test mode, only visualize 5 images just for testing
# procedure # procedure
......
...@@ -2,11 +2,4 @@ pre-commit ...@@ -2,11 +2,4 @@ pre-commit
yapf == 0.26.0 yapf == 0.26.0
flake8 flake8
pyyaml >= 5.1 pyyaml >= 5.1
tb-paddle visualdl == 2.0.0-alpha.2
tensorboard >= 1.15.0
Pillow
numpy
six
opencv-python
tqdm
requests
...@@ -87,14 +87,14 @@ def parse_args(): ...@@ -87,14 +87,14 @@ def parse_args():
help='debug mode, display detail information of training', help='debug mode, display detail information of training',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--use_tb', '--use_vdl',
dest='use_tb', dest='use_vdl',
help='whether to record the data during training to Tensorboard', help='whether to record the data during training to VisualDL',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--tb_log_dir', '--vdl_log_dir',
dest='tb_log_dir', dest='vd;_log_dir',
help='Tensorboard logging directory', help='VisualDL logging directory',
default=None, default=None,
type=str) type=str)
parser.add_argument( parser.add_argument(
...@@ -409,17 +409,17 @@ def train(cfg): ...@@ -409,17 +409,17 @@ def train(cfg):
fetch_list.extend([pred.name, grts.name, masks.name]) fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb: if args.use_vdl:
if not args.tb_log_dir: if not args.vdl_log_dir:
print_info("Please specify the log directory by --tb_log_dir.") print_info("Please specify the log directory by --vdl_log_dir.")
exit(1) exit(1)
from tb_paddle import SummaryWriter from visualdl import LogWriter
log_writer = SummaryWriter(args.tb_log_dir) log_writer = LogWriter(args.vdl_log_dir)
# trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0)) # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
# num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
global_step = 0 step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1 all_step += 1
...@@ -455,9 +455,9 @@ def train(cfg): ...@@ -455,9 +455,9 @@ def train(cfg):
return_numpy=True) return_numpy=True)
cm.calculate(pred, grts, masks) cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0: if step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy() category_acc, mean_acc = cm.accuracy()
...@@ -465,22 +465,22 @@ def train(cfg): ...@@ -465,22 +465,22 @@ def train(cfg):
print_info(( print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc, ).format(epoch, step, lr[0], avg_loss, mean_acc,
mean_iou, speed, mean_iou, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
print_info("Category IoU: ", category_iou) print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc) print_info("Category Acc: ", category_acc)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/mean_iou', mean_iou, log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Train/mean_acc', mean_acc, log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step) step)
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/step/sec', speed, log_writer.add_scalar('Train/step/sec', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
cm.zero_matrix() cm.zero_matrix()
...@@ -494,25 +494,25 @@ def train(cfg): ...@@ -494,25 +494,25 @@ def train(cfg):
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
avg_t_loss += np.mean(np.array(t_loss)) avg_t_loss += np.mean(np.array(t_loss))
avg_d_loss += np.mean(np.array(d_loss)) avg_d_loss += np.mean(np.array(d_loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: if step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps avg_loss /= args.log_steps
avg_t_loss /= args.log_steps avg_t_loss /= args.log_steps
avg_d_loss /= args.log_steps avg_d_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
print(( print((
"epoch={} step={} lr={:.5f} loss={:.4f} teacher loss={:.4f} distill loss={:.4f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} teacher loss={:.4f} distill loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, ).format(epoch, step, lr[0], avg_loss,
avg_t_loss, avg_d_loss, speed, avg_t_loss, avg_d_loss, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/speed', speed, log_writer.add_scalar('Train/speed', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
avg_t_loss = 0.0 avg_t_loss = 0.0
...@@ -536,11 +536,11 @@ def train(cfg): ...@@ -536,11 +536,11 @@ def train(cfg):
ckpt_dir=ckpt_dir, ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
use_mpio=args.use_mpio) use_mpio=args.use_mpio)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou, log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc, log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step) step)
if mean_iou > best_mIoU: if mean_iou > best_mIoU:
best_mIoU = mean_iou best_mIoU = mean_iou
...@@ -550,8 +550,8 @@ def train(cfg): ...@@ -550,8 +550,8 @@ def train(cfg):
os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'), os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
mean_iou)) mean_iou))
# Use Tensorboard to visualize results # Use VisualDL to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: if args.use_vdl and cfg.DATASET.VIS_FILE_LIST is not None:
visualize( visualize(
cfg=cfg, cfg=cfg,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -87,14 +87,14 @@ def parse_args(): ...@@ -87,14 +87,14 @@ def parse_args():
help='debug mode, display detail information of training', help='debug mode, display detail information of training',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--use_tb', '--use_vdl',
dest='use_tb', dest='use_vdl',
help='whether to record the data during training to Tensorboard', help='whether to record the data during training to VisualDL',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--tb_log_dir', '--vdl_log_dir',
dest='tb_log_dir', dest='vdl_log_dir',
help='Tensorboard logging directory', help='VisualDL logging directory',
default=None, default=None,
type=str) type=str)
parser.add_argument( parser.add_argument(
......
...@@ -83,14 +83,14 @@ def parse_args(): ...@@ -83,14 +83,14 @@ def parse_args():
help='debug mode, display detail information of training', help='debug mode, display detail information of training',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--use_tb', '--use_vdl',
dest='use_tb', dest='use_vdl',
help='whether to record the data during training to Tensorboard', help='whether to record the data during training to VisualDL',
action='store_true') action='store_true')
parser.add_argument( parser.add_argument(
'--tb_log_dir', '--vdl_log_dir',
dest='tb_log_dir', dest='vdl_log_dir',
help='Tensorboard logging directory', help='VisualDL logging directory',
default=None, default=None,
type=str) type=str)
parser.add_argument( parser.add_argument(
...@@ -335,13 +335,13 @@ def train(cfg): ...@@ -335,13 +335,13 @@ def train(cfg):
fetch_list.extend([pred.name, grts.name, masks.name]) fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb: if args.use_vdl:
if not args.tb_log_dir: if not args.vdl_log_dir:
print_info("Please specify the log directory by --tb_log_dir.") print_info("Please specify the log directory by --vdl_log_dir.")
exit(1) exit(1)
from tb_paddle import SummaryWriter from visualdl import LogWriter
log_writer = SummaryWriter(args.tb_log_dir) log_writer = LogWriter(args.vdl_log_dir)
pruner = Pruner() pruner = Pruner()
train_prog = pruner.prune( train_prog = pruner.prune(
...@@ -357,7 +357,7 @@ def train(cfg): ...@@ -357,7 +357,7 @@ def train(cfg):
exec_strategy=exec_strategy, exec_strategy=exec_strategy,
build_strategy=build_strategy) build_strategy=build_strategy)
global_step = 0 step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1 all_step += 1
...@@ -389,9 +389,9 @@ def train(cfg): ...@@ -389,9 +389,9 @@ def train(cfg):
return_numpy=True) return_numpy=True)
cm.calculate(pred, grts, masks) cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0: if step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy() category_acc, mean_acc = cm.accuracy()
...@@ -399,22 +399,22 @@ def train(cfg): ...@@ -399,22 +399,22 @@ def train(cfg):
print_info(( print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc, ).format(epoch, step, lr[0], avg_loss, mean_acc,
mean_iou, speed, mean_iou, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
print_info("Category IoU: ", category_iou) print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc) print_info("Category Acc: ", category_acc)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/mean_iou', mean_iou, log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Train/mean_acc', mean_acc, log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step) step)
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/step/sec', speed, log_writer.add_scalar('Train/step/sec', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
cm.zero_matrix() cm.zero_matrix()
...@@ -426,22 +426,22 @@ def train(cfg): ...@@ -426,22 +426,22 @@ def train(cfg):
fetch_list=fetch_list, fetch_list=fetch_list,
return_numpy=True) return_numpy=True)
avg_loss += np.mean(np.array(loss)) avg_loss += np.mean(np.array(loss))
global_step += 1 step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: if step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time() speed = args.log_steps / timer.elapsed_time()
print(( print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}" "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed, ).format(epoch, step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed))) calculate_eta(all_step - step, speed)))
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Train/loss', avg_loss, log_writer.add_scalar('Train/loss', avg_loss,
global_step) step)
log_writer.add_scalar('Train/lr', lr[0], log_writer.add_scalar('Train/lr', lr[0],
global_step) step)
log_writer.add_scalar('Train/speed', speed, log_writer.add_scalar('Train/speed', speed,
global_step) step)
sys.stdout.flush() sys.stdout.flush()
avg_loss = 0.0 avg_loss = 0.0
timer.restart() timer.restart()
...@@ -463,14 +463,14 @@ def train(cfg): ...@@ -463,14 +463,14 @@ def train(cfg):
ckpt_dir=ckpt_dir, ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
use_mpio=args.use_mpio) use_mpio=args.use_mpio)
if args.use_tb: if args.use_vdl:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou, log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step) step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc, log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step) step)
# Use Tensorboard to visualize results # Use VisualDL to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: if args.use_vdl and cfg.DATASET.VIS_FILE_LIST is not None:
visualize( visualize(
cfg=cfg, cfg=cfg,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册