提交 b953f871 编写于 作者: C Channingss

merge paddle/develop

...@@ -3,4 +3,4 @@ name: 4. PaddleX GUI使用问题 ...@@ -3,4 +3,4 @@ name: 4. PaddleX GUI使用问题
about: Paddle GUI客户端使用问题 about: Paddle GUI客户端使用问题
--- ---
PaddleX GUI: https://www.paddlepaddle.org.cn/paddle/paddleX PaddleX GUI: https://www.paddlepaddle.org.cn/paddle/paddleX (请在ISSUE内容中保留此行内容)
<p align="center"> <p align="center">
<img src="./docs/gui/images/paddlex.png" width="360" height ="55" alt="PaddleX" align="middle" /> <img src="./docs/gui/images/paddlex.png" width="360" height ="55" alt="PaddleX" align="middle" />
</p> </p>
<p align= "center"> PaddleX -- 飞桨全流程开发套件,以低代码的形式支持开发者快速实现产业实际项目落地 </p>
[![License](https://img.shields.io/badge/license-Apache%202-red.svg)](LICENSE) [![License](https://img.shields.io/badge/license-Apache%202-red.svg)](LICENSE)
[![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases) [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg) ![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20) ![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20)
**PaddleX--飞桨全功能开发套件**,集成了飞桨视觉套件(PaddleClas、PaddleDetection、PaddleSeg)、模型压缩工具PaddleSlim、可视化分析工具VisualDL、轻量化推理引擎Paddle Lite 等核心模块的能力,同时融合飞桨团队丰富的实际经验及技术积累,将深度学习开发全流程,从数据准备、模型训练与优化到多端部署实现了端到端打通,为开发者提供飞桨全流程开发的最佳实践。 集成飞桨智能视觉领域**图像分类****目标检测****语义分割****实例分割**任务能力,将深度学习开发全流程从**数据准备****模型训练与优化****多端部署**端到端打通,并提供**统一任务API接口****图形化开发界面Demo**。开发者无需分别安装不同套件,以**低代码**的形式即可快速完成飞桨全流程开发。
**PaddleX 提供了最简化的API设计,并官方实现GUI供大家下载使用**,最大程度降低开发者使用门槛。开发者既可以应用**PaddleX GUI**快速体验深度学习模型开发的全流程,也可以直接使用 **PaddleX API** 更灵活地进行开发。
更进一步的,如果用户需要根据自己场景及需求,定制化地对PaddleX 进行改造或集成,PaddleX 也提供很好的支持。
## PaddleX 三大特点 **PaddleX** 经过**质检****安防****巡检****遥感****零售****医疗**等十多个行业实际应用场景验证,沉淀产业实际经验,**并提供丰富的案例实践教程**,全程助力开发者产业实践落地。
### 全流程打通
- **数据准备**:兼容ImageNet、VOC、COCO等常用数据协议, 同时与Labelme、精灵标注助手、[EasyData智能数据服务平台](https://ai.baidu.com/easydata/)等无缝衔接,全方位助力开发者更快完成数据准备工作。
- **数据预处理及增强**:提供极简的图像预处理和增强方法--Transforms,适配imgaug图像增强库,支持上百种数据增强策略,是开发者快速缓解小样本数据训练的问题。
- **模型训练**:集成[PaddleClas](https://github.com/PaddlePaddle/PaddleClas), [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)视觉开发套件,提供大量精选的、经过产业实践的高质量预训练模型,使开发者更快实现工业级模型效果。
- **模型调优**:内置模型可解释性模块、[VisualDL](https://github.com/PaddlePaddle/VisualDL)可视化分析工具。使开发者可以更直观的理解模型的特征提取区域、训练过程参数变化,从而快速优化模型。
- **多端安全部署**:内置[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)模型压缩工具和**模型加密部署模块**,与飞桨原生预测库Paddle Inference及高性能端侧推理引擎[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 无缝打通,使开发者快速实现模型的多端、高性能、安全部署。
### 融合产业实践
- **产业验证**:经过**质检****安防****巡检****遥感****零售****医疗**等十多个行业实际应用场景验证,适配行业数据格式及部署环境要求。
- **经验沉淀**:沉淀产业实践实际经验,**提供丰富的案例实践教程**,加速开发者产业落地。
- **产业开发者共建**:吸收实际产业开发者贡献代码,源于产业,回馈产业。
## 易用易集成
- **易用**:统一的全流程API,5步即可完成模型训练,10行代码实现Python/C++高性能部署。
- **易集成**:支持开发者自主改造、集成,开发出适用于自己产业需求的产品。并官方提供基于 PaddleX API 开发的跨平台可视化工具-- **PaddleX GUI**,使开发者快速体验飞桨深度学习开发全流程,并启发用户进行定制化开发。
## 安装 ## 安装
**PaddleX提供两种开发模式,满足用户的不同需求:** **PaddleX提供两种开发模式,满足用户的不同需求:**
1. **Python开发模式:** 通过简洁易懂的Python API,在兼顾功能全面性、开发灵活性、集成方便性的基础上,给开发者最流畅的深度学习开发体验。<br> 1. **Python开发模式:**
通过简洁易懂的Python API,在兼顾功能全面性、开发灵活性、集成方便性的基础上,给开发者最流畅的深度学习开发体验。<br>
**前置依赖** **前置依赖**
> - paddlepaddle >= 1.8.0 > - paddlepaddle >= 1.8.0
> - python >= 3.5 > - python >= 3.6
> - cython > - cython
> - pycocotools > - pycocotools
...@@ -59,12 +40,29 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple ...@@ -59,12 +40,29 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
详细安装方法请参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/develop/install.html) 详细安装方法请参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/develop/install.html)
2. **Padlde GUI模式:** 无代码开发的可视化客户端,应用Paddle API实现,使开发者快速进行产业项目验证,并为用户开发自有深度学习软件/应用提供参照。 2. **Padlde GUI模式:**
无代码开发的可视化客户端,应用Paddle API实现,使开发者快速进行产业项目验证,并为用户开发自有深度学习软件/应用提供参照。
- 前往[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex),申请下载Paddle X GUI一键绿色安装包。 - 前往[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex),申请下载Paddle X GUI一键绿色安装包。
- 前往[PaddleX GUI使用教程](./docs/gui/how_to_use.md)了解PaddleX GUI使用详情。 - 前往[PaddleX GUI使用教程](./docs/gui/how_to_use.md)了解PaddleX GUI使用详情。
## 产品模块说明
- **数据准备**:兼容ImageNet、VOC、COCO等常用数据协议,同时与Labelme、精灵标注助手、[EasyData智能数据服务平台](https://ai.baidu.com/easydata/)等无缝衔接,全方位助力开发者更快完成数据准备工作。
- **数据预处理及增强**:提供极简的图像预处理和增强方法--Transforms,适配imgaug图像增强库,支持**上百种数据增强策略**,是开发者快速缓解小样本数据训练的问题。
- **模型训练**:集成[PaddleClas](https://github.com/PaddlePaddle/PaddleClas), [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)视觉开发套件,提供大量精选的、经过产业实践的高质量预训练模型,使开发者更快实现工业级模型效果。
- **模型调优**:内置模型可解释性模块、[VisualDL](https://github.com/PaddlePaddle/VisualDL)可视化分析工具。使开发者可以更直观的理解模型的特征提取区域、训练过程参数变化,从而快速优化模型。
- **多端安全部署**:内置[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)模型压缩工具和**模型加密部署模块**,与飞桨原生预测库Paddle Inference及高性能端侧推理引擎[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 无缝打通,使开发者快速实现模型的多端、高性能、安全部署。
## 完整使用文档及API说明 ## 完整使用文档及API说明
...@@ -74,7 +72,7 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple ...@@ -74,7 +72,7 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- [PaddleX模型训练教程集合](https://paddlex.readthedocs.io/zh_CN/develop/train/index.html) - [PaddleX模型训练教程集合](https://paddlex.readthedocs.io/zh_CN/develop/train/index.html)
- [PaddleX API接口说明](https://paddlex.readthedocs.io/zh_CN/develop/apis/index.html) - [PaddleX API接口说明](https://paddlex.readthedocs.io/zh_CN/develop/apis/index.html)
## 在线项目示例 ### 在线项目示例
为了使开发者更快掌握PaddleX API,我们创建了一系列完整的示例教程,您可通过AIStudio一站式开发平台,快速在线运行PaddleX的项目。 为了使开发者更快掌握PaddleX API,我们创建了一系列完整的示例教程,您可通过AIStudio一站式开发平台,快速在线运行PaddleX的项目。
...@@ -83,15 +81,36 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple ...@@ -83,15 +81,36 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- [PaddleX快速上手——Faster-RCNN AI识虫](https://aistudio.baidu.com/aistudio/projectdetail/439888) - [PaddleX快速上手——Faster-RCNN AI识虫](https://aistudio.baidu.com/aistudio/projectdetail/439888)
- [PaddleX快速上手——DeepLabv3+ 视盘分割](https://aistudio.baidu.com/aistudio/projectdetail/440197) - [PaddleX快速上手——DeepLabv3+ 视盘分割](https://aistudio.baidu.com/aistudio/projectdetail/440197)
## 交流与反馈
- 项目官网: https://www.paddlepaddle.org.cn/paddle/paddlex
- PaddleX用户交流群: 1045148026 (手机QQ扫描如下二维码快速加入) ## 全流程产业应用案例
<img src="./docs/gui/images/QR.jpg" width="250" height="300" alt="QQGroup" align="center" />
(continue to be updated)
* 工业巡检:
* [工业表计读数](https://paddlex.readthedocs.io/zh_CN/develop/examples/meter_reader.html)
* 工业质检:
* 电池隔膜缺陷检测(Coming Soon)
* [人像分割](https://paddlex.readthedocs.io/zh_CN/develop/examples/human_segmentation.html)
## [FAQ](./docs/gui/faq.md) ## [FAQ](./docs/gui/faq.md)
## 交流与反馈
- 项目官网:https://www.paddlepaddle.org.cn/paddle/paddlex
- PaddleX用户交流群:957286141 (手机QQ扫描如下二维码快速加入)
![](./docs/gui/images/QR2.jpg)
## 更新日志 ## 更新日志
> [历史版本及更新内容](https://paddlex.readthedocs.io/zh_CN/develop/change_log.html) > [历史版本及更新内容](https://paddlex.readthedocs.io/zh_CN/develop/change_log.html)
- 2020.07.13 v1.1.0 - 2020.07.13 v1.1.0
...@@ -99,6 +118,8 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple ...@@ -99,6 +118,8 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- 2020.05.20 v1.0.0 - 2020.05.20 v1.0.0
- 2020.05.17 v0.1.8 - 2020.05.17 v0.1.8
## 贡献代码 ## 贡献代码
我们非常欢迎您为PaddleX贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交Pull Requests. 我们非常欢迎您为PaddleX贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交Pull Requests
...@@ -17,7 +17,6 @@ SET(OPENCV_DIR "" CACHE PATH "Location of libraries") ...@@ -17,7 +17,6 @@ SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(ENCRYPTION_DIR"" CACHE PATH "Location of libraries") SET(ENCRYPTION_DIR"" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries") SET(CUDA_LIB "" CACHE PATH "Location of libraries")
if (NOT WIN32) if (NOT WIN32)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib) set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib) set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
...@@ -51,7 +50,9 @@ endmacro() ...@@ -51,7 +50,9 @@ endmacro()
if (WITH_ENCRYPTION) if (WITH_ENCRYPTION)
add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION}) if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION})
endif()
endif() endif()
if (WITH_MKL) if (WITH_MKL)
...@@ -62,8 +63,10 @@ if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "") ...@@ -62,8 +63,10 @@ if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir") message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif() endif()
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "") if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv") message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
endif() endif()
include_directories("${CMAKE_SOURCE_DIR}/") include_directories("${CMAKE_SOURCE_DIR}/")
...@@ -111,10 +114,17 @@ if (WIN32) ...@@ -111,10 +114,17 @@ if (WIN32)
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH) find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
unset(OpenCV_DIR CACHE) unset(OpenCV_DIR CACHE)
else () else ()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH) if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64") # x86_64 aarch64
set(OpenCV_INCLUDE_DIRS "/usr/include/opencv4")
file(GLOB OpenCV_LIBS /usr/lib/aarch64-linux-gnu/libopencv_*${CMAKE_SHARED_LIBRARY_SUFFIX})
message("OpenCV libs: ${OpenCV_LIBS}")
else()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
endif()
include_directories("${PADDLE_DIR}/paddle/include") include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib") link_directories("${PADDLE_DIR}/paddle/lib")
endif () endif ()
include_directories(${OpenCV_INCLUDE_DIRS}) include_directories(${OpenCV_INCLUDE_DIRS})
if (WIN32) if (WIN32)
...@@ -260,9 +270,11 @@ endif() ...@@ -260,9 +270,11 @@ endif()
if(WITH_ENCRYPTION) if(WITH_ENCRYPTION)
if(NOT WIN32) if(NOT WIN32)
if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
include_directories("${ENCRYPTION_DIR}/include") include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib") link_directories("${ENCRYPTION_DIR}/lib")
set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/libpmodel-decrypt${CMAKE_SHARED_LIBRARY_SUFFIX}) set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/libpmodel-decrypt${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
else() else()
include_directories("${ENCRYPTION_DIR}/include") include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib") link_directories("${ENCRYPTION_DIR}/lib")
...@@ -276,6 +288,7 @@ if (NOT WIN32) ...@@ -276,6 +288,7 @@ if (NOT WIN32)
endif() endif()
set(DEPS ${DEPS} ${OpenCV_LIBS}) set(DEPS ${DEPS} ${OpenCV_LIBS})
add_library(paddlex_inference SHARED src/visualize src/transforms.cpp src/paddlex.cpp) add_library(paddlex_inference SHARED src/visualize src/transforms.cpp src/paddlex.cpp)
ADD_DEPENDENCIES(paddlex_inference ext-yaml-cpp) ADD_DEPENDENCIES(paddlex_inference ext-yaml-cpp)
target_link_libraries(paddlex_inference ${DEPS}) target_link_libraries(paddlex_inference ${DEPS})
...@@ -292,6 +305,19 @@ add_executable(segmenter demo/segmenter.cpp src/transforms.cpp src/paddlex.cpp s ...@@ -292,6 +305,19 @@ add_executable(segmenter demo/segmenter.cpp src/transforms.cpp src/paddlex.cpp s
ADD_DEPENDENCIES(segmenter ext-yaml-cpp) ADD_DEPENDENCIES(segmenter ext-yaml-cpp)
target_link_libraries(segmenter ${DEPS}) target_link_libraries(segmenter ${DEPS})
add_executable(video_classifier demo/video_classifier.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
ADD_DEPENDENCIES(video_classifier ext-yaml-cpp)
target_link_libraries(video_classifier ${DEPS})
add_executable(video_detector demo/video_detector.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
ADD_DEPENDENCIES(video_detector ext-yaml-cpp)
target_link_libraries(video_detector ${DEPS})
add_executable(video_segmenter demo/video_segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
ADD_DEPENDENCIES(video_segmenter ext-yaml-cpp)
target_link_libraries(video_segmenter ${DEPS})
if (WIN32 AND WITH_MKL) if (WIN32 AND WITH_MKL)
add_custom_command(TARGET classifier POST_BUILD add_custom_command(TARGET classifier POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
...@@ -313,7 +339,27 @@ if (WIN32 AND WITH_MKL) ...@@ -313,7 +339,27 @@ if (WIN32 AND WITH_MKL)
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll )
add_custom_command(TARGET video_classifier POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
)
add_custom_command(TARGET video_detector POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
)
add_custom_command(TARGET video_segmenter POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
) )
# for encryption # for encryption
if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll") if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll")
...@@ -329,6 +375,18 @@ if (WIN32 AND WITH_MKL) ...@@ -329,6 +375,18 @@ if (WIN32 AND WITH_MKL)
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
) )
add_custom_command(TARGET video_classifier POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
)
add_custom_command(TARGET video_detector POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
)
add_custom_command(TARGET video_segmenter POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
)
endif() endif()
endif() endif()
......
...@@ -37,7 +37,6 @@ DEFINE_int32(batch_size, 1, "Batch size of infering"); ...@@ -37,7 +37,6 @@ DEFINE_int32(batch_size, 1, "Batch size of infering");
DEFINE_int32(thread_num, DEFINE_int32(thread_num,
omp_get_num_procs(), omp_get_num_procs(),
"Number of preprocessing threads"); "Number of preprocessing threads");
DEFINE_bool(use_ir_optim, true, "use ir optimization");
int main(int argc, char** argv) { int main(int argc, char** argv) {
// Parsing command-line // Parsing command-line
...@@ -52,16 +51,15 @@ int main(int argc, char** argv) { ...@@ -52,16 +51,15 @@ int main(int argc, char** argv) {
return -1; return -1;
} }
// 加载模型 // Load model
PaddleX::Model model; PaddleX::Model model;
model.Init(FLAGS_model_dir, model.Init(FLAGS_model_dir,
FLAGS_use_gpu, FLAGS_use_gpu,
FLAGS_use_trt, FLAGS_use_trt,
FLAGS_gpu_id, FLAGS_gpu_id,
FLAGS_key, FLAGS_key);
FLAGS_use_ir_optim);
// 进行预测 // Predict
int imgs = 1; int imgs = 1;
if (FLAGS_image_list != "") { if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list); std::ifstream inf(FLAGS_image_list);
...@@ -69,7 +67,7 @@ int main(int argc, char** argv) { ...@@ -69,7 +67,7 @@ int main(int argc, char** argv) {
std::cerr << "Fail to open file " << FLAGS_image_list << std::endl; std::cerr << "Fail to open file " << FLAGS_image_list << std::endl;
return -1; return -1;
} }
// 多batch预测 // Mini-batch predict
std::string image_path; std::string image_path;
std::vector<std::string> image_paths; std::vector<std::string> image_paths;
while (getline(inf, image_path)) { while (getline(inf, image_path)) {
...@@ -77,7 +75,7 @@ int main(int argc, char** argv) { ...@@ -77,7 +75,7 @@ int main(int argc, char** argv) {
} }
imgs = image_paths.size(); imgs = image_paths.size();
for (int i = 0; i < image_paths.size(); i += FLAGS_batch_size) { for (int i = 0; i < image_paths.size(); i += FLAGS_batch_size) {
// 读图像 // Read image
int im_vec_size = int im_vec_size =
std::min(static_cast<int>(image_paths.size()), i + FLAGS_batch_size); std::min(static_cast<int>(image_paths.size()), i + FLAGS_batch_size);
std::vector<cv::Mat> im_vec(im_vec_size - i); std::vector<cv::Mat> im_vec(im_vec_size - i);
......
...@@ -43,10 +43,9 @@ DEFINE_double(threshold, ...@@ -43,10 +43,9 @@ DEFINE_double(threshold,
DEFINE_int32(thread_num, DEFINE_int32(thread_num,
omp_get_num_procs(), omp_get_num_procs(),
"Number of preprocessing threads"); "Number of preprocessing threads");
DEFINE_bool(use_ir_optim, true, "use ir optimization");
int main(int argc, char** argv) { int main(int argc, char** argv) {
// 解析命令行参数 // Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true); google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") { if (FLAGS_model_dir == "") {
...@@ -57,17 +56,16 @@ int main(int argc, char** argv) { ...@@ -57,17 +56,16 @@ int main(int argc, char** argv) {
std::cerr << "--image or --image_list need to be defined" << std::endl; std::cerr << "--image or --image_list need to be defined" << std::endl;
return -1; return -1;
} }
// 加载模型 // Load model
PaddleX::Model model; PaddleX::Model model;
model.Init(FLAGS_model_dir, model.Init(FLAGS_model_dir,
FLAGS_use_gpu, FLAGS_use_gpu,
FLAGS_use_trt, FLAGS_use_trt,
FLAGS_gpu_id, FLAGS_gpu_id,
FLAGS_key, FLAGS_key);
FLAGS_use_ir_optim);
int imgs = 1; int imgs = 1;
std::string save_dir = "output"; std::string save_dir = "output";
// 进行预测 // Predict
if (FLAGS_image_list != "") { if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list); std::ifstream inf(FLAGS_image_list);
if (!inf) { if (!inf) {
...@@ -92,7 +90,7 @@ int main(int argc, char** argv) { ...@@ -92,7 +90,7 @@ int main(int argc, char** argv) {
im_vec[j - i] = std::move(cv::imread(image_paths[j], 1)); im_vec[j - i] = std::move(cv::imread(image_paths[j], 1));
} }
model.predict(im_vec, &results, thread_num); model.predict(im_vec, &results, thread_num);
// 输出结果目标框 // Output predicted bounding boxes
for (int j = 0; j < im_vec_size - i; ++j) { for (int j = 0; j < im_vec_size - i; ++j) {
for (int k = 0; k < results[j].boxes.size(); ++k) { for (int k = 0; k < results[j].boxes.size(); ++k) {
std::cout << "image file: " << image_paths[i + j] << ", "; std::cout << "image file: " << image_paths[i + j] << ", ";
...@@ -106,7 +104,7 @@ int main(int argc, char** argv) { ...@@ -106,7 +104,7 @@ int main(int argc, char** argv) {
<< results[j].boxes[k].coordinate[3] << ")" << std::endl; << results[j].boxes[k].coordinate[3] << ")" << std::endl;
} }
} }
// 可视化 // Visualize results
for (int j = 0; j < im_vec_size - i; ++j) { for (int j = 0; j < im_vec_size - i; ++j) {
cv::Mat vis_img = PaddleX::Visualize( cv::Mat vis_img = PaddleX::Visualize(
im_vec[j], results[j], model.labels, FLAGS_threshold); im_vec[j], results[j], model.labels, FLAGS_threshold);
...@@ -120,7 +118,7 @@ int main(int argc, char** argv) { ...@@ -120,7 +118,7 @@ int main(int argc, char** argv) {
PaddleX::DetResult result; PaddleX::DetResult result;
cv::Mat im = cv::imread(FLAGS_image, 1); cv::Mat im = cv::imread(FLAGS_image, 1);
model.predict(im, &result); model.predict(im, &result);
// 输出结果目标框 // Output predicted bounding boxes
for (int i = 0; i < result.boxes.size(); ++i) { for (int i = 0; i < result.boxes.size(); ++i) {
std::cout << "image file: " << FLAGS_image << std::endl; std::cout << "image file: " << FLAGS_image << std::endl;
std::cout << ", predict label: " << result.boxes[i].category std::cout << ", predict label: " << result.boxes[i].category
...@@ -132,7 +130,7 @@ int main(int argc, char** argv) { ...@@ -132,7 +130,7 @@ int main(int argc, char** argv) {
<< result.boxes[i].coordinate[3] << ")" << std::endl; << result.boxes[i].coordinate[3] << ")" << std::endl;
} }
// 可视化 // Visualize results
cv::Mat vis_img = cv::Mat vis_img =
PaddleX::Visualize(im, result, model.labels, FLAGS_threshold); PaddleX::Visualize(im, result, model.labels, FLAGS_threshold);
std::string save_path = std::string save_path =
......
...@@ -39,10 +39,9 @@ DEFINE_int32(batch_size, 1, "Batch size of infering"); ...@@ -39,10 +39,9 @@ DEFINE_int32(batch_size, 1, "Batch size of infering");
DEFINE_int32(thread_num, DEFINE_int32(thread_num,
omp_get_num_procs(), omp_get_num_procs(),
"Number of preprocessing threads"); "Number of preprocessing threads");
DEFINE_bool(use_ir_optim, false, "use ir optimization");
int main(int argc, char** argv) { int main(int argc, char** argv) {
// 解析命令行参数 // Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true); google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") { if (FLAGS_model_dir == "") {
...@@ -54,16 +53,15 @@ int main(int argc, char** argv) { ...@@ -54,16 +53,15 @@ int main(int argc, char** argv) {
return -1; return -1;
} }
// 加载模型 // Load model
PaddleX::Model model; PaddleX::Model model;
model.Init(FLAGS_model_dir, model.Init(FLAGS_model_dir,
FLAGS_use_gpu, FLAGS_use_gpu,
FLAGS_use_trt, FLAGS_use_trt,
FLAGS_gpu_id, FLAGS_gpu_id,
FLAGS_key, FLAGS_key);
FLAGS_use_ir_optim);
int imgs = 1; int imgs = 1;
// 进行预测 // Predict
if (FLAGS_image_list != "") { if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list); std::ifstream inf(FLAGS_image_list);
if (!inf) { if (!inf) {
...@@ -88,7 +86,7 @@ int main(int argc, char** argv) { ...@@ -88,7 +86,7 @@ int main(int argc, char** argv) {
im_vec[j - i] = std::move(cv::imread(image_paths[j], 1)); im_vec[j - i] = std::move(cv::imread(image_paths[j], 1));
} }
model.predict(im_vec, &results, thread_num); model.predict(im_vec, &results, thread_num);
// 可视化 // Visualize results
for (int j = 0; j < im_vec_size - i; ++j) { for (int j = 0; j < im_vec_size - i; ++j) {
cv::Mat vis_img = cv::Mat vis_img =
PaddleX::Visualize(im_vec[j], results[j], model.labels); PaddleX::Visualize(im_vec[j], results[j], model.labels);
...@@ -102,7 +100,7 @@ int main(int argc, char** argv) { ...@@ -102,7 +100,7 @@ int main(int argc, char** argv) {
PaddleX::SegResult result; PaddleX::SegResult result;
cv::Mat im = cv::imread(FLAGS_image, 1); cv::Mat im = cv::imread(FLAGS_image, 1);
model.predict(im, &result); model.predict(im, &result);
// 可视化 // Visualize results
cv::Mat vis_img = PaddleX::Visualize(im, result, model.labels); cv::Mat vis_img = PaddleX::Visualize(im, result, model.labels);
std::string save_path = std::string save_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image); PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image);
......
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include <omp.h>
#include <algorithm>
#include <chrono> // NOLINT
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include "include/paddlex/paddlex.h"
#include "include/paddlex/visualize.h"
#if defined(__arm__) || defined(__aarch64__)
#include <opencv2/videoio/legacy/constants_c.h>
#endif
using namespace std::chrono; // NOLINT
DEFINE_string(model_dir, "", "Path of inference model");
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
DEFINE_bool(use_trt, false, "Infering with TensorRT");
DEFINE_int32(gpu_id, 0, "GPU card id");
DEFINE_string(key, "", "key of encryption");
DEFINE_bool(use_camera, false, "Infering with Camera");
DEFINE_int32(camera_id, 0, "Camera id");
DEFINE_string(video_path, "", "Path of input video");
DEFINE_bool(show_result, false, "show the result of each frame with a window");
DEFINE_bool(save_result, true, "save the result of each frame to a video");
DEFINE_string(save_dir, "output", "Path to save visualized image");
int main(int argc, char** argv) {
// Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
std::cerr << "--model_dir need to be defined" << std::endl;
return -1;
}
if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
return -1;
}
// Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
FLAGS_key);
// Open video
cv::VideoCapture capture;
if (FLAGS_use_camera) {
capture.open(FLAGS_camera_id);
if (!capture.isOpened()) {
std::cout << "Can not open the camera "
<< FLAGS_camera_id << "."
<< std::endl;
return -1;
}
} else {
capture.open(FLAGS_video_path);
if (!capture.isOpened()) {
std::cout << "Can not open the video "
<< FLAGS_video_path << "."
<< std::endl;
return -1;
}
}
// Create a VideoWriter
cv::VideoWriter video_out;
std::string video_out_path;
if (FLAGS_save_result) {
// Get video information: resolution, fps
int video_width = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_WIDTH));
int video_height = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
int video_fps = static_cast<int>(capture.get(CV_CAP_PROP_FPS));
int video_fourcc;
if (FLAGS_use_camera) {
video_fourcc = 828601953;
} else {
video_fourcc = static_cast<int>(capture.get(CV_CAP_PROP_FOURCC));
}
if (FLAGS_use_camera) {
time_t now = time(0);
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir,
std::to_string(now) + ".mp4");
} else {
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
}
video_out.open(video_out_path.c_str(),
video_fourcc,
video_fps,
cv::Size(video_width, video_height),
true);
if (!video_out.isOpened()) {
std::cout << "Create video writer failed!" << std::endl;
return -1;
}
}
PaddleX::ClsResult result;
cv::Mat frame;
int key;
while (capture.read(frame)) {
if (FLAGS_show_result || FLAGS_use_camera) {
key = cv::waitKey(1);
// When pressing `ESC`, then exit program and result video is saved
if (key == 27) {
break;
}
} else if (frame.empty()) {
break;
}
// Begin to predict
model.predict(frame, &result);
// Visualize results
cv::Mat vis_img = frame.clone();
auto colormap = PaddleX::GenerateColorMap(model.labels.size());
int c1 = colormap[3 * result.category_id + 0];
int c2 = colormap[3 * result.category_id + 1];
int c3 = colormap[3 * result.category_id + 2];
cv::Scalar text_color = cv::Scalar(c1, c2, c3);
std::string text = result.category;
text += std::to_string(static_cast<int>(result.score * 100)) + "%";
int font_face = cv::FONT_HERSHEY_SIMPLEX;
double font_scale = 0.5f;
float thickness = 0.5;
cv::Size text_size =
cv::getTextSize(text, font_face, font_scale, thickness, nullptr);
cv::Point origin;
origin.x = frame.cols / 2;
origin.y = frame.rows / 2;
cv::Rect text_back = cv::Rect(origin.x,
origin.y - text_size.height,
text_size.width,
text_size.height);
cv::rectangle(vis_img, text_back, text_color, -1);
cv::putText(vis_img,
text,
origin,
font_face,
font_scale,
cv::Scalar(255, 255, 255),
thickness);
if (FLAGS_show_result || FLAGS_use_camera) {
cv::imshow("video_classifier", vis_img);
}
if (FLAGS_save_result) {
video_out.write(vis_img);
}
std::cout << "Predict label: " << result.category
<< ", label_id:" << result.category_id
<< ", score: " << result.score << std::endl;
}
capture.release();
if (FLAGS_save_result) {
video_out.release();
std::cout << "Visualized output saved as " << video_out_path << std::endl;
}
if (FLAGS_show_result || FLAGS_use_camera) {
cv::destroyAllWindows();
}
return 0;
}
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include <omp.h>
#include <algorithm>
#include <chrono> // NOLINT
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include "include/paddlex/paddlex.h"
#include "include/paddlex/visualize.h"
#if defined(__arm__) || defined(__aarch64__)
#include <opencv2/videoio/legacy/constants_c.h>
#endif
using namespace std::chrono; // NOLINT
DEFINE_string(model_dir, "", "Path of inference model");
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
DEFINE_bool(use_trt, false, "Infering with TensorRT");
DEFINE_int32(gpu_id, 0, "GPU card id");
DEFINE_bool(use_camera, false, "Infering with Camera");
DEFINE_int32(camera_id, 0, "Camera id");
DEFINE_string(video_path, "", "Path of input video");
DEFINE_bool(show_result, false, "show the result of each frame with a window");
DEFINE_bool(save_result, true, "save the result of each frame to a video");
DEFINE_string(key, "", "key of encryption");
DEFINE_string(save_dir, "output", "Path to save visualized image");
DEFINE_double(threshold,
0.5,
"The minimum scores of target boxes which are shown");
int main(int argc, char** argv) {
// Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
std::cerr << "--model_dir need to be defined" << std::endl;
return -1;
}
if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
return -1;
}
// Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
FLAGS_key);
// Open video
cv::VideoCapture capture;
if (FLAGS_use_camera) {
capture.open(FLAGS_camera_id);
if (!capture.isOpened()) {
std::cout << "Can not open the camera "
<< FLAGS_camera_id << "."
<< std::endl;
return -1;
}
} else {
capture.open(FLAGS_video_path);
if (!capture.isOpened()) {
std::cout << "Can not open the video "
<< FLAGS_video_path << "."
<< std::endl;
return -1;
}
}
// Create a VideoWriter
cv::VideoWriter video_out;
std::string video_out_path;
if (FLAGS_save_result) {
// Get video information: resolution, fps
int video_width = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_WIDTH));
int video_height = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
int video_fps = static_cast<int>(capture.get(CV_CAP_PROP_FPS));
int video_fourcc;
if (FLAGS_use_camera) {
video_fourcc = 828601953;
} else {
video_fourcc = static_cast<int>(capture.get(CV_CAP_PROP_FOURCC));
}
if (FLAGS_use_camera) {
time_t now = time(0);
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir,
std::to_string(now) + ".mp4");
} else {
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
}
video_out.open(video_out_path.c_str(),
video_fourcc,
video_fps,
cv::Size(video_width, video_height),
true);
if (!video_out.isOpened()) {
std::cout << "Create video writer failed!" << std::endl;
return -1;
}
}
PaddleX::DetResult result;
cv::Mat frame;
int key;
while (capture.read(frame)) {
if (FLAGS_show_result || FLAGS_use_camera) {
key = cv::waitKey(1);
// When pressing `ESC`, then exit program and result video is saved
if (key == 27) {
break;
}
} else if (frame.empty()) {
break;
}
// Begin to predict
model.predict(frame, &result);
// Visualize results
cv::Mat vis_img =
PaddleX::Visualize(frame, result, model.labels, FLAGS_threshold);
if (FLAGS_show_result || FLAGS_use_camera) {
cv::imshow("video_detector", vis_img);
}
if (FLAGS_save_result) {
video_out.write(vis_img);
}
result.clear();
}
capture.release();
if (FLAGS_save_result) {
std::cout << "Visualized output saved as " << video_out_path << std::endl;
video_out.release();
}
if (FLAGS_show_result || FLAGS_use_camera) {
cv::destroyAllWindows();
}
return 0;
}
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include <omp.h>
#include <algorithm>
#include <chrono> // NOLINT
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include <ctime>
#include "include/paddlex/paddlex.h"
#include "include/paddlex/visualize.h"
#if defined(__arm__) || defined(__aarch64__)
#include <opencv2/videoio/legacy/constants_c.h>
#endif
using namespace std::chrono; // NOLINT
DEFINE_string(model_dir, "", "Path of inference model");
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
DEFINE_bool(use_trt, false, "Infering with TensorRT");
DEFINE_int32(gpu_id, 0, "GPU card id");
DEFINE_string(key, "", "key of encryption");
DEFINE_bool(use_camera, false, "Infering with Camera");
DEFINE_int32(camera_id, 0, "Camera id");
DEFINE_string(video_path, "", "Path of input video");
DEFINE_bool(show_result, false, "show the result of each frame with a window");
DEFINE_bool(save_result, true, "save the result of each frame to a video");
DEFINE_string(save_dir, "output", "Path to save visualized image");
int main(int argc, char** argv) {
// Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
std::cerr << "--model_dir need to be defined" << std::endl;
return -1;
}
if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
return -1;
}
// Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
FLAGS_key);
// Open video
cv::VideoCapture capture;
if (FLAGS_use_camera) {
capture.open(FLAGS_camera_id);
if (!capture.isOpened()) {
std::cout << "Can not open the camera "
<< FLAGS_camera_id << "."
<< std::endl;
return -1;
}
} else {
capture.open(FLAGS_video_path);
if (!capture.isOpened()) {
std::cout << "Can not open the video "
<< FLAGS_video_path << "."
<< std::endl;
return -1;
}
}
// Create a VideoWriter
cv::VideoWriter video_out;
std::string video_out_path;
if (FLAGS_save_result) {
// Get video information: resolution, fps
int video_width = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_WIDTH));
int video_height = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
int video_fps = static_cast<int>(capture.get(CV_CAP_PROP_FPS));
int video_fourcc;
if (FLAGS_use_camera) {
video_fourcc = 828601953;
} else {
video_fourcc = static_cast<int>(capture.get(CV_CAP_PROP_FOURCC));
}
if (FLAGS_use_camera) {
time_t now = time(0);
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir,
std::to_string(now) + ".mp4");
} else {
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
}
video_out.open(video_out_path.c_str(),
video_fourcc,
video_fps,
cv::Size(video_width, video_height),
true);
if (!video_out.isOpened()) {
std::cout << "Create video writer failed!" << std::endl;
return -1;
}
}
PaddleX::SegResult result;
cv::Mat frame;
int key;
while (capture.read(frame)) {
if (FLAGS_show_result || FLAGS_use_camera) {
key = cv::waitKey(1);
// When pressing `ESC`, then exit program and result video is saved
if (key == 27) {
break;
}
} else if (frame.empty()) {
break;
}
// Begin to predict
model.predict(frame, &result);
// Visualize results
cv::Mat vis_img = PaddleX::Visualize(frame, result, model.labels);
if (FLAGS_show_result || FLAGS_use_camera) {
cv::imshow("video_segmenter", vis_img);
}
if (FLAGS_save_result) {
video_out.write(vis_img);
}
result.clear();
}
capture.release();
if (FLAGS_save_result) {
video_out.release();
std::cout << "Visualized output saved as " << video_out_path << std::endl;
}
if (FLAGS_show_result || FLAGS_use_camera) {
cv::destroyAllWindows();
}
return 0;
}
...@@ -23,9 +23,9 @@ ...@@ -23,9 +23,9 @@
#else // Linux/Unix #else // Linux/Unix
#include <dirent.h> #include <dirent.h>
// #include <sys/io.h> // #include <sys/io.h>
#ifdef __arm__ // for arm #if defined(__arm__) || defined(__aarch64__) // for arm
#include <aarch64-linux-gpu/sys/stat.h> #include <aarch64-linux-gnu/sys/stat.h>
#include <aarch64-linux-gpu/sys/types.h> #include <aarch64-linux-gnu/sys/types.h>
#else #else
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/types.h> #include <sys/types.h>
......
...@@ -7,12 +7,12 @@ if [ ! -d "./paddlex-encryption" ]; then ...@@ -7,12 +7,12 @@ if [ ! -d "./paddlex-encryption" ]; then
fi fi
# download pre-compiled opencv lib # download pre-compiled opencv lib
OPENCV_URL=https://paddleseg.bj.bcebos.com/deploy/docker/opencv3gcc4.8.tar.bz2 OPENCV_URL=https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2
if [ ! -d "./deps/opencv3gcc4.8" ]; then if [ ! -d "./deps/opencv3.4.6gcc4.8ffmpeg/" ]; then
mkdir -p deps mkdir -p deps
cd deps cd deps
wget -c ${OPENCV_URL} wget -c ${OPENCV_URL}
tar xvfj opencv3gcc4.8.tar.bz2 tar xvfj opencv3.4.6gcc4.8ffmpeg.tar.gz2
rm -rf opencv3gcc4.8.tar.bz2 rm -rf opencv3.4.6gcc4.8ffmpeg.tar.gz2
cd .. cd ..
fi fi
...@@ -24,7 +24,7 @@ ENCRYPTION_DIR=$(pwd)/paddlex-encryption ...@@ -24,7 +24,7 @@ ENCRYPTION_DIR=$(pwd)/paddlex-encryption
# OPENCV 路径, 如果使用自带预编译版本可不修改 # OPENCV 路径, 如果使用自带预编译版本可不修改
sh $(pwd)/scripts/bootstrap.sh # 下载预编译版本的opencv sh $(pwd)/scripts/bootstrap.sh # 下载预编译版本的opencv
OPENCV_DIR=$(pwd)/deps/opencv3gcc4.8/ OPENCV_DIR=$(pwd)/deps/opencv3.4.6gcc4.8ffmpeg/
# 以下无需改动 # 以下无需改动
rm -rf build rm -rf build
...@@ -42,4 +42,4 @@ cmake .. \ ...@@ -42,4 +42,4 @@ cmake .. \
-DCUDNN_LIB=${CUDNN_LIB} \ -DCUDNN_LIB=${CUDNN_LIB} \
-DENCRYPTION_DIR=${ENCRYPTION_DIR} \ -DENCRYPTION_DIR=${ENCRYPTION_DIR} \
-DOPENCV_DIR=${OPENCV_DIR} -DOPENCV_DIR=${OPENCV_DIR}
make make -j16
# download pre-compiled opencv lib
OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/tools/opencv3_aarch.tgz
if [ ! -d "./deps/opencv3" ]; then
mkdir -p deps
cd deps
wget -c ${OPENCV_URL}
tar xvfz opencv3_aarch.tgz
rm -rf opencv3_aarch.tgz
cd ..
fi
...@@ -14,14 +14,7 @@ WITH_STATIC_LIB=OFF ...@@ -14,14 +14,7 @@ WITH_STATIC_LIB=OFF
# CUDA 的 lib 路径 # CUDA 的 lib 路径
CUDA_LIB=/usr/local/cuda/lib64 CUDA_LIB=/usr/local/cuda/lib64
# CUDNN 的 lib 路径 # CUDNN 的 lib 路径
CUDNN_LIB=/usr/local/cuda/lib64 CUDNN_LIB=/usr/lib/aarch64-linux-gnu
# 是否加载加密后的模型
WITH_ENCRYPTION=OFF
# OPENCV 路径, 如果使用自带预编译版本可不修改
sh $(pwd)/scripts/jetson_bootstrap.sh # 下载预编译版本的opencv
OPENCV_DIR=$(pwd)/deps/opencv3
# 以下无需改动 # 以下无需改动
rm -rf build rm -rf build
...@@ -31,12 +24,9 @@ cmake .. \ ...@@ -31,12 +24,9 @@ cmake .. \
-DWITH_GPU=${WITH_GPU} \ -DWITH_GPU=${WITH_GPU} \
-DWITH_MKL=${WITH_MKL} \ -DWITH_MKL=${WITH_MKL} \
-DWITH_TENSORRT=${WITH_TENSORRT} \ -DWITH_TENSORRT=${WITH_TENSORRT} \
-DWITH_ENCRYPTION=${WITH_ENCRYPTION} \
-DTENSORRT_DIR=${TENSORRT_DIR} \ -DTENSORRT_DIR=${TENSORRT_DIR} \
-DPADDLE_DIR=${PADDLE_DIR} \ -DPADDLE_DIR=${PADDLE_DIR} \
-DWITH_STATIC_LIB=${WITH_STATIC_LIB} \ -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
-DCUDA_LIB=${CUDA_LIB} \ -DCUDA_LIB=${CUDA_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \ -DCUDNN_LIB=${CUDNN_LIB}
-DENCRYPTION_DIR=${ENCRYPTION_DIR} \
-DOPENCV_DIR=${OPENCV_DIR}
make make
...@@ -65,7 +65,11 @@ void Model::create_predictor(const std::string& model_dir, ...@@ -65,7 +65,11 @@ void Model::create_predictor(const std::string& model_dir,
config.SwitchUseFeedFetchOps(false); config.SwitchUseFeedFetchOps(false);
config.SwitchSpecifyInputNames(true); config.SwitchSpecifyInputNames(true);
// 开启图优化 // 开启图优化
#if defined(__arm__) || defined(__aarch64__)
config.SwitchIrOptim(false);
#else
config.SwitchIrOptim(use_ir_optim); config.SwitchIrOptim(use_ir_optim);
#endif
// 开启内存优化 // 开启内存优化
config.EnableMemoryOptim(); config.EnableMemoryOptim();
if (use_trt) { if (use_trt) {
......
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
#include <iostream> #include <iostream>
#include <string> #include <string>
#include <vector> #include <vector>
#include <math.h>
#include "include/paddlex/transforms.h" #include "include/paddlex/transforms.h"
...@@ -60,8 +61,8 @@ bool ResizeByShort::Run(cv::Mat* im, ImageBlob* data) { ...@@ -60,8 +61,8 @@ bool ResizeByShort::Run(cv::Mat* im, ImageBlob* data) {
data->reshape_order_.push_back("resize"); data->reshape_order_.push_back("resize");
float scale = GenerateScale(*im); float scale = GenerateScale(*im);
int width = static_cast<int>(scale * im->cols); int width = static_cast<int>(round(scale * im->cols));
int height = static_cast<int>(scale * im->rows); int height = static_cast<int>(round(scale * im->rows));
cv::resize(*im, *im, cv::Size(width, height), 0, 0, cv::INTER_LINEAR); cv::resize(*im, *im, cv::Size(width, height), 0, 0, cv::INTER_LINEAR);
data->new_im_size_[0] = im->rows; data->new_im_size_[0] = im->rows;
......
...@@ -23,6 +23,7 @@ import org.opencv.core.Scalar; ...@@ -23,6 +23,7 @@ import org.opencv.core.Scalar;
import org.opencv.core.Size; import org.opencv.core.Size;
import org.opencv.imgproc.Imgproc; import org.opencv.imgproc.Imgproc;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap; import java.util.HashMap;
import java.util.List; import java.util.List;
...@@ -101,6 +102,15 @@ public class Transforms { ...@@ -101,6 +102,15 @@ public class Transforms {
if (info.containsKey("coarsest_stride")) { if (info.containsKey("coarsest_stride")) {
padding.coarsest_stride = (int) info.get("coarsest_stride"); padding.coarsest_stride = (int) info.get("coarsest_stride");
} }
if (info.containsKey("im_padding_value")) {
List<Double> im_padding_value = (List<Double>) info.get("im_padding_value");
if (im_padding_value.size()!=3){
Log.e(TAG, "len of im_padding_value in padding must == 3.");
}
for (int k =0; i<im_padding_value.size(); i++){
padding.paddding_value[k] = im_padding_value.get(k);
}
}
if (info.containsKey("target_size")) { if (info.containsKey("target_size")) {
if (info.get("target_size") instanceof Integer) { if (info.get("target_size") instanceof Integer) {
padding.width = (int) info.get("target_size"); padding.width = (int) info.get("target_size");
...@@ -124,7 +134,7 @@ public class Transforms { ...@@ -124,7 +134,7 @@ public class Transforms {
if(transformsMode.equalsIgnoreCase("RGB")){ if(transformsMode.equalsIgnoreCase("RGB")){
Imgproc.cvtColor(inputMat, inputMat, Imgproc.COLOR_BGR2RGB); Imgproc.cvtColor(inputMat, inputMat, Imgproc.COLOR_BGR2RGB);
}else if(!transformsMode.equalsIgnoreCase("BGR")){ }else if(!transformsMode.equalsIgnoreCase("BGR")){
Log.e(TAG, "transformsMode only support RGB or BGR"); Log.e(TAG, "transformsMode only support RGB or BGR.");
} }
inputMat.convertTo(inputMat, CvType.CV_32FC(3)); inputMat.convertTo(inputMat, CvType.CV_32FC(3));
...@@ -136,16 +146,15 @@ public class Transforms { ...@@ -136,16 +146,15 @@ public class Transforms {
int h = inputMat.height(); int h = inputMat.height();
int c = inputMat.channels(); int c = inputMat.channels();
imageBlob.setImageData(new float[w * h * c]); imageBlob.setImageData(new float[w * h * c]);
int[] channelStride = new int[]{w * h, w * h * 2};
for (int y = 0; y < h; y++) { Mat singleChannelMat = new Mat(h, w, CvType.CV_32FC(1));
for (int x = 0; float[] singleChannelImageData = new float[w * h];
x < w; x++) { for (int i = 0; i < c; i++) {
double[] color = inputMat.get(y, x); Core.extractChannel(inputMat, singleChannelMat, i);
imageBlob.getImageData()[y * w + x] = (float) (color[0]); singleChannelMat.get(0, 0, singleChannelImageData);
imageBlob.getImageData()[y * w + x + channelStride[0]] = (float) (color[1]); System.arraycopy(singleChannelImageData ,0, imageBlob.getImageData(),i*w*h, w*h);
imageBlob.getImageData()[y * w + x + channelStride[1]] = (float) (color[2]);
}
} }
return imageBlob; return imageBlob;
} }
...@@ -248,6 +257,7 @@ public class Transforms { ...@@ -248,6 +257,7 @@ public class Transforms {
private double width; private double width;
private double height; private double height;
private double coarsest_stride; private double coarsest_stride;
private double[] paddding_value = {0.0, 0.0, 0.0};
public Mat run(Mat inputMat, ImageBlob imageBlob) { public Mat run(Mat inputMat, ImageBlob imageBlob) {
int origin_w = inputMat.width(); int origin_w = inputMat.width();
...@@ -264,7 +274,7 @@ public class Transforms { ...@@ -264,7 +274,7 @@ public class Transforms {
} }
imageBlob.setNewImageSize(inputMat.height(),2); imageBlob.setNewImageSize(inputMat.height(),2);
imageBlob.setNewImageSize(inputMat.width(),3); imageBlob.setNewImageSize(inputMat.width(),3);
Core.copyMakeBorder(inputMat, inputMat, 0, (int)padding_h, 0, (int)padding_w, Core.BORDER_CONSTANT, new Scalar(0)); Core.copyMakeBorder(inputMat, inputMat, 0, (int)padding_h, 0, (int)padding_w, Core.BORDER_CONSTANT, new Scalar(paddding_value));
return inputMat; return inputMat;
} }
} }
......
...@@ -31,8 +31,11 @@ import org.opencv.core.Scalar; ...@@ -31,8 +31,11 @@ import org.opencv.core.Scalar;
import org.opencv.core.Size; import org.opencv.core.Size;
import org.opencv.imgproc.Imgproc; import org.opencv.imgproc.Imgproc;
import java.nio.ByteBuffer;
import java.nio.FloatBuffer;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Arrays; import java.util.Arrays;
import java.util.Date;
import java.util.List; import java.util.List;
import java.util.ListIterator; import java.util.ListIterator;
import java.util.Map; import java.util.Map;
...@@ -118,13 +121,13 @@ public class Visualize { ...@@ -118,13 +121,13 @@ public class Visualize {
public Mat draw(SegResult result, Mat visualizeMat, ImageBlob imageBlob, int cutoutClass) { public Mat draw(SegResult result, Mat visualizeMat, ImageBlob imageBlob, int cutoutClass) {
int new_h = (int)imageBlob.getNewImageSize()[2]; int new_h = (int)imageBlob.getNewImageSize()[2];
int new_w = (int)imageBlob.getNewImageSize()[3]; int new_w = (int)imageBlob.getNewImageSize()[3];
Mat mask = new Mat(new_h, new_w, CvType.CV_8UC(1)); Mat mask = new Mat(new_h, new_w, CvType.CV_32FC(1));
float[] scoreData = new float[new_h*new_w];
System.arraycopy(result.getMask().getScoreData() ,cutoutClass*new_h*new_w, scoreData ,0, new_h*new_w);
mask.put(0,0, scoreData);
Core.multiply(mask, new Scalar(255), mask);
mask.convertTo(mask,CvType.CV_8UC(1));
for (int h = 0; h < new_h; h++) {
for (int w = 0; w < new_w; w++){
mask.put(h , w, (1-result.getMask().getScoreData()[cutoutClass + h * new_h + w]) * 255);
}
}
ListIterator<Map.Entry<String, int[]>> reverseReshapeInfo = new ArrayList<Map.Entry<String, int[]>>(imageBlob.getReshapeInfo().entrySet()).listIterator(imageBlob.getReshapeInfo().size()); ListIterator<Map.Entry<String, int[]>> reverseReshapeInfo = new ArrayList<Map.Entry<String, int[]>>(imageBlob.getReshapeInfo().entrySet()).listIterator(imageBlob.getReshapeInfo().size());
while (reverseReshapeInfo.hasPrevious()) { while (reverseReshapeInfo.hasPrevious()) {
Map.Entry<String, int[]> entry = reverseReshapeInfo.previous(); Map.Entry<String, int[]> entry = reverseReshapeInfo.previous();
...@@ -135,10 +138,7 @@ public class Visualize { ...@@ -135,10 +138,7 @@ public class Visualize {
Size sz = new Size(entry.getValue()[0], entry.getValue()[1]); Size sz = new Size(entry.getValue()[0], entry.getValue()[1]);
Imgproc.resize(mask, mask, sz,0,0,Imgproc.INTER_LINEAR); Imgproc.resize(mask, mask, sz,0,0,Imgproc.INTER_LINEAR);
} }
Log.i(TAG, "postprocess operator: " + entry.getKey());
Log.i(TAG, "shape:: " + String.valueOf(mask.width()) + ","+ String.valueOf(mask.height()));
} }
Mat dst = new Mat(); Mat dst = new Mat();
List<Mat> listMat = Arrays.asList(visualizeMat, mask); List<Mat> listMat = Arrays.asList(visualizeMat, mask);
Core.merge(listMat, dst); Core.merge(listMat, dst);
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
......
# PaddleX文档 # PaddleX文档
PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/latest/index.html)进行查阅。 PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/develop/index.html)进行查阅。
## 编译文档 ## 编译文档
在本目录下按如下步骤进行文档编译 在本目录下按如下步骤进行文档编译
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
``` ```
paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=8, parallel_method='process', shuffle=False) paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=8, parallel_method='process', shuffle=False)
``` ```
读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html) 读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../data/format/classification.md)
示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/image_classification/mobilenetv2.py) 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/image_classification/mobilenetv2.py)
...@@ -26,7 +26,7 @@ paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_ ...@@ -26,7 +26,7 @@ paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_
paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='process', shuffle=False) paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='process', shuffle=False)
``` ```
> 读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html) > 读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../data/format/detection.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py) > 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py)
...@@ -47,7 +47,7 @@ paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, ...@@ -47,7 +47,7 @@ paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None,
paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False) paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False)
``` ```
> 读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html) > 读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../data/format/instance_segmentation.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py) > 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py)
...@@ -67,7 +67,7 @@ paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers= ...@@ -67,7 +67,7 @@ paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers=
paddlex.datasets.SegDataset(data_dir, file_list, label_list, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False) paddlex.datasets.SegDataset(data_dir, file_list, label_list, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False)
``` ```
> 读取语义分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html) > 读取语义分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../data/format/segmentation.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py) > 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py)
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
图像分类、目标检测、实例分割、语义分割统一的预测器,实现高性能预测。 图像分类、目标检测、实例分割、语义分割统一的预测器,实现高性能预测。
``` ```
paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_trt=False, use_glog=False, memory_optimize=True) paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, mkl_thread_num=4, use_trt=False, use_glog=False, memory_optimize=True)
``` ```
**参数** **参数**
...@@ -16,6 +16,7 @@ paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_ ...@@ -16,6 +16,7 @@ paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_
> * **use_gpu** (bool): 是否使用GPU进行预测。 > * **use_gpu** (bool): 是否使用GPU进行预测。
> * **gpu_id** (int): 使用的GPU序列号。 > * **gpu_id** (int): 使用的GPU序列号。
> * **use_mkl** (bool): 是否使用mkldnn加速库。 > * **use_mkl** (bool): 是否使用mkldnn加速库。
> * **mkl_thread_num** (int): 使用mkldnn加速库时的线程数,默认为4
> * **use_trt** (boll): 是否使用TensorRT预测引擎。 > * **use_trt** (boll): 是否使用TensorRT预测引擎。
> * **use_glog** (bool): 是否打印中间日志。 > * **use_glog** (bool): 是否打印中间日志。
> * **memory_optimize** (bool): 是否优化内存使用。 > * **memory_optimize** (bool): 是否优化内存使用。
...@@ -40,7 +41,7 @@ predict(image, topk=1) ...@@ -40,7 +41,7 @@ predict(image, topk=1)
> **参数** > **参数**
> >
> > * **image** (str|np.ndarray): 待预测的图片路径或numpy数组(HWC排列,BGR格式)。 > > * **image** (str|np.ndarray): 待预测的图片路径或numpy数组(HWC排列,BGR格式)。
> > * **topk** (int): 图像分类时使用的参数,表示预测前topk个可能的分类 > > * **topk** (int): 图像分类时使用的参数,表示预测前topk个可能的分类
### batch_predict 接口 ### batch_predict 接口
``` ```
......
# Object Detection # Object Detection
## paddlex.det.PPYOLO
```python
paddlex.det.PPYOLO(num_classes=80, backbone='ResNet50_vd_ssld', with_dcn_v2=True, anchors=None, anchor_masks=None, use_coord_conv=True, use_iou_aware=True, use_spp=True, use_drop_block=True, scale_x_y=1.05, ignore_threshold=0.7, label_smooth=False, use_iou_loss=True, use_matrix_nms=True, nms_score_threshold=0.01, nms_topk=1000, nms_keep_topk=100, nms_iou_threshold=0.45, train_random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608])
```
> 构建PPYOLO检测器。**注意在PPYOLO,num_classes不需要包含背景类,如目标包括human、dog两种,则num_classes设为2即可,这里与FasterRCNN/MaskRCNN有差别**
> **参数**
>
> > - **num_classes** (int): 类别数。默认为80。
> > - **backbone** (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd_ssld']。默认为'ResNet50_vd_ssld'。
> > - **with_dcn_v2** (bool): Backbone是否使用DCNv2结构。默认为True。
> > - **anchors** (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
> > [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
> [59, 119], [116, 90], [156, 198], [373, 326]]。
> > - **anchor_masks** (list|tuple): 在计算PPYOLO损失时,使用anchor的mask索引,为None时表示使用默认值
> > [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
> > - **use_coord_conv** (bool): 是否使用CoordConv。默认值为True。
> > - **use_iou_aware** (bool): 是否使用IoU Aware分支。默认值为True。
> > - **use_spp** (bool): 是否使用Spatial Pyramid Pooling结构。默认值为True。
> > - **use_drop_block** (bool): 是否使用Drop Block。默认值为True。
> > - **scale_x_y** (float): 调整中心点位置时的系数因子。默认值为1.05。
> > - **use_iou_loss** (bool): 是否使用IoU loss。默认值为True。
> > - **use_matrix_nms** (bool): 是否使用Matrix NMS。默认值为True。
> > - **ignore_threshold** (float): 在计算PPYOLO损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
> > - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
> > - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
> > - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
> > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
> > - **label_smooth** (bool): 是否使用label smooth。默认值为False。
> > - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
### train
```python
train(self, num_epochs, train_dataset, train_batch_size=8, eval_dataset=None, save_interval_epochs=20, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/8000, warmup_steps=1000, warmup_start_lr=0.0, lr_decay_epochs=[213, 240], lr_decay_gamma=0.1, metric=None, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05, early_stop=False, early_stop_patience=5, resume_checkpoint=None, use_ema=True, ema_decay=0.9998)
```
> PPYOLO模型的训练接口,函数内置了`piecewise`学习率衰减策略和`momentum`优化器。
> **参数**
>
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认值为8。
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为20。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认值为'output'。
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为字符串'COCO',则自动下载在COCO数据集上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
> > - **learning_rate** (float): 默认优化器的学习率。默认为1.0/8000。
> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为1000。
> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为0.0。
> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在PascalVOC数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
> > - **early_stop** (bool): 是否使用提前终止训练策略。默认值为False。
> > - **early_stop_patience** (int): 当使用提前终止训练策略时,如果验证集精度在`early_stop_patience`个epoch内连续下降或持平,则终止训练。默认值为5。
> > - **resume_checkpoint** (str): 恢复训练时指定上次训练保存的模型路径。若为None,则不会恢复训练。默认值为None。
> > - **use_ema** (bool): 是否使用指数衰减计算参数的滑动平均值。默认值为True。
> > - **ema_decay** (float): 指数衰减率。默认值为0.9998。
### evaluate
```python
evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
```
> PPYOLO模型的评估接口,模型评估后会返回在验证集上的指标`box_map`(metric指定为'VOC'时)或`box_mmap`(metric指定为`COCO`时)。
> **参数**
>
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **batch_size** (int): 验证数据批大小。默认为1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC';如为COCODetection,则`metric`为'COCO'默认为None, 如为EasyData类型数据集,同时也会使用'VOC'。
> > - **return_details** (bool): 是否返回详细信息。默认值为False。
> >
> **返回值**
>
> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
### predict
```python
predict(self, img_file, transforms=None)
```
> PPYOLO模型预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`predict`接口时,用户需要再重新定义`test_transforms`传入给`predict`接口
> **参数**
>
> > - **img_file** (str|np.ndarray): 预测图像路径或numpy数组(HWC排列,BGR格式)。
> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
>
> **返回值**
>
> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key包括'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
### batch_predict
```python
batch_predict(self, img_file_list, transforms=None, thread_num=2)
```
> PPYOLO模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义`test_transforms`传入给`batch_predict`接口
> **参数**
>
> > - **img_file_list** (str|np.ndarray): 对列表(或元组)中的图像同时进行预测,列表中的元素是预测图像路径或numpy数组(HWC排列,BGR格式)。
> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
> > - **thread_num** (int): 并发执行各图像预处理时的线程数。
>
> **返回值**
>
> > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,key包括'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
## paddlex.det.YOLOv3 ## paddlex.det.YOLOv3
```python ```python
...@@ -21,7 +145,7 @@ paddlex.det.YOLOv3(num_classes=80, backbone='MobileNetV1', anchors=None, anchor_ ...@@ -21,7 +145,7 @@ paddlex.det.YOLOv3(num_classes=80, backbone='MobileNetV1', anchors=None, anchor_
> > - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。 > > - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
> > - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。 > > - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
> > - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。 > > - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
> > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。 > > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IoU的阈值。默认为0.45。
> > - **label_smooth** (bool): 是否使用label smooth。默认值为False。 > > - **label_smooth** (bool): 是否使用label smooth。默认值为False。
> > - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。 > > - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
......
...@@ -101,4 +101,4 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2) ...@@ -101,4 +101,4 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2)
> >
> **返回值** > **返回值**
> >
> > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,key'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度。其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。Mask信息为原图大小的二值图,1表示像素点属于预测类别,0表示像素点是背景。 > > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,包含关键字:'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度。其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。Mask信息为原图大小的二值图,1表示像素点属于预测类别,0表示像素点是背景。
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
## paddlex.seg.DeepLabv3p ## paddlex.seg.DeepLabv3p
```python ```python
paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255) paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None)
``` ```
...@@ -12,7 +12,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride ...@@ -12,7 +12,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride
> **参数** > **参数**
> > - **num_classes** (int): 类别数。 > > - **num_classes** (int): 类别数。
> > - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0'],默认值为'MobileNetV2_x1.0'。 > > - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld'],默认值为'MobileNetV2_x1.0'。
> > - **output_stride** (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。 > > - **output_stride** (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
> > - **aspp_with_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。 > > - **aspp_with_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
> > - **decoder_use_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。 > > - **decoder_use_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
...@@ -22,6 +22,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride ...@@ -22,6 +22,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride
> > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,当`use_bce_loss`和`use_dice_loss`都为False时,使用交叉熵损失函数。默认False。 > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,当`use_bce_loss`和`use_dice_loss`都为False时,使用交叉熵损失函数。默认False。
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
> > - **pooling_crop_size** (int):当backbone为`MobileNetV3_large_x1_0_ssld`时,需设置为训练过程中模型输入大小,格式为[W, H]。例如模型输入大小为[512, 512], 则`pooling_crop_size`应该设置为[512, 512]。在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用`avg_pool`算子得到平均值。默认值None。
### train ### train
...@@ -69,7 +70,7 @@ evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False): ...@@ -69,7 +70,7 @@ evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
> **返回值** > **返回值**
> > > >
> > - **dict**: 当`return_details`为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、 > > - **dict**: 当`return_details`为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。 > > 'category_acc'和'kappa',分别表示平均IoU、各类别IoU、平均准确率、各类别准确率和kappa系数。
> > - **tuple** (metrics, eval_details):当`return_details`为True时,增加返回dict (eval_details), > > - **tuple** (metrics, eval_details):当`return_details`为True时,增加返回dict (eval_details),
> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。 > > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
......
...@@ -26,16 +26,16 @@ paddlex.slim.cal_params_sensitivities(model, save_file, eval_dataset, batch_size ...@@ -26,16 +26,16 @@ paddlex.slim.cal_params_sensitivities(model, save_file, eval_dataset, batch_size
``` ```
paddlex.slim.export_quant_model(model, test_dataset, batch_size=2, batch_num=10, save_dir='./quant_model', cache_dir='./temp') paddlex.slim.export_quant_model(model, test_dataset, batch_size=2, batch_num=10, save_dir='./quant_model', cache_dir='./temp')
``` ```
导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size``batch_num`。量化过程中会以数量为`batch_size` X `batch_num`的样本数据的计算结果为统计信息完成模型的量化。 导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size``batch_num`。量化过程中会以数量为`batch_size` * `batch_num`的样本数据的计算结果为统计信息完成模型的量化。
**参数** **参数**
* **model**(paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。 * **model**(paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
* **test_dataset**(paddlex.dataset): 测试数据集 * **test_dataset**(paddlex.dataset): 测试数据集
* **batch_size**(int): 进行前向计算时的批数据大小 * **batch_size**(int): 进行前向计算时的批数据大小
* **batch_num**(int): 进行向前计算时批数据数量 * **batch_num**(int): 进行向前计算时批数据数量
* **save_dir**(str): 量化后模型的保存目录 * **save_dir**(str): 量化后模型的保存目录
* **cache_dir**(str): 量化过程中的统计数据临时存储目录 * **cache_dir**(str): 量化过程中的统计数据临时存储目录
**使用示例** **使用示例**
......
...@@ -10,11 +10,11 @@ PaddleX对于图像分类、目标检测、实例分割和语义分割内置了 ...@@ -10,11 +10,11 @@ PaddleX对于图像分类、目标检测、实例分割和语义分割内置了
| :------- | :------------| | :------- | :------------|
| 图像分类 | [RandomCrop](cls_transforms.html#randomcrop)[RandomHorizontalFlip](cls_transforms.html#randomhorizontalflip)[RandomVerticalFlip](cls_transforms.html#randomverticalflip)<br> [RandomRotate](cls_transforms.html#randomratate)[RandomDistort](cls_transforms.html#randomdistort) | | 图像分类 | [RandomCrop](cls_transforms.html#randomcrop)[RandomHorizontalFlip](cls_transforms.html#randomhorizontalflip)[RandomVerticalFlip](cls_transforms.html#randomverticalflip)<br> [RandomRotate](cls_transforms.html#randomratate)[RandomDistort](cls_transforms.html#randomdistort) |
|目标检测<br>实例分割| [RandomHorizontalFlip](det_transforms.html#randomhorizontalflip)[RandomDistort](det_transforms.html#randomdistort)[RandomCrop](det_transforms.html#randomcrop)<br> [MixupImage](det_transforms.html#mixupimage)(仅支持YOLOv3模型)[RandomExpand](det_transforms.html#randomexpand) | |目标检测<br>实例分割| [RandomHorizontalFlip](det_transforms.html#randomhorizontalflip)[RandomDistort](det_transforms.html#randomdistort)[RandomCrop](det_transforms.html#randomcrop)<br> [MixupImage](det_transforms.html#mixupimage)(仅支持YOLOv3模型)[RandomExpand](det_transforms.html#randomexpand) |
|语义分割 | [RandomHorizontalFlip](seg_transforms.html#randomhorizontalflip)[RandomVerticalFlip](seg_transforms.html#randomverticalflip)[RandomRangeScaling](seg_transforms.html#randomrangescaling)<br> [RandomStepScaling](seg_transforms.html#randomstepscaling)[RandomPaddingCrop](seg_transforms.html#randompaddingcrop)[RandomBlur](seg_transforms.html#randomblur)<br> [RandomRotate](seg_transforms.html#randomrotate)[RandomScaleAspect](seg_transforms.html#randomscaleaspect)[RandomDistort](seg_transforms.html#randomdistort) | |语义分割 | [RandomHorizontalFlip](seg_transforms.html#randomhorizontalflip)[RandomVerticalFlip](seg_transforms.html#randomverticalflip)[ResizeRangeScaling](seg_transforms.html#resizerangescaling)<br> [ResizeStepScaling](seg_transforms.html#resizestepscaling)[RandomPaddingCrop](seg_transforms.html#randompaddingcrop)[RandomBlur](seg_transforms.html#randomblur)<br> [RandomRotate](seg_transforms.html#randomrotate)[RandomScaleAspect](seg_transforms.html#randomscaleaspect)[RandomDistort](seg_transforms.html#randomdistort) |
## imgaug增强库的支持 ## imgaug增强库的支持
PaddleX目前已适配imgaug图像增强库,用户可以直接在PaddleX构造`transforms`时,调用imgaug的方法, 如下示例 PaddleX目前已适配imgaug图像增强库,用户可以直接在PaddleX构造`transforms`时,调用imgaug的方法,如下示例,
``` ```
import paddlex as pdx import paddlex as pdx
from paddlex.cls import transforms from paddlex.cls import transforms
......
...@@ -16,7 +16,7 @@ paddlex.seg.transforms.Compose(transforms) ...@@ -16,7 +16,7 @@ paddlex.seg.transforms.Compose(transforms)
```python ```python
paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5) paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
``` ```
以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。 以一定的概率对图像进行水平翻转模型训练时的数据增强操作。
### 参数 ### 参数
* **prob** (float): 随机水平翻转的概率。默认值为0.5。 * **prob** (float): 随机水平翻转的概率。默认值为0.5。
...@@ -25,7 +25,7 @@ paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5) ...@@ -25,7 +25,7 @@ paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
```python ```python
paddlex.seg.transforms.RandomVerticalFlip(prob=0.1) paddlex.seg.transforms.RandomVerticalFlip(prob=0.1)
``` ```
以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。 以一定的概率对图像进行垂直翻转模型训练时的数据增强操作。
### 参数 ### 参数
* **prob** (float): 随机垂直翻转的概率。默认值为0.1。 * **prob** (float): 随机垂直翻转的概率。默认值为0.1。
...@@ -59,7 +59,7 @@ paddlex.seg.transforms.ResizeByLong(long_size) ...@@ -59,7 +59,7 @@ paddlex.seg.transforms.ResizeByLong(long_size)
```python ```python
paddlex.seg.transforms.ResizeRangeScaling(min_value=400, max_value=600) paddlex.seg.transforms.ResizeRangeScaling(min_value=400, max_value=600)
``` ```
对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。 对图像长边随机resize到指定范围内,短边按比例进行缩放模型训练时的数据增强操作。
### 参数 ### 参数
* **min_value** (int): 图像长边resize后的最小值。默认值400。 * **min_value** (int): 图像长边resize后的最小值。默认值400。
* **max_value** (int): 图像长边resize后的最大值。默认值600。 * **max_value** (int): 图像长边resize后的最大值。默认值600。
...@@ -124,7 +124,7 @@ paddlex.seg.transforms.RandomBlur(prob=0.1) ...@@ -124,7 +124,7 @@ paddlex.seg.transforms.RandomBlur(prob=0.1)
```python ```python
paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255) paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
``` ```
对图像进行随机旋转, 模型训练时的数据增强操作。 对图像进行随机旋转模型训练时的数据增强操作。
在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行, 在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
并对旋转后的图像和标注图像进行相应的padding。 并对旋转后的图像和标注图像进行相应的padding。
...@@ -138,7 +138,7 @@ paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 12 ...@@ -138,7 +138,7 @@ paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 12
```python ```python
paddlex.seg.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33) paddlex.seg.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33)
``` ```
裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。 裁剪并resize回原始尺寸的图像和标注图像模型训练时的数据增强操作。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。 按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
### 参数 ### 参数
......
...@@ -131,7 +131,7 @@ paddlex.transforms.visualize(dataset, ...@@ -131,7 +131,7 @@ paddlex.transforms.visualize(dataset,
对数据预处理/增强中间结果进行可视化。 对数据预处理/增强中间结果进行可视化。
可使用VisualDL查看中间结果: 可使用VisualDL查看中间结果:
1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001 1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001
2. 浏览器打开 https://0.0.0.0:8001即可, 2. 浏览器打开 https://0.0.0.0:8001 即可,
其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
### 参数 ### 参数
......
...@@ -45,6 +45,7 @@ ...@@ -45,6 +45,7 @@
|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 | |[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 |
|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 | |[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 |
|[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 | |[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 |
|[PPYOLO](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | 329.1MB | - |45.9 |
|[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 | |[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 |
|[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 | |[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 |
|[YOLOv3-MobileNetV3_large](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams)|100.7MB | 143.322 | 31.6 | |[YOLOv3-MobileNetV3_large](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams)|100.7MB | 143.322 | 31.6 |
...@@ -80,6 +81,7 @@ ...@@ -80,6 +81,7 @@
| 模型 | 模型大小 | 预测时间(毫秒) | mIoU(%) | | 模型 | 模型大小 | 预测时间(毫秒) | mIoU(%) |
|:-------|:-----------|:-------------|:----------| |:-------|:-----------|:-------------|:----------|
| [DeepLabv3_MobileNetV3_large_x1_0_ssld](https://paddleseg.bj.bcebos.com/models/deeplabv3p_mobilenetv3_large_cityscapes.tar.gz) | 9.3MB | - | 73.28 |
| [DeepLabv3_MobileNetv2_x1.0](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) | 14.7MB | - | 69.8 | | [DeepLabv3_MobileNetv2_x1.0](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) | 14.7MB | - | 69.8 |
| [DeepLabv3_Xception65](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz) | 329.3MB | - | 79.3 | | [DeepLabv3_Xception65](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz) | 329.3MB | - | 79.3 |
| [HRNet_W18](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 77.3MB | | 79.36 | | [HRNet_W18](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 77.3MB | | 79.36 |
......
...@@ -25,12 +25,12 @@ MyDataset/ # 图像分类数据集根目录 ...@@ -25,12 +25,12 @@ MyDataset/ # 图像分类数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载图像分类示例数据集](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz) **为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载图像分类示例数据集](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz)
<!--
> 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中split指标训练集的比例,剩余的比例用于验证集。 > 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中val_value表示验证集的比例,test_value表示测试集的比例(可以为0),剩余的比例用于训练集。
> ``` > ```
> paddlex --split_dataset --from ImageNet --split 0.8 --save_dir ./splited_dataset_dir > paddlex --split_dataset --format ImageNet --dataset_dir MyDataset --val_value 0.2 --test_value 0.1
> ``` > ```
-->
**labels.txt** **labels.txt**
......
...@@ -22,12 +22,10 @@ MyDataset/ # 目标检测数据集根目录 ...@@ -22,12 +22,10 @@ MyDataset/ # 目标检测数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载目标检测示例数据集](https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz) **为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载目标检测示例数据集](https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz)
<!-- > 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中val_value表示验证集的比例,test_value表示测试集的比例(可以为0),剩余的比例用于训练集。
> 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中split指标训练集的比例,剩余的比例用于验证集。
> ``` > ```
> paddlex --split_dataset --from PascalVOC --pics ./JPEGImages --annotations ./Annotations --split 0.8 --save_dir ./splited_dataset_dir > paddlex --split_dataset --format VOC --dataset_dir MyDataset --val_value 0.2 --test_value 0.1
> ``` > ```
-->
**labels.txt** **labels.txt**
......
...@@ -18,14 +18,12 @@ MyDataset/ # 实例分割数据集根目录 ...@@ -18,14 +18,12 @@ MyDataset/ # 实例分割数据集根目录
在PaddleX中,为了区分训练集和验证集,在`MyDataset`同级目录,使用不同的json表示数据的划分,例如`train.json``val.json`[点击下载实例分割示例数据集](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz) 在PaddleX中,为了区分训练集和验证集,在`MyDataset`同级目录,使用不同的json表示数据的划分,例如`train.json``val.json`[点击下载实例分割示例数据集](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz)
<!-- > 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中val_value表示验证集的比例,test_value表示测试集的比例(可以为0),剩余的比例用于训练集。
> 注:也可使用PaddleX自带工具,对数据集进行随机划分,在数据按照上述示例组织结构后,使用如下命令,即可快速完成数据集随机划分,其中split指定训练集的比例,剩余比例用于验证集。
> ``` > ```
> paddlex --split_dataset --from MSCOCO --pics ./JPEGImages --annotations ./annotations.json --split 0.8 --save_dir ./splited_dataset_dir > paddlex --split_dataset --format COCO --dataset_dir MyDataset --val_value 0.2 --test_value 0.1
> ``` > ```
-->
MSCOCO数据的标注文件采用json格式,用户可使用Labelme, 精灵标注助手或EasyData等标注工具进行标注,参见[数据标注工具](../annotations.md) MSCOCO数据的标注文件采用json格式,用户可使用Labelme, 精灵标注助手或EasyData等标注工具进行标注,参见[数据标注工具](../annotation.md)
## PaddleX加载数据集 ## PaddleX加载数据集
示例代码如下, 示例代码如下,
......
...@@ -23,12 +23,10 @@ MyDataset/ # 语义分割数据集根目录 ...@@ -23,12 +23,10 @@ MyDataset/ # 语义分割数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载语义分割示例数据集](https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz) **为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载语义分割示例数据集](https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz)
<!-- > 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中val_value表示验证集的比例,test_value表示测试集的比例(可以为0),剩余的比例用于训练集。
> 注:也可使用PaddleX自带工具,对数据集进行随机划分,**在数据集按照上面格式组织后**,使用如下命令即可快速完成数据集随机划分,其中split指标训练集的比例,剩余的比例用于验证集。
> ``` > ```
> paddlex --split_dataset --from Seg --pics ./JPEGImages --annotations ./Annotations --split 0.8 --save_dir ./splited_dataset_dir > paddlex --split_dataset --format Seg --dataset_dir MyDataset --val_value 0.2 --test_value 0.1
> ``` > ```
-->
**labels.txt** **labels.txt**
......
# 轻量级服务化部署
## 简介
借助`PaddleHub-Serving`,可以将`PaddleX``Inference Model`进行快速部署,以提供在线预测的能力。
关于`PaddleHub-Serving`的更多信息,可参照[PaddleHub-Serving](https://github.com/PaddlePaddle/PaddleHub/blob/develop/docs/tutorial/serving.md)
**注意:使用此方式部署,需确保自己Python环境中PaddleHub的版本高于1.8.0, 可在命令终端输入`pip show paddlehub`确认版本信息。**
下面,我们按照步骤,实现将一个图像分类模型[MobileNetV3_small_ssld](https://bj.bcebos.com/paddlex/models/mobilenetv3_small_ssld_imagenet.tar.gz)转换成`PaddleHub`的预训练模型,并利用`PaddleHub-Serving`实现一键部署。
# 模型部署
## 1 部署模型准备
部署模型的格式均为目录下包含`__model__``__params__``model.yml`三个文件,如若不然,则参照[部署模型导出文档](./export_model.md)进行导出。
## 2 模型转换
首先,我们将`PaddleX``Inference Model`转换成`PaddleHub`的预训练模型,使用命令`hub convert`即可一键转换,对此命令的说明如下:
```shell
$ hub convert --model_dir XXXX \
--module_name XXXX \
--module_version XXXX \
--output_dir XXXX
```
**参数**
|参数|用途|
|-|-|
|--model_dir/-m|`PaddleX Inference Model`所在的目录|
|--module_name/-n|生成预训练模型的名称|
|--module_version/-v|生成预训练模型的版本,默认为`1.0.0`|
|--output_dir/-o|生成预训练模型的存放位置,默认为`{module_name}_{timestamp}`|
因此,我们仅需要一行命令即可完成预训练模型的转换。
```shell
hub convert --model_dir mobilenetv3_small_ssld_imagenet_hub --module_name mobilenetv3_small_ssld_imagenet_hub
```
转换成功后会打印提示信息,如下:
```shell
$ The converted module is stored in `MobileNetV3_small_ssld_hub_1596077881.868501`.
```
等待生成成功的提示后,我们就在输出目录中得到了一个`PaddleHub`的一个预训练模型。
## 3 模型安装
在模型转换一步中,我们得到了一个`.tar.gz`格式的预训练模型压缩包,在进行部署之前需要先安装到本机,使用命令`hub install`即可一键安装,对此命令的说明如下:
```shell
$ hub install ${MODULE}
```
其中${MODULE}为要安装的预训练模型文件路径。
因此,我们使用`hub install`命令安装:
```shell
hub install MobileNetV3_small_ssld_hub_1596077881.868501/mobilenetv3_small_ssld_imagenet_hub.tar.gz
```
安装成功后会打印提示信息,如下:
```shell
$ Successfully installed mobilenetv3_small_ssld_imagenet_hub
```
## 4 模型部署
下面,我们只需要使用`hub serving`命令即可完成模型的一键部署,对此命令的说明如下:
```shell
$ hub serving start --modules/-m [Module1==Version1, Module2==Version2, ...] \
--port/-p XXXX
--config/-c XXXX
```
**参数**
|参数|用途|
|-|-|
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*|
|--port/-p|服务端口,默认为8866|
|--config/-c|使用配置文件配置模型|
因此,我们仅需要一行代码即可完成模型的部署,如下:
```shell
$ hub serving start -m mobilenetv3_small_ssld_imagenet_hub
```
等待模型加载后,此预训练模型就已经部署在机器上了。
我们还可以使用配置文件对部署的模型进行更多配置,配置文件格式如下:
```json
{
"modules_info": {
"mobilenetv3_small_ssld_imagenet_hub": {
"init_args": {
"version": "1.0.0"
},
"predict_args": {
"batch_size": 1,
"use_gpu": false
}
}
},
"port": 8866
}
```
|参数|用途|
|-|-|
|modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,key为模型名称。其中:<br>`init_args`为模型加载时输入的参数,等同于`paddlehub.Module(**init_args)`<br>`predict_args`为模型预测时输入的参数,以`mobilenetv3_small_ssld_imagenet_hub`为例,等同于`mobilenetv3_small_ssld_imagenet_hub.batch_predict(**predict_args)`
|port|服务端口,默认为8866|
## 5 测试
在第二步模型安装的同时,会生成一个客户端请求示例,存放在模型安装目录,默认为`${HUB_HOME}/.paddlehub/modules`,对于此例,我们可以在`~/.paddlehub/modules/mobilenetv3_small_ssld_imagenet_hub`找到此客户端示例`serving_client_demo.py`,代码如下:
```python
# coding: utf8
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
if __name__ == '__main__':
# 获取图片的base64编码格式
img1 = cv2_to_base64(cv2.imread("IMAGE_PATH1"))
img2 = cv2_to_base64(cv2.imread("IMAGE_PATH2"))
data = {'images': [img1, img2]}
# 指定content-type
headers = {"Content-type": "application/json"}
# 发送HTTP请求
url = "http://127.0.0.1:8866/predict/mobilenetv3_small_ssld_imagenet_hub"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
使用的测试图片如下:
![](../train/images/test.jpg)
将代码中的`IMAGE_PATH1`改成想要进行预测的图片路径后,在命令行执行:
```python
python ~/.paddlehub/module/MobileNetV3_small_ssld_hub/serving_client_demo.py
```
即可收到预测结果,如下:
```shell
[[{'category': 'envelope', 'category_id': 549, 'score': 0.2141510397195816}]]
````
到此,我们就完成了`PaddleX`模型的一键部署。
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
:caption: 文档目录: :caption: 文档目录:
export_model.md export_model.md
hub_serving.md
server/index server/index
nvidia-jetson.md nvidia-jetson.md
paddlelite/index paddlelite/index
# Nvidia Jetson开发板 # Nvidia Jetson开发板
## 说明 ## 说明
本文档在 `Linux`平台使用`GCC 7.4`测试过,如果需要使用更高G++版本编译使用,则需要重新编译Paddle预测库,请参考: [Nvidia Jetson嵌入式硬件预测库源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12) 本文档在基于Nvidia Jetpack 4.4的`Linux`平台上使用`GCC 7.4`测试过,如需使用不同G++版本,则需要重新编译Paddle预测库,请参考: [NVIDIA Jetson嵌入式硬件预测库源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)
## 前置条件 ## 前置条件
* G++ 7.4 * G++ 7.4
* CUDA 9.0 / CUDA 10.0, CUDNN 7+ (仅在使用GPU版本的预测库时需要) * CUDA 10.0 / CUDNN 8 (仅在使用GPU版本的预测库时需要)
* CMake 3.0+ * CMake 3.0+
请确保系统已经安装好上述基本软件,**下面所有示例以工作目录 `/root/projects/`演示** 请确保系统已经安装好上述基本软件,**下面所有示例以工作目录 `/root/projects/`演示**
...@@ -57,13 +57,6 @@ CUDA_LIB=/usr/local/cuda/lib64 ...@@ -57,13 +57,6 @@ CUDA_LIB=/usr/local/cuda/lib64
# CUDNN 的 lib 路径 # CUDNN 的 lib 路径
CUDNN_LIB=/usr/local/cuda/lib64 CUDNN_LIB=/usr/local/cuda/lib64
# 是否加载加密后的模型
WITH_ENCRYPTION=OFF
# OPENCV 路径, 如果使用自带预编译版本可不修改
sh $(pwd)/scripts/jetson_bootstrap.sh # 下载预编译版本的opencv
OPENCV_DIR=$(pwd)/deps/opencv3/
# 以下无需改动 # 以下无需改动
rm -rf build rm -rf build
mkdir -p build mkdir -p build
...@@ -77,18 +70,13 @@ cmake .. \ ...@@ -77,18 +70,13 @@ cmake .. \
-DPADDLE_DIR=${PADDLE_DIR} \ -DPADDLE_DIR=${PADDLE_DIR} \
-DWITH_STATIC_LIB=${WITH_STATIC_LIB} \ -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
-DCUDA_LIB=${CUDA_LIB} \ -DCUDA_LIB=${CUDA_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \ -DCUDNN_LIB=${CUDNN_LIB}
-DENCRYPTION_DIR=${ENCRYPTION_DIR} \
-DOPENCV_DIR=${OPENCV_DIR}
make make
``` ```
**注意:** linux环境下编译会自动下载OPENCV和YAML,如果编译环境无法访问外网,可手动下载: **注意:** linux环境下编译会自动下载YAML,如果编译环境无法访问外网,可手动下载:
- [opencv3_aarch.tgz](https://bj.bcebos.com/paddlex/deploy/tools/opencv3_aarch.tgz)
- [yaml-cpp.zip](https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip) - [yaml-cpp.zip](https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip)
opencv3_aarch.tgz文件下载后解压,然后在script/build.sh中指定`OPENCE_DIR`为解压后的路径。
yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip` 中的网址,改为下载文件的路径。 yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip` 中的网址,改为下载文件的路径。
修改脚本设置好主要参数后,执行`build`脚本: 修改脚本设置好主要参数后,执行`build`脚本:
...@@ -100,7 +88,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -100,7 +88,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
**在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](export_model.md)将模型导出为部署格式。** **在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](export_model.md)将模型导出为部署格式。**
编译成功后,预测demo的可执行程序分别为`build/demo/detector``build/demo/classifier``build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下: * 编译成功后,图片预测demo的可执行程序分别为`build/demo/detector``build/demo/classifier``build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 | | 参数 | 说明 |
| ---- | ---- | | ---- | ---- |
...@@ -111,10 +99,26 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -111,10 +99,26 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) | | use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
| gpu_id | GPU 设备ID, 默认值为0 | | gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output",**classfier无该参数** | | save_dir | 保存可视化结果的路径, 默认值为"output",**classfier无该参数** |
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 | | batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 | | thread_num | 预测的线程数,默认为cpu处理器个数 |
| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
* 编译成功后,视频预测demo的可执行程序分别为`build/demo/video_detector``build/demo/video_classifier``build/demo/video_segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
| model_dir | 导出的预测模型所在路径 |
| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
| camera_id | 摄像头设备ID,默认值为0 |
| video_path | 视频文件的路径 |
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
| gpu_id | GPU 设备ID, 默认值为0 |
| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
| save_dir | 保存可视化结果的路径, 默认值为"output" |
**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
## 样例 ## 样例
...@@ -143,3 +147,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -143,3 +147,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2 ./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2
``` ```
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。 图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
**样例三:**
使用摄像头预测:
```shell
./build/demo/video_detector --model_dir=/root/projects/inference_model --use_camera=1 --use_gpu=1 --save_dir=output --save_result=1
```
`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
**样例四:**
对视频文件进行预测:
```shell
./build/demo/video_detector --model_dir=/root/projects/inference_model --video_path=/path/to/video_file --use_gpu=1 --save_dir=output --show_result=1 --save_result=1
```
`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
...@@ -49,7 +49,7 @@ PaddleX提供了两种方式: ...@@ -49,7 +49,7 @@ PaddleX提供了两种方式:
### 语义分割 ### 语义分割
实验背景:使用UNet模型,数据集为视盘分割示例数据,剪裁训练代码见[tutorials/compress/segmentation](https://github.com/PaddlePaddle/PaddleX/tree/develop/tutorials/compress/segmentation) 实验背景:使用UNet模型,数据集为视盘分割示例数据,剪裁训练代码见[tutorials/compress/segmentation](https://github.com/PaddlePaddle/PaddleX/tree/develop/tutorials/compress/segmentation)
| 模型 | 剪裁情况 | 模型大小 | mIOU(%) |GPU预测速度 | CPU预测速度 | | 模型 | 剪裁情况 | 模型大小 | mIoU(%) |GPU预测速度 | CPU预测速度 |
| :-----| :--------| :-------- | :---------- |:---------- | :---------| | :-----| :--------| :-------- | :---------- |:---------- | :---------|
|UNet | 无剪裁(原模型)| 77M | 91.22 |33.28ms |9523.55ms | |UNet | 无剪裁(原模型)| 77M | 91.22 |33.28ms |9523.55ms |
|UNet | 方案一(eval_metric_loss=0.10) |26M | 90.37 |21.04ms |3936.20ms | |UNet | 方案一(eval_metric_loss=0.10) |26M | 90.37 |21.04ms |3936.20ms |
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
定点量化使用更少的比特数(如8-bit、3-bit、2-bit等)表示神经网络的权重和激活值,从而加速模型推理速度。PaddleX提供了训练后量化技术,其原理可参见[训练后量化原理](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html#id14),该量化使用KL散度确定量化比例因子,将FP32模型转成INT8模型,且不需要重新训练,可以快速得到量化模型。 定点量化使用更少的比特数(如8-bit、3-bit、2-bit等)表示神经网络的权重和激活值,从而加速模型推理速度。PaddleX提供了训练后量化技术,其原理可参见[训练后量化原理](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html#id14),该量化使用KL散度确定量化比例因子,将FP32模型转成INT8模型,且不需要重新训练,可以快速得到量化模型。
## 使用PaddleX量化模型 ## 使用PaddleX量化模型
PaddleX提供了`export_quant_model`接口,让用户以接口的形式对训练后的模型进行量化。点击查看[量化接口使用文档](../../../apis/slim.html) PaddleX提供了`export_quant_model`接口,让用户以接口的形式对训练后的模型进行量化。点击查看[量化接口使用文档](../../../apis/slim.md)
## 量化性能对比 ## 量化性能对比
模型量化后的性能对比指标请查阅[PaddleSlim模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html) 模型量化后的性能对比指标请查阅[PaddleSlim模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html)
...@@ -116,7 +116,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -116,7 +116,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
**在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](../../export_model.md)将模型导出为部署格式。** **在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](../../export_model.md)将模型导出为部署格式。**
编译成功后,预测demo的可执行程序分别为`build/demo/detector``build/demo/classifier``build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下: * 编译成功后,图片预测demo的可执行程序分别为`build/demo/detector``build/demo/classifier``build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 | | 参数 | 说明 |
| ---- | ---- | | ---- | ---- |
...@@ -130,7 +130,24 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -130,7 +130,24 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 | | key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 | | batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 | | thread_num | 预测的线程数,默认为cpu处理器个数 |
| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
* 编译成功后,视频预测demo的可执行程序分别为`build/demo/video_detector``build/demo/video_classifier``build/demo/video_segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
| model_dir | 导出的预测模型所在路径 |
| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
| camera_id | 摄像头设备ID,默认值为0 |
| video_path | 视频文件的路径 |
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
| gpu_id | GPU 设备ID, 默认值为0 |
| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
## 样例 ## 样例
...@@ -138,7 +155,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -138,7 +155,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
> 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。 > 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。
`样例一` **样例一:**
不使用`GPU`测试图片 `/root/projects/images/xiaoduxiong.jpeg` 不使用`GPU`测试图片 `/root/projects/images/xiaoduxiong.jpeg`
...@@ -148,7 +165,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -148,7 +165,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。 图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
`样例二`: **样例二:**
使用`GPU`预测多个图片`/root/projects/image_list.txt`,image_list.txt内容的格式如下: 使用`GPU`预测多个图片`/root/projects/image_list.txt`,image_list.txt内容的格式如下:
``` ```
...@@ -161,3 +178,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https:// ...@@ -161,3 +178,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2 ./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2
``` ```
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。 图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
**样例三:**
使用摄像头预测:
```shell
./build/demo/video_detector --model_dir=/root/projects/inference_model --use_camera=1 --use_gpu=1 --save_dir=output --save_result=1
```
`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
**样例四:**
对视频文件进行预测:
```shell
./build/demo/video_detector --model_dir=/root/projects/inference_model --video_path=/path/to/video_file --use_gpu=1 --save_dir=output --show_result=1 --save_result=1
```
`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
...@@ -101,7 +101,7 @@ D: ...@@ -101,7 +101,7 @@ D:
cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release
``` ```
编译成功后,预测demo的入口程序为`paddlex_inference\detector.exe`,`paddlex_inference\classifier.exe`,`paddlex_inference\segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下: * 编译成功后,图片预测demo的入口程序为`paddlex_inference\detector.exe`,`paddlex_inference\classifier.exe`,`paddlex_inference\segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 | | 参数 | 说明 |
| ---- | ---- | | ---- | ---- |
...@@ -114,7 +114,24 @@ cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release ...@@ -114,7 +114,24 @@ cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 | | key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 | | batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 | | thread_num | 预测的线程数,默认为cpu处理器个数 |
| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
* 编译成功后,视频预测demo的入口程序为`paddlex_inference\video_detector.exe`,`paddlex_inference\video_classifier.exe`,`paddlex_inference\video_segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
| model_dir | 导出的预测模型所在路径 |
| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
| camera_id | 摄像头设备ID,默认值为0 |
| video_path | 视频文件的路径 |
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
| gpu_id | GPU 设备ID, 默认值为0 |
| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
| save_dir | 保存可视化结果的路径, 默认值为"output" |
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
## 样例 ## 样例
...@@ -157,3 +174,18 @@ D:\images\xiaoduxiongn.jpeg ...@@ -157,3 +174,18 @@ D:\images\xiaoduxiongn.jpeg
``` ```
`--key`传入加密工具输出的密钥,例如`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=`, 图片文件可视化预测结果会保存在`save_dir`参数设置的目录下。 `--key`传入加密工具输出的密钥,例如`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=`, 图片文件可视化预测结果会保存在`save_dir`参数设置的目录下。
### 样例四:(使用未加密的模型开启摄像头预测)
```shell
.\paddlex_inference\video_detector.exe --model_dir=D:\projects\inference_model --use_camera=1 --use_gpu=1 --save_dir=output
```
当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
### 样例五:(使用未加密的模型对视频文件做预测)
```shell
.\paddlex_inference\video_detector.exe --model_dir=D:\projects\inference_model --video_path=D:\projects\video_test.mp4 --use_gpu=1 --show_result=1 --save_dir=output
```
当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
...@@ -51,7 +51,7 @@ paddlex-encryption ...@@ -51,7 +51,7 @@ paddlex-encryption
| |
├── lib # libpmodel-encrypt.so和libpmodel-decrypt.so动态库 ├── lib # libpmodel-encrypt.so和libpmodel-decrypt.so动态库
| |
└── tool # paddlex_encrypt_tool └── tool # paddle_encrypt_tool
``` ```
Windows加密工具包含内容为: Windows加密工具包含内容为:
...@@ -61,7 +61,7 @@ paddlex-encryption ...@@ -61,7 +61,7 @@ paddlex-encryption
| |
├── lib # pmodel-encrypt.dll和pmodel-decrypt.dll动态库 pmodel-encrypt.lib和pmodel-encrypt.lib静态库 ├── lib # pmodel-encrypt.dll和pmodel-decrypt.dll动态库 pmodel-encrypt.lib和pmodel-encrypt.lib静态库
| |
└── tool # paddlex_encrypt_tool.exe 模型加密工具 └── tool # paddle_encrypt_tool.exe 模型加密工具
``` ```
### 1.3 加密PaddleX模型 ### 1.3 加密PaddleX模型
...@@ -71,13 +71,13 @@ paddlex-encryption ...@@ -71,13 +71,13 @@ paddlex-encryption
Linux平台: Linux平台:
``` ```
# 假设模型在/root/projects下 # 假设模型在/root/projects下
./paddlex-encryption/tool/paddlex_encrypt_tool -model_dir /root/projects/paddlex_inference_model -save_dir /root/projects/paddlex_encrypted_model ./paddlex-encryption/tool/paddle_encrypt_tool -model_dir /root/projects/paddlex_inference_model -save_dir /root/projects/paddlex_encrypted_model
``` ```
Windows平台: Windows平台:
``` ```
# 假设模型在D:/projects下 # 假设模型在D:/projects下
.\paddlex-encryption\tool\paddlex_encrypt_tool.exe -model_dir D:\projects\paddlex_inference_model -save_dir D:\projects\paddlex_encrypted_model .\paddlex-encryption\tool\paddle_encrypt_tool.exe -model_dir D:\projects\paddlex_inference_model -save_dir D:\projects\paddlex_encrypted_model
``` ```
`-model_dir`用于指定inference模型路径(参考[导出inference模型](../export_model.md)将模型导出为inference格式模型),可使用[导出小度熊识别模型](../export_model.md)中导出的`inference_model`。加密完成后,加密过的模型会保存至指定的`-save_dir`下,包含`__model__.encrypted``__params__.encrypted``model.yml`三个文件,同时生成密钥信息,命令输出如下图所示,密钥为`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=` `-model_dir`用于指定inference模型路径(参考[导出inference模型](../export_model.md)将模型导出为inference格式模型),可使用[导出小度熊识别模型](../export_model.md)中导出的`inference_model`。加密完成后,加密过的模型会保存至指定的`-save_dir`下,包含`__model__.encrypted``__params__.encrypted``model.yml`三个文件,同时生成密钥信息,命令输出如下图所示,密钥为`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=`
......
...@@ -27,7 +27,26 @@ import paddlex as pdx ...@@ -27,7 +27,26 @@ import paddlex as pdx
predictor = pdx.deploy.Predictor('./inference_model') predictor = pdx.deploy.Predictor('./inference_model')
image_list = ['xiaoduxiong_test_image/JPEGImages/WeChatIMG110.jpeg', image_list = ['xiaoduxiong_test_image/JPEGImages/WeChatIMG110.jpeg',
'xiaoduxiong_test_image/JPEGImages/WeChatIMG111.jpeg'] 'xiaoduxiong_test_image/JPEGImages/WeChatIMG111.jpeg']
result = predictor.predict(image_list=image_list) result = predictor.batch_predict(image_list=image_list)
```
* 视频流预测
```
import cv2
import paddlex as pdx
predictor = pdx.deploy.Predictor('./inference_model')
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
if ret:
result = predictor.predict(frame)
vis_img = pdx.det.visualize(frame, result, threshold=0.6, save_dir=None)
cv2.imshow('Xiaoduxiong', vis_img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
``` ```
> 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。 > 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。
......
# 人像分割模型 # 人像分割模型
本教程基于PaddleX核心分割模型实现人像分割,开放预训练模型和测试数据、支持视频流人像分割、提供模型Fine-tune到Paddle Lite移动端部署的全流程应用指南。 本教程基于PaddleX核心分割模型实现人像分割,开放预训练模型和测试数据、支持视频流人像分割、提供模型Fine-tune到Paddle Lite移动端及Nvidia Jeston嵌入式设备部署的全流程应用指南。
## 预训练模型和测试数据 ## 预训练模型和测试数据
#### 预训练模型 #### 预训练模型
本案例开放了两个在大规模人像数据集上训练好的模型,以满足服务器端场景和移动端场景的需求。使用这些模型可以快速体验视频流人像分割,也可以部署到移动端进行实时人像分割,也可以用于完成模型Fine-tuning。 本案例开放了两个在大规模人像数据集上训练好的模型,以满足服务器端场景和移动端场景的需求。使用这些模型可以快速体验视频流人像分割,也可以部署到移动端或嵌入式设备进行实时人像分割,也可以用于完成模型Fine-tuning。
| 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 | | 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 |
| --- | --- | --- | ---| --- | | --- | --- | --- | ---| --- |
...@@ -243,15 +243,17 @@ python quant_offline.py --model_dir output/best_model \ ...@@ -243,15 +243,17 @@ python quant_offline.py --model_dir output/best_model \
* `--save_dir`: 量化模型保存路径 * `--save_dir`: 量化模型保存路径
* `--image_shape`: 网络输入图像大小(w, h) * `--image_shape`: 网络输入图像大小(w, h)
## Paddle Lite移动端部署 ## 推理部署
### Paddle Lite移动端部署
本案例将人像分割模型在移动端进行部署,部署流程展示如下,通用的移动端部署流程参见[Paddle Lite移动端部署](../../docs/deploy/paddlelite/android.md) 本案例将人像分割模型在移动端进行部署,部署流程展示如下,通用的移动端部署流程参见[Paddle Lite移动端部署](../../docs/deploy/paddlelite/android.md)
### 1. 将PaddleX模型导出为inference模型 #### 1. 将PaddleX模型导出为inference模型
本案例使用humanseg_mobile_quant预训练模型,该模型已经是inference模型,不需要再执行模型导出步骤。如果不使用预训练模型,则执行上一章节`模型训练`中的`模型导出`将自己训练的模型导出为inference格式。 本案例使用humanseg_mobile_quant预训练模型,该模型已经是inference模型,不需要再执行模型导出步骤。如果不使用预训练模型,则执行上一章节`模型训练`中的`模型导出`将自己训练的模型导出为inference格式。
### 2. 将inference模型优化为Paddle Lite模型 #### 2. 将inference模型优化为Paddle Lite模型
下载并解压 [模型优化工具opt](https://bj.bcebos.com/paddlex/deploy/lite/model_optimize_tool_11cbd50e.tar.gz),进入模型优化工具opt所在路径后,执行以下命令: 下载并解压 [模型优化工具opt](https://bj.bcebos.com/paddlex/deploy/lite/model_optimize_tool_11cbd50e.tar.gz),进入模型优化工具opt所在路径后,执行以下命令:
...@@ -273,16 +275,16 @@ python quant_offline.py --model_dir output/best_model \ ...@@ -273,16 +275,16 @@ python quant_offline.py --model_dir output/best_model \
更详细的使用方法和参数含义请参考: [使用opt转化模型](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html) 更详细的使用方法和参数含义请参考: [使用opt转化模型](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html)
### 3. 移动端预测 #### 3. 移动端预测
PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图像分类、目标检测、实例分割和语义分割,该demo位于`PaddleX/deploy/lite/android/demo`,用户将模型、配置文件和测试图片拷贝至该demo下进行预测。 PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图像分类、目标检测、实例分割和语义分割,该demo位于`PaddleX/deploy/lite/android/demo`,用户将模型、配置文件和测试图片拷贝至该demo下进行预测。
#### 3.1 前置依赖 ##### 3.1 前置依赖
* Android Studio 3.4 * Android Studio 3.4
* Android手机或开发板 * Android手机或开发板
#### 3.2 拷贝模型、配置文件和测试图片 ##### 3.2 拷贝模型、配置文件和测试图片
* 将Lite模型(.nb文件)拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/model/`目录下, 根据.nb文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`MODEL_PATH_DEFAULT` * 将Lite模型(.nb文件)拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/model/`目录下, 根据.nb文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`MODEL_PATH_DEFAULT`
...@@ -290,7 +292,7 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图 ...@@ -290,7 +292,7 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图
* 将测试图片拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/images/`目录下,根据图片文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`IMAGE_PATH_DEFAULT` * 将测试图片拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/images/`目录下,根据图片文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`IMAGE_PATH_DEFAULT`
#### 3.3 导入工程并运行 ##### 3.3 导入工程并运行
* 打开Android Studio,在"Welcome to Android Studio"窗口点击"Open an existing Android Studio project",在弹出的路径选择窗口中进入`PaddleX/deploy/lite/android/demo`目录,然后点击右下角的"Open"按钮,导入工程; * 打开Android Studio,在"Welcome to Android Studio"窗口点击"Open an existing Android Studio project",在弹出的路径选择窗口中进入`PaddleX/deploy/lite/android/demo`目录,然后点击右下角的"Open"按钮,导入工程;
...@@ -303,3 +305,58 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图 ...@@ -303,3 +305,58 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图
测试图片及其分割结果如下所示: 测试图片及其分割结果如下所示:
![](./images/beauty.png) ![](./images/beauty.png)
### Nvidia Jetson嵌入式设备部署
#### c++部署
step 1. 下载PaddleX源码
```
git clone https://github.com/PaddlePaddle/PaddleX
```
step 2. 将`PaddleX/examples/human_segmentation/deploy/cpp`下的`human_segmenter.cpp``CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。
step 3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。
step 4. 编译成功后,可执行程为`build/human_segmenter`,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
| model_dir | 人像分割模型路径 |
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
| camera_id | 摄像头设备ID,默认值为0 |
| video_path | 视频文件的路径 |
| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果,支持值为0或1(默认值为0) |
| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
| image | 待预测的图片路径 |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
step 5. 推理预测
用于部署推理的模型应为inference格式,本案例使用humanseg_server_inference预训练模型,该模型已经是inference模型,不需要再执行模型导出步骤。如果不使用预训练模型,则执行第2章节`模型训练`中的`模型导出`将自己训练的模型导出为inference格式。
* 使用未加密的模型对单张图片做预测
待测试图片位于本案例提供的测试数据中,可以替换成自己的图片。
```shell
./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --image=/path/to/data/mini_supervisely/Images/pexels-photo-63776.png --use_gpu=1 --save_dir=output
```
* 使用未加密的模型开启摄像头做预测
```shell
./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --use_camera=1 --save_result=1 --use_gpu=1 --save_dir=output
```
* 使用未加密的模型对视频文件做预测
待测试视频文件位于本案例提供的测试数据中,可以替换成自己的视频文件。
```shell
./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --video_path=/path/to/data/mini_supervisely/video_test.mp4 --save_result=1 --use_gpu=1 --save_dir=output
```
...@@ -46,13 +46,13 @@ ...@@ -46,13 +46,13 @@
#### 测试表计读数 #### 测试表计读数
1. 下载PaddleX源码: step 1. 下载PaddleX源码:
``` ```
git clone https://github.com/PaddlePaddle/PaddleX git clone https://github.com/PaddlePaddle/PaddleX
``` ```
2. 预测执行文件位于`PaddleX/examples/meter_reader/`,进入该目录: step 2. 预测执行文件位于`PaddleX/examples/meter_reader/`,进入该目录:
``` ```
cd PaddleX/examples/meter_reader/ cd PaddleX/examples/meter_reader/
...@@ -76,7 +76,7 @@ cd PaddleX/examples/meter_reader/ ...@@ -76,7 +76,7 @@ cd PaddleX/examples/meter_reader/
| use_erode | 是否使用图像腐蚀对分割预测图进行细分,默认为False | | use_erode | 是否使用图像腐蚀对分割预测图进行细分,默认为False |
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 | | erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
3. 预测 step 3. 预测
若要使用GPU,则指定GPU卡号(以0号卡为例): 若要使用GPU,则指定GPU卡号(以0号卡为例):
...@@ -112,17 +112,17 @@ python3 reader_infer.py --detector_dir /path/to/det_inference_model --segmenter_ ...@@ -112,17 +112,17 @@ python3 reader_infer.py --detector_dir /path/to/det_inference_model --segmenter_
#### c++部署 #### c++部署
1. 下载PaddleX源码: step 1. 下载PaddleX源码:
``` ```
git clone https://github.com/PaddlePaddle/PaddleX git clone https://github.com/PaddlePaddle/PaddleX
``` ```
2.`PaddleX\examples\meter_reader\deploy\cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX\deploy\cpp`目录下,拷贝之前可以将`PaddleX\deploy\cpp`下原本的`CMakeList.txt`做好备份。 step 2. 将`PaddleX\examples\meter_reader\deploy\cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX\deploy\cpp`目录下,拷贝之前可以将`PaddleX\deploy\cpp`下原本的`CMakeList.txt`做好备份。
3. 按照[Windows平台部署](../deploy/server/cpp/windows.md)中的Step2至Step4完成C++预测代码的编译。 step 3. 按照[Windows平台部署](../deploy/server/cpp/windows.md)中的Step2至Step4完成C++预测代码的编译。
4. 编译成功后,可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录: step 4. 编译成功后,可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
``` ```
cd PaddleX\deploy\cpp\out\build\x64-Release cd PaddleX\deploy\cpp\out\build\x64-Release
...@@ -139,8 +139,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -139,8 +139,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)| | use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 | | gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"| | save_dir | 保存可视化结果的路径, 默认值为"output"|
| det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
| seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 | | seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 | | thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) | | use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
...@@ -149,7 +147,7 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -149,7 +147,7 @@ git clone https://github.com/PaddlePaddle/PaddleX
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 | | erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
| score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5| | score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5|
5. 推理预测: step 5. 推理预测:
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。 用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。
...@@ -160,6 +158,13 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -160,6 +158,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
``` ```
* 使用未加密的模型对图像列表做预测 * 使用未加密的模型对图像列表做预测
图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
```
\path\to\images\1.jpg
\path\to\images\2.jpg
...
\path\to\images\n.jpg
```
```shell ```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
...@@ -171,29 +176,29 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -171,29 +176,29 @@ git clone https://github.com/PaddlePaddle/PaddleX
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
* 使用加密后的模型对单张图片做预测 * 使用加密后的模型对单张图片做预测
如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=` 如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
```shell ```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY= .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
``` ```
### Linux系统的jetson嵌入式设备安全部署 ### Linux系统的jetson嵌入式设备安全部署
#### c++部署 #### c++部署
1. 下载PaddleX源码: step 1. 下载PaddleX源码:
``` ```
git clone https://github.com/PaddlePaddle/PaddleX git clone https://github.com/PaddlePaddle/PaddleX
``` ```
2.`PaddleX/examples/meter_reader/deploy/cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。 step 2. 将`PaddleX/examples/meter_reader/deploy/cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。
3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。 step 3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。
4. 编译成功后,可执行程为`build/meter_reader/meter_reader`,其主要命令参数说明如下: step 4. 编译成功后,可执行程为`build/meter_reader/meter_reader`,其主要命令参数说明如下:
| 参数 | 说明 | | 参数 | 说明 |
| ---- | ---- | | ---- | ---- |
...@@ -204,8 +209,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -204,8 +209,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)| | use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 | | gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"| | save_dir | 保存可视化结果的路径, 默认值为"output"|
| det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
| seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 | | seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 | | thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) | | use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
...@@ -214,7 +217,7 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -214,7 +217,7 @@ git clone https://github.com/PaddlePaddle/PaddleX
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 | | erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
| score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5| | score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5|
5. 推理预测: step 5. 推理预测:
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。 用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。
...@@ -225,7 +228,13 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -225,7 +228,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
``` ```
* 使用未加密的模型对图像列表做预测 * 使用未加密的模型对图像列表做预测
图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
```
\path\to\images\1.jpg
\path\to\images\2.jpg
...
\path\to\images\n.jpg
```
```shell ```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output ./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
...@@ -236,15 +245,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -236,15 +245,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output ./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
* 使用加密后的模型对单张图片做预测
如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`/path/to/encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`/path/to/encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/encrypted_det_inference_model --seg_model_dir=/path/to/encrypted_seg_inference_model --image=/path/to/test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
```
## 模型训练 ## 模型训练
......
...@@ -42,6 +42,7 @@ PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉 ...@@ -42,6 +42,7 @@ PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉
| YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 | | YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 |
| YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 | | YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 |
| YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 | | YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 |
| PPYOLO | 预测速度和模型精度都比YOLOv3-DarkNet53优异,适用于大多数的服务器端场景 | 329.1MB | - | - | - | 45.9 |
| FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 | | FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 |
| FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 | | FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 |
| FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 | | FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 |
...@@ -74,11 +75,12 @@ PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone ...@@ -74,11 +75,12 @@ PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。 > 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
> 表中CPU预测速度 (测试CPU型号为)。 > 表中CPU预测速度 (测试CPU型号为)。
> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。 > 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
> 测速时模型的输入大小为1024 x 2048,mIOU为Cityscapes数据集上评估所得。 > 测速时模型的输入大小为1024 x 2048,mIoU为Cityscapes数据集上评估所得。
| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIOU | | 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIoU |
| :---- | :------- | :---------- | :---------- | :----- | :----- |:--- | | :---- | :------- | :---------- | :---------- | :----- | :----- |:--- |
| DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% | | DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% |
| DeepLabv3-MobileNetV3_large_x1_0_ssld | 轻量级模型,适用于移动端场景| - | - | - | 73.28% |
| HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - | | HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - |
| FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 | | FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 |
| HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 | | HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 |
......
...@@ -33,4 +33,4 @@ ...@@ -33,4 +33,4 @@
**如果您有任何问题或建议,欢迎以issue的形式,或加入PaddleX官方QQ群(1045148026)直接反馈您的问题和需求** **如果您有任何问题或建议,欢迎以issue的形式,或加入PaddleX官方QQ群(1045148026)直接反馈您的问题和需求**
![](/Users/lvxueying/Documents/LaraPaddleX/docs/paddlex_gui/images/QR.jpg) ![](./images/QR.jpg)
\ No newline at end of file
...@@ -29,4 +29,4 @@ python mobilenetv3_small_ssld.py ...@@ -29,4 +29,4 @@ python mobilenetv3_small_ssld.py
-**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md) -**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
-**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925) -**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-**拓展**】更多图像分类模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/index.html) -**拓展**】更多图像分类模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/classification.md)
...@@ -13,3 +13,4 @@ PaddleX集成了PaddleClas、PaddleDetection和PaddleSeg三大CV工具套件中 ...@@ -13,3 +13,4 @@ PaddleX集成了PaddleClas、PaddleDetection和PaddleSeg三大CV工具套件中
instance_segmentation.md instance_segmentation.md
semantic_segmentation.md semantic_segmentation.md
prediction.md prediction.md
visualdl.md
...@@ -27,4 +27,4 @@ python mask_rcnn_r50_fpn.py ...@@ -27,4 +27,4 @@ python mask_rcnn_r50_fpn.py
-**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md) -**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
-**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925) -**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-**拓展**】更多实例分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/index.html) -**拓展**】更多实例分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/instance_segmentation.md)
...@@ -13,6 +13,7 @@ PaddleX目前提供了FasterRCNN和YOLOv3两种检测结构,多种backbone模 ...@@ -13,6 +13,7 @@ PaddleX目前提供了FasterRCNN和YOLOv3两种检测结构,多种backbone模
| [YOLOv3-MobileNetV1](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv1.py) | 29.3% | 99.2MB | 15.442ms | - | 模型小,预测速度快,适用于低性能或移动端设备 | | [YOLOv3-MobileNetV1](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv1.py) | 29.3% | 99.2MB | 15.442ms | - | 模型小,预测速度快,适用于低性能或移动端设备 |
| [YOLOv3-MobileNetV3](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv3.py) | 31.6% | 100.7MB | 143.322ms | - | 模型小,移动端上预测速度有优势 | | [YOLOv3-MobileNetV3](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv3.py) | 31.6% | 100.7MB | 143.322ms | - | 模型小,移动端上预测速度有优势 |
| [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py) | 38.9% | 249.2MB | 42.672ms | - | 模型较大,预测速度快,适用于服务端 | | [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py) | 38.9% | 249.2MB | 42.672ms | - | 模型较大,预测速度快,适用于服务端 |
| [PPYOLO](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/ppyolo.py) | 45.9% | 329.1MB | - | - | 模型较大,预测速度比YOLOv3-DarkNet53更快,适用于服务端 |
| [FasterRCNN-ResNet50-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r50_fpn.py) | 37.2% | 167.7MB | 197.715ms | - | 模型精度高,适用于服务端部署 | | [FasterRCNN-ResNet50-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r50_fpn.py) | 37.2% | 167.7MB | 197.715ms | - | 模型精度高,适用于服务端部署 |
| [FasterRCNN-ResNet18-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r18_fpn.py) | 32.6% | 173.2MB | - | - | 模型精度高,适用于服务端部署 | | [FasterRCNN-ResNet18-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r18_fpn.py) | 32.6% | 173.2MB | - | - | 模型精度高,适用于服务端部署 |
| [FasterRCNN-HRNet-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py) | 36.0% | 115.MB | 81.592ms | - | 模型精度高,预测速度快,适用于服务端部署 | | [FasterRCNN-HRNet-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py) | 36.0% | 115.MB | 81.592ms | - | 模型精度高,预测速度快,适用于服务端部署 |
...@@ -31,4 +32,4 @@ python yolov3_mobilenetv1.py ...@@ -31,4 +32,4 @@ python yolov3_mobilenetv1.py
-**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md) -**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
-**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925) -**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-**拓展**】更多目标检测模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/index.html) -**拓展**】更多目标检测模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/detection.md)
...@@ -4,15 +4,16 @@ ...@@ -4,15 +4,16 @@
PaddleX目前提供了DeepLabv3p、UNet、HRNet和FastSCNN四种语义分割结构,多种backbone模型,可满足开发者不同场景和性能的需求。 PaddleX目前提供了DeepLabv3p、UNet、HRNet和FastSCNN四种语义分割结构,多种backbone模型,可满足开发者不同场景和性能的需求。
- **mIOU**: 模型在CityScape数据集上的测试精度 - **mIoU**: 模型在CityScape数据集上的测试精度
- **预测速度**:单张图片的预测用时(不包括预处理和后处理) - **预测速度**:单张图片的预测用时(不包括预处理和后处理)
- "-"表示指标暂未更新 - "-"表示指标暂未更新
| 模型(点击获取代码) | mIOU | 模型大小 | GPU预测速度 | Arm预测速度 | 备注 | | 模型(点击获取代码) | mIoU | 模型大小 | GPU预测速度 | Arm预测速度 | 备注 |
| :---------------- | :------- | :------- | :--------- | :--------- | :----- | | :---------------- | :------- | :------- | :--------- | :--------- | :----- |
| [DeepLabv3p-MobileNetV2-x0.25](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2_x0.25.py) | - | 2.9MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 | | [DeepLabv3p-MobileNetV2-x0.25](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2_x0.25.py) | - | 2.9MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
| [DeepLabv3p-MobileNetV2-x1.0](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py) | 69.8% | 11MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 | | [DeepLabv3p-MobileNetV2-x1.0](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py) | 69.8% | 11MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
| [DeepLabv3p-Xception65](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_xception65.pyy) | 79.3% | 158MB | - | - | 模型大,精度高,适用于服务端 | | [DeepLabv3_MobileNetV3_large_x1_0_ssld](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv3_large_ssld.py) | 73.28% | 9.3MB | - | - | 模型小,预测速度快,精度较高,适用于低性能或移动端设备 |
| [DeepLabv3p-Xception65](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_xception65.py) | 79.3% | 158MB | - | - | 模型大,精度高,适用于服务端 |
| [UNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py) | - | 52MB | - | - | 模型较大,精度高,适用于服务端 | | [UNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py) | - | 52MB | - | - | 模型较大,精度高,适用于服务端 |
| [HRNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/hrnet.py) | 79.4% | 37MB | - | - | 模型较小,模型精度高,适用于服务端部署 | | [HRNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/hrnet.py) | 79.4% | 37MB | - | - | 模型较小,模型精度高,适用于服务端部署 |
| [FastSCNN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/fast_scnn.py) | - | 4.5MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 | | [FastSCNN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/fast_scnn.py) | - | 4.5MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
...@@ -30,4 +31,4 @@ python deeplabv3p_mobilenetv2_x0.25.py ...@@ -30,4 +31,4 @@ python deeplabv3p_mobilenetv2_x0.25.py
-**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md) -**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
-**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925) -**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-**拓展**】更多语义分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/index.html) -**拓展**】更多语义分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)[API使用文档](../apis/models/semantic_segmentation.md)
# VisualDL可视化训练指标
在使用PaddleX训练模型过程中,各个训练指标和评估指标会直接输出到标准输出流,同时也可通过VisualDL对训练过程中的指标进行可视化,只需在调用`train`函数时,将`use_vdl`参数设为`True`即可,如下代码所示,
```
model = paddlex.cls.ResNet50(num_classes=1000)
model.train(num_epochs=120, train_dataset=train_dataset,
train_batch_size=32, eval_dataset=eval_dataset,
log_interval_steps=10, save_interval_epochs=10,
save_dir='./output', use_vdl=True)
```
模型在训练过程中,会在`save_dir`下生成`vdl_log`目录,通过在命令行终端执行以下命令,启动VisualDL。
```
visualdl --logdir=output/vdl_log --port=8008
```
在浏览器打开`http://0.0.0.0:8008`便可直接查看随训练迭代动态变化的各个指标(0.0.0.0表示启动VisualDL所在服务器的IP,本机使用0.0.0.0即可)。
在训练分类模型过程中,使用VisualDL进行可视化的示例图如下所示。
> 训练过程中每个Step的`Loss`和相应`Top1准确率`变化趋势:
![](../images/vdl1.jpg)
> 训练过程中每个Step的`学习率lr`和相应`Top5准确率`变化趋势:
![](../images/vdl2.jpg)
> 训练过程中,每次保存模型时,模型在验证数据集上的`Top1准确率`和`Top5准确率`:
![](../images/vdl3.jpg)
cmake_minimum_required(VERSION 3.0)
project(PaddleX CXX C)
option(WITH_MKL "Compile human_segmenter with MKL/OpenBlas support,defaultuseMKL." ON)
option(WITH_GPU "Compile human_segmenter with GPU/CPU, default use CPU." ON)
if (NOT WIN32)
option(WITH_STATIC_LIB "Compile human_segmenter with static/shared library, default use static." OFF)
else()
option(WITH_STATIC_LIB "Compile human_segmenter with static/shared library, default use static." ON)
endif()
option(WITH_TENSORRT "Compile human_segmenter with TensorRT." OFF)
option(WITH_ENCRYPTION "Compile human_segmenter with encryption tool." OFF)
SET(TENSORRT_DIR "" CACHE PATH "Location of libraries")
SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(ENCRYPTION_DIR"" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
if (NOT WIN32)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
else()
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
endif()
if (NOT WIN32)
SET(YAML_BUILD_TYPE ON CACHE BOOL "yaml build shared library.")
else()
SET(YAML_BUILD_TYPE OFF CACHE BOOL "yaml build shared library.")
endif()
include(cmake/yaml-cpp.cmake)
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include")
link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if (WITH_ENCRYPTION)
add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION})
endif()
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${PADDLE_DIR}/")
include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
include_directories("${PADDLE_DIR}/third_party/install/glog/include")
include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include")
include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include")
include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
endif()
# zlib does not exist in 1.8.1
if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/include")
include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
endif()
include_directories("${PADDLE_DIR}/third_party/boost")
include_directories("${PADDLE_DIR}/third_party/eigen3")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
endif()
if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/lib")
link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
endif()
link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
link_directories("${PADDLE_DIR}/paddle/lib/")
link_directories("${CMAKE_CURRENT_BINARY_DIR}")
if (WIN32)
include_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/fluid/inference")
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
unset(OpenCV_DIR CACHE)
else ()
if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64") # x86_64 aarch64
set(OpenCV_INCLUDE_DIRS "/usr/include/opencv4")
file(GLOB OpenCV_LIBS /usr/lib/aarch64-linux-gnu/libopencv_*${CMAKE_SHARED_LIBRARY_SUFFIX})
message("OpenCV libs: ${OpenCV_LIBS}")
else()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
endif()
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
endif ()
include_directories(${OpenCV_INCLUDE_DIRS})
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
find_package(OpenMP REQUIRED)
if (OPENMP_FOUND)
message("OPENMP FOUND")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} ${OpenMP_C_FLAGS}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} ${OpenMP_CXX_FLAGS}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} ${OpenMP_CXX_FLAGS}")
endif()
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn/")
endif()
endif(NOT WIN32)
endif()
if (NOT WIN32)
if (WITH_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_DIR}/include")
link_directories("${TENSORRT_DIR}/lib")
endif()
endif(NOT WIN32)
if (NOT WIN32)
set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
if(EXISTS ${NGRAPH_PATH})
include(GNUInstallDirs)
include_directories("${NGRAPH_PATH}/include")
link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_MKL)
include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
if (WIN32)
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
execute_process(COMMAND cp -r ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
endif ()
set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if (WIN32)
if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
if (NOT WIN32)
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_DIR}/paddle/lib/paddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash yaml-cpp
)
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags_static libprotobuf xxhash libyaml-cppmt)
if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/lib")
set(DEPS ${DEPS} zlibstatic)
endif()
set(DEPS ${DEPS} libcmt shlwapi)
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
if (EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (WITH_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_ENCRYPTION)
if(NOT WIN32)
include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib")
set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/libpmodel-decrypt${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib")
set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/pmodel-decrypt${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
set(DEPS ${DEPS} ${EXTERNAL_LIB})
endif()
set(DEPS ${DEPS} ${OpenCV_LIBS})
add_library(paddlex_inference SHARED src/visualize src/transforms.cpp src/paddlex.cpp)
ADD_DEPENDENCIES(paddlex_inference ext-yaml-cpp)
target_link_libraries(paddlex_inference ${DEPS})
add_executable(human_segmenter human_segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
ADD_DEPENDENCIES(human_segmenter ext-yaml-cpp)
target_link_libraries(human_segmenter ${DEPS})
if (WIN32 AND WITH_MKL)
add_custom_command(TARGET human_segmenter POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
)
# for encryption
if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll")
add_custom_command(TARGET human_segmenter POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
)
endif()
endif()
file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/visualize.h"
DESTINATION "${CMAKE_BINARY_DIR}/include/" )
file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/config_parser.h"
DESTINATION "${CMAKE_BINARY_DIR}/include/" )
file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/transforms.h"
DESTINATION "${CMAKE_BINARY_DIR}/include/" )
file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/results.h"
DESTINATION "${CMAKE_BINARY_DIR}/include/" )
file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/paddlex.h"
DESTINATION "${CMAKE_BINARY_DIR}/include/" )
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include <omp.h>
#include <algorithm>
#include <chrono> // NOLINT
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include <ctime>
#include "include/paddlex/paddlex.h"
#include "include/paddlex/visualize.h"
#if defined(__arm__) || defined(__aarch64__)
#include <opencv2/videoio/legacy/constants_c.h>
#endif
using namespace std::chrono; // NOLINT
DEFINE_string(model_dir, "", "Path of inference model");
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
DEFINE_bool(use_trt, false, "Infering with TensorRT");
DEFINE_int32(gpu_id, 0, "GPU card id");
DEFINE_string(key, "", "key of encryption");
DEFINE_string(image, "", "Path of test image file");
DEFINE_bool(use_camera, false, "Infering with Camera");
DEFINE_int32(camera_id, 0, "Camera id");
DEFINE_string(video_path, "", "Path of input video");
DEFINE_bool(show_result, false, "show the result of each frame with a window");
DEFINE_bool(save_result, true, "save the result of each frame to a video");
DEFINE_string(save_dir, "output", "Path to save visualized image");
int main(int argc, char** argv) {
// Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
std::cerr << "--model_dir need to be defined" << std::endl;
return -1;
}
if (FLAGS_image == "" & FLAGS_video_path == ""
& FLAGS_use_camera == false) {
std::cerr << "--image or --video_path or --use_camera need to be defined"
<< std::endl;
return -1;
}
// Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
FLAGS_key);
if (FLAGS_use_camera || FLAGS_video_path != "") {
// Open video
cv::VideoCapture capture;
if (FLAGS_use_camera) {
capture.open(FLAGS_camera_id);
if (!capture.isOpened()) {
std::cout << "Can not open the camera "
<< FLAGS_camera_id << "."
<< std::endl;
return -1;
}
} else {
capture.open(FLAGS_video_path);
if (!capture.isOpened()) {
std::cout << "Can not open the video "
<< FLAGS_video_path << "."
<< std::endl;
return -1;
}
}
// Create a VideoWriter
cv::VideoWriter video_out;
std::string video_out_path;
if (FLAGS_save_result) {
// Get video information: resolution, fps
int video_width = static_cast<int>(capture.get(CV_CAP_PROP_FRAME_WIDTH));
int video_height =
static_cast<int>(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
int video_fps = static_cast<int>(capture.get(CV_CAP_PROP_FPS));
int video_fourcc;
if (FLAGS_use_camera) {
video_fourcc = 828601953;
} else {
video_fourcc = static_cast<int>(capture.get(CV_CAP_PROP_FOURCC));
}
if (FLAGS_use_camera) {
time_t now = time(0);
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir,
std::to_string(now) + ".mp4");
} else {
video_out_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
}
video_out.open(video_out_path.c_str(),
video_fourcc,
video_fps,
cv::Size(video_width, video_height),
true);
if (!video_out.isOpened()) {
std::cout << "Create video writer failed!" << std::endl;
return -1;
}
}
PaddleX::SegResult result;
cv::Mat frame;
int key;
while (capture.read(frame)) {
if (FLAGS_show_result || FLAGS_use_camera) {
key = cv::waitKey(1);
// When pressing `ESC`, then exit program and result video is saved
if (key == 27) {
break;
}
} else if (frame.empty()) {
break;
}
// Begin to predict
model.predict(frame, &result);
// Visualize results
std::vector<uint8_t> label_map(result.label_map.data.begin(),
result.label_map.data.end());
cv::Mat mask(result.label_map.shape[0],
result.label_map.shape[1],
CV_8UC1,
label_map.data());
int rows = result.label_map.shape[0];
int cols = result.label_map.shape[1];
cv::Mat vis_img = frame.clone();
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
int category_id = static_cast<int>(mask.at<uchar>(i, j));
if (category_id == 0) {
vis_img.at<cv::Vec3b>(i, j)[0] = 255;
vis_img.at<cv::Vec3b>(i, j)[1] = 255;
vis_img.at<cv::Vec3b>(i, j)[2] = 255;
}
}
}
if (FLAGS_show_result || FLAGS_use_camera) {
cv::imshow("human_seg", vis_img);
}
if (FLAGS_save_result) {
video_out.write(vis_img);
}
result.clear();
}
capture.release();
if (FLAGS_save_result) {
video_out.release();
std::cout << "Visualized output saved as " << video_out_path << std::endl;
}
if (FLAGS_show_result || FLAGS_use_camera) {
cv::destroyAllWindows();
}
} else {
PaddleX::SegResult result;
cv::Mat im = cv::imread(FLAGS_image, 1);
model.predict(im, &result);
// Visualize results
std::vector<uint8_t> label_map(result.label_map.data.begin(),
result.label_map.data.end());
cv::Mat mask(result.label_map.shape[0],
result.label_map.shape[1],
CV_8UC1,
label_map.data());
int rows = result.label_map.shape[0];
int cols = result.label_map.shape[1];
cv::Mat vis_img = im.clone();
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
int category_id = static_cast<int>(mask.at<uchar>(i, j));
if (category_id == 0) {
vis_img.at<cv::Vec3b>(i, j)[0] = 255;
vis_img.at<cv::Vec3b>(i, j)[1] = 255;
vis_img.at<cv::Vec3b>(i, j)[2] = 255;
}
}
}
std::string save_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image);
cv::imwrite(save_path, vis_img);
result.clear();
std::cout << "Visualized output saved as " << save_path << std::endl;
}
return 0;
}
...@@ -148,8 +148,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -148,8 +148,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)| | use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 | | gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"| | save_dir | 保存可视化结果的路径, 默认值为"output"|
| det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
| seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 | | seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 | | thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) | | use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
...@@ -163,13 +161,20 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -163,13 +161,20 @@ git clone https://github.com/PaddlePaddle/PaddleX
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[导出inference模型](https://paddlex.readthedocs.io/zh_CN/latest/tutorials/deploy/deploy_server/deploy_python.html#inference)将模型导出为inference格式。 用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[导出inference模型](https://paddlex.readthedocs.io/zh_CN/latest/tutorials/deploy/deploy_server/deploy_python.html#inference)将模型导出为inference格式。
* 使用未加密的模型对单张图片做预测 * 使用未加密的模型对单张图片做预测
```shell ```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image=\path\to\meter_test\20190822_168.jpg --use_gpu=1 --use_erode=1 --save_dir=output .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image=\path\to\meter_test\20190822_168.jpg --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
* 使用未加密的模型对图像列表做预测 * 使用未加密的模型对图像列表做预测
图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
```
\path\to\images\1.jpg
\path\to\images\2.jpg
...
\path\to\images\n.jpg
```
```shell ```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
...@@ -180,12 +185,12 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -180,12 +185,12 @@ git clone https://github.com/PaddlePaddle/PaddleX
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
* 使用加密后的模型对单张图片做预测 * 使用加密后的模型对单张图片做预测
如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=` 如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
```shell ```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY= .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
``` ```
### Linux系统的jetson嵌入式设备安全部署 ### Linux系统的jetson嵌入式设备安全部署
...@@ -213,8 +218,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -213,8 +218,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)| | use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 | | gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"| | save_dir | 保存可视化结果的路径, 默认值为"output"|
| det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
| seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 | | seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 | | thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) | | use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
...@@ -234,6 +237,13 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -234,6 +237,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
``` ```
* 使用未加密的模型对图像列表做预测 * 使用未加密的模型对图像列表做预测
图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
```
\path\to\images\1.jpg
\path\to\images\2.jpg
...
\path\to\images\n.jpg
```
```shell ```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output ./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
...@@ -245,15 +255,6 @@ git clone https://github.com/PaddlePaddle/PaddleX ...@@ -245,15 +255,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output ./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
``` ```
* 使用加密后的模型对单张图片做预测
如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`/path/to/encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`/path/to/encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/encrypted_det_inference_model --seg_model_dir=/path/to/encrypted_seg_inference_model --image=/path/to/test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
```
## <h2 id="5">模型训练</h2> ## <h2 id="5">模型训练</h2>
......
...@@ -51,7 +51,8 @@ DEFINE_string(seg_key, "", "Segmenter model key of encryption"); ...@@ -51,7 +51,8 @@ DEFINE_string(seg_key, "", "Segmenter model key of encryption");
DEFINE_string(image, "", "Path of test image file"); DEFINE_string(image, "", "Path of test image file");
DEFINE_string(image_list, "", "Path of test image list file"); DEFINE_string(image_list, "", "Path of test image list file");
DEFINE_string(save_dir, "output", "Path to save visualized image"); DEFINE_string(save_dir, "output", "Path to save visualized image");
DEFINE_double(score_threshold, 0.5, "Detected bbox whose score is lower than this threshlod is filtered"); DEFINE_double(score_threshold, 0.5,
"Detected bbox whose score is lower than this threshlod is filtered");
void predict(const cv::Mat &input_image, PaddleX::Model *det_model, void predict(const cv::Mat &input_image, PaddleX::Model *det_model,
PaddleX::Model *seg_model, const std::string save_dir, PaddleX::Model *seg_model, const std::string save_dir,
...@@ -207,7 +208,7 @@ int main(int argc, char **argv) { ...@@ -207,7 +208,7 @@ int main(int argc, char **argv) {
return -1; return -1;
} }
// 加载模型 // Load model
PaddleX::Model det_model; PaddleX::Model det_model;
det_model.Init(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_use_trt, det_model.Init(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_use_trt,
FLAGS_gpu_id, FLAGS_det_key); FLAGS_gpu_id, FLAGS_det_key);
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
from six import text_type as _text_type from six import text_type as _text_type
import argparse import argparse
import sys import sys
import os.path as osp
import paddlex.utils.logging as logging import paddlex.utils.logging as logging
...@@ -91,6 +92,33 @@ def arg_parser(): ...@@ -91,6 +92,33 @@ def arg_parser():
"-fs", "-fs",
default=None, default=None,
help="export inference model with fixed input shape:[w,h]") help="export inference model with fixed input shape:[w,h]")
parser.add_argument(
"--split_dataset",
"-sd",
action="store_true",
default=False,
help="split dataset with the split value")
parser.add_argument(
"--format",
"-f",
default=None,
help="define dataset format(ImageNet/COCO/VOC/Seg)")
parser.add_argument(
"--dataset_dir",
"-dd",
type=_text_type,
default=None,
help="define the path of dataset to be splited")
parser.add_argument(
"--val_value",
"-vv",
default=None,
help="define the value of validation dataset(E.g 0.2)")
parser.add_argument(
"--test_value",
"-tv",
default=None,
help="define the value of test dataset(E.g 0.1)")
return parser return parser
...@@ -159,6 +187,30 @@ def main(): ...@@ -159,6 +187,30 @@ def main():
pdx.tools.convert.dataset_conversion(args.source, args.to, args.pics, pdx.tools.convert.dataset_conversion(args.source, args.to, args.pics,
args.annotations, args.save_dir) args.annotations, args.save_dir)
if args.split_dataset:
assert args.dataset_dir is not None, "--dataset_dir should be defined while spliting dataset"
assert args.format is not None, "--form should be defined while spliting dataset"
assert args.val_value is not None, "--val_value should be defined while spliting dataset"
dataset_dir = args.dataset_dir
dataset_format = args.format.lower()
val_value = float(args.val_value)
test_value = float(args.test_value
if args.test_value is not None else 0)
save_dir = dataset_dir
if not dataset_format in ["coco", "imagenet", "voc", "seg"]:
logging.error(
"The dataset format is not correct defined.(support COCO/ImageNet/VOC/Seg)"
)
if not osp.exists(dataset_dir):
logging.error("The path of dataset to be splited doesn't exist.")
if val_value <= 0 or val_value >= 1 or test_value < 0 or test_value >= 1 or val_value + test_value >= 1:
logging.error("The value of split is not correct.")
pdx.tools.split.dataset_split(dataset_dir, dataset_format, val_value,
test_value, save_dir)
if __name__ == "__main__": if __name__ == "__main__":
main() main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -26,6 +26,7 @@ ResNet50 = models.ResNet50 ...@@ -26,6 +26,7 @@ ResNet50 = models.ResNet50
DarkNet53 = models.DarkNet53 DarkNet53 = models.DarkNet53
# detection # detection
YOLOv3 = models.YOLOv3 YOLOv3 = models.YOLOv3
PPYOLO = models.PPYOLO
#EAST = models.EAST #EAST = models.EAST
FasterRCNN = models.FasterRCNN FasterRCNN = models.FasterRCNN
MaskRCNN = models.MaskRCNN MaskRCNN = models.MaskRCNN
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -15,6 +15,8 @@ ...@@ -15,6 +15,8 @@
from __future__ import absolute_import from __future__ import absolute_import
import copy import copy
import os.path as osp import os.path as osp
import six
import sys
import random import random
import numpy as np import numpy as np
import paddlex.utils.logging as logging import paddlex.utils.logging as logging
...@@ -48,6 +50,12 @@ class CocoDetection(VOCDetection): ...@@ -48,6 +50,12 @@ class CocoDetection(VOCDetection):
shuffle=False): shuffle=False):
from pycocotools.coco import COCO from pycocotools.coco import COCO
try:
import shapely.ops
from shapely.geometry import Polygon, MultiPolygon, GeometryCollection
except:
six.reraise(*sys.exc_info())
super(VOCDetection, self).__init__( super(VOCDetection, self).__init__(
transforms=transforms, transforms=transforms,
num_workers=num_workers, num_workers=num_workers,
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -115,7 +115,7 @@ def multithread_reader(mapper, ...@@ -115,7 +115,7 @@ def multithread_reader(mapper,
while not isinstance(sample, EndSignal): while not isinstance(sample, EndSignal):
batch_data.append(sample) batch_data.append(sample)
if len(batch_data) == batch_size: if len(batch_data) == batch_size:
batch_data = generate_minibatch(batch_data) batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data yield batch_data
batch_data = [] batch_data = []
sample = out_queue.get() sample = out_queue.get()
...@@ -127,11 +127,11 @@ def multithread_reader(mapper, ...@@ -127,11 +127,11 @@ def multithread_reader(mapper,
else: else:
batch_data.append(sample) batch_data.append(sample)
if len(batch_data) == batch_size: if len(batch_data) == batch_size:
batch_data = generate_minibatch(batch_data) batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data yield batch_data
batch_data = [] batch_data = []
if not drop_last and len(batch_data) != 0: if not drop_last and len(batch_data) != 0:
batch_data = generate_minibatch(batch_data) batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data yield batch_data
batch_data = [] batch_data = []
...@@ -188,18 +188,21 @@ def multiprocess_reader(mapper, ...@@ -188,18 +188,21 @@ def multiprocess_reader(mapper,
else: else:
batch_data.append(sample) batch_data.append(sample)
if len(batch_data) == batch_size: if len(batch_data) == batch_size:
batch_data = generate_minibatch(batch_data) batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data yield batch_data
batch_data = [] batch_data = []
if len(batch_data) != 0 and not drop_last: if len(batch_data) != 0 and not drop_last:
batch_data = generate_minibatch(batch_data) batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data yield batch_data
batch_data = [] batch_data = []
return queue_reader return queue_reader
def generate_minibatch(batch_data, label_padding_value=255): def generate_minibatch(batch_data, label_padding_value=255, mapper=None):
if mapper is not None and mapper.batch_transforms is not None:
for op in mapper.batch_transforms:
batch_data = op(batch_data)
# if batch_size is 1, do not pad the image # if batch_size is 1, do not pad the image
if len(batch_data) == 1: if len(batch_data) == 1:
return batch_data return batch_data
...@@ -218,14 +221,13 @@ def generate_minibatch(batch_data, label_padding_value=255): ...@@ -218,14 +221,13 @@ def generate_minibatch(batch_data, label_padding_value=255):
(im_c, max_shape[1], max_shape[2]), dtype=np.float32) (im_c, max_shape[1], max_shape[2]), dtype=np.float32)
padding_im[:, :im_h, :im_w] = data[0] padding_im[:, :im_h, :im_w] = data[0]
if len(data) > 2: if len(data) > 2:
# padding the image, label and insert 'padding' into `im_info` of segmentation during evaluating phase. # padding the image, label and insert 'padding' into `im_info` of segmentation during evaluating phase.
if len(data[1]) == 0 or 'padding' not in [ if len(data[1]) == 0 or 'padding' not in [
data[1][i][0] for i in range(len(data[1])) data[1][i][0] for i in range(len(data[1]))
]: ]:
data[1].append(('padding', [im_h, im_w])) data[1].append(('padding', [im_h, im_w]))
padding_batch.append((padding_im, data[1], data[2])) padding_batch.append((padding_im, data[1], data[2]))
elif len(data) > 1: elif len(data) > 1:
if isinstance(data[1], np.ndarray) and len(data[1].shape) > 1: if isinstance(data[1], np.ndarray) and len(data[1].shape) > 1:
# padding the image and label of segmentation during the training # padding the image and label of segmentation during the training
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -25,6 +25,7 @@ from .voc import VOCDetection ...@@ -25,6 +25,7 @@ from .voc import VOCDetection
from .dataset import is_pic from .dataset import is_pic
from .dataset import get_encoding from .dataset import get_encoding
class EasyDataDet(VOCDetection): class EasyDataDet(VOCDetection):
"""读取EasyDataDet格式的检测数据集,并对样本进行相应的处理。 """读取EasyDataDet格式的检测数据集,并对样本进行相应的处理。
...@@ -41,7 +42,7 @@ class EasyDataDet(VOCDetection): ...@@ -41,7 +42,7 @@ class EasyDataDet(VOCDetection):
线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。 线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。 shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
""" """
def __init__(self, def __init__(self,
data_dir, data_dir,
file_list, file_list,
...@@ -60,12 +61,12 @@ class EasyDataDet(VOCDetection): ...@@ -60,12 +61,12 @@ class EasyDataDet(VOCDetection):
self.file_list = list() self.file_list = list()
self.labels = list() self.labels = list()
self._epoch = 0 self._epoch = 0
annotations = {} annotations = {}
annotations['images'] = [] annotations['images'] = []
annotations['categories'] = [] annotations['categories'] = []
annotations['annotations'] = [] annotations['annotations'] = []
cname2cid = {} cname2cid = {}
label_id = 1 label_id = 1
with open(label_list, encoding=get_encoding(label_list)) as fr: with open(label_list, encoding=get_encoding(label_list)) as fr:
...@@ -80,7 +81,7 @@ class EasyDataDet(VOCDetection): ...@@ -80,7 +81,7 @@ class EasyDataDet(VOCDetection):
'id': v, 'id': v,
'name': k 'name': k
}) })
from pycocotools.mask import decode from pycocotools.mask import decode
ct = 0 ct = 0
ann_ct = 0 ann_ct = 0
...@@ -95,8 +96,8 @@ class EasyDataDet(VOCDetection): ...@@ -95,8 +96,8 @@ class EasyDataDet(VOCDetection):
if not osp.isfile(json_file): if not osp.isfile(json_file):
continue continue
if not osp.exists(img_file): if not osp.exists(img_file):
raise IOError( raise IOError('The image file {} is not exist!'.format(
'The image file {} is not exist!'.format(img_file)) img_file))
with open(json_file, mode='r', \ with open(json_file, mode='r', \
encoding=get_encoding(json_file)) as j: encoding=get_encoding(json_file)) as j:
json_info = json.load(j) json_info = json.load(j)
...@@ -127,21 +128,15 @@ class EasyDataDet(VOCDetection): ...@@ -127,21 +128,15 @@ class EasyDataDet(VOCDetection):
mask = decode(mask_dict) mask = decode(mask_dict)
gt_poly[i] = self.mask2polygon(mask) gt_poly[i] = self.mask2polygon(mask)
annotations['annotations'].append({ annotations['annotations'].append({
'iscrowd': 'iscrowd': 0,
0, 'image_id': int(im_id[0]),
'image_id':
int(im_id[0]),
'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1], 'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1],
'area': 'area': float((x2 - x1 + 1) * (y2 - y1 + 1)),
float((x2 - x1 + 1) * (y2 - y1 + 1)), 'segmentation': [[x1, y1, x1, y2, x2, y2, x2, y1]]
'segmentation': if gt_poly[i] is None else gt_poly[i],
[[x1, y1, x1, y2, x2, y2, x2, y1]] if gt_poly[i] is None else gt_poly[i], 'category_id': cname2cid[cname],
'category_id': 'id': ann_ct,
cname2cid[cname], 'difficult': 0
'id':
ann_ct,
'difficult':
0
}) })
ann_ct += 1 ann_ct += 1
im_info = { im_info = {
...@@ -162,14 +157,10 @@ class EasyDataDet(VOCDetection): ...@@ -162,14 +157,10 @@ class EasyDataDet(VOCDetection):
self.file_list.append([img_file, voc_rec]) self.file_list.append([img_file, voc_rec])
ct += 1 ct += 1
annotations['images'].append({ annotations['images'].append({
'height': 'height': im_h,
im_h, 'width': im_w,
'width': 'id': int(im_id[0]),
im_w, 'file_name': osp.split(img_file)[1]
'id':
int(im_id[0]),
'file_name':
osp.split(img_file)[1]
}) })
if not len(self.file_list) > 0: if not len(self.file_list) > 0:
...@@ -181,13 +172,13 @@ class EasyDataDet(VOCDetection): ...@@ -181,13 +172,13 @@ class EasyDataDet(VOCDetection):
self.coco_gt = COCO() self.coco_gt = COCO()
self.coco_gt.dataset = annotations self.coco_gt.dataset = annotations
self.coco_gt.createIndex() self.coco_gt.createIndex()
def mask2polygon(self, mask): def mask2polygon(self, mask):
contours, hierarchy = cv2.findContours( contours, hierarchy = cv2.findContours(
(mask).astype(np.uint8), cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) (mask).astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
segmentation = [] segmentation = []
for contour in contours: for contour in contours:
contour_list = contour.flatten().tolist() contour_list = contour.flatten().tolist()
if len(contour_list) > 4: if len(contour_list) > 4:
segmentation.append(contour_list) segmentation.append(contour_list)
return segmentation return segmentation
\ No newline at end of file
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -25,6 +25,7 @@ from .dataset import Dataset ...@@ -25,6 +25,7 @@ from .dataset import Dataset
from .dataset import get_encoding from .dataset import get_encoding
from .dataset import is_pic from .dataset import is_pic
class EasyDataSeg(Dataset): class EasyDataSeg(Dataset):
"""读取EasyDataSeg语义分割任务数据集,并对样本进行相应的处理。 """读取EasyDataSeg语义分割任务数据集,并对样本进行相应的处理。
...@@ -67,7 +68,7 @@ class EasyDataSeg(Dataset): ...@@ -67,7 +68,7 @@ class EasyDataSeg(Dataset):
cname2cid[line.strip()] = label_id cname2cid[line.strip()] = label_id
label_id += 1 label_id += 1
self.labels.append(line.strip()) self.labels.append(line.strip())
with open(file_list, encoding=get_encoding(file_list)) as f: with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f: for line in f:
img_file, json_file = [osp.join(data_dir, x) \ img_file, json_file = [osp.join(data_dir, x) \
...@@ -79,8 +80,8 @@ class EasyDataSeg(Dataset): ...@@ -79,8 +80,8 @@ class EasyDataSeg(Dataset):
if not osp.isfile(json_file): if not osp.isfile(json_file):
continue continue
if not osp.exists(img_file): if not osp.exists(img_file):
raise IOError( raise IOError('The image file {} is not exist!'.format(
'The image file {} is not exist!'.format(img_file)) img_file))
with open(json_file, mode='r', \ with open(json_file, mode='r', \
encoding=get_encoding(json_file)) as j: encoding=get_encoding(json_file)) as j:
json_info = json.load(j) json_info = json.load(j)
...@@ -97,7 +98,8 @@ class EasyDataSeg(Dataset): ...@@ -97,7 +98,8 @@ class EasyDataSeg(Dataset):
mask_dict['counts'] = obj['mask'].encode() mask_dict['counts'] = obj['mask'].encode()
mask = decode(mask_dict) mask = decode(mask_dict)
mask *= cid mask *= cid
conflict_index = np.where(((lable_npy > 0) & (mask == cid)) == True) conflict_index = np.where(((lable_npy > 0) &
(mask == cid)) == True)
mask[conflict_index] = 0 mask[conflict_index] = 0
lable_npy += mask lable_npy += mask
self.file_list.append([img_file, lable_npy]) self.file_list.append([img_file, lable_npy])
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -278,8 +278,8 @@ class PageAllocator(object): ...@@ -278,8 +278,8 @@ class PageAllocator(object):
def set_alloc_info(self, alloc_pos, used_pages): def set_alloc_info(self, alloc_pos, used_pages):
""" set allocating position to new value """ set allocating position to new value
""" """
memcopy(self._base[4:12], struct.pack( memcopy(self._base[4:12],
str('II'), alloc_pos, used_pages)) struct.pack(str('II'), alloc_pos, used_pages))
def set_page_status(self, start, page_num, status): def set_page_status(self, start, page_num, status):
""" set pages from 'start' to 'end' with new same status 'status' """ set pages from 'start' to 'end' with new same status 'status'
...@@ -525,8 +525,8 @@ class SharedMemoryMgr(object): ...@@ -525,8 +525,8 @@ class SharedMemoryMgr(object):
logger.info('destroy [%s]' % (self)) logger.info('destroy [%s]' % (self))
if not self._released and not self._allocator.empty(): if not self._released and not self._allocator.empty():
logger.debug( logger.debug('not empty when delete this SharedMemoryMgr[%s]' %
'not empty when delete this SharedMemoryMgr[%s]' % (self)) (self))
else: else:
self._released = True self._released = True
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -38,6 +38,7 @@ from .classifier import HRNet_W18 ...@@ -38,6 +38,7 @@ from .classifier import HRNet_W18
from .classifier import AlexNet from .classifier import AlexNet
from .base import BaseAPI from .base import BaseAPI
from .yolo_v3 import YOLOv3 from .yolo_v3 import YOLOv3
from .ppyolo import PPYOLO
from .faster_rcnn import FasterRCNN from .faster_rcnn import FasterRCNN
from .mask_rcnn import MaskRCNN from .mask_rcnn import MaskRCNN
from .unet import UNet from .unet import UNet
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -246,8 +246,8 @@ class BaseAPI: ...@@ -246,8 +246,8 @@ class BaseAPI:
logging.info( logging.info(
"Load pretrain weights from {}.".format(pretrain_weights), "Load pretrain weights from {}.".format(pretrain_weights),
use_color=True) use_color=True)
paddlex.utils.utils.load_pretrain_weights(self.exe, self.train_prog, paddlex.utils.utils.load_pretrain_weights(
pretrain_weights, fuse_bn) self.exe, self.train_prog, pretrain_weights, fuse_bn)
# 进行裁剪 # 进行裁剪
if sensitivities_file is not None: if sensitivities_file is not None:
import paddleslim import paddleslim
...@@ -351,7 +351,9 @@ class BaseAPI: ...@@ -351,7 +351,9 @@ class BaseAPI:
logging.info("Model saved in {}.".format(save_dir)) logging.info("Model saved in {}.".format(save_dir))
def export_inference_model(self, save_dir): def export_inference_model(self, save_dir):
test_input_names = [var.name for var in list(self.test_inputs.values())] test_input_names = [
var.name for var in list(self.test_inputs.values())
]
test_outputs = list(self.test_outputs.values()) test_outputs = list(self.test_outputs.values())
with fluid.scope_guard(self.scope): with fluid.scope_guard(self.scope):
if self.__class__.__name__ == 'MaskRCNN': if self.__class__.__name__ == 'MaskRCNN':
...@@ -389,7 +391,8 @@ class BaseAPI: ...@@ -389,7 +391,8 @@ class BaseAPI:
# 模型保存成功的标志 # 模型保存成功的标志
open(osp.join(save_dir, '.success'), 'w').close() open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model for inference deploy saved in {}.".format(save_dir)) logging.info("Model for inference deploy saved in {}.".format(
save_dir))
def train_loop(self, def train_loop(self,
num_epochs, num_epochs,
...@@ -516,11 +519,13 @@ class BaseAPI: ...@@ -516,11 +519,13 @@ class BaseAPI:
eta = ((num_epochs - i) * total_num_steps - step - 1 eta = ((num_epochs - i) * total_num_steps - step - 1
) * avg_step_time ) * avg_step_time
if time_eval_one_epoch is not None: if time_eval_one_epoch is not None:
eval_eta = (total_eval_times - i // save_interval_epochs eval_eta = (
) * time_eval_one_epoch total_eval_times - i // save_interval_epochs
) * time_eval_one_epoch
else: else:
eval_eta = (total_eval_times - i // save_interval_epochs eval_eta = (
) * total_num_steps_eval * avg_step_time total_eval_times - i // save_interval_epochs
) * total_num_steps_eval * avg_step_time
eta_str = seconds_to_hms(eta + eval_eta) eta_str = seconds_to_hms(eta + eval_eta)
logging.info( logging.info(
...@@ -543,6 +548,8 @@ class BaseAPI: ...@@ -543,6 +548,8 @@ class BaseAPI:
current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1)) current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
if not osp.isdir(current_save_dir): if not osp.isdir(current_save_dir):
os.makedirs(current_save_dir) os.makedirs(current_save_dir)
if getattr(self, 'use_ema', False):
self.exe.run(self.ema.apply_program)
if eval_dataset is not None and eval_dataset.num_samples > 0: if eval_dataset is not None and eval_dataset.num_samples > 0:
self.eval_metrics, self.eval_details = self.evaluate( self.eval_metrics, self.eval_details = self.evaluate(
eval_dataset=eval_dataset, eval_dataset=eval_dataset,
...@@ -569,6 +576,8 @@ class BaseAPI: ...@@ -569,6 +576,8 @@ class BaseAPI:
log_writer.add_scalar( log_writer.add_scalar(
"Metrics/Eval(Epoch): {}".format(k), v, i + 1) "Metrics/Eval(Epoch): {}".format(k), v, i + 1)
self.save_model(save_dir=current_save_dir) self.save_model(save_dir=current_save_dir)
if getattr(self, 'use_ema', False):
self.exe.run(self.ema.restore_program)
time_eval_one_epoch = time.time() - eval_epoch_start_time time_eval_one_epoch = time.time() - eval_epoch_start_time
eval_epoch_start_time = time.time() eval_epoch_start_time = time.time()
if best_model_epoch > 0: if best_model_epoch > 0:
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -37,7 +37,7 @@ class DeepLabv3p(BaseAPI): ...@@ -37,7 +37,7 @@ class DeepLabv3p(BaseAPI):
num_classes (int): 类别数。 num_classes (int): 类别数。
backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41',
'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld']。默认'MobileNetV2_x1.0'。
output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。 output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。默认True。 aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。默认True。
decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。默认True。 decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。默认True。
...@@ -51,10 +51,13 @@ class DeepLabv3p(BaseAPI): ...@@ -51,10 +51,13 @@ class DeepLabv3p(BaseAPI):
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None时,各类的权重1, 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None时,各类的权重1,
即平时使用的交叉熵损失函数。 即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。 ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
pooling_crop_size (list): 当backbone为MobileNetV3_large_x1_0_ssld时,需设置为训练过程中模型输入大小, 格式为[W, H]。
在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用'pool'算子得到平均值。
默认值为None。
Raises: Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。 ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25', ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25',
'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0']之内。 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld']之内。
ValueError: class_weight为list, 但长度不等于num_class。 ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。 class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。 TypeError: class_weight不为None时,其类型不是list或str。
...@@ -71,7 +74,8 @@ class DeepLabv3p(BaseAPI): ...@@ -71,7 +74,8 @@ class DeepLabv3p(BaseAPI):
use_bce_loss=False, use_bce_loss=False,
use_dice_loss=False, use_dice_loss=False,
class_weight=None, class_weight=None,
ignore_index=255): ignore_index=255,
pooling_crop_size=None):
self.init_params = locals() self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter') super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中 # dice_loss或bce_loss只适用两类分割中
...@@ -85,12 +89,12 @@ class DeepLabv3p(BaseAPI): ...@@ -85,12 +89,12 @@ class DeepLabv3p(BaseAPI):
if backbone not in [ if backbone not in [
'Xception65', 'Xception41', 'MobileNetV2_x0.25', 'Xception65', 'Xception41', 'MobileNetV2_x0.25',
'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
'MobileNetV2_x2.0' 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld'
]: ]:
raise ValueError( raise ValueError(
"backbone: {} is set wrong. it should be one of " "backbone: {} is set wrong. it should be one of "
"('Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5'," "('Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',"
" 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0')". " 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld')".
format(backbone)) format(backbone))
if class_weight is not None: if class_weight is not None:
...@@ -121,6 +125,30 @@ class DeepLabv3p(BaseAPI): ...@@ -121,6 +125,30 @@ class DeepLabv3p(BaseAPI):
self.labels = None self.labels = None
self.sync_bn = True self.sync_bn = True
self.fixed_input_shape = None self.fixed_input_shape = None
self.pooling_stride = [1, 1]
self.pooling_crop_size = pooling_crop_size
self.aspp_with_se = False
self.se_use_qsigmoid = False
self.aspp_convs_filters = 256
self.aspp_with_concat_projection = True
self.add_image_level_feature = True
self.use_sum_merge = False
self.conv_filters = 256
self.output_is_logits = False
self.backbone_lr_mult_list = None
if 'MobileNetV3' in backbone:
self.output_stride = 32
self.pooling_stride = (4, 5)
self.aspp_with_se = True
self.se_use_qsigmoid = True
self.aspp_convs_filters = 128
self.aspp_with_concat_projection = False
self.add_image_level_feature = False
self.use_sum_merge = True
self.output_is_logits = True
if self.output_is_logits:
self.conv_filters = self.num_classes
self.backbone_lr_mult_list = [0.15, 0.35, 0.65, 0.85, 1]
def _get_backbone(self, backbone): def _get_backbone(self, backbone):
def mobilenetv2(backbone): def mobilenetv2(backbone):
...@@ -167,10 +195,22 @@ class DeepLabv3p(BaseAPI): ...@@ -167,10 +195,22 @@ class DeepLabv3p(BaseAPI):
end_points=end_points, end_points=end_points,
decode_points=decode_points) decode_points=decode_points)
def mobilenetv3(backbone):
scale = 1.0
lr_mult_list = self.backbone_lr_mult_list
return paddlex.cv.nets.MobileNetV3(
scale=scale,
model_name='large',
output_stride=self.output_stride,
lr_mult_list=lr_mult_list,
for_seg=True)
if 'Xception' in backbone: if 'Xception' in backbone:
return xception(backbone) return xception(backbone)
elif 'MobileNetV2' in backbone: elif 'MobileNetV2' in backbone:
return mobilenetv2(backbone) return mobilenetv2(backbone)
elif 'MobileNetV3' in backbone:
return mobilenetv3(backbone)
def build_net(self, mode='train'): def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.DeepLabv3p( model = paddlex.cv.nets.segmentation.DeepLabv3p(
...@@ -186,7 +226,17 @@ class DeepLabv3p(BaseAPI): ...@@ -186,7 +226,17 @@ class DeepLabv3p(BaseAPI):
use_dice_loss=self.use_dice_loss, use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight, class_weight=self.class_weight,
ignore_index=self.ignore_index, ignore_index=self.ignore_index,
fixed_input_shape=self.fixed_input_shape) fixed_input_shape=self.fixed_input_shape,
pooling_stride=self.pooling_stride,
pooling_crop_size=self.pooling_crop_size,
aspp_with_se=self.aspp_with_se,
se_use_qsigmoid=self.se_use_qsigmoid,
aspp_convs_filters=self.aspp_convs_filters,
aspp_with_concat_projection=self.aspp_with_concat_projection,
add_image_level_feature=self.add_image_level_feature,
use_sum_merge=self.use_sum_merge,
conv_filters=self.conv_filters,
output_is_logits=self.output_is_logits)
inputs = model.generate_inputs() inputs = model.generate_inputs()
model_out = model.build_net(inputs) model_out = model.build_net(inputs)
outputs = OrderedDict() outputs = OrderedDict()
...@@ -360,18 +410,16 @@ class DeepLabv3p(BaseAPI): ...@@ -360,18 +410,16 @@ class DeepLabv3p(BaseAPI):
pred = pred[0:num_samples] pred = pred[0:num_samples]
for i in range(num_samples): for i in range(num_samples):
one_pred = pred[i].astype('uint8') one_pred = np.squeeze(pred[i]).astype('uint8')
one_label = labels[i] one_label = labels[i]
for info in im_info[i][::-1]: for info in im_info[i][::-1]:
if info[0] == 'resize': if info[0] == 'resize':
w, h = info[1][1], info[1][0] w, h = info[1][1], info[1][0]
one_pred = cv2.resize(one_pred, (w, h), cv2.INTER_NEAREST) one_pred = cv2.resize(one_pred, (w, h),
cv2.INTER_NEAREST)
elif info[0] == 'padding': elif info[0] == 'padding':
w, h = info[1][1], info[1][0] w, h = info[1][1], info[1][0]
one_pred = one_pred[0:h, 0:w] one_pred = one_pred[0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(
info[0]))
one_pred = one_pred.astype('int64') one_pred = one_pred.astype('int64')
one_pred = one_pred[np.newaxis, :, :, np.newaxis] one_pred = one_pred[np.newaxis, :, :, np.newaxis]
one_label = one_label[np.newaxis, np.newaxis, :, :] one_label = one_label[np.newaxis, np.newaxis, :, :]
...@@ -429,9 +477,6 @@ class DeepLabv3p(BaseAPI): ...@@ -429,9 +477,6 @@ class DeepLabv3p(BaseAPI):
w, h = info[1][1], info[1][0] w, h = info[1][1], info[1][0]
pred = pred[0:h, 0:w] pred = pred[0:h, 0:w]
logit = logit[0:h, 0:w, :] logit = logit[0:h, 0:w, :]
else:
raise Exception("Unexpected info '{}' in im_info".format(
info[0]))
pred_list.append(pred) pred_list.append(pred)
logit_list.append(logit) logit_list.append(logit)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -103,8 +103,8 @@ def load_model(model_dir, fixed_input_shape=None): ...@@ -103,8 +103,8 @@ def load_model(model_dir, fixed_input_shape=None):
model.model_type, info['Transforms'], info['BatchTransforms']) model.model_type, info['Transforms'], info['BatchTransforms'])
model.eval_transforms = copy.deepcopy(model.test_transforms) model.eval_transforms = copy.deepcopy(model.test_transforms)
else: else:
model.test_transforms = build_transforms(model.model_type, model.test_transforms = build_transforms(
info['Transforms'], to_rgb) model.model_type, info['Transforms'], to_rgb)
model.eval_transforms = copy.deepcopy(model.test_transforms) model.eval_transforms = copy.deepcopy(model.test_transforms)
if '_Attributes' in info: if '_Attributes' in info:
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -280,8 +280,9 @@ class MaskRCNN(FasterRCNN): ...@@ -280,8 +280,9 @@ class MaskRCNN(FasterRCNN):
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size) total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list() results = list()
logging.info("Start to evaluating(total_samples={}, total_steps={})...". logging.info(
format(eval_dataset.num_samples, total_steps)) "Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm( for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps): enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data]).astype('float32') images = np.array([d[0] for d in data]).astype('float32')
...@@ -325,7 +326,8 @@ class MaskRCNN(FasterRCNN): ...@@ -325,7 +326,8 @@ class MaskRCNN(FasterRCNN):
zip(['bbox_map', 'segm_map'], zip(['bbox_map', 'segm_map'],
[ap_stats[0][1], ap_stats[1][1]])) [ap_stats[0][1], ap_stats[1][1]]))
else: else:
metrics = OrderedDict(zip(['bbox_map', 'segm_map'], [0.0, 0.0])) metrics = OrderedDict(
zip(['bbox_map', 'segm_map'], [0.0, 0.0]))
elif metric == 'COCO': elif metric == 'COCO':
if isinstance(ap_stats[0], np.ndarray) and isinstance(ap_stats[1], if isinstance(ap_stats[0], np.ndarray) and isinstance(ap_stats[1],
np.ndarray): np.ndarray):
...@@ -429,8 +431,8 @@ class MaskRCNN(FasterRCNN): ...@@ -429,8 +431,8 @@ class MaskRCNN(FasterRCNN):
if transforms is None: if transforms is None:
transforms = self.test_transforms transforms = self.test_transforms
im, im_resize_info, im_shape = FasterRCNN._preprocess( im, im_resize_info, im_shape = FasterRCNN._preprocess(
img_file_list, transforms, self.model_type, self.__class__.__name__, img_file_list, transforms, self.model_type,
thread_num) self.__class__.__name__, thread_num)
with fluid.scope_guard(self.scope): with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog, result = self.exe.run(self.test_prog,
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import math
import tqdm
import os.path as osp
import numpy as np
from multiprocessing.pool import ThreadPool
import paddle.fluid as fluid
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
from paddle.fluid.optimizer import ExponentialMovingAverage
import paddlex.utils.logging as logging
import paddlex
import copy
from paddlex.cv.transforms import arrange_transforms
from paddlex.cv.datasets import generate_minibatch
from .base import BaseAPI
from collections import OrderedDict
from .utils.detection_eval import eval_results, bbox2out
class PPYOLO(BaseAPI):
"""构建PPYOLO,并实现其训练、评估、预测和模型导出。
Args:
num_classes (int): 类别数。默认为80。
backbone (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd']。默认为'ResNet50_vd'。
with_dcn_v2 (bool): Backbone是否使用DCNv2结构。默认为True。
anchors (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]]。
anchor_masks (list|tuple): 在计算PPYOLO损失时,使用anchor的mask索引,为None时表示使用默认值
[[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
use_coord_conv (bool): 是否使用CoordConv。默认值为True。
use_iou_aware (bool): 是否使用IoU Aware分支。默认值为True。
use_spp (bool): 是否使用Spatial Pyramid Pooling结构。默认值为True。
use_drop_block (bool): 是否使用Drop Block。默认值为True。
scale_x_y (float): 调整中心点位置时的系数因子。默认值为1.05。
use_iou_loss (bool): 是否使用IoU loss。默认值为True。
use_matrix_nms (bool): 是否使用Matrix NMS。默认值为True。
ignore_threshold (float): 在计算PPYOLO损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
label_smooth (bool): 是否使用label smooth。默认值为False。
train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
"""
def __init__(
self,
num_classes=80,
backbone='ResNet50_vd_ssld',
with_dcn_v2=True,
# YOLO Head
anchors=None,
anchor_masks=None,
use_coord_conv=True,
use_iou_aware=True,
use_spp=True,
use_drop_block=True,
scale_x_y=1.05,
# PPYOLO Loss
ignore_threshold=0.7,
label_smooth=False,
use_iou_loss=True,
# NMS
use_matrix_nms=True,
nms_score_threshold=0.01,
nms_topk=1000,
nms_keep_topk=100,
nms_iou_threshold=0.45,
train_random_shapes=[
320, 352, 384, 416, 448, 480, 512, 544, 576, 608
]):
self.init_params = locals()
super(PPYOLO, self).__init__('detector')
backbones = ['ResNet50_vd_ssld']
assert backbone in backbones, "backbone should be one of {}".format(
backbones)
self.backbone = backbone
self.num_classes = num_classes
self.anchors = anchors
self.anchor_masks = anchor_masks
if anchors is None:
self.anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]]
if anchor_masks is None:
self.anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
self.ignore_threshold = ignore_threshold
self.nms_score_threshold = nms_score_threshold
self.nms_topk = nms_topk
self.nms_keep_topk = nms_keep_topk
self.nms_iou_threshold = nms_iou_threshold
self.label_smooth = label_smooth
self.sync_bn = True
self.train_random_shapes = train_random_shapes
self.fixed_input_shape = None
self.use_fine_grained_loss = False
if use_coord_conv or use_iou_aware or use_spp or use_drop_block or use_iou_loss:
self.use_fine_grained_loss = True
self.use_coord_conv = use_coord_conv
self.use_iou_aware = use_iou_aware
self.use_spp = use_spp
self.use_drop_block = use_drop_block
self.use_iou_loss = use_iou_loss
self.scale_x_y = scale_x_y
self.max_height = 608
self.max_width = 608
self.use_matrix_nms = use_matrix_nms
self.use_ema = False
self.with_dcn_v2 = with_dcn_v2
def _get_backbone(self, backbone_name):
if backbone_name.startswith('ResNet50_vd'):
backbone = paddlex.cv.nets.ResNet(
norm_type='sync_bn',
layers=50,
freeze_norm=False,
norm_decay=0.,
feature_maps=[3, 4, 5],
freeze_at=0,
variant='d',
dcn_v2_stages=[5] if self.with_dcn_v2 else [])
return backbone
def build_net(self, mode='train'):
model = paddlex.cv.nets.detection.YOLOv3(
backbone=self._get_backbone(self.backbone),
num_classes=self.num_classes,
mode=mode,
anchors=self.anchors,
anchor_masks=self.anchor_masks,
ignore_threshold=self.ignore_threshold,
label_smooth=self.label_smooth,
nms_score_threshold=self.nms_score_threshold,
nms_topk=self.nms_topk,
nms_keep_topk=self.nms_keep_topk,
nms_iou_threshold=self.nms_iou_threshold,
fixed_input_shape=self.fixed_input_shape,
coord_conv=self.use_coord_conv,
iou_aware=self.use_iou_aware,
scale_x_y=self.scale_x_y,
spp=self.use_spp,
drop_block=self.use_drop_block,
use_matrix_nms=self.use_matrix_nms,
use_fine_grained_loss=self.use_fine_grained_loss,
use_iou_loss=self.use_iou_loss,
batch_size=self.batch_size_per_gpu
if hasattr(self, 'batch_size_per_gpu') else 8)
if mode == 'train' and self.use_iou_loss or self.use_iou_aware:
model.max_height = self.max_height
model.max_width = self.max_width
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict([('bbox', model_out)])
if mode == 'train':
self.optimizer.minimize(model_out)
outputs = OrderedDict([('loss', model_out)])
if self.use_ema:
global_steps = _decay_step_counter()
self.ema = ExponentialMovingAverage(
self.ema_decay, thres_steps=global_steps)
self.ema.update()
return inputs, outputs
def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
logging.error(
"In function train(), parameters should satisfy: warmup_steps <= lr_decay_epochs[0]*num_samples_in_train_dataset",
exit=False)
logging.error(
"See this doc for more information: https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/appendix/parameters.md#notice",
exit=False)
logging.error(
"warmup_steps should less than {} or lr_decay_epochs[0] greater than {}, please modify 'lr_decay_epochs' or 'warmup_steps' in train function".
format(lr_decay_epochs[0] * num_steps_each_epoch, warmup_steps
// num_steps_each_epoch))
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [(lr_decay_gamma**i) * learning_rate
for i in range(len(lr_decay_epochs) + 1)]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
lr_warmup = fluid.layers.linear_lr_warmup(
learning_rate=lr_decay,
warmup_steps=warmup_steps,
start_lr=warmup_start_lr,
end_lr=learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr_warmup,
momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=8,
eval_dataset=None,
save_interval_epochs=20,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=1.0 / 8000,
warmup_steps=1000,
warmup_start_lr=0.0,
lr_decay_epochs=[213, 240],
lr_decay_gamma=0.1,
metric=None,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05,
early_stop=False,
early_stop_patience=5,
resume_checkpoint=None,
use_ema=True,
ema_decay=0.9998):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡
数量之商为验证数据batch大小。默认值为8。
eval_dataset (paddlex.datasets): 验证数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为20。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为10。
save_dir (str): 模型保存路径。默认值为'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为字符串'COCO',
则自动下载在COCO数据集上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
learning_rate (float): 默认优化器的学习率。默认为1.0/8000。
warmup_steps (int): 默认优化器进行warmup过程的步数。默认为1000。
warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为0.0。
lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
early_stop (bool): 是否使用提前终止训练策略。默认值为False。
early_stop_patience (int): 当使用提前终止训练策略时,如果验证集精度在`early_stop_patience`个epoch内
连续下降或持平,则终止训练。默认值为5。
resume_checkpoint (str): 恢复训练时指定上次训练保存的模型路径。若为None,则不会恢复训练。默认值为None。
use_ema (bool): 是否使用指数衰减计算参数的滑动平均值。默认值为True。
ema_decay (float): 指数衰减率。默认值为0.9998。
Raises:
ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。
"""
if not self.trainable:
raise ValueError("Model is not trainable from load_model method.")
if metric is None:
if isinstance(train_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(train_dataset, paddlex.datasets.VOCDetection) or \
isinstance(train_dataset, paddlex.datasets.EasyDataDet):
metric = 'VOC'
else:
raise ValueError(
"train_dataset should be datasets.VOCDetection or datasets.COCODetection or datasets.EasyDataDet."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
self.metric = metric
self.labels = train_dataset.labels
# 构建训练网络
if optimizer is None:
# 构建默认的优化策略
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
warmup_steps=warmup_steps,
warmup_start_lr=warmup_start_lr,
lr_decay_epochs=lr_decay_epochs,
lr_decay_gamma=lr_decay_gamma,
num_steps_each_epoch=num_steps_each_epoch)
self.optimizer = optimizer
self.use_ema = use_ema
self.ema_decay = ema_decay
self.batch_size_per_gpu = int(train_batch_size /
paddlex.env_info['num'])
if self.use_fine_grained_loss:
for transform in train_dataset.transforms.transforms:
if isinstance(transform, paddlex.det.transforms.Resize):
self.max_height = transform.target_size
self.max_width = transform.target_size
break
if train_dataset.transforms.batch_transforms is None:
train_dataset.transforms.batch_transforms = list()
define_random_shape = False
for bt in train_dataset.transforms.batch_transforms:
if isinstance(bt, paddlex.det.transforms.BatchRandomShape):
define_random_shape = True
if not define_random_shape:
if isinstance(self.train_random_shapes,
(list, tuple)) and len(self.train_random_shapes) > 0:
train_dataset.transforms.batch_transforms.append(
paddlex.det.transforms.BatchRandomShape(
random_shapes=self.train_random_shapes))
if self.use_fine_grained_loss:
self.max_height = max(self.max_height,
max(self.train_random_shapes))
self.max_width = max(self.max_width,
max(self.train_random_shapes))
if self.use_fine_grained_loss:
define_generate_target = False
for bt in train_dataset.transforms.batch_transforms:
if isinstance(bt, paddlex.det.transforms.GenerateYoloTarget):
define_generate_target = True
if not define_generate_target:
train_dataset.transforms.batch_transforms.append(
paddlex.det.transforms.GenerateYoloTarget(
anchors=self.anchors,
anchor_masks=self.anchor_masks,
num_classes=self.num_classes,
downsample_ratios=[32, 16, 8]))
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss,
resume_checkpoint=resume_checkpoint)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl,
early_stop=early_stop,
early_stop_patience=early_stop_patience)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
metric=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
如为COCODetection,则metric为'COCO'。
return_details (bool): 是否返回详细信息。
Returns:
tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
"""
arrange_transforms(
model_type=self.model_type,
class_name=self.__class__.__name__,
transforms=eval_dataset.transforms,
mode='eval')
if metric is None:
if hasattr(self, 'metric') and self.metric is not None:
metric = self.metric
else:
if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise Exception(
"eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data])
im_sizes = np.array([d[1] for d in data])
feed_data = {'image': images, 'im_size': im_sizes}
with fluid.scope_guard(self.scope):
outputs = self.exe.run(
self.test_prog,
feed=[feed_data],
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
'bbox': (np.array(outputs[0]),
outputs[0].recursive_sequence_lengths())
}
res_id = [np.array([d[2]]) for d in data]
res['im_id'] = (res_id, [])
if metric == 'VOC':
res_gt_box = [d[3].reshape(-1, 4) for d in data]
res_gt_label = [d[4].reshape(-1, 1) for d in data]
res_is_difficult = [d[5].reshape(-1, 1) for d in data]
res_id = [np.array([d[2]]) for d in data]
res['gt_box'] = (res_gt_box, [])
res['gt_label'] = (res_gt_label, [])
res['is_difficult'] = (res_is_difficult, [])
results.append(res)
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(epoch_id, step +
1, total_steps))
box_ap_stats, eval_details = eval_results(
results, metric, eval_dataset.coco_gt, with_background=False)
evaluate_metrics = OrderedDict(
zip(['bbox_mmap'
if metric == 'COCO' else 'bbox_map'], box_ap_stats))
if return_details:
return evaluate_metrics, eval_details
return evaluate_metrics
@staticmethod
def _preprocess(images, transforms, model_type, class_name, thread_num=1):
arrange_transforms(
model_type=model_type,
class_name=class_name,
transforms=transforms,
mode='test')
pool = ThreadPool(thread_num)
batch_data = pool.map(transforms, images)
pool.close()
pool.join()
padding_batch = generate_minibatch(batch_data)
im = np.array(
[data[0] for data in padding_batch],
dtype=padding_batch[0][0].dtype)
im_size = np.array([data[1] for data in padding_batch], dtype=np.int32)
return im, im_size
@staticmethod
def _postprocess(res, batch_size, num_classes, labels):
clsid2catid = dict({i: i for i in range(num_classes)})
xywh_results = bbox2out([res], clsid2catid)
preds = [[] for i in range(batch_size)]
for xywh_res in xywh_results:
image_id = xywh_res['image_id']
del xywh_res['image_id']
xywh_res['category'] = labels[xywh_res['category_id']]
preds[image_id].append(xywh_res)
return preds
def predict(self, img_file, transforms=None):
"""预测。
Args:
img_file (str|np.ndarray): 预测图像路径,或者是解码后的排列格式为(H, W, C)且类型为float32且为BGR格式的数组。
transforms (paddlex.det.transforms): 数据预处理操作。
Returns:
list: 预测结果列表,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if isinstance(img_file, (str, np.ndarray)):
images = [img_file]
else:
raise Exception("img_file must be str/np.ndarray")
if transforms is None:
transforms = self.test_transforms
im, im_size = PPYOLO._preprocess(images, transforms, self.model_type,
self.__class__.__name__)
with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog,
feed={'image': im,
'im_size': im_size},
fetch_list=list(self.test_outputs.values()),
return_numpy=False,
use_program_cache=True)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), result)
}
res['im_id'] = (np.array(
[[i] for i in range(len(images))]).astype('int32'), [[]])
preds = PPYOLO._postprocess(res,
len(images), self.num_classes, self.labels)
return preds[0]
def batch_predict(self, img_file_list, transforms=None, thread_num=2):
"""预测。
Args:
img_file_list (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径,也可以是解码后的排列格式为(H,W,C)
且类型为float32且为BGR格式的数组。
transforms (paddlex.det.transforms): 数据预处理操作。
thread_num (int): 并发执行各图像预处理时的线程数。
Returns:
list: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if not isinstance(img_file_list, (list, tuple)):
raise Exception("im_file must be list/tuple")
if transforms is None:
transforms = self.test_transforms
im, im_size = PPYOLO._preprocess(img_file_list, transforms,
self.model_type,
self.__class__.__name__, thread_num)
with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog,
feed={'image': im,
'im_size': im_size},
fetch_list=list(self.test_outputs.values()),
return_numpy=False,
use_program_cache=True)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), result)
}
res['im_id'] = (np.array(
[[i] for i in range(len(img_file_list))]).astype('int32'), [[]])
preds = PPYOLO._postprocess(res,
len(img_file_list), self.num_classes,
self.labels)
return preds
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -80,7 +80,9 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization): ...@@ -80,7 +80,9 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization):
self._support_activation_quantize_type = [ self._support_activation_quantize_type = [
'range_abs_max', 'moving_average_abs_max', 'abs_max' 'range_abs_max', 'moving_average_abs_max', 'abs_max'
] ]
self._support_weight_quantize_type = ['abs_max', 'channel_wise_abs_max'] self._support_weight_quantize_type = [
'abs_max', 'channel_wise_abs_max'
]
self._support_algo_type = ['KL', 'abs_max', 'min_max'] self._support_algo_type = ['KL', 'abs_max', 'min_max']
self._support_quantize_op_type = \ self._support_quantize_op_type = \
list(set(QuantizationTransformPass._supported_quantizable_op_type + list(set(QuantizationTransformPass._supported_quantizable_op_type +
...@@ -240,8 +242,8 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization): ...@@ -240,8 +242,8 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization):
'[Calculate weight] Weight_id={}/{}, time_each_weight={} s.'. '[Calculate weight] Weight_id={}/{}, time_each_weight={} s.'.
format( format(
str(ct), str(ct),
str(len(self._quantized_weight_var_name)), str(end - str(len(self._quantized_weight_var_name)),
start))) str(end - start)))
ct += 1 ct += 1
ct = 1 ct = 1
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -288,8 +288,8 @@ def get_params_ratios(sensitivities_file, eval_metric_loss=0.05): ...@@ -288,8 +288,8 @@ def get_params_ratios(sensitivities_file, eval_metric_loss=0.05):
if not osp.exists(sensitivities_file): if not osp.exists(sensitivities_file):
raise Exception('The sensitivities file is not exists!') raise Exception('The sensitivities file is not exists!')
sensitivitives = paddleslim.prune.load_sensitivities(sensitivities_file) sensitivitives = paddleslim.prune.load_sensitivities(sensitivities_file)
params_ratios = paddleslim.prune.get_ratios_by_loss( params_ratios = paddleslim.prune.get_ratios_by_loss(sensitivitives,
sensitivitives, eval_metric_loss) eval_metric_loss)
return params_ratios return params_ratios
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -47,8 +47,7 @@ def visualize(model, sensitivities_file, save_dir='./'): ...@@ -47,8 +47,7 @@ def visualize(model, sensitivities_file, save_dir='./'):
y.append(loss_thresh) y.append(loss_thresh)
plt.plot(x, y, color='green', linewidth=0.5, marker='o', markersize=3) plt.plot(x, y, color='green', linewidth=0.5, marker='o', markersize=3)
my_x_ticks = np.arange( my_x_ticks = np.arange(
min(np.array(x)) - 0.01, min(np.array(x)) - 0.01, max(np.array(x)) + 0.01, 0.05)
max(np.array(x)) + 0.01, 0.05)
my_y_ticks = np.arange(0.05, 1, 0.05) my_y_ticks = np.arange(0.05, 1, 0.05)
plt.xticks(my_x_ticks, rotation=15, fontsize=8) plt.xticks(my_x_ticks, rotation=15, fontsize=8)
plt.yticks(my_y_ticks, fontsize=8) plt.yticks(my_y_ticks, fontsize=8)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -158,8 +158,8 @@ def loadRes(coco_obj, anns): ...@@ -158,8 +158,8 @@ def loadRes(coco_obj, anns):
for id, ann in enumerate(anns): for id, ann in enumerate(anns):
ann['id'] = id + 1 ann['id'] = id + 1
elif 'bbox' in anns[0] and not anns[0]['bbox'] == []: elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
res.dataset['categories'] = copy.deepcopy( res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
coco_obj.dataset['categories']) 'categories'])
for id, ann in enumerate(anns): for id, ann in enumerate(anns):
bb = ann['bbox'] bb = ann['bbox']
x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]] x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
...@@ -169,8 +169,8 @@ def loadRes(coco_obj, anns): ...@@ -169,8 +169,8 @@ def loadRes(coco_obj, anns):
ann['id'] = id + 1 ann['id'] = id + 1
ann['iscrowd'] = 0 ann['iscrowd'] = 0
elif 'segmentation' in anns[0]: elif 'segmentation' in anns[0]:
res.dataset['categories'] = copy.deepcopy( res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
coco_obj.dataset['categories']) 'categories'])
for id, ann in enumerate(anns): for id, ann in enumerate(anns):
# now only support compressed RLE format as segmentation results # now only support compressed RLE format as segmentation results
ann['area'] = maskUtils.area(ann['segmentation']) ann['area'] = maskUtils.area(ann['segmentation'])
...@@ -179,8 +179,8 @@ def loadRes(coco_obj, anns): ...@@ -179,8 +179,8 @@ def loadRes(coco_obj, anns):
ann['id'] = id + 1 ann['id'] = id + 1
ann['iscrowd'] = 0 ann['iscrowd'] = 0
elif 'keypoints' in anns[0]: elif 'keypoints' in anns[0]:
res.dataset['categories'] = copy.deepcopy( res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
coco_obj.dataset['categories']) 'categories'])
for id, ann in enumerate(anns): for id, ann in enumerate(anns):
s = ann['keypoints'] s = ann['keypoints']
x = s[0::3] x = s[0::3]
...@@ -375,8 +375,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5): ...@@ -375,8 +375,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
expand_bbox = expand_boxes(bbox, scale) expand_bbox = expand_boxes(bbox, scale)
expand_bbox = expand_bbox.astype(np.int32) expand_bbox = expand_bbox.astype(np.int32)
padded_mask = np.zeros((resolution + 2, resolution + 2), padded_mask = np.zeros(
dtype=np.float32) (resolution + 2, resolution + 2), dtype=np.float32)
for j in range(num): for j in range(num):
xmin, ymin, xmax, ymax = expand_bbox[j].tolist() xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
...@@ -404,7 +404,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5): ...@@ -404,7 +404,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), ( im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)] x0 - xmin):(x1 - xmin)]
segm = mask_util.encode( segm = mask_util.encode(
np.array(im_mask[:, :, np.newaxis], order='F'))[0] np.array(
im_mask[:, :, np.newaxis], order='F'))[0]
catid = clsid2catid[clsid] catid = clsid2catid[clsid]
segm['counts'] = segm['counts'].decode('utf8') segm['counts'] = segm['counts'].decode('utf8')
coco_res = { coco_res = {
...@@ -571,8 +572,8 @@ def prune_zero_padding(gt_box, gt_label, difficult=None): ...@@ -571,8 +572,8 @@ def prune_zero_padding(gt_box, gt_label, difficult=None):
gt_box[i, 2] == 0 and gt_box[i, 3] == 0: gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
break break
valid_cnt += 1 valid_cnt += 1
return (gt_box[:valid_cnt], gt_label[:valid_cnt], return (gt_box[:valid_cnt], gt_label[:valid_cnt], difficult[:valid_cnt]
difficult[:valid_cnt] if difficult is not None else None) if difficult is not None else None)
def bbox_area(bbox, is_bbox_normalized): def bbox_area(bbox, is_bbox_normalized):
...@@ -694,8 +695,9 @@ class DetectionMAP(object): ...@@ -694,8 +695,9 @@ class DetectionMAP(object):
""" """
mAP = 0. mAP = 0.
valid_cnt = 0 valid_cnt = 0
for id, (score_pos, count) in enumerate( for id, (
zip(self.class_score_poss, self.class_gt_counts)): score_pos, count
) in enumerate(zip(self.class_score_poss, self.class_gt_counts)):
if count == 0: continue if count == 0: continue
if len(score_pos) == 0: if len(score_pos) == 0:
valid_cnt += 1 valid_cnt += 1
......
...@@ -116,10 +116,14 @@ coco_pretrain = { ...@@ -116,10 +116,14 @@ coco_pretrain = {
'DeepLabv3p_MobileNetV2_x1.0_COCO': 'DeepLabv3p_MobileNetV2_x1.0_COCO':
'https://bj.bcebos.com/v1/paddleseg/deeplab_mobilenet_x1_0_coco.tgz', 'https://bj.bcebos.com/v1/paddleseg/deeplab_mobilenet_x1_0_coco.tgz',
'DeepLabv3p_Xception65_COCO': 'DeepLabv3p_Xception65_COCO':
'https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz' 'https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz',
'PPYOLO_ResNet50_vd_ssld_COCO':
'https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams'
} }
cityscapes_pretrain = { cityscapes_pretrain = {
'DeepLabv3p_MobileNetV3_large_x1_0_ssld_CITYSCAPES':
'https://paddleseg.bj.bcebos.com/models/deeplabv3p_mobilenetv3_large_cityscapes.tar.gz',
'DeepLabv3p_MobileNetV2_x1.0_CITYSCAPES': 'DeepLabv3p_MobileNetV2_x1.0_CITYSCAPES':
'https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz', 'https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz',
'DeepLabv3p_Xception65_CITYSCAPES': 'DeepLabv3p_Xception65_CITYSCAPES':
...@@ -142,7 +146,8 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir): ...@@ -142,7 +146,8 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir):
if flag == 'COCO': if flag == 'COCO':
if class_name == 'DeepLabv3p' and backbone in [ if class_name == 'DeepLabv3p' and backbone in [
'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',
'MobileNetV2_x1.5', 'MobileNetV2_x2.0' 'MobileNetV2_x1.5', 'MobileNetV2_x2.0',
'MobileNetV3_large_x1_0_ssld'
]: ]:
model_name = '{}_{}'.format(class_name, backbone) model_name = '{}_{}'.format(class_name, backbone)
logging.warning(warning_info.format(model_name, flag, 'IMAGENET')) logging.warning(warning_info.format(model_name, flag, 'IMAGENET'))
...@@ -226,7 +231,9 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir): ...@@ -226,7 +231,9 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir):
new_save_dir = save_dir new_save_dir = save_dir
if hasattr(paddlex, 'pretrain_dir'): if hasattr(paddlex, 'pretrain_dir'):
new_save_dir = paddlex.pretrain_dir new_save_dir = paddlex.pretrain_dir
if class_name in ['YOLOv3', 'FasterRCNN', 'MaskRCNN', 'DeepLabv3p']: if class_name in [
'YOLOv3', 'FasterRCNN', 'MaskRCNN', 'DeepLabv3p', 'PPYOLO'
]:
backbone = '{}_{}'.format(class_name, backbone) backbone = '{}_{}'.format(class_name, backbone)
backbone = "{}_{}".format(backbone, flag) backbone = "{}_{}".format(backbone, flag)
if flag == 'COCO': if flag == 'COCO':
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -24,8 +24,8 @@ class ConfusionMatrix(object): ...@@ -24,8 +24,8 @@ class ConfusionMatrix(object):
""" """
def __init__(self, num_classes=2, streaming=False): def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes], self.confusion_matrix = np.zeros(
dtype='int64') [num_classes, num_classes], dtype='int64')
self.num_classes = num_classes self.num_classes = num_classes
self.streaming = streaming self.streaming = streaming
...@@ -42,15 +42,15 @@ class ConfusionMatrix(object): ...@@ -42,15 +42,15 @@ class ConfusionMatrix(object):
pred = np.asarray(pred)[mask] pred = np.asarray(pred)[mask]
one = np.ones_like(pred) one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix # Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)), spm = csr_matrix(
shape=(self.num_classes, self.num_classes)) (one, (label, pred)), shape=(self.num_classes, self.num_classes))
spm = spm.todense() spm = spm.todense()
self.confusion_matrix += spm self.confusion_matrix += spm
def zero_matrix(self): def zero_matrix(self):
""" Clear confusion matrix """ """ Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes], self.confusion_matrix = np.zeros(
dtype='int64') [self.num_classes, self.num_classes], dtype='int64')
def mean_iou(self): def mean_iou(self):
iou_list = [] iou_list = []
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -404,8 +404,9 @@ def draw_pr_curve(eval_details_file=None, ...@@ -404,8 +404,9 @@ def draw_pr_curve(eval_details_file=None,
plt.plot(x, sr_array, color=color, label=nm, linewidth=1) plt.plot(x, sr_array, color=color, label=nm, linewidth=1)
plt.legend(loc="lower left", fontsize=5) plt.legend(loc="lower left", fontsize=5)
plt.savefig( plt.savefig(
os.path.join(save_dir, os.path.join(
"./{}_pr_curve(iou-{}).png".format(style, iou_thresh)), save_dir,
"./{}_pr_curve(iou-{}).png".format(style, iou_thresh)),
dpi=800) dpi=800)
plt.close() plt.close()
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -15,21 +15,11 @@ ...@@ -15,21 +15,11 @@
from __future__ import absolute_import from __future__ import absolute_import
import math import math
import tqdm import tqdm
import os.path as osp
import numpy as np
from multiprocessing.pool import ThreadPool
import paddle.fluid as fluid
import paddlex.utils.logging as logging
import paddlex import paddlex
import copy from .ppyolo import PPYOLO
from paddlex.cv.transforms import arrange_transforms
from paddlex.cv.datasets import generate_minibatch
from .base import BaseAPI
from collections import OrderedDict
from .utils.detection_eval import eval_results, bbox2out
class YOLOv3(BaseAPI): class YOLOv3(PPYOLO):
"""构建YOLOv3,并实现其训练、评估、预测和模型导出。 """构建YOLOv3,并实现其训练、评估、预测和模型导出。
Args: Args:
...@@ -45,7 +35,7 @@ class YOLOv3(BaseAPI): ...@@ -45,7 +35,7 @@ class YOLOv3(BaseAPI):
nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。 nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。 nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。 nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。 nms_iou_threshold (float): 进行NMS时,用于剔除检测框IoU的阈值。默认为0.45。
label_smooth (bool): 是否使用label smooth。默认值为False。 label_smooth (bool): 是否使用label smooth。默认值为False。
train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。 train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
""" """
...@@ -65,12 +55,12 @@ class YOLOv3(BaseAPI): ...@@ -65,12 +55,12 @@ class YOLOv3(BaseAPI):
320, 352, 384, 416, 448, 480, 512, 544, 576, 608 320, 352, 384, 416, 448, 480, 512, 544, 576, 608
]): ]):
self.init_params = locals() self.init_params = locals()
super(YOLOv3, self).__init__('detector')
backbones = [ backbones = [
'DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large' 'DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large'
] ]
assert backbone in backbones, "backbone should be one of {}".format( assert backbone in backbones, "backbone should be one of {}".format(
backbones) backbones)
super(PPYOLO, self).__init__('detector')
self.backbone = backbone self.backbone = backbone
self.num_classes = num_classes self.num_classes = num_classes
self.anchors = anchors self.anchors = anchors
...@@ -84,6 +74,16 @@ class YOLOv3(BaseAPI): ...@@ -84,6 +74,16 @@ class YOLOv3(BaseAPI):
self.sync_bn = True self.sync_bn = True
self.train_random_shapes = train_random_shapes self.train_random_shapes = train_random_shapes
self.fixed_input_shape = None self.fixed_input_shape = None
self.use_fine_grained_loss = False
self.use_coord_conv = False
self.use_iou_aware = False
self.use_spp = False
self.use_drop_block = False
self.use_iou_loss = False
self.scale_x_y = 1.
self.use_matrix_nms = False
self.use_ema = False
self.with_dcn_v2 = False
def _get_backbone(self, backbone_name): def _get_backbone(self, backbone_name):
if backbone_name == 'DarkNet53': if backbone_name == 'DarkNet53':
...@@ -104,59 +104,6 @@ class YOLOv3(BaseAPI): ...@@ -104,59 +104,6 @@ class YOLOv3(BaseAPI):
norm_type='sync_bn', model_name=model_name) norm_type='sync_bn', model_name=model_name)
return backbone return backbone
def build_net(self, mode='train'):
model = paddlex.cv.nets.detection.YOLOv3(
backbone=self._get_backbone(self.backbone),
num_classes=self.num_classes,
mode=mode,
anchors=self.anchors,
anchor_masks=self.anchor_masks,
ignore_threshold=self.ignore_threshold,
label_smooth=self.label_smooth,
nms_score_threshold=self.nms_score_threshold,
nms_topk=self.nms_topk,
nms_keep_topk=self.nms_keep_topk,
nms_iou_threshold=self.nms_iou_threshold,
train_random_shapes=self.train_random_shapes,
fixed_input_shape=self.fixed_input_shape)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict([('bbox', model_out)])
if mode == 'train':
self.optimizer.minimize(model_out)
outputs = OrderedDict([('loss', model_out)])
return inputs, outputs
def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
logging.error(
"In function train(), parameters should satisfy: warmup_steps <= lr_decay_epochs[0]*num_samples_in_train_dataset",
exit=False)
logging.error(
"See this doc for more information: https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/appendix/parameters.md#notice",
exit=False)
logging.error(
"warmup_steps should less than {} or lr_decay_epochs[0] greater than {}, please modify 'lr_decay_epochs' or 'warmup_steps' in train function".
format(lr_decay_epochs[0] * num_steps_each_epoch, warmup_steps
// num_steps_each_epoch))
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [(lr_decay_gamma**i) * learning_rate
for i in range(len(lr_decay_epochs) + 1)]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
lr_warmup = fluid.layers.linear_lr_warmup(
learning_rate=lr_decay,
warmup_steps=warmup_steps,
start_lr=warmup_start_lr,
end_lr=learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr_warmup,
momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
return optimizer
def train(self, def train(self,
num_epochs, num_epochs,
train_dataset, train_dataset,
...@@ -214,259 +161,11 @@ class YOLOv3(BaseAPI): ...@@ -214,259 +161,11 @@ class YOLOv3(BaseAPI):
ValueError: 评估类型不在指定列表中。 ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。 ValueError: 模型从inference model进行加载。
""" """
if not self.trainable:
raise ValueError("Model is not trainable from load_model method.")
if metric is None:
if isinstance(train_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(train_dataset, paddlex.datasets.VOCDetection) or \
isinstance(train_dataset, paddlex.datasets.EasyDataDet):
metric = 'VOC'
else:
raise ValueError(
"train_dataset should be datasets.VOCDetection or datasets.COCODetection or datasets.EasyDataDet."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
self.metric = metric
self.labels = train_dataset.labels
# 构建训练网络
if optimizer is None:
# 构建默认的优化策略
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
warmup_steps=warmup_steps,
warmup_start_lr=warmup_start_lr,
lr_decay_epochs=lr_decay_epochs,
lr_decay_gamma=lr_decay_gamma,
num_steps_each_epoch=num_steps_each_epoch)
self.optimizer = optimizer
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss,
resume_checkpoint=resume_checkpoint)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl,
early_stop=early_stop,
early_stop_patience=early_stop_patience)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
metric=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
如为COCODetection,则metric为'COCO'。
return_details (bool): 是否返回详细信息。
Returns:
tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
"""
arrange_transforms(
model_type=self.model_type,
class_name=self.__class__.__name__,
transforms=eval_dataset.transforms,
mode='eval')
if metric is None:
if hasattr(self, 'metric') and self.metric is not None:
metric = self.metric
else:
if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise Exception(
"eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data])
im_sizes = np.array([d[1] for d in data])
feed_data = {'image': images, 'im_size': im_sizes}
with fluid.scope_guard(self.scope):
outputs = self.exe.run(
self.test_prog,
feed=[feed_data],
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
'bbox': (np.array(outputs[0]),
outputs[0].recursive_sequence_lengths())
}
res_id = [np.array([d[2]]) for d in data]
res['im_id'] = (res_id, [])
if metric == 'VOC':
res_gt_box = [d[3].reshape(-1, 4) for d in data]
res_gt_label = [d[4].reshape(-1, 1) for d in data]
res_is_difficult = [d[5].reshape(-1, 1) for d in data]
res_id = [np.array([d[2]]) for d in data]
res['gt_box'] = (res_gt_box, [])
res['gt_label'] = (res_gt_label, [])
res['is_difficult'] = (res_is_difficult, [])
results.append(res)
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(epoch_id, step +
1, total_steps))
box_ap_stats, eval_details = eval_results(
results, metric, eval_dataset.coco_gt, with_background=False)
evaluate_metrics = OrderedDict(
zip(['bbox_mmap'
if metric == 'COCO' else 'bbox_map'], box_ap_stats))
if return_details:
return evaluate_metrics, eval_details
return evaluate_metrics
@staticmethod
def _preprocess(images, transforms, model_type, class_name, thread_num=1):
arrange_transforms(
model_type=model_type,
class_name=class_name,
transforms=transforms,
mode='test')
pool = ThreadPool(thread_num)
batch_data = pool.map(transforms, images)
pool.close()
pool.join()
padding_batch = generate_minibatch(batch_data)
im = np.array(
[data[0] for data in padding_batch],
dtype=padding_batch[0][0].dtype)
im_size = np.array([data[1] for data in padding_batch], dtype=np.int32)
return im, im_size
@staticmethod
def _postprocess(res, batch_size, num_classes, labels):
clsid2catid = dict({i: i for i in range(num_classes)})
xywh_results = bbox2out([res], clsid2catid)
preds = [[] for i in range(batch_size)]
for xywh_res in xywh_results:
image_id = xywh_res['image_id']
del xywh_res['image_id']
xywh_res['category'] = labels[xywh_res['category_id']]
preds[image_id].append(xywh_res)
return preds
def predict(self, img_file, transforms=None):
"""预测。
Args:
img_file (str|np.ndarray): 预测图像路径,或者是解码后的排列格式为(H, W, C)且类型为float32且为BGR格式的数组。
transforms (paddlex.det.transforms): 数据预处理操作。
Returns:
list: 预测结果列表,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if isinstance(img_file, (str, np.ndarray)):
images = [img_file]
else:
raise Exception("img_file must be str/np.ndarray")
if transforms is None:
transforms = self.test_transforms
im, im_size = YOLOv3._preprocess(images, transforms, self.model_type,
self.__class__.__name__)
with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog,
feed={'image': im,
'im_size': im_size},
fetch_list=list(self.test_outputs.values()),
return_numpy=False,
use_program_cache=True)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), result)
}
res['im_id'] = (np.array(
[[i] for i in range(len(images))]).astype('int32'), [[]])
preds = YOLOv3._postprocess(res,
len(images), self.num_classes, self.labels)
return preds[0]
def batch_predict(self, img_file_list, transforms=None, thread_num=2):
"""预测。
Args:
img_file_list (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径,也可以是解码后的排列格式为(H,W,C)
且类型为float32且为BGR格式的数组。
transforms (paddlex.det.transforms): 数据预处理操作。
thread_num (int): 并发执行各图像预处理时的线程数。
Returns:
list: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if not isinstance(img_file_list, (list, tuple)):
raise Exception("im_file must be list/tuple")
if transforms is None:
transforms = self.test_transforms
im, im_size = YOLOv3._preprocess(img_file_list, transforms,
self.model_type,
self.__class__.__name__, thread_num)
with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog,
feed={'image': im,
'im_size': im_size},
fetch_list=list(self.test_outputs.values()),
return_numpy=False,
use_program_cache=True)
res = { return super(YOLOv3, self).train(
k: (np.array(v), v.recursive_sequence_lengths()) num_epochs, train_dataset, train_batch_size, eval_dataset,
for k, v in zip(list(self.test_outputs.keys()), result) save_interval_epochs, log_interval_steps, save_dir,
} pretrain_weights, optimizer, learning_rate, warmup_steps,
res['im_id'] = (np.array( warmup_start_lr, lr_decay_epochs, lr_decay_gamma, metric, use_vdl,
[[i] for i in range(len(img_file_list))]).astype('int32'), [[]]) sensitivities_file, eval_metric_loss, early_stop,
preds = YOLOv3._postprocess(res, early_stop_patience, resume_checkpoint, False)
len(img_file_list), self.num_classes,
self.labels)
return preds
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
def _split_ioup(output, an_num, num_classes):
"""
Split new output feature map to output, predicted iou
along channel dimension
"""
ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num])
ioup = fluid.layers.sigmoid(ioup)
oriout = fluid.layers.slice(
output, axes=[1], starts=[an_num], ends=[an_num * (num_classes + 6)])
return (ioup, oriout)
def _de_sigmoid(x, eps=1e-7):
x = fluid.layers.clip(x, eps, 1 / eps)
one = fluid.layers.fill_constant(
shape=[1, 1, 1, 1], dtype=x.dtype, value=1.)
x = fluid.layers.clip((one / x - 1.0), eps, 1 / eps)
x = -fluid.layers.log(x)
return x
def _postprocess_output(ioup, output, an_num, num_classes, iou_aware_factor):
"""
post process output objectness score
"""
tensors = []
stride = output.shape[1] // an_num
for m in range(an_num):
tensors.append(
fluid.layers.slice(
output,
axes=[1],
starts=[stride * m + 0],
ends=[stride * m + 4]))
obj = fluid.layers.slice(
output, axes=[1], starts=[stride * m + 4], ends=[stride * m + 5])
obj = fluid.layers.sigmoid(obj)
ip = fluid.layers.slice(ioup, axes=[1], starts=[m], ends=[m + 1])
new_obj = fluid.layers.pow(obj, (
1 - iou_aware_factor)) * fluid.layers.pow(ip, iou_aware_factor)
new_obj = _de_sigmoid(new_obj)
tensors.append(new_obj)
tensors.append(
fluid.layers.slice(
output,
axes=[1],
starts=[stride * m + 5],
ends=[stride * m + 5 + num_classes]))
output = fluid.layers.concat(tensors, axis=1)
return output
def get_iou_aware_score(output, an_num, num_classes, iou_aware_factor):
ioup, output = _split_ioup(output, an_num, num_classes)
output = _postprocess_output(ioup, output, an_num, num_classes,
iou_aware_factor)
return output
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import yolo_loss
from . import iou_aware_loss
from . import iou_loss
from .yolo_loss import *
from .iou_aware_loss import *
from .iou_loss import *
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import NumpyArrayInitializer
from paddle import fluid
from .iou_loss import IouLoss
class IouAwareLoss(IouLoss):
"""
iou aware loss, see https://arxiv.org/abs/1912.05992
Args:
loss_weight (float): iou aware loss weight, default is 1.0
max_height (int): max height of input to support random shape input
max_width (int): max width of input to support random shape input
"""
def __init__(self, loss_weight=1.0, max_height=608, max_width=608):
super(IouAwareLoss, self).__init__(
loss_weight=loss_weight,
max_height=max_height,
max_width=max_width)
def __call__(self,
ioup,
x,
y,
w,
h,
tx,
ty,
tw,
th,
anchors,
downsample_ratio,
batch_size,
scale_x_y,
eps=1.e-10):
'''
Args:
ioup ([Variables]): the predicted iou
x | y | w | h ([Variables]): the output of yolov3 for encoded x|y|w|h
tx |ty |tw |th ([Variables]): the target of yolov3 for encoded x|y|w|h
anchors ([float]): list of anchors for current output layer
downsample_ratio (float): the downsample ratio for current output layer
batch_size (int): training batch size
eps (float): the decimal to prevent the denominator eqaul zero
'''
pred = self._bbox_transform(x, y, w, h, anchors, downsample_ratio,
batch_size, False, scale_x_y, eps)
gt = self._bbox_transform(tx, ty, tw, th, anchors, downsample_ratio,
batch_size, True, scale_x_y, eps)
iouk = self._iou(pred, gt, ioup, eps)
iouk.stop_gradient = True
loss_iou_aware = fluid.layers.cross_entropy(
ioup, iouk, soft_label=True)
loss_iou_aware = loss_iou_aware * self._loss_weight
return loss_iou_aware
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import NumpyArrayInitializer
from paddle import fluid
class IouLoss(object):
"""
iou loss, see https://arxiv.org/abs/1908.03851
loss = 1.0 - iou * iou
Args:
loss_weight (float): iou loss weight, default is 2.5
max_height (int): max height of input to support random shape input
max_width (int): max width of input to support random shape input
ciou_term (bool): whether to add ciou_term
loss_square (bool): whether to square the iou term
"""
def __init__(self,
loss_weight=2.5,
max_height=608,
max_width=608,
ciou_term=False,
loss_square=True):
self._loss_weight = loss_weight
self._MAX_HI = max_height
self._MAX_WI = max_width
self.ciou_term = ciou_term
self.loss_square = loss_square
def __call__(self,
x,
y,
w,
h,
tx,
ty,
tw,
th,
anchors,
downsample_ratio,
batch_size,
scale_x_y=1.,
ioup=None,
eps=1.e-10):
'''
Args:
x | y | w | h ([Variables]): the output of yolov3 for encoded x|y|w|h
tx |ty |tw |th ([Variables]): the target of yolov3 for encoded x|y|w|h
anchors ([float]): list of anchors for current output layer
downsample_ratio (float): the downsample ratio for current output layer
batch_size (int): training batch size
eps (float): the decimal to prevent the denominator eqaul zero
'''
pred = self._bbox_transform(x, y, w, h, anchors, downsample_ratio,
batch_size, False, scale_x_y, eps)
gt = self._bbox_transform(tx, ty, tw, th, anchors, downsample_ratio,
batch_size, True, scale_x_y, eps)
iouk = self._iou(pred, gt, ioup, eps)
if self.loss_square:
loss_iou = 1. - iouk * iouk
else:
loss_iou = 1. - iouk
loss_iou = loss_iou * self._loss_weight
return loss_iou
def _iou(self, pred, gt, ioup=None, eps=1.e-10):
x1, y1, x2, y2 = pred
x1g, y1g, x2g, y2g = gt
x2 = fluid.layers.elementwise_max(x1, x2)
y2 = fluid.layers.elementwise_max(y1, y2)
xkis1 = fluid.layers.elementwise_max(x1, x1g)
ykis1 = fluid.layers.elementwise_max(y1, y1g)
xkis2 = fluid.layers.elementwise_min(x2, x2g)
ykis2 = fluid.layers.elementwise_min(y2, y2g)
intsctk = (xkis2 - xkis1) * (ykis2 - ykis1)
intsctk = intsctk * fluid.layers.greater_than(
xkis2, xkis1) * fluid.layers.greater_than(ykis2, ykis1)
unionk = (x2 - x1) * (y2 - y1) + (x2g - x1g) * (y2g - y1g
) - intsctk + eps
iouk = intsctk / unionk
if self.ciou_term:
ciou = self.get_ciou_term(pred, gt, iouk, eps)
iouk = iouk - ciou
return iouk
def get_ciou_term(self, pred, gt, iouk, eps):
x1, y1, x2, y2 = pred
x1g, y1g, x2g, y2g = gt
cx = (x1 + x2) / 2
cy = (y1 + y2) / 2
w = (x2 - x1) + fluid.layers.cast((x2 - x1) == 0, 'float32')
h = (y2 - y1) + fluid.layers.cast((y2 - y1) == 0, 'float32')
cxg = (x1g + x2g) / 2
cyg = (y1g + y2g) / 2
wg = x2g - x1g
hg = y2g - y1g
# A or B
xc1 = fluid.layers.elementwise_min(x1, x1g)
yc1 = fluid.layers.elementwise_min(y1, y1g)
xc2 = fluid.layers.elementwise_max(x2, x2g)
yc2 = fluid.layers.elementwise_max(y2, y2g)
# DIOU term
dist_intersection = (cx - cxg) * (cx - cxg) + (cy - cyg) * (cy - cyg)
dist_union = (xc2 - xc1) * (xc2 - xc1) + (yc2 - yc1) * (yc2 - yc1)
diou_term = (dist_intersection + eps) / (dist_union + eps)
# CIOU term
ciou_term = 0
ar_gt = wg / hg
ar_pred = w / h
arctan = fluid.layers.atan(ar_gt) - fluid.layers.atan(ar_pred)
ar_loss = 4. / np.pi / np.pi * arctan * arctan
alpha = ar_loss / (1 - iouk + ar_loss + eps)
alpha.stop_gradient = True
ciou_term = alpha * ar_loss
return diou_term + ciou_term
def _bbox_transform(self, dcx, dcy, dw, dh, anchors, downsample_ratio,
batch_size, is_gt, scale_x_y, eps):
grid_x = int(self._MAX_WI / downsample_ratio)
grid_y = int(self._MAX_HI / downsample_ratio)
an_num = len(anchors) // 2
shape_fmp = fluid.layers.shape(dcx)
shape_fmp.stop_gradient = True
# generate the grid_w x grid_h center of feature map
idx_i = np.array([[i for i in range(grid_x)]])
idx_j = np.array([[j for j in range(grid_y)]]).transpose()
gi_np = np.repeat(idx_i, grid_y, axis=0)
gi_np = np.reshape(gi_np, newshape=[1, 1, grid_y, grid_x])
gi_np = np.tile(gi_np, reps=[batch_size, an_num, 1, 1])
gj_np = np.repeat(idx_j, grid_x, axis=1)
gj_np = np.reshape(gj_np, newshape=[1, 1, grid_y, grid_x])
gj_np = np.tile(gj_np, reps=[batch_size, an_num, 1, 1])
gi_max = self._create_tensor_from_numpy(gi_np.astype(np.float32))
gi = fluid.layers.crop(x=gi_max, shape=dcx)
gi.stop_gradient = True
gj_max = self._create_tensor_from_numpy(gj_np.astype(np.float32))
gj = fluid.layers.crop(x=gj_max, shape=dcx)
gj.stop_gradient = True
grid_x_act = fluid.layers.cast(shape_fmp[3], dtype="float32")
grid_x_act.stop_gradient = True
grid_y_act = fluid.layers.cast(shape_fmp[2], dtype="float32")
grid_y_act.stop_gradient = True
if is_gt:
cx = fluid.layers.elementwise_add(dcx, gi) / grid_x_act
cx.gradient = True
cy = fluid.layers.elementwise_add(dcy, gj) / grid_y_act
cy.gradient = True
else:
dcx_sig = fluid.layers.sigmoid(dcx)
dcy_sig = fluid.layers.sigmoid(dcy)
if (abs(scale_x_y - 1.0) > eps):
dcx_sig = scale_x_y * dcx_sig - 0.5 * (scale_x_y - 1)
dcy_sig = scale_x_y * dcy_sig - 0.5 * (scale_x_y - 1)
cx = fluid.layers.elementwise_add(dcx_sig, gi) / grid_x_act
cy = fluid.layers.elementwise_add(dcy_sig, gj) / grid_y_act
anchor_w_ = [anchors[i] for i in range(0, len(anchors)) if i % 2 == 0]
anchor_w_np = np.array(anchor_w_)
anchor_w_np = np.reshape(anchor_w_np, newshape=[1, an_num, 1, 1])
anchor_w_np = np.tile(
anchor_w_np, reps=[batch_size, 1, grid_y, grid_x])
anchor_w_max = self._create_tensor_from_numpy(
anchor_w_np.astype(np.float32))
anchor_w = fluid.layers.crop(x=anchor_w_max, shape=dcx)
anchor_w.stop_gradient = True
anchor_h_ = [anchors[i] for i in range(0, len(anchors)) if i % 2 == 1]
anchor_h_np = np.array(anchor_h_)
anchor_h_np = np.reshape(anchor_h_np, newshape=[1, an_num, 1, 1])
anchor_h_np = np.tile(
anchor_h_np, reps=[batch_size, 1, grid_y, grid_x])
anchor_h_max = self._create_tensor_from_numpy(
anchor_h_np.astype(np.float32))
anchor_h = fluid.layers.crop(x=anchor_h_max, shape=dcx)
anchor_h.stop_gradient = True
# e^tw e^th
exp_dw = fluid.layers.exp(dw)
exp_dh = fluid.layers.exp(dh)
pw = fluid.layers.elementwise_mul(exp_dw, anchor_w) / \
(grid_x_act * downsample_ratio)
ph = fluid.layers.elementwise_mul(exp_dh, anchor_h) / \
(grid_y_act * downsample_ratio)
if is_gt:
exp_dw.stop_gradient = True
exp_dh.stop_gradient = True
pw.stop_gradient = True
ph.stop_gradient = True
x1 = cx - 0.5 * pw
y1 = cy - 0.5 * ph
x2 = cx + 0.5 * pw
y2 = cy + 0.5 * ph
if is_gt:
x1.stop_gradient = True
y1.stop_gradient = True
x2.stop_gradient = True
y2.stop_gradient = True
return x1, y1, x2, y2
def _create_tensor_from_numpy(self, numpy_array):
paddle_array = fluid.layers.create_parameter(
attr=ParamAttr(),
shape=numpy_array.shape,
dtype=numpy_array.dtype,
default_initializer=NumpyArrayInitializer(numpy_array))
paddle_array.stop_gradient = True
return paddle_array
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
try:
from collections.abc import Sequence
except Exception:
from collections import Sequence
class YOLOv3Loss(object):
"""
Combined loss for YOLOv3 network
Args:
batch_size (int): training batch size
ignore_thresh (float): threshold to ignore confidence loss
label_smooth (bool): whether to use label smoothing
use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss
instead of fluid.layers.yolov3_loss
"""
def __init__(self,
batch_size=8,
ignore_thresh=0.7,
label_smooth=True,
use_fine_grained_loss=False,
iou_loss=None,
iou_aware_loss=None,
downsample=[32, 16, 8],
scale_x_y=1.,
match_score=False):
self._batch_size = batch_size
self._ignore_thresh = ignore_thresh
self._label_smooth = label_smooth
self._use_fine_grained_loss = use_fine_grained_loss
self._iou_loss = iou_loss
self._iou_aware_loss = iou_aware_loss
self.downsample = downsample
self.scale_x_y = scale_x_y
self.match_score = match_score
def __call__(self, outputs, gt_box, gt_label, gt_score, targets, anchors,
anchor_masks, mask_anchors, num_classes, prefix_name):
if self._use_fine_grained_loss:
return self._get_fine_grained_loss(
outputs, targets, gt_box, self._batch_size, num_classes,
mask_anchors, self._ignore_thresh)
else:
losses = []
for i, output in enumerate(outputs):
scale_x_y = self.scale_x_y if not isinstance(
self.scale_x_y, Sequence) else self.scale_x_y[i]
anchor_mask = anchor_masks[i]
loss = fluid.layers.yolov3_loss(
x=output,
gt_box=gt_box,
gt_label=gt_label,
gt_score=gt_score,
anchors=anchors,
anchor_mask=anchor_mask,
class_num=num_classes,
ignore_thresh=self._ignore_thresh,
downsample_ratio=self.downsample[i],
use_label_smooth=self._label_smooth,
scale_x_y=scale_x_y,
name=prefix_name + "yolo_loss" + str(i))
losses.append(fluid.layers.reduce_mean(loss))
return {'loss': sum(losses)}
def _get_fine_grained_loss(self,
outputs,
targets,
gt_box,
batch_size,
num_classes,
mask_anchors,
ignore_thresh,
eps=1.e-10):
"""
Calculate fine grained YOLOv3 loss
Args:
outputs ([Variables]): List of Variables, output of backbone stages
targets ([Variables]): List of Variables, The targets for yolo
loss calculatation.
gt_box (Variable): The ground-truth boudding boxes.
batch_size (int): The training batch size
num_classes (int): class num of dataset
mask_anchors ([[float]]): list of anchors in each output layer
ignore_thresh (float): prediction bbox overlap any gt_box greater
than ignore_thresh, objectness loss will
be ignored.
Returns:
Type: dict
xy_loss (Variable): YOLOv3 (x, y) coordinates loss
wh_loss (Variable): YOLOv3 (w, h) coordinates loss
obj_loss (Variable): YOLOv3 objectness score loss
cls_loss (Variable): YOLOv3 classification loss
"""
assert len(outputs) == len(targets), \
"YOLOv3 output layer number not equal target number"
loss_xys, loss_whs, loss_objs, loss_clss = [], [], [], []
if self._iou_loss is not None:
loss_ious = []
if self._iou_aware_loss is not None:
loss_iou_awares = []
for i, (output, target,
anchors) in enumerate(zip(outputs, targets, mask_anchors)):
downsample = self.downsample[i]
an_num = len(anchors) // 2
if self._iou_aware_loss is not None:
ioup, output = self._split_ioup(output, an_num, num_classes)
x, y, w, h, obj, cls = self._split_output(output, an_num,
num_classes)
tx, ty, tw, th, tscale, tobj, tcls = self._split_target(target)
tscale_tobj = tscale * tobj
scale_x_y = self.scale_x_y if not isinstance(
self.scale_x_y, Sequence) else self.scale_x_y[i]
if (abs(scale_x_y - 1.0) < eps):
loss_x = fluid.layers.sigmoid_cross_entropy_with_logits(
x, tx) * tscale_tobj
loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3])
loss_y = fluid.layers.sigmoid_cross_entropy_with_logits(
y, ty) * tscale_tobj
loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3])
else:
dx = scale_x_y * fluid.layers.sigmoid(x) - 0.5 * (scale_x_y -
1.0)
dy = scale_x_y * fluid.layers.sigmoid(y) - 0.5 * (scale_x_y -
1.0)
loss_x = fluid.layers.abs(dx - tx) * tscale_tobj
loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3])
loss_y = fluid.layers.abs(dy - ty) * tscale_tobj
loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3])
# NOTE: we refined loss function of (w, h) as L1Loss
loss_w = fluid.layers.abs(w - tw) * tscale_tobj
loss_w = fluid.layers.reduce_sum(loss_w, dim=[1, 2, 3])
loss_h = fluid.layers.abs(h - th) * tscale_tobj
loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3])
if self._iou_loss is not None:
loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors,
downsample, self._batch_size,
scale_x_y)
loss_iou = loss_iou * tscale_tobj
loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3])
loss_ious.append(fluid.layers.reduce_mean(loss_iou))
if self._iou_aware_loss is not None:
loss_iou_aware = self._iou_aware_loss(
ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample,
self._batch_size, scale_x_y)
loss_iou_aware = loss_iou_aware * tobj
loss_iou_aware = fluid.layers.reduce_sum(
loss_iou_aware, dim=[1, 2, 3])
loss_iou_awares.append(
fluid.layers.reduce_mean(loss_iou_aware))
loss_obj_pos, loss_obj_neg = self._calc_obj_loss(
output, obj, tobj, gt_box, self._batch_size, anchors,
num_classes, downsample, self._ignore_thresh, scale_x_y)
loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls,
tcls)
loss_cls = fluid.layers.elementwise_mul(loss_cls, tobj, axis=0)
loss_cls = fluid.layers.reduce_sum(loss_cls, dim=[1, 2, 3, 4])
loss_xys.append(fluid.layers.reduce_mean(loss_x + loss_y))
loss_whs.append(fluid.layers.reduce_mean(loss_w + loss_h))
loss_objs.append(
fluid.layers.reduce_mean(loss_obj_pos + loss_obj_neg))
loss_clss.append(fluid.layers.reduce_mean(loss_cls))
losses_all = {
"loss_xy": fluid.layers.sum(loss_xys),
"loss_wh": fluid.layers.sum(loss_whs),
"loss_obj": fluid.layers.sum(loss_objs),
"loss_cls": fluid.layers.sum(loss_clss),
}
if self._iou_loss is not None:
losses_all["loss_iou"] = fluid.layers.sum(loss_ious)
if self._iou_aware_loss is not None:
losses_all["loss_iou_aware"] = fluid.layers.sum(loss_iou_awares)
return losses_all
def _split_ioup(self, output, an_num, num_classes):
"""
Split output feature map to output, predicted iou
along channel dimension
"""
ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num])
ioup = fluid.layers.sigmoid(ioup)
oriout = fluid.layers.slice(
output,
axes=[1],
starts=[an_num],
ends=[an_num * (num_classes + 6)])
return (ioup, oriout)
def _split_output(self, output, an_num, num_classes):
"""
Split output feature map to x, y, w, h, objectness, classification
along channel dimension
"""
x = fluid.layers.strided_slice(
output,
axes=[1],
starts=[0],
ends=[output.shape[1]],
strides=[5 + num_classes])
y = fluid.layers.strided_slice(
output,
axes=[1],
starts=[1],
ends=[output.shape[1]],
strides=[5 + num_classes])
w = fluid.layers.strided_slice(
output,
axes=[1],
starts=[2],
ends=[output.shape[1]],
strides=[5 + num_classes])
h = fluid.layers.strided_slice(
output,
axes=[1],
starts=[3],
ends=[output.shape[1]],
strides=[5 + num_classes])
obj = fluid.layers.strided_slice(
output,
axes=[1],
starts=[4],
ends=[output.shape[1]],
strides=[5 + num_classes])
clss = []
stride = output.shape[1] // an_num
for m in range(an_num):
clss.append(
fluid.layers.slice(
output,
axes=[1],
starts=[stride * m + 5],
ends=[stride * m + 5 + num_classes]))
cls = fluid.layers.transpose(
fluid.layers.stack(
clss, axis=1), perm=[0, 1, 3, 4, 2])
return (x, y, w, h, obj, cls)
def _split_target(self, target):
"""
split target to x, y, w, h, objectness, classification
along dimension 2
target is in shape [N, an_num, 6 + class_num, H, W]
"""
tx = target[:, :, 0, :, :]
ty = target[:, :, 1, :, :]
tw = target[:, :, 2, :, :]
th = target[:, :, 3, :, :]
tscale = target[:, :, 4, :, :]
tobj = target[:, :, 5, :, :]
tcls = fluid.layers.transpose(
target[:, :, 6:, :, :], perm=[0, 1, 3, 4, 2])
tcls.stop_gradient = True
return (tx, ty, tw, th, tscale, tobj, tcls)
def _calc_obj_loss(self, output, obj, tobj, gt_box, batch_size, anchors,
num_classes, downsample, ignore_thresh, scale_x_y):
# A prediction bbox overlap any gt_bbox over ignore_thresh,
# objectness loss will be ignored, process as follows:
# 1. get pred bbox, which is same with YOLOv3 infer mode, use yolo_box here
# NOTE: img_size is set as 1.0 to get noramlized pred bbox
bbox, prob = fluid.layers.yolo_box(
x=output,
img_size=fluid.layers.ones(
shape=[batch_size, 2], dtype="int32"),
anchors=anchors,
class_num=num_classes,
conf_thresh=0.,
downsample_ratio=downsample,
clip_bbox=False,
scale_x_y=scale_x_y)
# 2. split pred bbox and gt bbox by sample, calculate IoU between pred bbox
# and gt bbox in each sample
if batch_size > 1:
preds = fluid.layers.split(bbox, batch_size, dim=0)
gts = fluid.layers.split(gt_box, batch_size, dim=0)
else:
preds = [bbox]
gts = [gt_box]
probs = [prob]
ious = []
for pred, gt in zip(preds, gts):
def box_xywh2xyxy(box):
x = box[:, 0]
y = box[:, 1]
w = box[:, 2]
h = box[:, 3]
return fluid.layers.stack(
[
x - w / 2.,
y - h / 2.,
x + w / 2.,
y + h / 2.,
], axis=1)
pred = fluid.layers.squeeze(pred, axes=[0])
gt = box_xywh2xyxy(fluid.layers.squeeze(gt, axes=[0]))
ious.append(fluid.layers.iou_similarity(pred, gt))
iou = fluid.layers.stack(ious, axis=0)
# 3. Get iou_mask by IoU between gt bbox and prediction bbox,
# Get obj_mask by tobj(holds gt_score), calculate objectness loss
max_iou = fluid.layers.reduce_max(iou, dim=-1)
iou_mask = fluid.layers.cast(max_iou <= ignore_thresh, dtype="float32")
if self.match_score:
max_prob = fluid.layers.reduce_max(prob, dim=-1)
iou_mask = iou_mask * fluid.layers.cast(
max_prob <= 0.25, dtype="float32")
output_shape = fluid.layers.shape(output)
an_num = len(anchors) // 2
iou_mask = fluid.layers.reshape(iou_mask, (-1, an_num, output_shape[2],
output_shape[3]))
iou_mask.stop_gradient = True
# NOTE: tobj holds gt_score, obj_mask holds object existence mask
obj_mask = fluid.layers.cast(tobj > 0., dtype="float32")
obj_mask.stop_gradient = True
# For positive objectness grids, objectness loss should be calculated
# For negative objectness grids, objectness loss is calculated only iou_mask == 1.0
loss_obj = fluid.layers.sigmoid_cross_entropy_with_logits(obj,
obj_mask)
loss_obj_pos = fluid.layers.reduce_sum(loss_obj * tobj, dim=[1, 2, 3])
loss_obj_neg = fluid.layers.reduce_sum(
loss_obj * (1.0 - obj_mask) * iou_mask, dim=[1, 2, 3])
return loss_obj_pos, loss_obj_neg
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
from numbers import Integral
import math
import six
import paddle
from paddle import fluid
def DropBlock(input, block_size, keep_prob, is_test):
if is_test:
return input
def CalculateGamma(input, block_size, keep_prob):
input_shape = fluid.layers.shape(input)
feat_shape_tmp = fluid.layers.slice(input_shape, [0], [3], [4])
feat_shape_tmp = fluid.layers.cast(feat_shape_tmp, dtype="float32")
feat_shape_t = fluid.layers.reshape(feat_shape_tmp, [1, 1, 1, 1])
feat_area = fluid.layers.pow(feat_shape_t, factor=2)
block_shape_t = fluid.layers.fill_constant(
shape=[1, 1, 1, 1], value=block_size, dtype='float32')
block_area = fluid.layers.pow(block_shape_t, factor=2)
useful_shape_t = feat_shape_t - block_shape_t + 1
useful_area = fluid.layers.pow(useful_shape_t, factor=2)
upper_t = feat_area * (1 - keep_prob)
bottom_t = block_area * useful_area
output = upper_t / bottom_t
return output
gamma = CalculateGamma(input, block_size=block_size, keep_prob=keep_prob)
input_shape = fluid.layers.shape(input)
p = fluid.layers.expand_as(gamma, input)
input_shape_tmp = fluid.layers.cast(input_shape, dtype="int64")
random_matrix = fluid.layers.uniform_random(
input_shape_tmp, dtype='float32', min=0.0, max=1.0)
one_zero_m = fluid.layers.less_than(random_matrix, p)
one_zero_m.stop_gradient = True
one_zero_m = fluid.layers.cast(one_zero_m, dtype="float32")
mask_flag = fluid.layers.pool2d(
one_zero_m,
pool_size=block_size,
pool_type='max',
pool_stride=1,
pool_padding=block_size // 2)
mask = 1.0 - mask_flag
elem_numel = fluid.layers.reduce_prod(input_shape)
elem_numel_m = fluid.layers.cast(elem_numel, dtype="float32")
elem_numel_m.stop_gradient = True
elem_sum = fluid.layers.reduce_sum(mask)
elem_sum_m = fluid.layers.cast(elem_sum, dtype="float32")
elem_sum_m.stop_gradient = True
output = input * mask * elem_numel_m / elem_sum_m
return output
class MultiClassNMS(object):
def __init__(self,
score_threshold=.05,
nms_top_k=-1,
keep_top_k=100,
nms_threshold=.5,
normalized=False,
nms_eta=1.0,
background_label=0):
super(MultiClassNMS, self).__init__()
self.score_threshold = score_threshold
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
self.nms_threshold = nms_threshold
self.normalized = normalized
self.nms_eta = nms_eta
self.background_label = background_label
def __call__(self, bboxes, scores):
return fluid.layers.multiclass_nms(
bboxes=bboxes,
scores=scores,
score_threshold=self.score_threshold,
nms_top_k=self.nms_top_k,
keep_top_k=self.keep_top_k,
normalized=self.normalized,
nms_threshold=self.nms_threshold,
nms_eta=self.nms_eta,
background_label=self.background_label)
class MatrixNMS(object):
def __init__(self,
score_threshold=.05,
post_threshold=.05,
nms_top_k=-1,
keep_top_k=100,
use_gaussian=False,
gaussian_sigma=2.,
normalized=False,
background_label=0):
super(MatrixNMS, self).__init__()
self.score_threshold = score_threshold
self.post_threshold = post_threshold
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
self.normalized = normalized
self.use_gaussian = use_gaussian
self.gaussian_sigma = gaussian_sigma
self.background_label = background_label
def __call__(self, bboxes, scores):
return paddle.fluid.layers.matrix_nms(
bboxes=bboxes,
scores=scores,
score_threshold=self.score_threshold,
post_threshold=self.post_threshold,
nms_top_k=self.nms_top_k,
keep_top_k=self.keep_top_k,
normalized=self.normalized,
use_gaussian=self.use_gaussian,
gaussian_sigma=self.gaussian_sigma,
background_label=self.background_label)
class MultiClassSoftNMS(object):
def __init__(
self,
score_threshold=0.01,
keep_top_k=300,
softnms_sigma=0.5,
normalized=False,
background_label=0, ):
super(MultiClassSoftNMS, self).__init__()
self.score_threshold = score_threshold
self.keep_top_k = keep_top_k
self.softnms_sigma = softnms_sigma
self.normalized = normalized
self.background_label = background_label
def __call__(self, bboxes, scores):
def create_tmp_var(program, name, dtype, shape, lod_level):
return program.current_block().create_var(
name=name, dtype=dtype, shape=shape, lod_level=lod_level)
def _soft_nms_for_cls(dets, sigma, thres):
"""soft_nms_for_cls"""
dets_final = []
while len(dets) > 0:
maxpos = np.argmax(dets[:, 0])
dets_final.append(dets[maxpos].copy())
ts, tx1, ty1, tx2, ty2 = dets[maxpos]
scores = dets[:, 0]
# force remove bbox at maxpos
scores[maxpos] = -1
x1 = dets[:, 1]
y1 = dets[:, 2]
x2 = dets[:, 3]
y2 = dets[:, 4]
eta = 0 if self.normalized else 1
areas = (x2 - x1 + eta) * (y2 - y1 + eta)
xx1 = np.maximum(tx1, x1)
yy1 = np.maximum(ty1, y1)
xx2 = np.minimum(tx2, x2)
yy2 = np.minimum(ty2, y2)
w = np.maximum(0.0, xx2 - xx1 + eta)
h = np.maximum(0.0, yy2 - yy1 + eta)
inter = w * h
ovr = inter / (areas + areas[maxpos] - inter)
weight = np.exp(-(ovr * ovr) / sigma)
scores = scores * weight
idx_keep = np.where(scores >= thres)
dets[:, 0] = scores
dets = dets[idx_keep]
dets_final = np.array(dets_final).reshape(-1, 5)
return dets_final
def _soft_nms(bboxes, scores):
class_nums = scores.shape[-1]
softnms_thres = self.score_threshold
softnms_sigma = self.softnms_sigma
keep_top_k = self.keep_top_k
cls_boxes = [[] for _ in range(class_nums)]
cls_ids = [[] for _ in range(class_nums)]
start_idx = 1 if self.background_label == 0 else 0
for j in range(start_idx, class_nums):
inds = np.where(scores[:, j] >= softnms_thres)[0]
scores_j = scores[inds, j]
rois_j = bboxes[inds, j, :] if len(
bboxes.shape) > 2 else bboxes[inds, :]
dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
np.float32, copy=False)
cls_rank = np.argsort(-dets_j[:, 0])
dets_j = dets_j[cls_rank]
cls_boxes[j] = _soft_nms_for_cls(
dets_j, sigma=softnms_sigma, thres=softnms_thres)
cls_ids[j] = np.array([j] * cls_boxes[j].shape[0]).reshape(-1,
1)
cls_boxes = np.vstack(cls_boxes[start_idx:])
cls_ids = np.vstack(cls_ids[start_idx:])
pred_result = np.hstack([cls_ids, cls_boxes])
# Limit to max_per_image detections **over all classes**
image_scores = cls_boxes[:, 0]
if len(image_scores) > keep_top_k:
image_thresh = np.sort(image_scores)[-keep_top_k]
keep = np.where(cls_boxes[:, 0] >= image_thresh)[0]
pred_result = pred_result[keep, :]
return pred_result
def _batch_softnms(bboxes, scores):
batch_offsets = bboxes.lod()
bboxes = np.array(bboxes)
scores = np.array(scores)
out_offsets = [0]
pred_res = []
if len(batch_offsets) > 0:
batch_offset = batch_offsets[0]
for i in range(len(batch_offset) - 1):
s, e = batch_offset[i], batch_offset[i + 1]
pred = _soft_nms(bboxes[s:e], scores[s:e])
out_offsets.append(pred.shape[0] + out_offsets[-1])
pred_res.append(pred)
else:
assert len(bboxes.shape) == 3
assert len(scores.shape) == 3
for i in range(bboxes.shape[0]):
pred = _soft_nms(bboxes[i], scores[i])
out_offsets.append(pred.shape[0] + out_offsets[-1])
pred_res.append(pred)
res = fluid.LoDTensor()
res.set_lod([out_offsets])
if len(pred_res) == 0:
pred_res = np.array([[1]], dtype=np.float32)
res.set(np.vstack(pred_res).astype(np.float32), fluid.CPUPlace())
return res
pred_result = create_tmp_var(
fluid.default_main_program(),
name='softnms_pred_result',
dtype='float32',
shape=[-1, 6],
lod_level=1)
fluid.layers.py_func(
func=_batch_softnms, x=[bboxes, scores], out=pred_result)
return pred_result
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -16,25 +16,50 @@ from paddle import fluid ...@@ -16,25 +16,50 @@ from paddle import fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from collections import OrderedDict from collections import OrderedDict
from .ops import MultiClassNMS, MultiClassSoftNMS, MatrixNMS
from .ops import DropBlock
from .loss.yolo_loss import YOLOv3Loss
from .loss.iou_loss import IouLoss
from .loss.iou_aware_loss import IouAwareLoss
from .iou_aware import get_iou_aware_score
try:
from collections.abc import Sequence
except Exception:
from collections import Sequence
class YOLOv3: class YOLOv3:
def __init__(self, def __init__(
backbone, self,
num_classes, backbone,
mode='train', mode='train',
anchors=None, # YOLOv3Head
anchor_masks=None, num_classes=80,
ignore_threshold=0.7, anchors=None,
label_smooth=False, anchor_masks=None,
nms_score_threshold=0.01, coord_conv=False,
nms_topk=1000, iou_aware=False,
nms_keep_topk=100, iou_aware_factor=0.4,
nms_iou_threshold=0.45, scale_x_y=1.,
train_random_shapes=[ spp=False,
320, 352, 384, 416, 448, 480, 512, 544, 576, 608 drop_block=False,
], use_matrix_nms=False,
fixed_input_shape=None): # YOLOv3Loss
batch_size=8,
ignore_threshold=0.7,
label_smooth=False,
use_fine_grained_loss=False,
use_iou_loss=False,
iou_loss_weight=2.5,
iou_aware_loss_weight=1.0,
max_height=608,
max_width=608,
# NMS
nms_score_threshold=0.01,
nms_topk=1000,
nms_keep_topk=100,
nms_iou_threshold=0.45,
fixed_input_shape=None):
if anchors is None: if anchors is None:
anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]] [59, 119], [116, 90], [156, 198], [373, 326]]
...@@ -46,56 +71,114 @@ class YOLOv3: ...@@ -46,56 +71,114 @@ class YOLOv3:
self.mode = mode self.mode = mode
self.num_classes = num_classes self.num_classes = num_classes
self.backbone = backbone self.backbone = backbone
self.ignore_thresh = ignore_threshold
self.label_smooth = label_smooth
self.nms_score_threshold = nms_score_threshold
self.nms_topk = nms_topk
self.nms_keep_topk = nms_keep_topk
self.nms_iou_threshold = nms_iou_threshold
self.norm_decay = 0.0 self.norm_decay = 0.0
self.prefix_name = '' self.prefix_name = ''
self.train_random_shapes = train_random_shapes self.use_fine_grained_loss = use_fine_grained_loss
self.fixed_input_shape = fixed_input_shape self.fixed_input_shape = fixed_input_shape
self.coord_conv = coord_conv
self.iou_aware = iou_aware
self.iou_aware_factor = iou_aware_factor
self.scale_x_y = scale_x_y
self.use_spp = spp
self.drop_block = drop_block
def _head(self, feats): if use_matrix_nms:
self.nms = MatrixNMS(
background_label=-1,
keep_top_k=nms_keep_topk,
normalized=False,
score_threshold=nms_score_threshold,
post_threshold=0.01)
else:
self.nms = MultiClassNMS(
background_label=-1,
keep_top_k=nms_keep_topk,
nms_threshold=nms_iou_threshold,
nms_top_k=nms_topk,
normalized=False,
score_threshold=nms_score_threshold)
self.iou_loss = None
self.iou_aware_loss = None
if use_iou_loss:
self.iou_loss = IouLoss(
loss_weight=iou_loss_weight,
max_height=max_height,
max_width=max_width)
if iou_aware:
self.iou_aware_loss = IouAwareLoss(
loss_weight=iou_aware_loss_weight,
max_height=max_height,
max_width=max_width)
self.yolo_loss = YOLOv3Loss(
batch_size=batch_size,
ignore_thresh=ignore_threshold,
scale_x_y=scale_x_y,
label_smooth=label_smooth,
use_fine_grained_loss=self.use_fine_grained_loss,
iou_loss=self.iou_loss,
iou_aware_loss=self.iou_aware_loss)
self.conv_block_num = 2
self.block_size = 3
self.keep_prob = 0.9
self.downsample = [32, 16, 8]
self.clip_bbox = True
def _head(self, input, is_train=True):
outputs = [] outputs = []
# get last out_layer_num blocks in reverse order
out_layer_num = len(self.anchor_masks) out_layer_num = len(self.anchor_masks)
blocks = feats[-1:-out_layer_num - 1:-1] blocks = input[-1:-out_layer_num - 1:-1]
route = None
route = None
for i, block in enumerate(blocks): for i, block in enumerate(blocks):
if i > 0: if i > 0: # perform concat in first 2 detection_block
block = fluid.layers.concat(input=[route, block], axis=1) block = fluid.layers.concat(input=[route, block], axis=1)
route, tip = self._detection_block( route, tip = self._detection_block(
block, block,
channel=512 // (2**i), channel=64 * (2**out_layer_num) // (2**i),
name=self.prefix_name + 'yolo_block.{}'.format(i)) is_first=i == 0,
is_test=(not is_train),
conv_block_num=self.conv_block_num,
name=self.prefix_name + "yolo_block.{}".format(i))
num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5) # out channel number = mask_num * (5 + class_num)
block_out = fluid.layers.conv2d( if self.iou_aware:
input=tip, num_filters = len(self.anchor_masks[i]) * (
num_filters=num_filters, self.num_classes + 6)
filter_size=1, else:
stride=1, num_filters = len(self.anchor_masks[i]) * (
padding=0, self.num_classes + 5)
act=None, with fluid.name_scope('yolo_output'):
param_attr=ParamAttr(name=self.prefix_name + block_out = fluid.layers.conv2d(
'yolo_output.{}.conv.weights'.format(i)), input=tip,
bias_attr=ParamAttr( num_filters=num_filters,
regularizer=L2Decay(0.0), filter_size=1,
name=self.prefix_name + stride=1,
'yolo_output.{}.conv.bias'.format(i))) padding=0,
outputs.append(block_out) act=None,
param_attr=ParamAttr(
name=self.prefix_name +
"yolo_output.{}.conv.weights".format(i)),
bias_attr=ParamAttr(
regularizer=L2Decay(0.),
name=self.prefix_name +
"yolo_output.{}.conv.bias".format(i)))
outputs.append(block_out)
if i < len(blocks) - 1: if i < len(blocks) - 1:
# do not perform upsample in the last detection_block
route = self._conv_bn( route = self._conv_bn(
input=route, input=route,
ch_out=256 // (2**i), ch_out=256 // (2**i),
filter_size=1, filter_size=1,
stride=1, stride=1,
padding=0, padding=0,
name=self.prefix_name + 'yolo_transition.{}'.format(i)) is_test=(not is_train),
name=self.prefix_name + "yolo_transition.{}".format(i))
# upsample
route = self._upsample(route) route = self._upsample(route)
return outputs return outputs
def _parse_anchors(self, anchors): def _parse_anchors(self, anchors):
...@@ -116,6 +199,54 @@ class YOLOv3: ...@@ -116,6 +199,54 @@ class YOLOv3:
assert mask < anchor_num, "anchor mask index overflow" assert mask < anchor_num, "anchor mask index overflow"
self.mask_anchors[-1].extend(anchors[mask]) self.mask_anchors[-1].extend(anchors[mask])
def _create_tensor_from_numpy(self, numpy_array):
paddle_array = fluid.layers.create_global_var(
shape=numpy_array.shape, value=0., dtype=numpy_array.dtype)
fluid.layers.assign(numpy_array, paddle_array)
return paddle_array
def _add_coord(self, input, is_test=True):
if not self.coord_conv:
return input
# NOTE: here is used for exporting model for TensorRT inference,
# only support batch_size=1 for input shape should be fixed,
# and we create tensor with fixed shape from numpy array
if is_test and input.shape[2] > 0 and input.shape[3] > 0:
batch_size = 1
grid_x = int(input.shape[3])
grid_y = int(input.shape[2])
idx_i = np.array(
[[i / (grid_x - 1) * 2.0 - 1 for i in range(grid_x)]],
dtype='float32')
gi_np = np.repeat(idx_i, grid_y, axis=0)
gi_np = np.reshape(gi_np, newshape=[1, 1, grid_y, grid_x])
gi_np = np.tile(gi_np, reps=[batch_size, 1, 1, 1])
x_range = self._create_tensor_from_numpy(gi_np.astype(np.float32))
x_range.stop_gradient = True
y_range = self._create_tensor_from_numpy(
gi_np.transpose([0, 1, 3, 2]).astype(np.float32))
y_range.stop_gradient = True
# NOTE: in training mode, H and W is variable for random shape,
# implement add_coord with shape as Variable
else:
input_shape = fluid.layers.shape(input)
b = input_shape[0]
h = input_shape[2]
w = input_shape[3]
x_range = fluid.layers.range(0, w, 1, 'float32') / ((w - 1.) / 2.)
x_range = x_range - 1.
x_range = fluid.layers.unsqueeze(x_range, [0, 1, 2])
x_range = fluid.layers.expand(x_range, [b, 1, h, 1])
x_range.stop_gradient = True
y_range = fluid.layers.transpose(x_range, [0, 1, 3, 2])
y_range.stop_gradient = True
return fluid.layers.concat([input, x_range, y_range], axis=1)
def _conv_bn(self, def _conv_bn(self,
input, input,
ch_out, ch_out,
...@@ -151,18 +282,52 @@ class YOLOv3: ...@@ -151,18 +282,52 @@ class YOLOv3:
out = fluid.layers.leaky_relu(x=out, alpha=0.1) out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out return out
def _spp_module(self, input, is_test=True, name=""):
output1 = input
output2 = fluid.layers.pool2d(
input=output1,
pool_size=5,
pool_stride=1,
pool_padding=2,
ceil_mode=False,
pool_type='max')
output3 = fluid.layers.pool2d(
input=output1,
pool_size=9,
pool_stride=1,
pool_padding=4,
ceil_mode=False,
pool_type='max')
output4 = fluid.layers.pool2d(
input=output1,
pool_size=13,
pool_stride=1,
pool_padding=6,
ceil_mode=False,
pool_type='max')
output = fluid.layers.concat(
input=[output1, output2, output3, output4], axis=1)
return output
def _upsample(self, input, scale=2, name=None): def _upsample(self, input, scale=2, name=None):
out = fluid.layers.resize_nearest( out = fluid.layers.resize_nearest(
input=input, scale=float(scale), name=name, align_corners=False) input=input, scale=float(scale), name=name, align_corners=False)
return out return out
def _detection_block(self, input, channel, name=None): def _detection_block(self,
assert channel % 2 == 0, "channel({}) cannot be divided by 2 in detection block({})".format( input,
channel, name) channel,
conv_block_num=2,
is_first=False,
is_test=True,
name=None):
assert channel % 2 == 0, \
"channel {} cannot be divided by 2 in detection block {}" \
.format(channel, name)
is_test = False if self.mode == 'train' else True
conv = input conv = input
for i in range(2): for j in range(conv_block_num):
conv = self._add_coord(conv, is_test=is_test)
conv = self._conv_bn( conv = self._conv_bn(
conv, conv,
channel, channel,
...@@ -170,7 +335,17 @@ class YOLOv3: ...@@ -170,7 +335,17 @@ class YOLOv3:
stride=1, stride=1,
padding=0, padding=0,
is_test=is_test, is_test=is_test,
name='{}.{}.0'.format(name, i)) name='{}.{}.0'.format(name, j))
if self.use_spp and is_first and j == 1:
conv = self._spp_module(conv, is_test=is_test, name="spp")
conv = self._conv_bn(
conv,
512,
filter_size=1,
stride=1,
padding=0,
is_test=is_test,
name='{}.{}.spp.conv'.format(name, j))
conv = self._conv_bn( conv = self._conv_bn(
conv, conv,
channel * 2, channel * 2,
...@@ -178,7 +353,21 @@ class YOLOv3: ...@@ -178,7 +353,21 @@ class YOLOv3:
stride=1, stride=1,
padding=1, padding=1,
is_test=is_test, is_test=is_test,
name='{}.{}.1'.format(name, i)) name='{}.{}.1'.format(name, j))
if self.drop_block and j == 0 and not is_first:
conv = DropBlock(
conv,
block_size=self.block_size,
keep_prob=self.keep_prob,
is_test=is_test)
if self.drop_block and is_first:
conv = DropBlock(
conv,
block_size=self.block_size,
keep_prob=self.keep_prob,
is_test=is_test)
conv = self._add_coord(conv, is_test=is_test)
route = self._conv_bn( route = self._conv_bn(
conv, conv,
channel, channel,
...@@ -187,8 +376,9 @@ class YOLOv3: ...@@ -187,8 +376,9 @@ class YOLOv3:
padding=0, padding=0,
is_test=is_test, is_test=is_test,
name='{}.2'.format(name)) name='{}.2'.format(name))
new_route = self._add_coord(route, is_test=is_test)
tip = self._conv_bn( tip = self._conv_bn(
route, new_route,
channel * 2, channel * 2,
filter_size=3, filter_size=3,
stride=1, stride=1,
...@@ -197,54 +387,44 @@ class YOLOv3: ...@@ -197,54 +387,44 @@ class YOLOv3:
name='{}.tip'.format(name)) name='{}.tip'.format(name))
return route, tip return route, tip
def _get_loss(self, inputs, gt_box, gt_label, gt_score): def _get_loss(self, inputs, gt_box, gt_label, gt_score, targets):
losses = [] loss = self.yolo_loss(inputs, gt_box, gt_label, gt_score, targets,
downsample = 32 self.anchors, self.anchor_masks,
for i, input in enumerate(inputs): self.mask_anchors, self.num_classes,
loss = fluid.layers.yolov3_loss( self.prefix_name)
x=input, total_loss = fluid.layers.sum(list(loss.values()))
gt_box=gt_box, return total_loss
gt_label=gt_label,
gt_score=gt_score,
anchors=self.anchors,
anchor_mask=self.anchor_masks[i],
class_num=self.num_classes,
ignore_thresh=self.ignore_thresh,
downsample_ratio=downsample,
use_label_smooth=self.label_smooth,
name=self.prefix_name + 'yolo_loss' + str(i))
losses.append(fluid.layers.reduce_mean(loss))
downsample //= 2
return sum(losses)
def _get_prediction(self, inputs, im_size): def _get_prediction(self, inputs, im_size):
boxes = [] boxes = []
scores = [] scores = []
downsample = 32
for i, input in enumerate(inputs): for i, input in enumerate(inputs):
if self.iou_aware:
input = get_iou_aware_score(input,
len(self.anchor_masks[i]),
self.num_classes,
self.iou_aware_factor)
scale_x_y = self.scale_x_y if not isinstance(
self.scale_x_y, Sequence) else self.scale_x_y[i]
box, score = fluid.layers.yolo_box( box, score = fluid.layers.yolo_box(
x=input, x=input,
img_size=im_size, img_size=im_size,
anchors=self.mask_anchors[i], anchors=self.mask_anchors[i],
class_num=self.num_classes, class_num=self.num_classes,
conf_thresh=self.nms_score_threshold, conf_thresh=self.nms.score_threshold,
downsample_ratio=downsample, downsample_ratio=self.downsample[i],
name=self.prefix_name + 'yolo_box' + str(i)) name=self.prefix_name + 'yolo_box' + str(i),
clip_bbox=self.clip_bbox,
scale_x_y=self.scale_x_y)
boxes.append(box) boxes.append(box)
scores.append(fluid.layers.transpose(score, perm=[0, 2, 1])) scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
downsample //= 2
yolo_boxes = fluid.layers.concat(boxes, axis=1) yolo_boxes = fluid.layers.concat(boxes, axis=1)
yolo_scores = fluid.layers.concat(scores, axis=2) yolo_scores = fluid.layers.concat(scores, axis=2)
pred = fluid.layers.multiclass_nms( if type(self.nms) is MultiClassSoftNMS:
bboxes=yolo_boxes, yolo_scores = fluid.layers.transpose(yolo_scores, perm=[0, 2, 1])
scores=yolo_scores, pred = self.nms(bboxes=yolo_boxes, scores=yolo_scores)
score_threshold=self.nms_score_threshold,
nms_top_k=self.nms_topk,
keep_top_k=self.nms_keep_topk,
nms_threshold=self.nms_iou_threshold,
normalized=False,
nms_eta=1.0,
background_label=-1)
return pred return pred
def generate_inputs(self): def generate_inputs(self):
...@@ -267,6 +447,25 @@ class YOLOv3: ...@@ -267,6 +447,25 @@ class YOLOv3:
dtype='float32', shape=[None, None], name='gt_score') dtype='float32', shape=[None, None], name='gt_score')
inputs['im_size'] = fluid.data( inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size') dtype='int32', shape=[None, 2], name='im_size')
if self.use_fine_grained_loss:
downsample = 32
for i, mask in enumerate(self.anchor_masks):
if self.fixed_input_shape is not None:
target_shape = [
self.fixed_input_shape[1] // downsample,
self.fixed_input_shape[0] // downsample
]
else:
target_shape = [None, None]
inputs['target{}'.format(i)] = fluid.data(
dtype='float32',
lod_level=0,
shape=[
None, len(mask), 6 + self.num_classes,
target_shape[0], target_shape[1]
],
name='target{}'.format(i))
downsample //= 2
elif self.mode == 'eval': elif self.mode == 'eval':
inputs['im_size'] = fluid.data( inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size') dtype='int32', shape=[None, 2], name='im_size')
...@@ -285,28 +484,12 @@ class YOLOv3: ...@@ -285,28 +484,12 @@ class YOLOv3:
def build_net(self, inputs): def build_net(self, inputs):
image = inputs['image'] image = inputs['image']
if self.mode == 'train':
if isinstance(self.train_random_shapes,
(list, tuple)) and len(self.train_random_shapes) > 0:
import numpy as np
shapes = np.array(self.train_random_shapes)
shapes = np.stack([shapes, shapes], axis=1).astype('float32')
shapes_tensor = fluid.layers.assign(shapes)
index = fluid.layers.uniform_random(
shape=[1], dtype='float32', min=0.0, max=1)
index = fluid.layers.cast(
index * len(self.train_random_shapes), dtype='int32')
shape = fluid.layers.gather(shapes_tensor, index)
shape = fluid.layers.reshape(shape, [-1])
shape = fluid.layers.cast(shape, dtype='int32')
image = fluid.layers.resize_nearest(
image, out_shape=shape, align_corners=False)
feats = self.backbone(image) feats = self.backbone(image)
if isinstance(feats, OrderedDict): if isinstance(feats, OrderedDict):
feat_names = list(feats.keys()) feat_names = list(feats.keys())
feats = [feats[name] for name in feat_names] feats = [feats[name] for name in feat_names]
head_outputs = self._head(feats) head_outputs = self._head(feats, self.mode == 'train')
if self.mode == 'train': if self.mode == 'train':
gt_box = inputs['gt_box'] gt_box = inputs['gt_box']
gt_label = inputs['gt_label'] gt_label = inputs['gt_label']
...@@ -320,8 +503,15 @@ class YOLOv3: ...@@ -320,8 +503,15 @@ class YOLOv3:
whwh = fluid.layers.cast(whwh, dtype='float32') whwh = fluid.layers.cast(whwh, dtype='float32')
whwh.stop_gradient = True whwh.stop_gradient = True
normalized_box = fluid.layers.elementwise_div(gt_box, whwh) normalized_box = fluid.layers.elementwise_div(gt_box, whwh)
targets = []
if self.use_fine_grained_loss:
for i, mask in enumerate(self.anchor_masks):
k = 'target{}'.format(i)
if k in inputs:
targets.append(inputs[k])
return self._get_loss(head_outputs, normalized_box, gt_label, return self._get_loss(head_outputs, normalized_box, gt_label,
gt_score) gt_score, targets)
else: else:
im_size = inputs['im_size'] im_size = inputs['im_size']
return self._get_prediction(head_outputs, im_size) return self._get_prediction(head_outputs, im_size)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -30,10 +30,10 @@ class MobileNetV2: ...@@ -30,10 +30,10 @@ class MobileNetV2:
self.output_stride = output_stride self.output_stride = output_stride
self.end_points = end_points self.end_points = end_points
self.decode_points = decode_points self.decode_points = decode_points
self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2), self.bottleneck_params_list = [
(6, 32, 3, 2), (6, 64, 4, 2), (1, 16, 1, 1), (6, 24, 2, 2), (6, 32, 3, 2), (6, 64, 4, 2),
(6, 96, 3, 1), (6, 160, 3, 2), (6, 96, 3, 1), (6, 160, 3, 2), (6, 320, 1, 1)
(6, 320, 1, 1)] ]
self.modify_bottle_params(output_stride) self.modify_bottle_params(output_stride)
def __call__(self, input): def __call__(self, input):
...@@ -104,11 +104,10 @@ class MobileNetV2: ...@@ -104,11 +104,10 @@ class MobileNetV2:
output = fluid.layers.pool2d( output = fluid.layers.pool2d(
input=output, pool_type='avg', global_pooling=True) input=output, pool_type='avg', global_pooling=True)
output = fluid.layers.fc( output = fluid.layers.fc(input=output,
input=output, size=self.num_classes,
size=self.num_classes, param_attr=ParamAttr(name='fc10_weights'),
param_attr=ParamAttr(name='fc10_weights'), bias_attr=ParamAttr(name='fc10_offset'))
bias_attr=ParamAttr(name='fc10_offset'))
return output return output
def modify_bottle_params(self, output_stride=None): def modify_bottle_params(self, output_stride=None):
...@@ -239,4 +238,4 @@ class MobileNetV2: ...@@ -239,4 +238,4 @@ class MobileNetV2:
padding=1, padding=1,
expansion_factor=t, expansion_factor=t,
name=name + '_' + str(i + 1)) name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output return last_residual_block, depthwise_output
\ No newline at end of file
...@@ -42,7 +42,9 @@ class MobileNetV3(): ...@@ -42,7 +42,9 @@ class MobileNetV3():
extra_block_filters=[[256, 512], [128, 256], [128, 256], extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]], [64, 128]],
num_classes=None, num_classes=None,
lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0]): lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
for_seg=False,
output_stride=None):
assert len(lr_mult_list) == 5, \ assert len(lr_mult_list) == 5, \
"lr_mult_list length in MobileNetV3 must be 5 but got {}!!".format( "lr_mult_list length in MobileNetV3 must be 5 but got {}!!".format(
len(lr_mult_list)) len(lr_mult_list))
...@@ -57,48 +59,112 @@ class MobileNetV3(): ...@@ -57,48 +59,112 @@ class MobileNetV3():
self.num_classes = num_classes self.num_classes = num_classes
self.lr_mult_list = lr_mult_list self.lr_mult_list = lr_mult_list
self.curr_stage = 0 self.curr_stage = 0
if model_name == "large": self.for_seg = for_seg
self.cfg = [ self.decode_point = None
# kernel_size, expand, channel, se_block, act_mode, stride
[3, 16, 16, False, 'relu', 1], if self.for_seg:
[3, 64, 24, False, 'relu', 2], if model_name == "large":
[3, 72, 24, False, 'relu', 1], self.cfg = [
[5, 72, 40, True, 'relu', 2], # k, exp, c, se, nl, s,
[5, 120, 40, True, 'relu', 1], [3, 16, 16, False, 'relu', 1],
[5, 120, 40, True, 'relu', 1], [3, 64, 24, False, 'relu', 2],
[3, 240, 80, False, 'hard_swish', 2], [3, 72, 24, False, 'relu', 1],
[3, 200, 80, False, 'hard_swish', 1], [5, 72, 40, True, 'relu', 2],
[3, 184, 80, False, 'hard_swish', 1], [5, 120, 40, True, 'relu', 1],
[3, 184, 80, False, 'hard_swish', 1], [5, 120, 40, True, 'relu', 1],
[3, 480, 112, True, 'hard_swish', 1], [3, 240, 80, False, 'hard_swish', 2],
[3, 672, 112, True, 'hard_swish', 1], [3, 200, 80, False, 'hard_swish', 1],
[5, 672, 160, True, 'hard_swish', 2], [3, 184, 80, False, 'hard_swish', 1],
[5, 960, 160, True, 'hard_swish', 1], [3, 184, 80, False, 'hard_swish', 1],
[5, 960, 160, True, 'hard_swish', 1], [3, 480, 112, True, 'hard_swish', 1],
] [3, 672, 112, True, 'hard_swish', 1],
self.cls_ch_squeeze = 960 # The number of channels in the last 4 stages is reduced by a
self.cls_ch_expand = 1280 # factor of 2 compared to the standard implementation.
self.lr_interval = 3 [5, 336, 80, True, 'hard_swish', 2],
elif model_name == "small": [5, 480, 80, True, 'hard_swish', 1],
self.cfg = [ [5, 480, 80, True, 'hard_swish', 1],
# kernel_size, expand, channel, se_block, act_mode, stride ]
[3, 16, 16, True, 'relu', 2], self.cls_ch_squeeze = 480
[3, 72, 24, False, 'relu', 2], self.cls_ch_expand = 1280
[3, 88, 24, False, 'relu', 1], self.lr_interval = 3
[5, 96, 40, True, 'hard_swish', 2], elif model_name == "small":
[5, 240, 40, True, 'hard_swish', 1], self.cfg = [
[5, 240, 40, True, 'hard_swish', 1], # k, exp, c, se, nl, s,
[5, 120, 48, True, 'hard_swish', 1], [3, 16, 16, True, 'relu', 2],
[5, 144, 48, True, 'hard_swish', 1], [3, 72, 24, False, 'relu', 2],
[5, 288, 96, True, 'hard_swish', 2], [3, 88, 24, False, 'relu', 1],
[5, 576, 96, True, 'hard_swish', 1], [5, 96, 40, True, 'hard_swish', 2],
[5, 576, 96, True, 'hard_swish', 1], [5, 240, 40, True, 'hard_swish', 1],
] [5, 240, 40, True, 'hard_swish', 1],
self.cls_ch_squeeze = 576 [5, 120, 48, True, 'hard_swish', 1],
self.cls_ch_expand = 1280 [5, 144, 48, True, 'hard_swish', 1],
self.lr_interval = 2 # The number of channels in the last 4 stages is reduced by a
# factor of 2 compared to the standard implementation.
[5, 144, 48, True, 'hard_swish', 2],
[5, 288, 48, True, 'hard_swish', 1],
[5, 288, 48, True, 'hard_swish', 1],
]
else:
raise NotImplementedError
else: else:
raise NotImplementedError if model_name == "large":
self.cfg = [
# kernel_size, expand, channel, se_block, act_mode, stride
[3, 16, 16, False, 'relu', 1],
[3, 64, 24, False, 'relu', 2],
[3, 72, 24, False, 'relu', 1],
[5, 72, 40, True, 'relu', 2],
[5, 120, 40, True, 'relu', 1],
[5, 120, 40, True, 'relu', 1],
[3, 240, 80, False, 'hard_swish', 2],
[3, 200, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 480, 112, True, 'hard_swish', 1],
[3, 672, 112, True, 'hard_swish', 1],
[5, 672, 160, True, 'hard_swish', 2],
[5, 960, 160, True, 'hard_swish', 1],
[5, 960, 160, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 960
self.cls_ch_expand = 1280
self.lr_interval = 3
elif model_name == "small":
self.cfg = [
# kernel_size, expand, channel, se_block, act_mode, stride
[3, 16, 16, True, 'relu', 2],
[3, 72, 24, False, 'relu', 2],
[3, 88, 24, False, 'relu', 1],
[5, 96, 40, True, 'hard_swish', 2],
[5, 240, 40, True, 'hard_swish', 1],
[5, 240, 40, True, 'hard_swish', 1],
[5, 120, 48, True, 'hard_swish', 1],
[5, 144, 48, True, 'hard_swish', 1],
[5, 288, 96, True, 'hard_swish', 2],
[5, 576, 96, True, 'hard_swish', 1],
[5, 576, 96, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 576
self.cls_ch_expand = 1280
self.lr_interval = 2
else:
raise NotImplementedError
if self.for_seg:
self.modify_bottle_params(output_stride)
def modify_bottle_params(self, output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, _cfg in enumerate(self.cfg):
stride = stride * _cfg[-1]
if stride > output_stride:
s = 1
self.cfg[i][-1] = s
def _conv_bn_layer(self, def _conv_bn_layer(self,
input, input,
...@@ -153,6 +219,14 @@ class MobileNetV3(): ...@@ -153,6 +219,14 @@ class MobileNetV3():
bn = fluid.layers.relu6(bn) bn = fluid.layers.relu6(bn)
return bn return bn
def make_divisible(self, v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
def _hard_swish(self, x): def _hard_swish(self, x):
return x * fluid.layers.relu6(x + 3) / 6. return x * fluid.layers.relu6(x + 3) / 6.
...@@ -220,6 +294,9 @@ class MobileNetV3(): ...@@ -220,6 +294,9 @@ class MobileNetV3():
use_cudnn=False, use_cudnn=False,
name=name + '_depthwise') name=name + '_depthwise')
if self.curr_stage == 5:
self.decode_point = conv1
if use_se: if use_se:
conv1 = self._se_block( conv1 = self._se_block(
input=conv1, num_out_filter=num_mid_filter, name=name + '_se') input=conv1, num_out_filter=num_mid_filter, name=name + '_se')
...@@ -282,7 +359,7 @@ class MobileNetV3(): ...@@ -282,7 +359,7 @@ class MobileNetV3():
conv = self._conv_bn_layer( conv = self._conv_bn_layer(
input, input,
filter_size=3, filter_size=3,
num_filters=inplanes if scale <= 1.0 else int(inplanes * scale), num_filters=self.make_divisible(inplanes * scale),
stride=2, stride=2,
padding=1, padding=1,
num_groups=1, num_groups=1,
...@@ -290,6 +367,7 @@ class MobileNetV3(): ...@@ -290,6 +367,7 @@ class MobileNetV3():
act='hard_swish', act='hard_swish',
name='conv1') name='conv1')
i = 0 i = 0
inplanes = self.make_divisible(inplanes * scale)
for layer_cfg in cfg: for layer_cfg in cfg:
self.block_stride *= layer_cfg[5] self.block_stride *= layer_cfg[5]
if layer_cfg[5] == 2: if layer_cfg[5] == 2:
...@@ -297,19 +375,32 @@ class MobileNetV3(): ...@@ -297,19 +375,32 @@ class MobileNetV3():
conv = self._residual_unit( conv = self._residual_unit(
input=conv, input=conv,
num_in_filter=inplanes, num_in_filter=inplanes,
num_mid_filter=int(scale * layer_cfg[1]), num_mid_filter=self.make_divisible(scale * layer_cfg[1]),
num_out_filter=int(scale * layer_cfg[2]), num_out_filter=self.make_divisible(scale * layer_cfg[2]),
act=layer_cfg[4], act=layer_cfg[4],
stride=layer_cfg[5], stride=layer_cfg[5],
filter_size=layer_cfg[0], filter_size=layer_cfg[0],
use_se=layer_cfg[3], use_se=layer_cfg[3],
name='conv' + str(i + 2)) name='conv' + str(i + 2))
inplanes = self.make_divisible(scale * layer_cfg[2])
inplanes = int(scale * layer_cfg[2])
i += 1 i += 1
self.curr_stage = i self.curr_stage = i
blocks.append(conv) blocks.append(conv)
if self.for_seg:
conv = self._conv_bn_layer(
input=conv,
filter_size=1,
num_filters=self.make_divisible(scale * self.cls_ch_squeeze),
stride=1,
padding=0,
num_groups=1,
if_act=True,
act='hard_swish',
name='conv_last')
return conv, self.decode_point
if self.num_classes: if self.num_classes:
conv = self._conv_bn_layer( conv = self._conv_bn_layer(
input=conv, input=conv,
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -21,7 +21,7 @@ from collections import OrderedDict ...@@ -21,7 +21,7 @@ from collections import OrderedDict
import paddle.fluid as fluid import paddle.fluid as fluid
from .model_utils.libs import scope, name_scope from .model_utils.libs import scope, name_scope
from .model_utils.libs import bn, bn_relu, relu from .model_utils.libs import bn, bn_relu, relu, qsigmoid
from .model_utils.libs import conv, max_pool, deconv from .model_utils.libs import conv, max_pool, deconv
from .model_utils.libs import separate_conv from .model_utils.libs import separate_conv
from .model_utils.libs import sigmoid_to_softmax from .model_utils.libs import sigmoid_to_softmax
...@@ -82,7 +82,17 @@ class DeepLabv3p(object): ...@@ -82,7 +82,17 @@ class DeepLabv3p(object):
use_dice_loss=False, use_dice_loss=False,
class_weight=None, class_weight=None,
ignore_index=255, ignore_index=255,
fixed_input_shape=None): fixed_input_shape=None,
pooling_stride=[1, 1],
pooling_crop_size=None,
aspp_with_se=False,
se_use_qsigmoid=False,
aspp_convs_filters=256,
aspp_with_concat_projection=True,
add_image_level_feature=True,
use_sum_merge=False,
conv_filters=256,
output_is_logits=False):
# dice_loss或bce_loss只适用两类分割中 # dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss): if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError( raise ValueError(
...@@ -117,6 +127,17 @@ class DeepLabv3p(object): ...@@ -117,6 +127,17 @@ class DeepLabv3p(object):
self.encoder_with_aspp = encoder_with_aspp self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder self.enable_decoder = enable_decoder
self.fixed_input_shape = fixed_input_shape self.fixed_input_shape = fixed_input_shape
self.output_is_logits = output_is_logits
self.aspp_convs_filters = aspp_convs_filters
self.output_stride = output_stride
self.pooling_crop_size = pooling_crop_size
self.pooling_stride = pooling_stride
self.se_use_qsigmoid = se_use_qsigmoid
self.aspp_with_concat_projection = aspp_with_concat_projection
self.add_image_level_feature = add_image_level_feature
self.aspp_with_se = aspp_with_se
self.use_sum_merge = use_sum_merge
self.conv_filters = conv_filters
def _encoder(self, input): def _encoder(self, input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv # 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
...@@ -129,19 +150,36 @@ class DeepLabv3p(object): ...@@ -129,19 +150,36 @@ class DeepLabv3p(object):
elif self.output_stride == 8: elif self.output_stride == 8:
aspp_ratios = [12, 24, 36] aspp_ratios = [12, 24, 36]
else: else:
raise Exception("DeepLabv3p only support stride 8 or 16") aspp_ratios = []
param_attr = fluid.ParamAttr( param_attr = fluid.ParamAttr(
name=name_scope + 'weights', name=name_scope + 'weights',
regularizer=None, regularizer=None,
initializer=fluid.initializer.TruncatedNormal( initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.06)) loc=0.0, scale=0.06))
concat_logits = []
with scope('encoder'): with scope('encoder'):
channel = 256 channel = self.aspp_convs_filters
with scope("image_pool"): with scope("image_pool"):
image_avg = fluid.layers.reduce_mean( if self.pooling_crop_size is None:
input, [2, 3], keep_dim=True) image_avg = fluid.layers.reduce_mean(
image_avg = bn_relu( input, [2, 3], keep_dim=True)
else:
pool_w = int((self.pooling_crop_size[0] - 1.0) /
self.output_stride + 1.0)
pool_h = int((self.pooling_crop_size[1] - 1.0) /
self.output_stride + 1.0)
image_avg = fluid.layers.pool2d(
input,
pool_size=(pool_h, pool_w),
pool_stride=self.pooling_stride,
pool_type='avg',
pool_padding='VALID')
act = qsigmoid if self.se_use_qsigmoid else bn_relu
image_avg = act(
conv( conv(
image_avg, image_avg,
channel, channel,
...@@ -151,8 +189,10 @@ class DeepLabv3p(object): ...@@ -151,8 +189,10 @@ class DeepLabv3p(object):
padding=0, padding=0,
param_attr=param_attr)) param_attr=param_attr))
input_shape = fluid.layers.shape(input) input_shape = fluid.layers.shape(input)
image_avg = fluid.layers.resize_bilinear( image_avg = fluid.layers.resize_bilinear(image_avg,
image_avg, input_shape[2:], align_corners=False) input_shape[2:])
if self.add_image_level_feature:
concat_logits.append(image_avg)
with scope("aspp0"): with scope("aspp0"):
aspp0 = bn_relu( aspp0 = bn_relu(
...@@ -164,77 +204,160 @@ class DeepLabv3p(object): ...@@ -164,77 +204,160 @@ class DeepLabv3p(object):
groups=1, groups=1,
padding=0, padding=0,
param_attr=param_attr)) param_attr=param_attr))
with scope("aspp1"): concat_logits.append(aspp0)
if self.aspp_with_sep_conv:
aspp1 = separate_conv( if aspp_ratios:
input, with scope("aspp1"):
channel, if self.aspp_with_sep_conv:
1, aspp1 = separate_conv(
3,
dilation=aspp_ratios[0],
act=relu)
else:
aspp1 = bn_relu(
conv(
input, input,
channel, channel,
stride=1, 1,
filter_size=3, 3,
dilation=aspp_ratios[0], dilation=aspp_ratios[0],
padding=aspp_ratios[0], act=relu)
param_attr=param_attr)) else:
with scope("aspp2"): aspp1 = bn_relu(
if self.aspp_with_sep_conv: conv(
aspp2 = separate_conv( input,
input, channel,
channel, stride=1,
1, filter_size=3,
3, dilation=aspp_ratios[0],
dilation=aspp_ratios[1], padding=aspp_ratios[0],
act=relu) param_attr=param_attr))
else: concat_logits.append(aspp1)
aspp2 = bn_relu( with scope("aspp2"):
conv( if self.aspp_with_sep_conv:
aspp2 = separate_conv(
input, input,
channel, channel,
stride=1, 1,
filter_size=3, 3,
dilation=aspp_ratios[1], dilation=aspp_ratios[1],
padding=aspp_ratios[1], act=relu)
param_attr=param_attr)) else:
with scope("aspp3"): aspp2 = bn_relu(
if self.aspp_with_sep_conv: conv(
aspp3 = separate_conv( input,
input, channel,
channel, stride=1,
1, filter_size=3,
3, dilation=aspp_ratios[1],
dilation=aspp_ratios[2], padding=aspp_ratios[1],
act=relu) param_attr=param_attr))
else: concat_logits.append(aspp2)
aspp3 = bn_relu( with scope("aspp3"):
conv( if self.aspp_with_sep_conv:
aspp3 = separate_conv(
input, input,
channel, channel,
stride=1, 1,
filter_size=3, 3,
dilation=aspp_ratios[2], dilation=aspp_ratios[2],
padding=aspp_ratios[2], act=relu)
param_attr=param_attr)) else:
aspp3 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[2],
padding=aspp_ratios[2],
param_attr=param_attr))
concat_logits.append(aspp3)
with scope("concat"): with scope("concat"):
data = fluid.layers.concat( data = fluid.layers.concat(concat_logits, axis=1)
[image_avg, aspp0, aspp1, aspp2, aspp3], axis=1) if self.aspp_with_concat_projection:
data = bn_relu( data = bn_relu(
conv(
data,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9)
if self.aspp_with_se:
data = data * image_avg
return data
def _decoder_with_sum_merge(self, encode_data, decode_shortcut,
param_attr):
decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
encode_data = fluid.layers.resize_bilinear(encode_data,
decode_shortcut_shape[2:])
encode_data = conv(
encode_data,
self.conv_filters,
1,
1,
groups=1,
padding=0,
param_attr=param_attr)
with scope('merge'):
decode_shortcut = conv(
decode_shortcut,
self.conv_filters,
1,
1,
groups=1,
padding=0,
param_attr=param_attr)
return encode_data + decode_shortcut
def _decoder_with_concat(self, encode_data, decode_shortcut, param_attr):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut_shape[2:])
encode_data = fluid.layers.concat(
[encode_data, decode_shortcut], axis=1)
if self.decoder_use_sep_conv:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, self.conv_filters, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, self.conv_filters, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv( conv(
data, encode_data,
channel, self.conv_filters,
1, stride=1,
1, filter_size=3,
groups=1, dilation=1,
padding=0, padding=1,
param_attr=param_attr)) param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9) with scope("decoder_conv2"):
return data encode_data = bn_relu(
conv(
encode_data,
self.conv_filters,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def _decoder(self, encode_data, decode_shortcut): def _decoder(self, encode_data, decode_shortcut):
# 解码器配置 # 解码器配置
...@@ -246,54 +369,14 @@ class DeepLabv3p(object): ...@@ -246,54 +369,14 @@ class DeepLabv3p(object):
regularizer=None, regularizer=None,
initializer=fluid.initializer.TruncatedNormal( initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.06)) loc=0.0, scale=0.06))
with scope('decoder'): with scope('decoder'):
with scope('concat'): if self.use_sum_merge:
decode_shortcut = bn_relu( return self._decoder_with_sum_merge(
conv( encode_data, decode_shortcut, param_attr)
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
decode_shortcut_shape = fluid.layers.shape(decode_shortcut) return self._decoder_with_concat(encode_data, decode_shortcut,
encode_data = fluid.layers.resize_bilinear( param_attr)
encode_data,
decode_shortcut_shape[2:],
align_corners=False)
encode_data = fluid.layers.concat(
[encode_data, decode_shortcut], axis=1)
if self.decoder_use_sep_conv:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
with scope("decoder_conv2"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def _get_loss(self, logit, label, mask): def _get_loss(self, logit, label, mask):
avg_loss = 0 avg_loss = 0
...@@ -337,8 +420,11 @@ class DeepLabv3p(object): ...@@ -337,8 +420,11 @@ class DeepLabv3p(object):
self.num_classes = 1 self.num_classes = 1
image = inputs['image'] image = inputs['image']
data, decode_shortcuts = self.backbone(image) if 'MobileNetV3' in self.backbone.__class__.__name__:
decode_shortcut = decode_shortcuts[self.backbone.decode_points] data, decode_shortcut = self.backbone(image)
else:
data, decode_shortcuts = self.backbone(image)
decode_shortcut = decode_shortcuts[self.backbone.decode_points]
# 编码器解码器设置 # 编码器解码器设置
if self.encoder_with_aspp: if self.encoder_with_aspp:
...@@ -353,19 +439,22 @@ class DeepLabv3p(object): ...@@ -353,19 +439,22 @@ class DeepLabv3p(object):
regularization_coeff=0.0), regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal( initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.01)) loc=0.0, scale=0.01))
with scope('logit'): if not self.output_is_logits:
with fluid.name_scope('last_conv'): with scope('logit'):
logit = conv( with fluid.name_scope('last_conv'):
data, logit = conv(
self.num_classes, data,
1, self.num_classes,
stride=1, 1,
padding=0, stride=1,
bias_attr=True, padding=0,
param_attr=param_attr) bias_attr=True,
image_shape = fluid.layers.shape(image) param_attr=param_attr)
logit = fluid.layers.resize_bilinear( else:
logit, image_shape[2:], align_corners=False) logit = data
image_shape = fluid.layers.shape(image)
logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
if self.num_classes == 1: if self.num_classes == 1:
out = sigmoid_to_softmax(logit) out = sigmoid_to_softmax(logit)
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -77,12 +77,9 @@ class HRNet(object): ...@@ -77,12 +77,9 @@ class HRNet(object):
st4 = self.backbone(image) st4 = self.backbone(image)
# upsample # upsample
shape = fluid.layers.shape(st4[0])[-2:] shape = fluid.layers.shape(st4[0])[-2:]
st4[1] = fluid.layers.resize_bilinear( st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=shape)
st4[1], out_shape=shape, align_corners=False, align_mode=1) st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=shape)
st4[2] = fluid.layers.resize_bilinear( st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
st4[2], out_shape=shape, align_corners=False, align_mode=1)
st4[3] = fluid.layers.resize_bilinear(
st4[3], out_shape=shape, align_corners=False, align_mode=1)
out = fluid.layers.concat(st4, axis=1) out = fluid.layers.concat(st4, axis=1)
last_channels = sum(self.backbone.channels[str(self.backbone.width)][ last_channels = sum(self.backbone.channels[str(self.backbone.width)][
...@@ -107,8 +104,7 @@ class HRNet(object): ...@@ -107,8 +104,7 @@ class HRNet(object):
bias_attr=False) bias_attr=False)
input_shape = fluid.layers.shape(image)[-2:] input_shape = fluid.layers.shape(image)[-2:]
logit = fluid.layers.resize_bilinear( logit = fluid.layers.resize_bilinear(out, input_shape)
out, input_shape, align_corners=False, align_mode=1)
if self.num_classes == 1: if self.num_classes == 1:
out = sigmoid_to_softmax(logit) out = sigmoid_to_softmax(logit)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -112,6 +112,10 @@ def bn_relu(data, norm_type='bn', eps=1e-5): ...@@ -112,6 +112,10 @@ def bn_relu(data, norm_type='bn', eps=1e-5):
return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps)) return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
def qsigmoid(data):
return fluid.layers.relu6(data + 3) * 0.16667
def relu(data): def relu(data):
return fluid.layers.relu(data) return fluid.layers.relu(data)
...@@ -148,7 +152,8 @@ def separate_conv(input, ...@@ -148,7 +152,8 @@ def separate_conv(input,
name=name_scope + 'weights', name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer( regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0), regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33)) initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.33))
with scope('depthwise'): with scope('depthwise'):
input = conv( input = conv(
input, input,
...@@ -166,7 +171,8 @@ def separate_conv(input, ...@@ -166,7 +171,8 @@ def separate_conv(input,
param_attr = fluid.ParamAttr( param_attr = fluid.ParamAttr(
name=name_scope + 'weights', name=name_scope + 'weights',
regularizer=None, regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06)) initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.06))
with scope('pointwise'): with scope('pointwise'):
input = conv( input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr) input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -24,8 +24,9 @@ def softmax_with_loss(logit, ...@@ -24,8 +24,9 @@ def softmax_with_loss(logit,
ignore_index=255): ignore_index=255):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32') ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min( label = fluid.layers.elementwise_min(
label, fluid.layers.assign( label,
np.array([num_classes - 1], dtype=np.int32))) fluid.layers.assign(np.array(
[num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes]) logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1]) label = fluid.layers.reshape(label, [-1, 1])
...@@ -60,8 +61,8 @@ def softmax_with_loss(logit, ...@@ -60,8 +61,8 @@ def softmax_with_loss(logit,
'Expect weight is a list, string or Variable, but receive {}'. 'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight))) format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes]) weight = fluid.layers.reshape(weight, [1, num_classes])
weighted_label_one_hot = fluid.layers.elementwise_mul( weighted_label_one_hot = fluid.layers.elementwise_mul(label_one_hot,
label_one_hot, weight) weight)
probs = fluid.layers.softmax(logit) probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy( loss = fluid.layers.cross_entropy(
probs, probs,
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -144,8 +144,7 @@ class UNet(object): ...@@ -144,8 +144,7 @@ class UNet(object):
with scope("up"): with scope("up"):
if self.upsample_mode == 'bilinear': if self.upsample_mode == 'bilinear':
short_cut_shape = fluid.layers.shape(short_cut) short_cut_shape = fluid.layers.shape(short_cut)
data = fluid.layers.resize_bilinear( data = fluid.layers.resize_bilinear(data, short_cut_shape[2:])
data, short_cut_shape[2:], align_corners=False)
else: else:
data = deconv( data = deconv(
data, data,
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -96,11 +96,12 @@ class ShuffleNetV2(): ...@@ -96,11 +96,12 @@ class ShuffleNetV2():
pool_stride=1, pool_stride=1,
pool_padding=0, pool_padding=0,
pool_type='avg') pool_type='avg')
output = fluid.layers.fc( output = fluid.layers.fc(input=output,
input=output, size=self.num_classes,
size=self.num_classes, param_attr=ParamAttr(
param_attr=ParamAttr(initializer=MSRA(), name='fc6_weights'), initializer=MSRA(),
bias_attr=ParamAttr(name='fc6_offset')) name='fc6_weights'),
bias_attr=ParamAttr(name='fc6_offset'))
return output return output
def conv_bn_layer(self, def conv_bn_layer(self,
...@@ -122,7 +123,8 @@ class ShuffleNetV2(): ...@@ -122,7 +123,8 @@ class ShuffleNetV2():
groups=num_groups, groups=num_groups,
act=None, act=None,
use_cudnn=use_cudnn, use_cudnn=use_cudnn,
param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'), param_attr=ParamAttr(
initializer=MSRA(), name=name + '_weights'),
bias_attr=False) bias_attr=False)
out = int((input.shape[2] - 1) / float(stride) + 1) out = int((input.shape[2] - 1) / float(stride) + 1)
bn_name = name + '_bn' bn_name = name + '_bn'
......
# coding: utf8 # coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -69,8 +69,7 @@ class Xception(): ...@@ -69,8 +69,7 @@ class Xception():
def __call__( def __call__(
self, self,
input, input, ):
):
self.stride = 2 self.stride = 2
self.block_point = 0 self.block_point = 0
self.short_cuts = dict() self.short_cuts = dict()
...@@ -140,7 +139,8 @@ class Xception(): ...@@ -140,7 +139,8 @@ class Xception():
param_attr = fluid.ParamAttr( param_attr = fluid.ParamAttr(
name=name_scope + 'weights', name=name_scope + 'weights',
regularizer=None, regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09)) initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope("entry_flow"): with scope("entry_flow"):
with scope("conv1"): with scope("conv1"):
data = bn_relu( data = bn_relu(
...@@ -178,10 +178,10 @@ class Xception(): ...@@ -178,10 +178,10 @@ class Xception():
for i in range(block_num): for i in range(block_num):
block_point = block_point + 1 block_point = block_point + 1
with scope("block" + str(i + 1)): with scope("block" + str(i + 1)):
stride = strides[i] if check_stride( stride = strides[i] if check_stride(s * strides[i],
s * strides[i], output_stride) else 1 output_stride) else 1
data, short_cuts = self.xception_block( data, short_cuts = self.xception_block(data, chns[i],
data, chns[i], [1, 1, stride]) [1, 1, stride])
s = s * stride s = s * stride
if check_points(block_point, self.decode_points): if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1] self.short_cuts[block_point] = short_cuts[1]
...@@ -205,8 +205,8 @@ class Xception(): ...@@ -205,8 +205,8 @@ class Xception():
for i in range(block_num): for i in range(block_num):
block_point = block_point + 1 block_point = block_point + 1
with scope("block" + str(i + 1)): with scope("block" + str(i + 1)):
stride = strides[i] if check_stride( stride = strides[i] if check_stride(s * strides[i],
s * strides[i], output_stride) else 1 output_stride) else 1
data, short_cuts = self.xception_block( data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False) data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride s = s * stride
...@@ -302,16 +302,15 @@ class Xception(): ...@@ -302,16 +302,15 @@ class Xception():
initializer=fluid.initializer.TruncatedNormal( initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09)) loc=0.0, scale=0.09))
with scope('shortcut'): with scope('shortcut'):
skip = bn( skip = bn(conv(
conv( input,
input, channels[-1],
channels[-1], 1,
1, strides[-1],
strides[-1], groups=1,
groups=1, padding=0,
padding=0, param_attr=param_attr),
param_attr=param_attr), eps=1e-3)
eps=1e-3)
else: else:
skip = input skip = input
return data + skip, results return data + skip, results
...@@ -329,4 +328,4 @@ def xception_41(num_classes=None): ...@@ -329,4 +328,4 @@ def xception_41(num_classes=None):
def xception_71(num_classes=None): def xception_71(num_classes=None):
model = Xception(num_classes, 71) model = Xception(num_classes, 71)
return model return model
\ No newline at end of file
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -91,7 +91,10 @@ def arrange_transforms(model_type, class_name, transforms, mode='train'): ...@@ -91,7 +91,10 @@ def arrange_transforms(model_type, class_name, transforms, mode='train'):
elif model_type == 'segmenter': elif model_type == 'segmenter':
arrange_transform = seg_transforms.ArrangeSegmenter arrange_transform = seg_transforms.ArrangeSegmenter
elif model_type == 'detector': elif model_type == 'detector':
arrange_name = 'Arrange{}'.format(class_name) if class_name == "PPYOLO":
arrange_name = 'ArrangeYOLOv3'
else:
arrange_name = 'Arrange{}'.format(class_name)
arrange_transform = getattr(det_transforms, arrange_name) arrange_transform = getattr(det_transforms, arrange_name)
else: else:
raise Exception("Unrecognized model type: {}".format(self.model_type)) raise Exception("Unrecognized model type: {}".format(self.model_type))
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -170,7 +170,8 @@ def expand_segms(segms, x, y, height, width, ratio): ...@@ -170,7 +170,8 @@ def expand_segms(segms, x, y, height, width, ratio):
0).astype(mask.dtype) 0).astype(mask.dtype)
expanded_mask[y:y + height, x:x + width] = mask expanded_mask[y:y + height, x:x + width] = mask
rle = mask_util.encode( rle = mask_util.encode(
np.array(expanded_mask, order='F', dtype=np.uint8)) np.array(
expanded_mask, order='F', dtype=np.uint8))
return rle return rle
expanded_segms = [] expanded_segms = []
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -46,7 +46,7 @@ class Compose(ClsTransform): ...@@ -46,7 +46,7 @@ class Compose(ClsTransform):
raise ValueError('The length of transforms ' + \ raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!') 'must be equal or larger than 1!')
self.transforms = transforms self.transforms = transforms
self.batch_transforms = None
# 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作 # 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作
for op in self.transforms: for op in self.transforms:
if not isinstance(op, ClsTransform): if not isinstance(op, ClsTransform):
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -55,6 +55,7 @@ class Compose(DetTransform): ...@@ -55,6 +55,7 @@ class Compose(DetTransform):
raise ValueError('The length of transforms ' + \ raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!') 'must be equal or larger than 1!')
self.transforms = transforms self.transforms = transforms
self.batch_transforms = None
self.use_mixup = False self.use_mixup = False
for t in self.transforms: for t in self.transforms:
if type(t).__name__ == 'MixupImage': if type(t).__name__ == 'MixupImage':
...@@ -1385,3 +1386,187 @@ class ComposedYOLOv3Transforms(Compose): ...@@ -1385,3 +1386,187 @@ class ComposedYOLOv3Transforms(Compose):
mean=mean, std=std) mean=mean, std=std)
] ]
super(ComposedYOLOv3Transforms, self).__init__(transforms) super(ComposedYOLOv3Transforms, self).__init__(transforms)
class BatchRandomShape(DetTransform):
"""调整图像大小(resize)。
对batch数据中的每张图像全部resize到random_shapes中任意一个大小。
注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。
Args:
random_shapes (list): resize大小选择列表。
默认为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为
['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"RANDOM"。
Raises:
ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC',
'AREA', 'LANCZOS4', 'RANDOM']中。
"""
# The interpolation mode
interp_dict = {
'NEAREST': cv2.INTER_NEAREST,
'LINEAR': cv2.INTER_LINEAR,
'CUBIC': cv2.INTER_CUBIC,
'AREA': cv2.INTER_AREA,
'LANCZOS4': cv2.INTER_LANCZOS4
}
def __init__(
self,
random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608],
interp='RANDOM'):
if not (interp == "RANDOM" or interp in self.interp_dict):
raise ValueError("interp should be one of {}".format(
self.interp_dict.keys()))
self.random_shapes = random_shapes
self.interp = interp
def __call__(self, batch_data):
"""
Args:
batch_data (list): 由与图像相关的各种信息组成的batch数据。
Returns:
list: 由与图像相关的各种信息组成的batch数据。
"""
shape = np.random.choice(self.random_shapes)
if self.interp == "RANDOM":
interp = random.choice(list(self.interp_dict.keys()))
else:
interp = self.interp
for data_id, data in enumerate(batch_data):
data_list = list(data)
im = data_list[0]
im = np.swapaxes(im, 1, 0)
im = np.swapaxes(im, 1, 2)
im = resize(im, shape, self.interp_dict[interp])
im = np.swapaxes(im, 1, 2)
im = np.swapaxes(im, 1, 0)
data_list[0] = im
batch_data[data_id] = tuple(data_list)
return batch_data
class GenerateYoloTarget(object):
"""生成YOLOv3的ground truth(真实标注框)在不同特征层的位置转换信息。
该transform只在YOLOv3计算细粒度loss时使用。
Args:
anchors (list|tuple): anchor框的宽度和高度。
anchor_masks (list|tuple): 在计算损失时,使用anchor的mask索引。
num_classes (int): 类别数。默认为80。
iou_thresh (float): iou阈值,当anchor和真实标注框的iou大于该阈值时,计入target。默认为1.0。
"""
def __init__(self,
anchors,
anchor_masks,
downsample_ratios,
num_classes=80,
iou_thresh=1.):
super(GenerateYoloTarget, self).__init__()
self.anchors = anchors
self.anchor_masks = anchor_masks
self.downsample_ratios = downsample_ratios
self.num_classes = num_classes
self.iou_thresh = iou_thresh
def __call__(self, batch_data):
"""
Args:
batch_data (list): 由与图像相关的各种信息组成的batch数据。
Returns:
list: 由与图像相关的各种信息组成的batch数据。
其中,每个数据新添加的字段为:
- target0 (np.ndarray): YOLOv3的ground truth在特征层0的位置转换信息,
形状为(特征层0的anchor数量, 6+类别数, 特征层0的h, 特征层0的w)。
- target1 (np.ndarray): YOLOv3的ground truth在特征层1的位置转换信息,
形状为(特征层1的anchor数量, 6+类别数, 特征层1的h, 特征层1的w)。
- ...
-targetn (np.ndarray): YOLOv3的ground truth在特征层n的位置转换信息,
形状为(特征层n的anchor数量, 6+类别数, 特征层n的h, 特征层n的w)。
n的是大小由anchor_masks的长度决定。
"""
im = batch_data[0][0]
h = im.shape[1]
w = im.shape[2]
an_hw = np.array(self.anchors) / np.array([[w, h]])
for data_id, data in enumerate(batch_data):
gt_bbox = data[1]
gt_class = data[2]
gt_score = data[3]
im_shape = data[4]
origin_h = float(im_shape[0])
origin_w = float(im_shape[1])
data_list = list(data)
for i, (
mask, downsample_ratio
) in enumerate(zip(self.anchor_masks, self.downsample_ratios)):
grid_h = int(h / downsample_ratio)
grid_w = int(w / downsample_ratio)
target = np.zeros(
(len(mask), 6 + self.num_classes, grid_h, grid_w),
dtype=np.float32)
for b in range(gt_bbox.shape[0]):
gx = gt_bbox[b, 0] / float(origin_w)
gy = gt_bbox[b, 1] / float(origin_h)
gw = gt_bbox[b, 2] / float(origin_w)
gh = gt_bbox[b, 3] / float(origin_h)
cls = gt_class[b]
score = gt_score[b]
if gw <= 0. or gh <= 0. or score <= 0.:
continue
# find best match anchor index
best_iou = 0.
best_idx = -1
for an_idx in range(an_hw.shape[0]):
iou = jaccard_overlap(
[0., 0., gw, gh],
[0., 0., an_hw[an_idx, 0], an_hw[an_idx, 1]])
if iou > best_iou:
best_iou = iou
best_idx = an_idx
gi = int(gx * grid_w)
gj = int(gy * grid_h)
# gtbox should be regresed in this layes if best match
# anchor index in anchor mask of this layer
if best_idx in mask:
best_n = mask.index(best_idx)
# x, y, w, h, scale
target[best_n, 0, gj, gi] = gx * grid_w - gi
target[best_n, 1, gj, gi] = gy * grid_h - gj
target[best_n, 2, gj, gi] = np.log(
gw * w / self.anchors[best_idx][0])
target[best_n, 3, gj, gi] = np.log(
gh * h / self.anchors[best_idx][1])
target[best_n, 4, gj, gi] = 2.0 - gw * gh
# objectness record gt_score
target[best_n, 5, gj, gi] = score
# classification
target[best_n, 6 + cls, gj, gi] = 1.
# For non-matched anchors, calculate the target if the iou
# between anchor and gt is larger than iou_thresh
if self.iou_thresh < 1:
for idx, mask_i in enumerate(mask):
if mask_i == best_idx: continue
iou = jaccard_overlap(
[0., 0., gw, gh],
[0., 0., an_hw[mask_i, 0], an_hw[mask_i, 1]])
if iou > self.iou_thresh:
# x, y, w, h, scale
target[idx, 0, gj, gi] = gx * grid_w - gi
target[idx, 1, gj, gi] = gy * grid_h - gj
target[idx, 2, gj, gi] = np.log(
gw * w / self.anchors[mask_i][0])
target[idx, 3, gj, gi] = np.log(
gh * h / self.anchors[mask_i][1])
target[idx, 4, gj, gi] = 2.0 - gw * gh
# objectness record gt_score
target[idx, 5, gj, gi] = score
# classification
target[idx, 6 + cls, gj, gi] = 1.
data_list.append(target)
batch_data[data_id] = tuple(data_list)
return batch_data
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -69,8 +69,8 @@ def random_crop(im, ...@@ -69,8 +69,8 @@ def random_crop(im,
(float(im.shape[1]) / im.shape[0]) / (w**2)) (float(im.shape[1]) / im.shape[0]) / (w**2))
scale_max = min(scale[1], bound) scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound) scale_min = min(scale[0], bound)
target_area = im.shape[0] * im.shape[1] * np.random.uniform( target_area = im.shape[0] * im.shape[1] * np.random.uniform(scale_min,
scale_min, scale_max) scale_max)
target_size = math.sqrt(target_area) target_size = math.sqrt(target_area)
w = int(target_size * w) w = int(target_size * w)
h = int(target_size * h) h = int(target_size * h)
...@@ -146,6 +146,7 @@ def brightness(im, brightness_lower, brightness_upper): ...@@ -146,6 +146,7 @@ def brightness(im, brightness_lower, brightness_upper):
im += delta im += delta
return im return im
def rotate(im, rotate_lower, rotate_upper): def rotate(im, rotate_lower, rotate_upper):
rotate_delta = np.random.uniform(rotate_lower, rotate_upper) rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
im = im.rotate(int(rotate_delta)) im = im.rotate(int(rotate_delta))
......
# coding: utf8 # coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -49,6 +49,7 @@ class Compose(SegTransform): ...@@ -49,6 +49,7 @@ class Compose(SegTransform):
raise ValueError('The length of transforms ' + \ raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!') 'must be equal or larger than 1!')
self.transforms = transforms self.transforms = transforms
self.batch_transforms = None
self.to_rgb = False self.to_rgb = False
# 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作 # 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作
for op in self.transforms: for op in self.transforms:
...@@ -72,8 +73,6 @@ class Compose(SegTransform): ...@@ -72,8 +73,6 @@ class Compose(SegTransform):
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。 tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
""" """
if im_info is None:
im_info = list()
if isinstance(im, np.ndarray): if isinstance(im, np.ndarray):
if len(im.shape) != 3: if len(im.shape) != 3:
raise Exception( raise Exception(
...@@ -85,6 +84,8 @@ class Compose(SegTransform): ...@@ -85,6 +84,8 @@ class Compose(SegTransform):
except: except:
raise ValueError('Can\'t read The image file {}!'.format(im)) raise ValueError('Can\'t read The image file {}!'.format(im))
im = im.astype('float32') im = im.astype('float32')
if im_info is None:
im_info = [('origin_shape', im.shape[0:2])]
if self.to_rgb: if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if label is not None: if label is not None:
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
...@@ -48,181 +48,192 @@ def _draw_rectangle_and_cname(img, xmin, ymin, xmax, ymax, cname, color): ...@@ -48,181 +48,192 @@ def _draw_rectangle_and_cname(img, xmin, ymin, xmax, ymax, cname, color):
thickness=line_width) thickness=line_width)
return img return img
def cls_compose(im, label=None, transforms=None, vdl_writer=None, step=0): def cls_compose(im, label=None, transforms=None, vdl_writer=None, step=0):
""" """
Args: Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。 im (str/np.ndarray): 图像路径/图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。 label (int): 每张图像所对应的类别序号。
vdl_writer (visualdl.LogWriter): VisualDL存储器,日志信息将保存在其中。 vdl_writer (visualdl.LogWriter): VisualDL存储器,日志信息将保存在其中。
当为None时,不对日志进行保存。默认为None。 当为None时,不对日志进行保存。默认为None。
step (int): 数据预处理的轮数,当vdl_writer不为None时有效。默认为0。 step (int): 数据预处理的轮数,当vdl_writer不为None时有效。默认为0。
Returns: Returns:
tuple: 根据网络所需字段所组成的tuple; tuple: 根据网络所需字段所组成的tuple;
字段由transforms中的最后一个数据预处理操作决定。 字段由transforms中的最后一个数据预处理操作决定。
""" """
if isinstance(im, np.ndarray): if isinstance(im, np.ndarray):
if len(im.shape) != 3: if len(im.shape) != 3:
raise Exception(
"im should be 3-dimension, but now is {}-dimensions".format(
len(im.shape)))
else:
try:
im = cv2.imread(im).astype('float32')
except:
raise TypeError('Can\'t read The image file {}!'.format(im))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if vdl_writer is not None:
vdl_writer.add_image(
tag='0. OriginalImage/' + str(step), img=im, step=0)
op_id = 1
for op in transforms:
if isinstance(op, ClsTransform):
if vdl_writer is not None and hasattr(op, 'prob'):
op.prob = 1.0
outputs = op(im, label)
im = outputs[0]
if len(outputs) == 2:
label = outputs[1]
if isinstance(op, pdx.cv.transforms.cls_transforms.Normalize):
continue
else:
import imgaug.augmenters as iaa
if isinstance(op, iaa.Augmenter):
im = execute_imgaug(op, im)
outputs = (im, )
if label is not None:
outputs = (im, label)
if vdl_writer is not None:
tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
vdl_writer.add_image(tag=tag, img=im, step=0)
op_id += 1
def det_compose(im,
im_info=None,
label_info=None,
transforms=None,
vdl_writer=None,
step=0,
labels=[],
catid2color=None):
def decode_image(im_file, im_info, label_info):
if im_info is None:
im_info = dict()
if isinstance(im_file, np.ndarray):
if len(im_file.shape) != 3:
raise Exception( raise Exception(
"im should be 3-dimension, but now is {}-dimensions". "im should be 3-dimensions, but now is {}-dimensions".
format(len(im.shape))) format(len(im_file.shape)))
im = im_file
else: else:
try: try:
im = cv2.imread(im).astype('float32') im = cv2.imread(im_file).astype('float32')
except: except:
raise TypeError('Can\'t read The image file {}!'.format(im)) raise TypeError('Can\'t read The image file {}!'.format(
im_file))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if vdl_writer is not None: # make default im_info with [h, w, 1]
vdl_writer.add_image(tag='0. OriginalImage/' + str(step), im_info['im_resize_info'] = np.array(
img=im, [im.shape[0], im.shape[1], 1.], dtype=np.float32)
step=0) im_info['image_shape'] = np.array([im.shape[0],
op_id = 1 im.shape[1]]).astype('int32')
for op in transforms: use_mixup = False
if isinstance(op, ClsTransform): for t in transforms:
if vdl_writer is not None and hasattr(op, 'prob'): if type(t).__name__ == 'MixupImage':
op.prob = 1.0 use_mixup = True
outputs = op(im, label) if not use_mixup:
im = outputs[0] if 'mixup' in im_info:
if len(outputs) == 2: del im_info['mixup']
label = outputs[1] # decode mixup image
if isinstance(op, pdx.cv.transforms.cls_transforms.Normalize): if 'mixup' in im_info:
continue im_info['mixup'] = \
decode_image(im_info['mixup'][0],
im_info['mixup'][1],
im_info['mixup'][2])
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
outputs = decode_image(im, im_info, label_info)
im = outputs[0]
im_info = outputs[1]
if len(outputs) == 3:
label_info = outputs[2]
if vdl_writer is not None:
vdl_writer.add_image(
tag='0. OriginalImage/' + str(step), img=im, step=0)
op_id = 1
bboxes = label_info['gt_bbox']
transforms = [None] + transforms
for op in transforms:
if im is None:
return None
if isinstance(op, DetTransform) or op is None:
if vdl_writer is not None and hasattr(op, 'prob'):
op.prob = 1.0
if op is not None:
outputs = op(im, im_info, label_info)
else: else:
import imgaug.augmenters as iaa outputs = (im, im_info, label_info)
if isinstance(op, iaa.Augmenter): im = outputs[0]
im = execute_imgaug(op, im) vdl_im = im
outputs = (im, )
if label is not None:
outputs = (im, label)
if vdl_writer is not None: if vdl_writer is not None:
tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step) if isinstance(op,
vdl_writer.add_image(tag=tag, pdx.cv.transforms.det_transforms.ResizeByShort):
img=im, scale = outputs[1]['im_resize_info'][2]
step=0) bboxes = bboxes * scale
op_id += 1 elif isinstance(op, pdx.cv.transforms.det_transforms.Resize):
h = outputs[1]['image_shape'][0]
def det_compose(im, im_info=None, label_info=None, transforms=None, vdl_writer=None, step=0, w = outputs[1]['image_shape'][1]
labels=[], catid2color=None): target_size = op.target_size
def decode_image(im_file, im_info, label_info): if isinstance(target_size, int):
if im_info is None: h_scale = float(target_size) / h
im_info = dict() w_scale = float(target_size) / w
if isinstance(im_file, np.ndarray):
if len(im_file.shape) != 3:
raise Exception(
"im should be 3-dimensions, but now is {}-dimensions".
format(len(im_file.shape)))
im = im_file
else:
try:
im = cv2.imread(im_file).astype('float32')
except:
raise TypeError('Can\'t read The image file {}!'.format(
im_file))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
# make default im_info with [h, w, 1]
im_info['im_resize_info'] = np.array(
[im.shape[0], im.shape[1], 1.], dtype=np.float32)
im_info['image_shape'] = np.array([im.shape[0],
im.shape[1]]).astype('int32')
use_mixup = False
for t in transforms:
if type(t).__name__ == 'MixupImage':
use_mixup = True
if not use_mixup:
if 'mixup' in im_info:
del im_info['mixup']
# decode mixup image
if 'mixup' in im_info:
im_info['mixup'] = \
decode_image(im_info['mixup'][0],
im_info['mixup'][1],
im_info['mixup'][2])
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
outputs = decode_image(im, im_info, label_info)
im = outputs[0]
im_info = outputs[1]
if len(outputs) == 3:
label_info = outputs[2]
if vdl_writer is not None:
vdl_writer.add_image(tag='0. OriginalImage/' + str(step),
img=im,
step=0)
op_id = 1
bboxes = label_info['gt_bbox']
transforms = [None] + transforms
for op in transforms:
if im is None:
return None
if isinstance(op, DetTransform) or op is None:
if vdl_writer is not None and hasattr(op, 'prob'):
op.prob = 1.0
if op is not None:
outputs = op(im, im_info, label_info)
else:
outputs = (im, im_info, label_info)
im = outputs[0]
vdl_im = im
if vdl_writer is not None:
if isinstance(op, pdx.cv.transforms.det_transforms.ResizeByShort):
scale = outputs[1]['im_resize_info'][2]
bboxes = bboxes * scale
elif isinstance(op, pdx.cv.transforms.det_transforms.Resize):
h = outputs[1]['image_shape'][0]
w = outputs[1]['image_shape'][1]
target_size = op.target_size
if isinstance(target_size, int):
h_scale = float(target_size) / h
w_scale = float(target_size) / w
else:
h_scale = float(target_size[0]) / h
w_scale = float(target_size[1]) / w
bboxes[:,0] = bboxes[:,0] * w_scale
bboxes[:,1] = bboxes[:,1] * h_scale
bboxes[:,2] = bboxes[:,2] * w_scale
bboxes[:,3] = bboxes[:,3] * h_scale
else: else:
bboxes = outputs[2]['gt_bbox'] h_scale = float(target_size[0]) / h
if not isinstance(op, pdx.cv.transforms.det_transforms.RandomHorizontalFlip): w_scale = float(target_size[1]) / w
for i in range(bboxes.shape[0]): bboxes[:, 0] = bboxes[:, 0] * w_scale
bbox = bboxes[i] bboxes[:, 1] = bboxes[:, 1] * h_scale
cname = labels[outputs[2]['gt_class'][i][0]-1] bboxes[:, 2] = bboxes[:, 2] * w_scale
vdl_im = _draw_rectangle_and_cname(vdl_im, bboxes[:, 3] = bboxes[:, 3] * h_scale
int(bbox[0]),
int(bbox[1]),
int(bbox[2]),
int(bbox[3]),
cname,
catid2color[outputs[2]['gt_class'][i][0]-1])
if isinstance(op, pdx.cv.transforms.det_transforms.Normalize):
continue
else:
im = execute_imgaug(op, im)
if label_info is not None:
outputs = (im, im_info, label_info)
else: else:
outputs = (im, im_info) bboxes = outputs[2]['gt_bbox']
vdl_im = im if not isinstance(op, (
if vdl_writer is not None: pdx.cv.transforms.det_transforms.RandomHorizontalFlip,
tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step) pdx.cv.transforms.det_transforms.Padding)):
if op is None: for i in range(bboxes.shape[0]):
tag = str(op_id) + '. OriginalImageWithGTBox/' + str(step) bbox = bboxes[i]
vdl_writer.add_image(tag=tag, cname = labels[outputs[2]['gt_class'][i][0] - 1]
img=vdl_im, vdl_im = _draw_rectangle_and_cname(
step=0) vdl_im,
op_id += 1 int(bbox[0]),
int(bbox[1]),
def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None, step=0): int(bbox[2]),
int(bbox[3]), cname,
catid2color[outputs[2]['gt_class'][i][0] - 1])
if isinstance(op, pdx.cv.transforms.det_transforms.Normalize):
continue
else:
im = execute_imgaug(op, im)
if label_info is not None:
outputs = (im, im_info, label_info)
else:
outputs = (im, im_info)
vdl_im = im
if vdl_writer is not None:
tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
if op is None:
tag = str(op_id) + '. OriginalImageWithGTBox/' + str(step)
vdl_writer.add_image(tag=tag, img=vdl_im, step=0)
op_id += 1
def seg_compose(im,
im_info=None,
label=None,
transforms=None,
vdl_writer=None,
step=0):
if im_info is None: if im_info is None:
im_info = list() im_info = list()
if isinstance(im, np.ndarray): if isinstance(im, np.ndarray):
if len(im.shape) != 3: if len(im.shape) != 3:
raise Exception( raise Exception(
"im should be 3-dimensions, but now is {}-dimensions". "im should be 3-dimensions, but now is {}-dimensions".format(
format(len(im.shape))) len(im.shape)))
else: else:
try: try:
im = cv2.imread(im).astype('float32') im = cv2.imread(im).astype('float32')
...@@ -233,9 +244,8 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None, ...@@ -233,9 +244,8 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None,
if not isinstance(label, np.ndarray): if not isinstance(label, np.ndarray):
label = np.asarray(Image.open(label)) label = np.asarray(Image.open(label))
if vdl_writer is not None: if vdl_writer is not None:
vdl_writer.add_image(tag='0. OriginalImage' + '/' + str(step), vdl_writer.add_image(
img=im, tag='0. OriginalImage' + '/' + str(step), img=im, step=0)
step=0)
op_id = 1 op_id = 1
for op in transforms: for op in transforms:
if isinstance(op, SegTransform): if isinstance(op, SegTransform):
...@@ -254,19 +264,18 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None, ...@@ -254,19 +264,18 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None,
else: else:
outputs = (im, im_info) outputs = (im, im_info)
if vdl_writer is not None: if vdl_writer is not None:
tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step) tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
vdl_writer.add_image(tag=tag, vdl_writer.add_image(tag=tag, img=im, step=0)
img=im,
step=0)
op_id += 1 op_id += 1
def visualize(dataset, img_count=3, save_dir='vdl_output'): def visualize(dataset, img_count=3, save_dir='vdl_output'):
'''对数据预处理/增强中间结果进行可视化。 '''对数据预处理/增强中间结果进行可视化。
可使用VisualDL查看中间结果: 可使用VisualDL查看中间结果:
1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001 1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001
2. 浏览器打开 https://0.0.0.0:8001即可, 2. 浏览器打开 https://0.0.0.0:8001即可,
其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
Args: Args:
dataset (paddlex.datasets): 数据集读取器。 dataset (paddlex.datasets): 数据集读取器。
img_count (int): 需要进行数据预处理/增强的图像数目。默认为3。 img_count (int): 需要进行数据预处理/增强的图像数目。默认为3。
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -19,7 +19,9 @@ import yaml ...@@ -19,7 +19,9 @@ import yaml
import paddlex import paddlex
import paddle.fluid as fluid import paddle.fluid as fluid
from paddlex.cv.transforms import build_transforms from paddlex.cv.transforms import build_transforms
from paddlex.cv.models import BaseClassifier, YOLOv3, FasterRCNN, MaskRCNN, DeepLabv3p from paddlex.cv.models import BaseClassifier
from paddlex.cv.models import PPYOLO, FasterRCNN, MaskRCNN
from paddlex.cv.models import DeepLabv3p
class Predictor: class Predictor:
...@@ -28,6 +30,7 @@ class Predictor: ...@@ -28,6 +30,7 @@ class Predictor:
use_gpu=True, use_gpu=True,
gpu_id=0, gpu_id=0,
use_mkl=False, use_mkl=False,
mkl_thread_num=4,
use_trt=False, use_trt=False,
use_glog=False, use_glog=False,
memory_optimize=True): memory_optimize=True):
...@@ -38,6 +41,7 @@ class Predictor: ...@@ -38,6 +41,7 @@ class Predictor:
use_gpu: 是否使用gpu,默认True use_gpu: 是否使用gpu,默认True
gpu_id: 使用gpu的id,默认0 gpu_id: 使用gpu的id,默认0
use_mkl: 是否使用mkldnn计算库,CPU情况下使用,默认False use_mkl: 是否使用mkldnn计算库,CPU情况下使用,默认False
mkl_thread_num: mkldnn计算线程数,默认为4
use_trt: 是否使用TensorRT,默认False use_trt: 是否使用TensorRT,默认False
use_glog: 是否启用glog日志, 默认False use_glog: 是否启用glog日志, 默认False
memory_optimize: 是否启动内存优化,默认True memory_optimize: 是否启动内存优化,默认True
...@@ -72,13 +76,15 @@ class Predictor: ...@@ -72,13 +76,15 @@ class Predictor:
to_rgb = False to_rgb = False
self.transforms = build_transforms(self.model_type, self.transforms = build_transforms(self.model_type,
self.info['Transforms'], to_rgb) self.info['Transforms'], to_rgb)
self.predictor = self.create_predictor( self.predictor = self.create_predictor(use_gpu, gpu_id, use_mkl,
use_gpu, gpu_id, use_mkl, use_trt, use_glog, memory_optimize) mkl_thread_num, use_trt,
use_glog, memory_optimize)
def create_predictor(self, def create_predictor(self,
use_gpu=True, use_gpu=True,
gpu_id=0, gpu_id=0,
use_mkl=False, use_mkl=False,
mkl_thread_num=4,
use_trt=False, use_trt=False,
use_glog=False, use_glog=False,
memory_optimize=True): memory_optimize=True):
...@@ -93,6 +99,7 @@ class Predictor: ...@@ -93,6 +99,7 @@ class Predictor:
config.disable_gpu() config.disable_gpu()
if use_mkl: if use_mkl:
config.enable_mkldnn() config.enable_mkldnn()
config.set_cpu_math_library_num_threads(mkl_thread_num)
if use_glog: if use_glog:
config.enable_glog_info() config.enable_glog_info()
else: else:
...@@ -124,8 +131,8 @@ class Predictor: ...@@ -124,8 +131,8 @@ class Predictor:
thread_num=thread_num) thread_num=thread_num)
res['image'] = im res['image'] = im
elif self.model_type == "detector": elif self.model_type == "detector":
if self.model_name == "YOLOv3": if self.model_name in ["PPYOLO", "YOLOv3"]:
im, im_size = YOLOv3._preprocess( im, im_size = PPYOLO._preprocess(
image, image,
self.transforms, self.transforms,
self.model_type, self.model_type,
...@@ -185,8 +192,8 @@ class Predictor: ...@@ -185,8 +192,8 @@ class Predictor:
res = {'bbox': (results[0][0], offset_to_lengths(results[0][1])), } res = {'bbox': (results[0][0], offset_to_lengths(results[0][1])), }
res['im_id'] = (np.array( res['im_id'] = (np.array(
[[i] for i in range(batch_size)]).astype('int32'), [[]]) [[i] for i in range(batch_size)]).astype('int32'), [[]])
if self.model_name == "YOLOv3": if self.model_name in ["PPYOLO", "YOLOv3"]:
preds = YOLOv3._postprocess(res, batch_size, self.num_classes, preds = PPYOLO._postprocess(res, batch_size, self.num_classes,
self.labels) self.labels)
elif self.model_name == "FasterRCNN": elif self.model_name == "FasterRCNN":
preds = FasterRCNN._postprocess(res, batch_size, preds = FasterRCNN._postprocess(res, batch_size,
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -17,6 +17,7 @@ from . import cv ...@@ -17,6 +17,7 @@ from . import cv
FasterRCNN = cv.models.FasterRCNN FasterRCNN = cv.models.FasterRCNN
YOLOv3 = cv.models.YOLOv3 YOLOv3 = cv.models.YOLOv3
PPYOLO = cv.models.PPYOLO
MaskRCNN = cv.models.MaskRCNN MaskRCNN = cv.models.MaskRCNN
transforms = cv.transforms.det_transforms transforms = cv.transforms.det_transforms
visualize = cv.models.utils.visualize.visualize_detection visualize = cv.models.utils.visualize.visualize_detection
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
import os import os
def _find_classes(dir): def _find_classes(dir):
# Faster and available in Python 3.5 and above # Faster and available in Python 3.5 and above
classes = [d.name for d in os.scandir(dir) if d.is_dir()] classes = [d.name for d in os.scandir(dir) if d.is_dir()]
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -138,8 +138,10 @@ class ReaderConfig(object): ...@@ -138,8 +138,10 @@ class ReaderConfig(object):
... ...
""" """
def __init__(self, dataset_dir, is_test): def __init__(self, dataset_dir, is_test):
image_paths, labels, self.num_classes = self.get_dataset_info(dataset_dir, is_test) image_paths, labels, self.num_classes = self.get_dataset_info(
dataset_dir, is_test)
random_per = np.random.permutation(range(len(image_paths))) random_per = np.random.permutation(range(len(image_paths)))
self.image_paths = image_paths[random_per] self.image_paths = image_paths[random_per]
self.labels = labels[random_per] self.labels = labels[random_per]
...@@ -147,7 +149,8 @@ class ReaderConfig(object): ...@@ -147,7 +149,8 @@ class ReaderConfig(object):
def get_reader(self): def get_reader(self):
def reader(): def reader():
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
target_size = 256 target_size = 256
crop_size = 224 crop_size = 224
...@@ -171,7 +174,8 @@ class ReaderConfig(object): ...@@ -171,7 +174,8 @@ class ReaderConfig(object):
return reader return reader
def get_dataset_info(self, dataset_dir, is_test=False): def get_dataset_info(self, dataset_dir, is_test=False):
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
# read # read
if is_test: if is_test:
...@@ -199,7 +203,8 @@ class ReaderConfig(object): ...@@ -199,7 +203,8 @@ class ReaderConfig(object):
def create_reader(list_image_path, list_label=None, is_test=False): def create_reader(list_image_path, list_label=None, is_test=False):
def reader(): def reader():
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
target_size = 256 target_size = 256
crop_size = 224 crop_size = 224
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
...@@ -14,4 +14,5 @@ ...@@ -14,4 +14,5 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .convert import * from .convert import *
\ No newline at end of file from .split import *
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import random
import json
from pycocotools.coco import COCO
from .utils import MyEncoder
import paddlex.utils.logging as logging
def split_coco_dataset(dataset_dir, val_percent, test_percent, save_dir):
if not osp.exists(osp.join(dataset_dir, "annotations.json")):
logging.error("\'annotations.json\' is not found in {}!".format(
dataset_dir))
annotation_file = osp.join(dataset_dir, "annotations.json")
coco = COCO(annotation_file)
img_ids = coco.getImgIds()
cat_ids = coco.getCatIds()
anno_ids = coco.getAnnIds()
val_num = int(len(img_ids) * val_percent)
test_num = int(len(img_ids) * test_percent)
train_num = len(img_ids) - val_num - test_num
random.shuffle(img_ids)
train_files_ids = img_ids[:train_num]
val_files_ids = img_ids[train_num:train_num + val_num]
test_files_ids = img_ids[train_num + val_num:]
for img_id_list in [train_files_ids, val_files_ids, test_files_ids]:
img_anno_ids = coco.getAnnIds(imgIds=img_id_list, iscrowd=0)
imgs = coco.loadImgs(img_id_list)
instances = coco.loadAnns(img_anno_ids)
categories = coco.loadCats(cat_ids)
img_dict = {
"annotations": instances,
"images": imgs,
"categories": categories
}
if img_id_list == train_files_ids:
json_file = open(osp.join(save_dir, 'train.json'), 'w+')
json.dump(img_dict, json_file, cls=MyEncoder)
elif img_id_list == val_files_ids:
json_file = open(osp.join(save_dir, 'val.json'), 'w+')
json.dump(img_dict, json_file, cls=MyEncoder)
elif img_id_list == test_files_ids and len(test_files_ids):
json_file = open(osp.join(save_dir, 'test.json'), 'w+')
json.dump(img_dict, json_file, cls=MyEncoder)
return train_num, val_num, test_num
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import random
from .utils import list_files, is_pic
import paddlex.utils.logging as logging
def split_imagenet_dataset(dataset_dir, val_percent, test_percent, save_dir):
all_files = list_files(dataset_dir)
label_list = list()
train_image_anno_list = list()
val_image_anno_list = list()
test_image_anno_list = list()
for file in all_files:
if not is_pic(file):
continue
label, image_name = osp.split(file)
if label not in label_list:
label_list.append(label)
label_list = sorted(label_list)
for i in range(len(label_list)):
image_list = list_files(osp.join(dataset_dir, label_list[i]))
image_anno_list = list()
for img in image_list:
image_anno_list.append([osp.join(label_list[i], img), i])
random.shuffle(image_anno_list)
image_num = len(image_anno_list)
val_num = int(image_num * val_percent)
test_num = int(image_num * test_percent)
train_num = image_num - val_num - test_num
train_image_anno_list += image_anno_list[:train_num]
val_image_anno_list += image_anno_list[train_num:train_num + val_num]
test_image_anno_list += image_anno_list[train_num + val_num:]
with open(
osp.join(save_dir, 'train_list.txt'), mode='w',
encoding='utf-8') as f:
for x in train_image_anno_list:
file, label = x
f.write('{} {}\n'.format(file, label))
with open(
osp.join(save_dir, 'val_list.txt'), mode='w',
encoding='utf-8') as f:
for x in val_image_anno_list:
file, label = x
f.write('{} {}\n'.format(file, label))
if len(test_image_anno_list):
with open(
osp.join(save_dir, 'test_list.txt'), mode='w',
encoding='utf-8') as f:
for x in test_image_anno_list:
file, label = x
f.write('{} {}\n'.format(file, label))
with open(
osp.join(save_dir, 'labels.txt'), mode='w', encoding='utf-8') as f:
for l in sorted(label_list):
f.write('{}\n'.format(l))
return len(train_image_anno_list), len(val_image_anno_list), len(
test_image_anno_list)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import random
from .utils import list_files, is_pic, replace_ext, read_seg_ann
import paddlex.utils.logging as logging
def split_seg_dataset(dataset_dir, val_percent, test_percent, save_dir):
if not osp.exists(osp.join(dataset_dir, "JPEGImages")):
logging.error("\'JPEGImages\' is not found in {}!".format(dataset_dir))
if not osp.exists(osp.join(dataset_dir, "Annotations")):
logging.error("\'Annotations\' is not found in {}!".format(
dataset_dir))
all_image_files = list_files(osp.join(dataset_dir, "JPEGImages"))
image_anno_list = list()
label_list = list()
for image_file in all_image_files:
if not is_pic(image_file):
continue
anno_name = replace_ext(image_file, "png")
if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
image_anno_list.append([image_file, anno_name])
else:
anno_name = replace_ext(image_file, "PNG")
if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
image_anno_list.append([image_file, anno_name])
else:
logging.error("The annotation file {} doesn't exist!".format(
anno_name))
if not osp.exists(osp.join(dataset_dir, "labels.txt")):
for image_anno in image_anno_list:
labels = read_seg_ann(
osp.join(dataset_dir, "Annotations", anno_name))
for i in labels:
if i not in label_list:
label_list.append(i)
# 如果类标签的最大值大于类别数,添加对应缺失的标签
if len(label_list) != max(label_list) + 1:
label_list = [i for i in range(max(label_list) + 1)]
random.shuffle(image_anno_list)
image_num = len(image_anno_list)
val_num = int(image_num * val_percent)
test_num = int(image_num * test_percent)
train_num = image_num - val_num - test_num
train_image_anno_list = image_anno_list[:train_num]
val_image_anno_list = image_anno_list[train_num:train_num + val_num]
test_image_anno_list = image_anno_list[train_num + val_num:]
with open(
osp.join(save_dir, 'train_list.txt'), mode='w',
encoding='utf-8') as f:
for x in train_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
with open(
osp.join(save_dir, 'val_list.txt'), mode='w',
encoding='utf-8') as f:
for x in val_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
if len(test_image_anno_list):
with open(
osp.join(save_dir, 'test_list.txt'), mode='w',
encoding='utf-8') as f:
for x in test_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
if len(label_list):
with open(
osp.join(save_dir, 'labels.txt'), mode='w',
encoding='utf-8') as f:
for l in sorted(label_list):
f.write('{}\n'.format(l))
return train_num, val_num, test_num
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path as osp
from PIL import Image
import numpy as np
import json
class MyEncoder(json.JSONEncoder):
# 调整json文件存储形式
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(MyEncoder, self).default(obj)
def list_files(dirname):
""" 列出目录下所有文件(包括所属的一级子目录下文件)
Args:
dirname: 目录路径
"""
def filter_file(f):
if f.startswith('.'):
return True
return False
all_files = list()
dirs = list()
for f in os.listdir(dirname):
if filter_file(f):
continue
if osp.isdir(osp.join(dirname, f)):
dirs.append(f)
else:
all_files.append(f)
for d in dirs:
for f in os.listdir(osp.join(dirname, d)):
if filter_file(f):
continue
if osp.isdir(osp.join(dirname, d, f)):
continue
all_files.append(osp.join(d, f))
return all_files
def is_pic(filename):
""" 判断文件是否为图片格式
Args:
filename: 文件路径
"""
suffixes = {'JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png'}
suffix = filename.strip().split('.')[-1]
if suffix not in suffixes:
return False
return True
def replace_ext(filename, new_ext):
""" 替换文件后缀
Args:
filename: 文件路径
new_ext: 需要替换的新的后缀
"""
items = filename.split(".")
items[-1] = new_ext
new_filename = ".".join(items)
return new_filename
def read_seg_ann(pngfile):
""" 解析语义分割的标注png图片
Args:
pngfile: 包含标注信息的png图片路径
"""
grt = np.asarray(Image.open(pngfile))
labels = list(np.unique(grt))
if 255 in labels:
labels.remove(255)
return labels
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import random
import xml.etree.ElementTree as ET
from .utils import list_files, is_pic, replace_ext
import paddlex.utils.logging as logging
def split_voc_dataset(dataset_dir, val_percent, test_percent, save_dir):
if not osp.exists(osp.join(dataset_dir, "JPEGImages")):
logging.error("\'JPEGImages\' is not found in {}!".format(dataset_dir))
if not osp.exists(osp.join(dataset_dir, "Annotations")):
logging.error("\'Annotations\' is not found in {}!".format(
dataset_dir))
all_image_files = list_files(osp.join(dataset_dir, "JPEGImages"))
image_anno_list = list()
label_list = list()
for image_file in all_image_files:
if not is_pic(image_file):
continue
anno_name = replace_ext(image_file, "xml")
if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
image_anno_list.append([image_file, anno_name])
try:
tree = ET.parse(
osp.join(dataset_dir, "Annotations", anno_name))
except:
raise Exception("文件{}不是一个良构的xml文件,请检查标注文件".format(
osp.join(dataset_dir, "Annotations", anno_name)))
objs = tree.findall("object")
for i, obj in enumerate(objs):
cname = obj.find('name').text
if not cname in label_list:
label_list.append(cname)
else:
logging.error("The annotation file {} doesn't exist!".format(
anno_name))
random.shuffle(image_anno_list)
image_num = len(image_anno_list)
val_num = int(image_num * val_percent)
test_num = int(image_num * test_percent)
train_num = image_num - val_num - test_num
train_image_anno_list = image_anno_list[:train_num]
val_image_anno_list = image_anno_list[train_num:train_num + val_num]
test_image_anno_list = image_anno_list[train_num + val_num:]
with open(
osp.join(save_dir, 'train_list.txt'), mode='w',
encoding='utf-8') as f:
for x in train_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
with open(
osp.join(save_dir, 'val_list.txt'), mode='w',
encoding='utf-8') as f:
for x in val_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
if len(test_image_anno_list):
with open(
osp.join(save_dir, 'test_list.txt'), mode='w',
encoding='utf-8') as f:
for x in test_image_anno_list:
file = osp.join("JPEGImages", x[0])
label = osp.join("Annotations", x[1])
f.write('{} {}\n'.format(file, label))
with open(
osp.join(save_dir, 'labels.txt'), mode='w', encoding='utf-8') as f:
for l in sorted(label_list):
f.write('{}\n'.format(l))
return train_num, val_num, test_num
#!/usr/bin/env python
# coding: utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .dataset_split.coco_split import split_coco_dataset
from .dataset_split.voc_split import split_voc_dataset
from .dataset_split.imagenet_split import split_imagenet_dataset
from .dataset_split.seg_split import split_seg_dataset
def dataset_split(dataset_dir, dataset_format, val_value, test_value,
save_dir):
if dataset_format == "coco":
train_num, val_num, test_num = split_coco_dataset(
dataset_dir, val_value, test_value, save_dir)
elif dataset_format == "voc":
train_num, val_num, test_num = split_voc_dataset(
dataset_dir, val_value, test_value, save_dir)
elif dataset_format == "seg":
train_num, val_num, test_num = split_seg_dataset(
dataset_dir, val_value, test_value, save_dir)
elif dataset_format == "imagenet":
train_num, val_num, test_num = split_imagenet_dataset(
dataset_dir, val_value, test_value, save_dir)
print("Dataset Split Done.")
print("Train samples: {}".format(train_num))
print("Eval samples: {}".format(val_num))
print("Test samples: {}".format(test_num))
print("Split files saved in {}".format(save_dir))
...@@ -147,7 +147,7 @@ class LabelMe2COCO(X2COCO): ...@@ -147,7 +147,7 @@ class LabelMe2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0] img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(osp.join(image_dir, img_file)) os.remove(osp.join(img_dir, img_file))
continue continue
image_id = image_id + 1 image_id = image_id + 1
with open(json_file, mode='r', \ with open(json_file, mode='r', \
...@@ -220,7 +220,7 @@ class EasyData2COCO(X2COCO): ...@@ -220,7 +220,7 @@ class EasyData2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0] img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(osp.join(image_dir, img_file)) os.remove(osp.join(img_dir, img_file))
continue continue
image_id = image_id + 1 image_id = image_id + 1
with open(json_file, mode='r', \ with open(json_file, mode='r', \
...@@ -317,7 +317,7 @@ class JingLing2COCO(X2COCO): ...@@ -317,7 +317,7 @@ class JingLing2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0] img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(osp.join(image_dir, img_file)) os.remove(osp.join(img_dir, img_file))
continue continue
image_id = image_id + 1 image_id = image_id + 1
with open(json_file, mode='r', \ with open(json_file, mode='r', \
......
...@@ -23,6 +23,7 @@ import shutil ...@@ -23,6 +23,7 @@ import shutil
import numpy as np import numpy as np
import PIL.Image import PIL.Image
from .base import MyEncoder, is_pic, get_encoding from .base import MyEncoder, is_pic, get_encoding
import math
class X2Seg(object): class X2Seg(object):
def __init__(self): def __init__(self):
...@@ -140,7 +141,7 @@ class JingLing2Seg(X2Seg): ...@@ -140,7 +141,7 @@ class JingLing2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0] img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(os.remove(osp.join(image_dir, img_name))) os.remove(osp.join(image_dir, img_name))
continue continue
with open(json_file, mode="r", \ with open(json_file, mode="r", \
encoding=get_encoding(json_file)) as j: encoding=get_encoding(json_file)) as j:
...@@ -226,7 +227,7 @@ class LabelMe2Seg(X2Seg): ...@@ -226,7 +227,7 @@ class LabelMe2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0] img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(os.remove(osp.join(image_dir, img_name))) os.remove(osp.join(image_dir, img_name))
continue continue
img_file = osp.join(image_dir, img_name) img_file = osp.join(image_dir, img_name)
img = np.asarray(PIL.Image.open(img_file)) img = np.asarray(PIL.Image.open(img_file))
...@@ -260,7 +261,7 @@ class EasyData2Seg(X2Seg): ...@@ -260,7 +261,7 @@ class EasyData2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0] img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json") json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file): if not osp.exists(json_file):
os.remove(os.remove(osp.join(image_dir, img_name))) os.remove(osp.join(image_dir, img_name))
continue continue
with open(json_file, mode="r", \ with open(json_file, mode="r", \
encoding=get_encoding(json_file)) as j: encoding=get_encoding(json_file)) as j:
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -29,8 +29,9 @@ def log(level=2, message="", use_color=False): ...@@ -29,8 +29,9 @@ def log(level=2, message="", use_color=False):
current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array) current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
if paddlex.log_level >= level: if paddlex.log_level >= level:
if use_color: if use_color:
print("\033[1;31;40m{} [{}]\t{}\033[0m".format(current_time, levels[ print("\033[1;31;40m{} [{}]\t{}\033[0m".format(
level], message).encode("utf-8").decode("latin1")) current_time, levels[level], message).encode("utf-8").decode(
"latin1"))
else: else:
print("{} [{}]\t{}".format(current_time, levels[level], message) print("{} [{}]\t{}".format(current_time, levels[level], message)
.encode("utf-8").decode("latin1")) .encode("utf-8").decode("latin1"))
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -531,8 +531,8 @@ def save_mask_inference_model(dirname, ...@@ -531,8 +531,8 @@ def save_mask_inference_model(dirname,
if isinstance(target_vars, Variable): if isinstance(target_vars, Variable):
target_vars = [target_vars] target_vars = [target_vars]
elif export_for_deployment: elif export_for_deployment:
if not (bool(target_vars) if not (bool(target_vars) and
and all(isinstance(var, Variable) for var in target_vars)): all(isinstance(var, Variable) for var in target_vars)):
raise ValueError("'target_vars' should be a list of Variable.") raise ValueError("'target_vars' should be a list of Variable.")
main_program = _get_valid_program(main_program) main_program = _get_valid_program(main_program)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -50,6 +50,7 @@ def get_environ_info(): ...@@ -50,6 +50,7 @@ def get_environ_info():
info['num'] = fluid.core.get_cuda_device_count() info['num'] = fluid.core.get_cuda_device_count()
return info return info
def path_normalization(path): def path_normalization(path):
win_sep = "\\" win_sep = "\\"
other_sep = "/" other_sep = "/"
...@@ -59,6 +60,7 @@ def path_normalization(path): ...@@ -59,6 +60,7 @@ def path_normalization(path):
path = other_sep.join(path.split(win_sep)) path = other_sep.join(path.split(win_sep))
return path return path
def parse_param_file(param_file, return_shape=True): def parse_param_file(param_file, return_shape=True):
from paddle.fluid.proto.framework_pb2 import VarType from paddle.fluid.proto.framework_pb2 import VarType
f = open(param_file, 'rb') f = open(param_file, 'rb')
......
...@@ -8,3 +8,4 @@ paddleslim == 1.0.1 ...@@ -8,3 +8,4 @@ paddleslim == 1.0.1
shapely shapely
x2paddle x2paddle
paddlepaddle-gpu paddlepaddle-gpu
opencv-python
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -31,7 +31,7 @@ setuptools.setup( ...@@ -31,7 +31,7 @@ setuptools.setup(
install_requires=[ install_requires=[
"pycocotools;platform_system!='Windows'", 'pyyaml', 'colorama', 'tqdm', "pycocotools;platform_system!='Windows'", 'pyyaml', 'colorama', 'tqdm',
'paddleslim==1.0.1', 'visualdl>=2.0.0b', 'paddlehub>=1.6.2', 'paddleslim==1.0.1', 'visualdl>=2.0.0b', 'paddlehub>=1.6.2',
'shapely>=1.7.0' 'shapely>=1.7.0', "opencv-python"
], ],
classifiers=[ classifiers=[
"Programming Language :: Python :: 3", "Programming Language :: Python :: 3",
......
...@@ -2,17 +2,19 @@ ...@@ -2,17 +2,19 @@
本目录下整理了使用PaddleX进行模型剪裁训练的代码,代码均会自动下载数据,并使用单张GPU卡进行训练。 本目录下整理了使用PaddleX进行模型剪裁训练的代码,代码均会自动下载数据,并使用单张GPU卡进行训练。
PaddleX提供了两种剪裁训练方式, PaddleX提供了两种剪裁训练方式,
1. 用户自行计算剪裁配置(推荐),整体流程为 1. 用户自行计算剪裁配置(推荐),整体流程为
> 1.使用数据训练原始模型;
> 2.使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件 > 1. 使用数据训练原始模型;
> 3.再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件, > 2. 使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件
> 4.模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构剪裁后,继续迭代训练 > 3. 再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件,
> > 4. 模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构剪裁后,继续迭代训练
2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
> 1. 在训练调用`train`接口时,将`sensetivities_file`参数设为`DEFAULT`字符串 2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
> 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构剪裁,继而迭代训练
> 1. 在训练调用`train`接口时,将`sensetivities_file`参数设为`DEFAULT`字符串
上述两种方式,第1种方法相对比第2种方法少了两步(即用户训练原始模型+自行计算参数敏感度信息),实验验证第1种方法的精度会更高,剪裁的模型效果更好,因此在时间和计算成本允许的前提下,更推荐使用第1种方法。 > 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构剪裁,继而迭代训练
上述两种方式,第1种方法相对比第2种方法多两步(即用户训练原始模型+自行计算参数敏感度信息),实验验证第1种方法的精度会更高,剪裁的模型效果更好,因此在时间和计算成本允许的前提下,更推荐使用第1种方法。
## 开始剪裁训练 ## 开始剪裁训练
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -29,13 +29,11 @@ def train(model_dir=None, sensitivities_file=None, eval_metric_loss=0.05): ...@@ -29,13 +29,11 @@ def train(model_dir=None, sensitivities_file=None, eval_metric_loss=0.05):
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize()
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
...@@ -28,17 +28,14 @@ def train(model_dir, sensitivities_file, eval_metric_loss): ...@@ -28,17 +28,14 @@ def train(model_dir, sensitivities_file, eval_metric_loss):
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250), transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
transforms.RandomDistort(), transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
transforms.RandomExpand(), target_size=608, interp='RANDOM'),
transforms.RandomCrop(), transforms.RandomHorizontalFlip(), transforms.Normalize()
transforms.Resize(target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'), transforms.Resize(
transforms.Normalize() target_size=608, interp='CUBIC'), transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. #copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
#Licensed under the Apache License, Version 2.0 (the "License"); #Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License. #you may not use this file except in compliance with the License.
...@@ -28,15 +28,12 @@ def train(model_dir, sensitivities_file, eval_metric_loss): ...@@ -28,15 +28,12 @@ def train(model_dir, sensitivities_file, eval_metric_loss):
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.ResizeRangeScaling(), transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512), transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512), transforms.Padding(target_size=512), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
......
...@@ -4,15 +4,31 @@ ...@@ -4,15 +4,31 @@
|代码 | 模型任务 | 数据 | |代码 | 模型任务 | 数据 |
|------|--------|---------| |------|--------|---------|
|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 | |image_classification/alexnet.py | 图像分类AlexyNet | 蔬菜分类 |
|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 | |image_classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 | |image_classification/mobilenetv3_small_ssld.py | 图像分类MobileNetV3_small_ssld | 蔬菜分类 |
|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 | |image_classification/resnet50_vd_ssld.py | 图像分类ResNet50_vd_ssld | 蔬菜分类 |
|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 | |image_classification/shufflenetv2.py | 图像分类ShuffleNetV2 | 蔬菜分类 |
|segmentation/unet.py | 语义分割UNet | 视盘分割 | |object_detection/faster_rcnn_hrnet_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
|object_detection/faster_rcnn_r18_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
|object_detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
|object_detection/ppyolo.py | 目标检测PPYOLO | 昆虫检测 |
|object_detection/yolov3_darknet53.py | 目标检测YOLOv3 | 昆虫检测 |
|object_detection/yolov3_mobilenetv1.py | 目标检测YOLOv3 | 昆虫检测 |
|object_detection/yolov3_mobilenetv3.py | 目标检测YOLOv3 | 昆虫检测 |
|instance_segmentation/mask_rcnn_hrnet_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
|instance_segmentation/mask_rcnn_r18_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
|instance_segmentation/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
|semantic_segmentation/deeplabv3p_mobilenetv2.py | 语义分割DeepLabV3 | 视盘分割 |
|semantic_segmentation/deeplabv3p_mobilenetv2.py | 语义分割DeepLabV3 | 视盘分割 |
|semantic_segmentation/deeplabv3p_mobilenetv2_x0.25.py | 语义分割DeepLabV3 | 视盘分割 |
|semantic_segmentation/deeplabv3p_xception65.py | 语义分割DeepLabV3 | 视盘分割 |
|semantic_segmentation/fast_scnn.py | 语义分割FastSCNN | 视盘分割 |
|semantic_segmentation/hrnet.py | 语义分割HRNet | 视盘分割 |
|semantic_segmentation/unet.py | 语义分割UNet | 视盘分割 |
## 开始训练 ## 开始训练
在安装PaddleX后,使用如下命令开始训练 在安装PaddleX后,使用如下命令开始训练
``` ```
python classification/mobilenetv2.py python image_classification/mobilenetv2.py
``` ```
...@@ -17,4 +17,4 @@ python mobilenetv3_small_ssld.py ...@@ -17,4 +17,4 @@ python mobilenetv3_small_ssld.py
visualdl --logdir output/mobilenetv3_small_ssld/vdl_log --port 8001 visualdl --logdir output/mobilenetv3_small_ssld/vdl_log --port 8001
``` ```
服务启动后,使用浏览器打开 https://0.0.0.0:8001 或 https://localhost:8001 服务启动后,使用浏览器打开 https://0.0.0.0:8001 或 https://localhost:8001
...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./') ...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html # API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001或https://localhost:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.AlexNet(num_classes=len(train_dataset.labels)) model = pdx.cls.AlexNet(num_classes=len(train_dataset.labels))
# AlexNet需要指定确定的input_shape # AlexNet需要指定确定的input_shape
model.fixed_input_shape = [224, 224] model.fixed_input_shape = [224, 224]
......
...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./') ...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html # API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels)) model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
......
...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./') ...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html # API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.MobileNetV3_small_ssld(num_classes=len(train_dataset.labels)) model = pdx.cls.MobileNetV3_small_ssld(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-imagenet # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-imagenet
......
...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./') ...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html # API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.ResNet50_vd_ssld(num_classes=len(train_dataset.labels)) model = pdx.cls.ResNet50_vd_ssld(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
......
...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./') ...@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html # API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224), transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256), transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224), transforms.CenterCrop(crop_size=224), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.ShuffleNetV2(num_classes=len(train_dataset.labels)) model = pdx.cls.ShuffleNetV2(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
......
...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./') ...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333), transforms.ResizeByShort(
short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32), transforms.Padding(coarsest_stride=32),
]) ])
...@@ -38,10 +38,7 @@ eval_dataset = pdx.datasets.CocoDetection( ...@@ -38,10 +38,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./') ...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./') ...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection( ...@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(insect_dataset, path='./') ...@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -40,10 +38,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -40,10 +38,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./') ...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333), transforms.ResizeByShort(
short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32), transforms.Padding(coarsest_stride=32),
]) ])
...@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./') ...@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.Normalize(),
transforms.Normalize(), transforms.ResizeByShort(
transforms.ResizeByShort(short_size=800, max_size=1333), short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
transforms.Padding(coarsest_stride=32)
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Normalize(), transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333), transforms.ResizeByShort(
short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32), transforms.Padding(coarsest_stride=32),
]) ])
...@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 # num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1 num_classes = len(train_dataset.labels) + 1
......
# 环境变量配置,用于控制是否使用GPU
# 说明文档:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html#gpu
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.Resize(
target_size=608, interp='CUBIC'), transforms.Normalize()
])
# 定义训练和验证所用的数据集
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-vocdetection
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
model = pdx.det.PPYOLO(num_classes=num_classes)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#train
# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
model.train(
num_epochs=270,
train_dataset=train_dataset,
train_batch_size=8,
eval_dataset=eval_dataset,
learning_rate=0.000125,
lr_decay_epochs=[210, 240],
save_dir='output/ppyolo',
use_vdl=True)
...@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./') ...@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250), transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
transforms.RandomDistort(), transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
transforms.RandomExpand(), target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
transforms.RandomCrop(),
transforms.Resize(target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'), transforms.Resize(
transforms.Normalize() target_size=608, interp='CUBIC'), transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3 # API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
......
...@@ -17,13 +17,15 @@ train_transforms = transforms.Compose([ ...@@ -17,13 +17,15 @@ train_transforms = transforms.Compose([
transforms.RandomDistort(), transforms.RandomDistort(),
transforms.RandomExpand(), transforms.RandomExpand(),
transforms.RandomCrop(), transforms.RandomCrop(),
transforms.Resize(target_size=608, interp='RANDOM'), transforms.Resize(
target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(),
transforms.Normalize(), transforms.Normalize(),
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'), transforms.Resize(
target_size=608, interp='CUBIC'),
transforms.Normalize(), transforms.Normalize(),
]) ])
...@@ -42,10 +44,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -42,10 +44,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3 # API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
......
...@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./') ...@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250), transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
transforms.RandomDistort(), transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
transforms.RandomExpand(), target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
transforms.RandomCrop(),
transforms.Resize(target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize() transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'), transforms.Resize(
transforms.Normalize() target_size=608, interp='CUBIC'), transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection( ...@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3 # API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
......
...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./') ...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.ResizeRangeScaling(), transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512), transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512), transforms.Padding(target_size=512), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -40,15 +37,12 @@ eval_dataset = pdx.datasets.SegDataset( ...@@ -40,15 +37,12 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p
model = pdx.seg.DeepLabv3p(num_classes=num_classes, backbone='MobileNetV2_x1.0') model = pdx.seg.DeepLabv3p(
num_classes=num_classes, backbone='MobileNetV2_x1.0')
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#train # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#train
# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html # 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
......
# 环境变量配置,用于控制是否使用GPU
# 说明文档:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html#gpu
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-segdataset
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p
model = pdx.seg.DeepLabv3p(
num_classes=num_classes,
backbone='MobileNetV3_large_x1_0_ssld',
pooling_crop_size=(512, 512))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#train
# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
model.train(
num_epochs=40,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/deeplabv3p_mobilenetv3_large_ssld',
use_vdl=True)
...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./') ...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.ResizeRangeScaling(), transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512), transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512), transforms.Padding(target_size=512), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -40,13 +37,8 @@ eval_dataset = pdx.datasets.SegDataset( ...@@ -40,13 +37,8 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-fastscnn # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-fastscnn
model = pdx.seg.FastSCNN(num_classes=num_classes) model = pdx.seg.FastSCNN(num_classes=num_classes)
......
...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./') ...@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.ResizeRangeScaling(), transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512), transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512), transforms.Padding(target_size=512), transforms.Normalize()
transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -40,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset( ...@@ -40,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-hrnet # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-hrnet
......
...@@ -13,15 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./') ...@@ -13,15 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms # 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html # API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([ train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(), transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
transforms.ResizeRangeScaling(), transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
]) ])
eval_transforms = transforms.Compose([ eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512), transforms.Padding(target_size=512), transforms.ResizeByLong(long_size=512),
transforms.Normalize() transforms.Padding(target_size=512), transforms.Normalize()
]) ])
# 定义训练和验证所用的数据集 # 定义训练和验证所用的数据集
...@@ -39,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset( ...@@ -39,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms) transforms=eval_transforms)
# 初始化模型,并进行训练 # 初始化模型,并进行训练
# 可使用VisualDL查看训练指标 # 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p # API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册