diff --git a/.github/ISSUE_TEMPLATE/4_gui.md b/.github/ISSUE_TEMPLATE/4_gui.md
index 053f70da760ce35d6b4f53a00c81770b3ff48bf1..780c8b903b9137f72037e311213443c8678f61d9 100644
--- a/.github/ISSUE_TEMPLATE/4_gui.md
+++ b/.github/ISSUE_TEMPLATE/4_gui.md
@@ -3,4 +3,4 @@ name: 4. PaddleX GUI使用问题
about: Paddle GUI客户端使用问题
---
-PaddleX GUI: https://www.paddlepaddle.org.cn/paddle/paddleX
+PaddleX GUI: https://www.paddlepaddle.org.cn/paddle/paddleX (请在ISSUE内容中保留此行内容)
diff --git a/README.md b/README.md
index 21cabc03a5cf8a8ecf3eab9cf2f35abd898be620..774482fed12c8f4e9bc7a7810f0143206d24bb67 100644
--- a/README.md
+++ b/README.md
@@ -1,55 +1,36 @@
+
+
+
+
+ PaddleX -- 飞桨全流程开发套件,以低代码的形式支持开发者快速实现产业实际项目落地
+
[](LICENSE)
[](https://github.com/PaddlePaddle/PaddleX/releases)



-**PaddleX--飞桨全功能开发套件**,集成了飞桨视觉套件(PaddleClas、PaddleDetection、PaddleSeg)、模型压缩工具PaddleSlim、可视化分析工具VisualDL、轻量化推理引擎Paddle Lite 等核心模块的能力,同时融合飞桨团队丰富的实际经验及技术积累,将深度学习开发全流程,从数据准备、模型训练与优化到多端部署实现了端到端打通,为开发者提供飞桨全流程开发的最佳实践。
-
-**PaddleX 提供了最简化的API设计,并官方实现GUI供大家下载使用**,最大程度降低开发者使用门槛。开发者既可以应用**PaddleX GUI**快速体验深度学习模型开发的全流程,也可以直接使用 **PaddleX API** 更灵活地进行开发。
-
-更进一步的,如果用户需要根据自己场景及需求,定制化地对PaddleX 进行改造或集成,PaddleX 也提供很好的支持。
+集成飞桨智能视觉领域**图像分类**、**目标检测**、**语义分割**、**实例分割**任务能力,将深度学习开发全流程从**数据准备**、**模型训练与优化**到**多端部署**端到端打通,并提供**统一任务API接口**及**图形化开发界面Demo**。开发者无需分别安装不同套件,以**低代码**的形式即可快速完成飞桨全流程开发。
-## PaddleX 三大特点
+**PaddleX** 经过**质检**、**安防**、**巡检**、**遥感**、**零售**、**医疗**等十多个行业实际应用场景验证,沉淀产业实际经验,**并提供丰富的案例实践教程**,全程助力开发者产业实践落地。
-### 全流程打通
-- **数据准备**:兼容ImageNet、VOC、COCO等常用数据协议, 同时与Labelme、精灵标注助手、[EasyData智能数据服务平台](https://ai.baidu.com/easydata/)等无缝衔接,全方位助力开发者更快完成数据准备工作。
-
-- **数据预处理及增强**:提供极简的图像预处理和增强方法--Transforms,适配imgaug图像增强库,支持上百种数据增强策略,是开发者快速缓解小样本数据训练的问题。
-
-- **模型训练**:集成[PaddleClas](https://github.com/PaddlePaddle/PaddleClas), [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)视觉开发套件,提供大量精选的、经过产业实践的高质量预训练模型,使开发者更快实现工业级模型效果。
-
-- **模型调优**:内置模型可解释性模块、[VisualDL](https://github.com/PaddlePaddle/VisualDL)可视化分析工具。使开发者可以更直观的理解模型的特征提取区域、训练过程参数变化,从而快速优化模型。
-
-- **多端安全部署**:内置[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)模型压缩工具和**模型加密部署模块**,与飞桨原生预测库Paddle Inference及高性能端侧推理引擎[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 无缝打通,使开发者快速实现模型的多端、高性能、安全部署。
-
-### 融合产业实践
-
-- **产业验证**:经过**质检**、**安防**、**巡检**、**遥感**、**零售**、**医疗**等十多个行业实际应用场景验证,适配行业数据格式及部署环境要求。
-- **经验沉淀**:沉淀产业实践实际经验,**提供丰富的案例实践教程**,加速开发者产业落地。
-- **产业开发者共建**:吸收实际产业开发者贡献代码,源于产业,回馈产业。
-
-
-
-## 易用易集成
-
-- **易用**:统一的全流程API,5步即可完成模型训练,10行代码实现Python/C++高性能部署。
-- **易集成**:支持开发者自主改造、集成,开发出适用于自己产业需求的产品。并官方提供基于 PaddleX API 开发的跨平台可视化工具-- **PaddleX GUI**,使开发者快速体验飞桨深度学习开发全流程,并启发用户进行定制化开发。
## 安装
**PaddleX提供两种开发模式,满足用户的不同需求:**
-1. **Python开发模式:** 通过简洁易懂的Python API,在兼顾功能全面性、开发灵活性、集成方便性的基础上,给开发者最流畅的深度学习开发体验。
+1. **Python开发模式:**
+
+ 通过简洁易懂的Python API,在兼顾功能全面性、开发灵活性、集成方便性的基础上,给开发者最流畅的深度学习开发体验。
**前置依赖**
> - paddlepaddle >= 1.8.0
-> - python >= 3.5
+> - python >= 3.6
> - cython
> - pycocotools
@@ -59,12 +40,29 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
详细安装方法请参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/develop/install.html)
-2. **Padlde GUI模式:** 无代码开发的可视化客户端,应用Paddle API实现,使开发者快速进行产业项目验证,并为用户开发自有深度学习软件/应用提供参照。
+2. **Padlde GUI模式:**
+
+ 无代码开发的可视化客户端,应用Paddle API实现,使开发者快速进行产业项目验证,并为用户开发自有深度学习软件/应用提供参照。
- 前往[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex),申请下载Paddle X GUI一键绿色安装包。
- 前往[PaddleX GUI使用教程](./docs/gui/how_to_use.md)了解PaddleX GUI使用详情。
+
+
+## 产品模块说明
+
+- **数据准备**:兼容ImageNet、VOC、COCO等常用数据协议,同时与Labelme、精灵标注助手、[EasyData智能数据服务平台](https://ai.baidu.com/easydata/)等无缝衔接,全方位助力开发者更快完成数据准备工作。
+
+- **数据预处理及增强**:提供极简的图像预处理和增强方法--Transforms,适配imgaug图像增强库,支持**上百种数据增强策略**,是开发者快速缓解小样本数据训练的问题。
+
+- **模型训练**:集成[PaddleClas](https://github.com/PaddlePaddle/PaddleClas), [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)视觉开发套件,提供大量精选的、经过产业实践的高质量预训练模型,使开发者更快实现工业级模型效果。
+
+- **模型调优**:内置模型可解释性模块、[VisualDL](https://github.com/PaddlePaddle/VisualDL)可视化分析工具。使开发者可以更直观的理解模型的特征提取区域、训练过程参数变化,从而快速优化模型。
+
+- **多端安全部署**:内置[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)模型压缩工具和**模型加密部署模块**,与飞桨原生预测库Paddle Inference及高性能端侧推理引擎[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 无缝打通,使开发者快速实现模型的多端、高性能、安全部署。
+
+
## 完整使用文档及API说明
@@ -74,7 +72,7 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- [PaddleX模型训练教程集合](https://paddlex.readthedocs.io/zh_CN/develop/train/index.html)
- [PaddleX API接口说明](https://paddlex.readthedocs.io/zh_CN/develop/apis/index.html)
-## 在线项目示例
+### 在线项目示例
为了使开发者更快掌握PaddleX API,我们创建了一系列完整的示例教程,您可通过AIStudio一站式开发平台,快速在线运行PaddleX的项目。
@@ -83,15 +81,36 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- [PaddleX快速上手——Faster-RCNN AI识虫](https://aistudio.baidu.com/aistudio/projectdetail/439888)
- [PaddleX快速上手——DeepLabv3+ 视盘分割](https://aistudio.baidu.com/aistudio/projectdetail/440197)
-## 交流与反馈
-- 项目官网: https://www.paddlepaddle.org.cn/paddle/paddlex
-- PaddleX用户交流群: 1045148026 (手机QQ扫描如下二维码快速加入)
-
+
+## 全流程产业应用案例
+
+(continue to be updated)
+
+* 工业巡检:
+ * [工业表计读数](https://paddlex.readthedocs.io/zh_CN/develop/examples/meter_reader.html)
+
+* 工业质检:
+ * 电池隔膜缺陷检测(Coming Soon)
+
+* [人像分割](https://paddlex.readthedocs.io/zh_CN/develop/examples/human_segmentation.html)
+
+
## [FAQ](./docs/gui/faq.md)
+
+
+## 交流与反馈
+
+- 项目官网:https://www.paddlepaddle.org.cn/paddle/paddlex
+- PaddleX用户交流群:957286141 (手机QQ扫描如下二维码快速加入)
+ 
+
+
+
## 更新日志
+
> [历史版本及更新内容](https://paddlex.readthedocs.io/zh_CN/develop/change_log.html)
- 2020.07.13 v1.1.0
@@ -99,6 +118,8 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple
- 2020.05.20 v1.0.0
- 2020.05.17 v0.1.8
+
+
## 贡献代码
-我们非常欢迎您为PaddleX贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交Pull Requests.
+我们非常欢迎您为PaddleX贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能,欢迎给我们提交Pull Requests。
diff --git a/deploy/cpp/CMakeLists.txt b/deploy/cpp/CMakeLists.txt
index 48bf6455e9bc5659d62e06dda946be3a8bbadb64..349afa2cae5bf40721cafdf38bbf28ddd621beeb 100644
--- a/deploy/cpp/CMakeLists.txt
+++ b/deploy/cpp/CMakeLists.txt
@@ -17,7 +17,6 @@ SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(ENCRYPTION_DIR"" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
-
if (NOT WIN32)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
@@ -51,7 +50,9 @@ endmacro()
if (WITH_ENCRYPTION)
-add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION})
+ if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
+ add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION})
+ endif()
endif()
if (WITH_MKL)
@@ -62,8 +63,10 @@ if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
-if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
+if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
+ if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
+ endif()
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
@@ -111,10 +114,17 @@ if (WIN32)
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
unset(OpenCV_DIR CACHE)
else ()
- find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
+ if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64") # x86_64 aarch64
+ set(OpenCV_INCLUDE_DIRS "/usr/include/opencv4")
+ file(GLOB OpenCV_LIBS /usr/lib/aarch64-linux-gnu/libopencv_*${CMAKE_SHARED_LIBRARY_SUFFIX})
+ message("OpenCV libs: ${OpenCV_LIBS}")
+ else()
+ find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
+ endif()
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
endif ()
+
include_directories(${OpenCV_INCLUDE_DIRS})
if (WIN32)
@@ -260,9 +270,11 @@ endif()
if(WITH_ENCRYPTION)
if(NOT WIN32)
+ if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib")
set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/libpmodel-decrypt${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
else()
include_directories("${ENCRYPTION_DIR}/include")
link_directories("${ENCRYPTION_DIR}/lib")
@@ -276,6 +288,7 @@ if (NOT WIN32)
endif()
set(DEPS ${DEPS} ${OpenCV_LIBS})
+
add_library(paddlex_inference SHARED src/visualize src/transforms.cpp src/paddlex.cpp)
ADD_DEPENDENCIES(paddlex_inference ext-yaml-cpp)
target_link_libraries(paddlex_inference ${DEPS})
@@ -292,6 +305,19 @@ add_executable(segmenter demo/segmenter.cpp src/transforms.cpp src/paddlex.cpp s
ADD_DEPENDENCIES(segmenter ext-yaml-cpp)
target_link_libraries(segmenter ${DEPS})
+add_executable(video_classifier demo/video_classifier.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
+ADD_DEPENDENCIES(video_classifier ext-yaml-cpp)
+target_link_libraries(video_classifier ${DEPS})
+
+add_executable(video_detector demo/video_detector.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
+ADD_DEPENDENCIES(video_detector ext-yaml-cpp)
+target_link_libraries(video_detector ${DEPS})
+
+add_executable(video_segmenter demo/video_segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
+ADD_DEPENDENCIES(video_segmenter ext-yaml-cpp)
+target_link_libraries(video_segmenter ${DEPS})
+
+
if (WIN32 AND WITH_MKL)
add_custom_command(TARGET classifier POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
@@ -313,7 +339,27 @@ if (WIN32 AND WITH_MKL)
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
- COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
+ )
+ add_custom_command(TARGET video_classifier POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
+ )
+ add_custom_command(TARGET video_detector POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
+ )
+ add_custom_command(TARGET video_segmenter POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
)
# for encryption
if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll")
@@ -329,6 +375,18 @@ if (WIN32 AND WITH_MKL)
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
)
+ add_custom_command(TARGET video_classifier POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
+ )
+ add_custom_command(TARGET video_detector POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
+ )
+ add_custom_command(TARGET video_segmenter POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
+ )
endif()
endif()
diff --git a/deploy/cpp/demo/classifier.cpp b/deploy/cpp/demo/classifier.cpp
index db3687492789f47a3bb49643b87f9b946f05137d..cf3bb5ccf64c43ec42d59a9b73fdced6b50b8dc5 100644
--- a/deploy/cpp/demo/classifier.cpp
+++ b/deploy/cpp/demo/classifier.cpp
@@ -37,7 +37,6 @@ DEFINE_int32(batch_size, 1, "Batch size of infering");
DEFINE_int32(thread_num,
omp_get_num_procs(),
"Number of preprocessing threads");
-DEFINE_bool(use_ir_optim, true, "use ir optimization");
int main(int argc, char** argv) {
// Parsing command-line
@@ -52,16 +51,15 @@ int main(int argc, char** argv) {
return -1;
}
- // 加载模型
+ // Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
- FLAGS_key,
- FLAGS_use_ir_optim);
+ FLAGS_key);
- // 进行预测
+ // Predict
int imgs = 1;
if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list);
@@ -69,7 +67,7 @@ int main(int argc, char** argv) {
std::cerr << "Fail to open file " << FLAGS_image_list << std::endl;
return -1;
}
- // 多batch预测
+ // Mini-batch predict
std::string image_path;
std::vector image_paths;
while (getline(inf, image_path)) {
@@ -77,7 +75,7 @@ int main(int argc, char** argv) {
}
imgs = image_paths.size();
for (int i = 0; i < image_paths.size(); i += FLAGS_batch_size) {
- // 读图像
+ // Read image
int im_vec_size =
std::min(static_cast(image_paths.size()), i + FLAGS_batch_size);
std::vector im_vec(im_vec_size - i);
diff --git a/deploy/cpp/demo/detector.cpp b/deploy/cpp/demo/detector.cpp
index 32fbaafddc9cdbcfddf69164197143238bf26ca4..ef7fd782715bef5d9cc1dae43c87ceaa123e914f 100644
--- a/deploy/cpp/demo/detector.cpp
+++ b/deploy/cpp/demo/detector.cpp
@@ -43,10 +43,9 @@ DEFINE_double(threshold,
DEFINE_int32(thread_num,
omp_get_num_procs(),
"Number of preprocessing threads");
-DEFINE_bool(use_ir_optim, true, "use ir optimization");
int main(int argc, char** argv) {
- // 解析命令行参数
+ // Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
@@ -57,17 +56,16 @@ int main(int argc, char** argv) {
std::cerr << "--image or --image_list need to be defined" << std::endl;
return -1;
}
- // 加载模型
+ // Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
- FLAGS_key,
- FLAGS_use_ir_optim);
+ FLAGS_key);
int imgs = 1;
std::string save_dir = "output";
- // 进行预测
+ // Predict
if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list);
if (!inf) {
@@ -92,7 +90,7 @@ int main(int argc, char** argv) {
im_vec[j - i] = std::move(cv::imread(image_paths[j], 1));
}
model.predict(im_vec, &results, thread_num);
- // 输出结果目标框
+ // Output predicted bounding boxes
for (int j = 0; j < im_vec_size - i; ++j) {
for (int k = 0; k < results[j].boxes.size(); ++k) {
std::cout << "image file: " << image_paths[i + j] << ", ";
@@ -106,7 +104,7 @@ int main(int argc, char** argv) {
<< results[j].boxes[k].coordinate[3] << ")" << std::endl;
}
}
- // 可视化
+ // Visualize results
for (int j = 0; j < im_vec_size - i; ++j) {
cv::Mat vis_img = PaddleX::Visualize(
im_vec[j], results[j], model.labels, FLAGS_threshold);
@@ -120,7 +118,7 @@ int main(int argc, char** argv) {
PaddleX::DetResult result;
cv::Mat im = cv::imread(FLAGS_image, 1);
model.predict(im, &result);
- // 输出结果目标框
+ // Output predicted bounding boxes
for (int i = 0; i < result.boxes.size(); ++i) {
std::cout << "image file: " << FLAGS_image << std::endl;
std::cout << ", predict label: " << result.boxes[i].category
@@ -132,7 +130,7 @@ int main(int argc, char** argv) {
<< result.boxes[i].coordinate[3] << ")" << std::endl;
}
- // 可视化
+ // Visualize results
cv::Mat vis_img =
PaddleX::Visualize(im, result, model.labels, FLAGS_threshold);
std::string save_path =
diff --git a/deploy/cpp/demo/segmenter.cpp b/deploy/cpp/demo/segmenter.cpp
index b3b8fad9ac2dce33722c71d9d50d354349298230..d13a328f5beecc90fe9257a4f32ee63a8fe609a5 100644
--- a/deploy/cpp/demo/segmenter.cpp
+++ b/deploy/cpp/demo/segmenter.cpp
@@ -39,10 +39,9 @@ DEFINE_int32(batch_size, 1, "Batch size of infering");
DEFINE_int32(thread_num,
omp_get_num_procs(),
"Number of preprocessing threads");
-DEFINE_bool(use_ir_optim, false, "use ir optimization");
int main(int argc, char** argv) {
- // 解析命令行参数
+ // Parsing command-line
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_model_dir == "") {
@@ -54,16 +53,15 @@ int main(int argc, char** argv) {
return -1;
}
- // 加载模型
+ // Load model
PaddleX::Model model;
model.Init(FLAGS_model_dir,
FLAGS_use_gpu,
FLAGS_use_trt,
FLAGS_gpu_id,
- FLAGS_key,
- FLAGS_use_ir_optim);
+ FLAGS_key);
int imgs = 1;
- // 进行预测
+ // Predict
if (FLAGS_image_list != "") {
std::ifstream inf(FLAGS_image_list);
if (!inf) {
@@ -88,7 +86,7 @@ int main(int argc, char** argv) {
im_vec[j - i] = std::move(cv::imread(image_paths[j], 1));
}
model.predict(im_vec, &results, thread_num);
- // 可视化
+ // Visualize results
for (int j = 0; j < im_vec_size - i; ++j) {
cv::Mat vis_img =
PaddleX::Visualize(im_vec[j], results[j], model.labels);
@@ -102,7 +100,7 @@ int main(int argc, char** argv) {
PaddleX::SegResult result;
cv::Mat im = cv::imread(FLAGS_image, 1);
model.predict(im, &result);
- // 可视化
+ // Visualize results
cv::Mat vis_img = PaddleX::Visualize(im, result, model.labels);
std::string save_path =
PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image);
diff --git a/deploy/cpp/demo/video_classifier.cpp b/deploy/cpp/demo/video_classifier.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..96be867d40800455184b7938dc829e8a0b8f8390
--- /dev/null
+++ b/deploy/cpp/demo/video_classifier.cpp
@@ -0,0 +1,186 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include
+#include
+
+#include
+#include // NOLINT
+#include
+#include
+#include
+#include
+#include
+
+#include "include/paddlex/paddlex.h"
+#include "include/paddlex/visualize.h"
+
+#if defined(__arm__) || defined(__aarch64__)
+#include
+#endif
+
+using namespace std::chrono; // NOLINT
+
+DEFINE_string(model_dir, "", "Path of inference model");
+DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
+DEFINE_bool(use_trt, false, "Infering with TensorRT");
+DEFINE_int32(gpu_id, 0, "GPU card id");
+DEFINE_string(key, "", "key of encryption");
+DEFINE_bool(use_camera, false, "Infering with Camera");
+DEFINE_int32(camera_id, 0, "Camera id");
+DEFINE_string(video_path, "", "Path of input video");
+DEFINE_bool(show_result, false, "show the result of each frame with a window");
+DEFINE_bool(save_result, true, "save the result of each frame to a video");
+DEFINE_string(save_dir, "output", "Path to save visualized image");
+
+int main(int argc, char** argv) {
+ // Parsing command-line
+ google::ParseCommandLineFlags(&argc, &argv, true);
+
+ if (FLAGS_model_dir == "") {
+ std::cerr << "--model_dir need to be defined" << std::endl;
+ return -1;
+ }
+ if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
+ std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
+ return -1;
+ }
+
+ // Load model
+ PaddleX::Model model;
+ model.Init(FLAGS_model_dir,
+ FLAGS_use_gpu,
+ FLAGS_use_trt,
+ FLAGS_gpu_id,
+ FLAGS_key);
+
+ // Open video
+ cv::VideoCapture capture;
+ if (FLAGS_use_camera) {
+ capture.open(FLAGS_camera_id);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the camera "
+ << FLAGS_camera_id << "."
+ << std::endl;
+ return -1;
+ }
+ } else {
+ capture.open(FLAGS_video_path);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the video "
+ << FLAGS_video_path << "."
+ << std::endl;
+ return -1;
+ }
+ }
+
+ // Create a VideoWriter
+ cv::VideoWriter video_out;
+ std::string video_out_path;
+ if (FLAGS_save_result) {
+ // Get video information: resolution, fps
+ int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH));
+ int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
+ int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS));
+ int video_fourcc;
+ if (FLAGS_use_camera) {
+ video_fourcc = 828601953;
+ } else {
+ video_fourcc = static_cast(capture.get(CV_CAP_PROP_FOURCC));
+ }
+
+ if (FLAGS_use_camera) {
+ time_t now = time(0);
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir,
+ std::to_string(now) + ".mp4");
+ } else {
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
+ }
+ video_out.open(video_out_path.c_str(),
+ video_fourcc,
+ video_fps,
+ cv::Size(video_width, video_height),
+ true);
+ if (!video_out.isOpened()) {
+ std::cout << "Create video writer failed!" << std::endl;
+ return -1;
+ }
+ }
+
+ PaddleX::ClsResult result;
+ cv::Mat frame;
+ int key;
+ while (capture.read(frame)) {
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ key = cv::waitKey(1);
+ // When pressing `ESC`, then exit program and result video is saved
+ if (key == 27) {
+ break;
+ }
+ } else if (frame.empty()) {
+ break;
+ }
+ // Begin to predict
+ model.predict(frame, &result);
+ // Visualize results
+ cv::Mat vis_img = frame.clone();
+ auto colormap = PaddleX::GenerateColorMap(model.labels.size());
+ int c1 = colormap[3 * result.category_id + 0];
+ int c2 = colormap[3 * result.category_id + 1];
+ int c3 = colormap[3 * result.category_id + 2];
+ cv::Scalar text_color = cv::Scalar(c1, c2, c3);
+ std::string text = result.category;
+ text += std::to_string(static_cast(result.score * 100)) + "%";
+ int font_face = cv::FONT_HERSHEY_SIMPLEX;
+ double font_scale = 0.5f;
+ float thickness = 0.5;
+ cv::Size text_size =
+ cv::getTextSize(text, font_face, font_scale, thickness, nullptr);
+ cv::Point origin;
+ origin.x = frame.cols / 2;
+ origin.y = frame.rows / 2;
+ cv::Rect text_back = cv::Rect(origin.x,
+ origin.y - text_size.height,
+ text_size.width,
+ text_size.height);
+ cv::rectangle(vis_img, text_back, text_color, -1);
+ cv::putText(vis_img,
+ text,
+ origin,
+ font_face,
+ font_scale,
+ cv::Scalar(255, 255, 255),
+ thickness);
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::imshow("video_classifier", vis_img);
+ }
+ if (FLAGS_save_result) {
+ video_out.write(vis_img);
+ }
+ std::cout << "Predict label: " << result.category
+ << ", label_id:" << result.category_id
+ << ", score: " << result.score << std::endl;
+ }
+ capture.release();
+ if (FLAGS_save_result) {
+ video_out.release();
+ std::cout << "Visualized output saved as " << video_out_path << std::endl;
+ }
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::destroyAllWindows();
+ }
+ return 0;
+}
diff --git a/deploy/cpp/demo/video_detector.cpp b/deploy/cpp/demo/video_detector.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..ee4d5bdb138d03020042e60d41ded0ca1efde46d
--- /dev/null
+++ b/deploy/cpp/demo/video_detector.cpp
@@ -0,0 +1,159 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include
+#include
+
+#include
+#include // NOLINT
+#include
+#include
+#include
+#include
+#include
+
+#include "include/paddlex/paddlex.h"
+#include "include/paddlex/visualize.h"
+
+#if defined(__arm__) || defined(__aarch64__)
+#include
+#endif
+
+using namespace std::chrono; // NOLINT
+
+DEFINE_string(model_dir, "", "Path of inference model");
+DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
+DEFINE_bool(use_trt, false, "Infering with TensorRT");
+DEFINE_int32(gpu_id, 0, "GPU card id");
+DEFINE_bool(use_camera, false, "Infering with Camera");
+DEFINE_int32(camera_id, 0, "Camera id");
+DEFINE_string(video_path, "", "Path of input video");
+DEFINE_bool(show_result, false, "show the result of each frame with a window");
+DEFINE_bool(save_result, true, "save the result of each frame to a video");
+DEFINE_string(key, "", "key of encryption");
+DEFINE_string(save_dir, "output", "Path to save visualized image");
+DEFINE_double(threshold,
+ 0.5,
+ "The minimum scores of target boxes which are shown");
+
+int main(int argc, char** argv) {
+ // Parsing command-line
+ google::ParseCommandLineFlags(&argc, &argv, true);
+
+ if (FLAGS_model_dir == "") {
+ std::cerr << "--model_dir need to be defined" << std::endl;
+ return -1;
+ }
+ if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
+ std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
+ return -1;
+ }
+ // Load model
+ PaddleX::Model model;
+ model.Init(FLAGS_model_dir,
+ FLAGS_use_gpu,
+ FLAGS_use_trt,
+ FLAGS_gpu_id,
+ FLAGS_key);
+ // Open video
+ cv::VideoCapture capture;
+ if (FLAGS_use_camera) {
+ capture.open(FLAGS_camera_id);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the camera "
+ << FLAGS_camera_id << "."
+ << std::endl;
+ return -1;
+ }
+ } else {
+ capture.open(FLAGS_video_path);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the video "
+ << FLAGS_video_path << "."
+ << std::endl;
+ return -1;
+ }
+ }
+
+ // Create a VideoWriter
+ cv::VideoWriter video_out;
+ std::string video_out_path;
+ if (FLAGS_save_result) {
+ // Get video information: resolution, fps
+ int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH));
+ int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
+ int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS));
+ int video_fourcc;
+ if (FLAGS_use_camera) {
+ video_fourcc = 828601953;
+ } else {
+ video_fourcc = static_cast(capture.get(CV_CAP_PROP_FOURCC));
+ }
+
+ if (FLAGS_use_camera) {
+ time_t now = time(0);
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir,
+ std::to_string(now) + ".mp4");
+ } else {
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
+ }
+ video_out.open(video_out_path.c_str(),
+ video_fourcc,
+ video_fps,
+ cv::Size(video_width, video_height),
+ true);
+ if (!video_out.isOpened()) {
+ std::cout << "Create video writer failed!" << std::endl;
+ return -1;
+ }
+ }
+
+ PaddleX::DetResult result;
+ cv::Mat frame;
+ int key;
+ while (capture.read(frame)) {
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ key = cv::waitKey(1);
+ // When pressing `ESC`, then exit program and result video is saved
+ if (key == 27) {
+ break;
+ }
+ } else if (frame.empty()) {
+ break;
+ }
+ // Begin to predict
+ model.predict(frame, &result);
+ // Visualize results
+ cv::Mat vis_img =
+ PaddleX::Visualize(frame, result, model.labels, FLAGS_threshold);
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::imshow("video_detector", vis_img);
+ }
+ if (FLAGS_save_result) {
+ video_out.write(vis_img);
+ }
+ result.clear();
+ }
+ capture.release();
+ if (FLAGS_save_result) {
+ std::cout << "Visualized output saved as " << video_out_path << std::endl;
+ video_out.release();
+ }
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::destroyAllWindows();
+ }
+ return 0;
+}
diff --git a/deploy/cpp/demo/video_segmenter.cpp b/deploy/cpp/demo/video_segmenter.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6a835117cd1434b5f26e0fb660e6fe07ef56e607
--- /dev/null
+++ b/deploy/cpp/demo/video_segmenter.cpp
@@ -0,0 +1,157 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include
+#include
+
+#include
+#include // NOLINT
+#include
+#include
+#include
+#include
+#include
+#include
+#include "include/paddlex/paddlex.h"
+#include "include/paddlex/visualize.h"
+
+#if defined(__arm__) || defined(__aarch64__)
+#include
+#endif
+
+using namespace std::chrono; // NOLINT
+
+DEFINE_string(model_dir, "", "Path of inference model");
+DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
+DEFINE_bool(use_trt, false, "Infering with TensorRT");
+DEFINE_int32(gpu_id, 0, "GPU card id");
+DEFINE_string(key, "", "key of encryption");
+DEFINE_bool(use_camera, false, "Infering with Camera");
+DEFINE_int32(camera_id, 0, "Camera id");
+DEFINE_string(video_path, "", "Path of input video");
+DEFINE_bool(show_result, false, "show the result of each frame with a window");
+DEFINE_bool(save_result, true, "save the result of each frame to a video");
+DEFINE_string(save_dir, "output", "Path to save visualized image");
+
+int main(int argc, char** argv) {
+ // Parsing command-line
+ google::ParseCommandLineFlags(&argc, &argv, true);
+
+ if (FLAGS_model_dir == "") {
+ std::cerr << "--model_dir need to be defined" << std::endl;
+ return -1;
+ }
+ if (FLAGS_video_path == "" & FLAGS_use_camera == false) {
+ std::cerr << "--video_path or --use_camera need to be defined" << std::endl;
+ return -1;
+ }
+
+ // Load model
+ PaddleX::Model model;
+ model.Init(FLAGS_model_dir,
+ FLAGS_use_gpu,
+ FLAGS_use_trt,
+ FLAGS_gpu_id,
+ FLAGS_key);
+ // Open video
+ cv::VideoCapture capture;
+ if (FLAGS_use_camera) {
+ capture.open(FLAGS_camera_id);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the camera "
+ << FLAGS_camera_id << "."
+ << std::endl;
+ return -1;
+ }
+ } else {
+ capture.open(FLAGS_video_path);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the video "
+ << FLAGS_video_path << "."
+ << std::endl;
+ return -1;
+ }
+ }
+
+
+ // Create a VideoWriter
+ cv::VideoWriter video_out;
+ std::string video_out_path;
+ if (FLAGS_save_result) {
+ // Get video information: resolution, fps
+ int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH));
+ int video_height = static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
+ int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS));
+ int video_fourcc;
+ if (FLAGS_use_camera) {
+ video_fourcc = 828601953;
+ } else {
+ video_fourcc = static_cast(capture.get(CV_CAP_PROP_FOURCC));
+ }
+
+ if (FLAGS_use_camera) {
+ time_t now = time(0);
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir,
+ std::to_string(now) + ".mp4");
+ } else {
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
+ }
+ video_out.open(video_out_path.c_str(),
+ video_fourcc,
+ video_fps,
+ cv::Size(video_width, video_height),
+ true);
+ if (!video_out.isOpened()) {
+ std::cout << "Create video writer failed!" << std::endl;
+ return -1;
+ }
+ }
+
+ PaddleX::SegResult result;
+ cv::Mat frame;
+ int key;
+ while (capture.read(frame)) {
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ key = cv::waitKey(1);
+ // When pressing `ESC`, then exit program and result video is saved
+ if (key == 27) {
+ break;
+ }
+ } else if (frame.empty()) {
+ break;
+ }
+ // Begin to predict
+ model.predict(frame, &result);
+ // Visualize results
+ cv::Mat vis_img = PaddleX::Visualize(frame, result, model.labels);
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::imshow("video_segmenter", vis_img);
+ }
+ if (FLAGS_save_result) {
+ video_out.write(vis_img);
+ }
+ result.clear();
+ }
+ capture.release();
+ if (FLAGS_save_result) {
+ video_out.release();
+ std::cout << "Visualized output saved as " << video_out_path << std::endl;
+ }
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::destroyAllWindows();
+ }
+ return 0;
+}
diff --git a/deploy/cpp/include/paddlex/visualize.h b/deploy/cpp/include/paddlex/visualize.h
index 9b80ca367bc8e45334c951cb6dd32069c67c9dbd..873cea10ad5f725a4a4c477559de0b659f94a7b5 100644
--- a/deploy/cpp/include/paddlex/visualize.h
+++ b/deploy/cpp/include/paddlex/visualize.h
@@ -23,9 +23,9 @@
#else // Linux/Unix
#include
// #include
-#ifdef __arm__ // for arm
-#include
-#include
+#if defined(__arm__) || defined(__aarch64__) // for arm
+#include
+#include
#else
#include
#include
diff --git a/deploy/cpp/scripts/bootstrap.sh b/deploy/cpp/scripts/bootstrap.sh
index 283d75928a68a507d852ec61eb89e115e581146f..bb9756204e9e610365f67aa37dc78d1b5eaf80b8 100644
--- a/deploy/cpp/scripts/bootstrap.sh
+++ b/deploy/cpp/scripts/bootstrap.sh
@@ -7,12 +7,12 @@ if [ ! -d "./paddlex-encryption" ]; then
fi
# download pre-compiled opencv lib
-OPENCV_URL=https://paddleseg.bj.bcebos.com/deploy/docker/opencv3gcc4.8.tar.bz2
-if [ ! -d "./deps/opencv3gcc4.8" ]; then
+OPENCV_URL=https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2
+if [ ! -d "./deps/opencv3.4.6gcc4.8ffmpeg/" ]; then
mkdir -p deps
cd deps
wget -c ${OPENCV_URL}
- tar xvfj opencv3gcc4.8.tar.bz2
- rm -rf opencv3gcc4.8.tar.bz2
+ tar xvfj opencv3.4.6gcc4.8ffmpeg.tar.gz2
+ rm -rf opencv3.4.6gcc4.8ffmpeg.tar.gz2
cd ..
fi
diff --git a/deploy/cpp/scripts/build.sh b/deploy/cpp/scripts/build.sh
index e87d7bf4797f1833d88379df0587733958639b06..6d6ad25b24170a27639f9b1d651888c4027dbeed 100644
--- a/deploy/cpp/scripts/build.sh
+++ b/deploy/cpp/scripts/build.sh
@@ -24,7 +24,7 @@ ENCRYPTION_DIR=$(pwd)/paddlex-encryption
# OPENCV 路径, 如果使用自带预编译版本可不修改
sh $(pwd)/scripts/bootstrap.sh # 下载预编译版本的opencv
-OPENCV_DIR=$(pwd)/deps/opencv3gcc4.8/
+OPENCV_DIR=$(pwd)/deps/opencv3.4.6gcc4.8ffmpeg/
# 以下无需改动
rm -rf build
@@ -42,4 +42,4 @@ cmake .. \
-DCUDNN_LIB=${CUDNN_LIB} \
-DENCRYPTION_DIR=${ENCRYPTION_DIR} \
-DOPENCV_DIR=${OPENCV_DIR}
-make
+make -j16
diff --git a/deploy/cpp/scripts/jetson_bootstrap.sh b/deploy/cpp/scripts/jetson_bootstrap.sh
deleted file mode 100644
index ebd95d0f20439674bbae2628ab7f8d89b7b4beca..0000000000000000000000000000000000000000
--- a/deploy/cpp/scripts/jetson_bootstrap.sh
+++ /dev/null
@@ -1,10 +0,0 @@
-# download pre-compiled opencv lib
-OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/tools/opencv3_aarch.tgz
-if [ ! -d "./deps/opencv3" ]; then
- mkdir -p deps
- cd deps
- wget -c ${OPENCV_URL}
- tar xvfz opencv3_aarch.tgz
- rm -rf opencv3_aarch.tgz
- cd ..
-fi
diff --git a/deploy/cpp/scripts/jetson_build.sh b/deploy/cpp/scripts/jetson_build.sh
index 95bec3cac95be5cf686d63ec5b0f49f62e706586..bb2957e351900872189773eeaa41a75d36ec3471 100644
--- a/deploy/cpp/scripts/jetson_build.sh
+++ b/deploy/cpp/scripts/jetson_build.sh
@@ -14,14 +14,7 @@ WITH_STATIC_LIB=OFF
# CUDA 的 lib 路径
CUDA_LIB=/usr/local/cuda/lib64
# CUDNN 的 lib 路径
-CUDNN_LIB=/usr/local/cuda/lib64
-
-# 是否加载加密后的模型
-WITH_ENCRYPTION=OFF
-
-# OPENCV 路径, 如果使用自带预编译版本可不修改
-sh $(pwd)/scripts/jetson_bootstrap.sh # 下载预编译版本的opencv
-OPENCV_DIR=$(pwd)/deps/opencv3
+CUDNN_LIB=/usr/lib/aarch64-linux-gnu
# 以下无需改动
rm -rf build
@@ -31,12 +24,9 @@ cmake .. \
-DWITH_GPU=${WITH_GPU} \
-DWITH_MKL=${WITH_MKL} \
-DWITH_TENSORRT=${WITH_TENSORRT} \
- -DWITH_ENCRYPTION=${WITH_ENCRYPTION} \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DPADDLE_DIR=${PADDLE_DIR} \
-DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
- -DCUDNN_LIB=${CUDNN_LIB} \
- -DENCRYPTION_DIR=${ENCRYPTION_DIR} \
- -DOPENCV_DIR=${OPENCV_DIR}
+ -DCUDNN_LIB=${CUDNN_LIB}
make
diff --git a/deploy/cpp/src/paddlex.cpp b/deploy/cpp/src/paddlex.cpp
index 1bd30863e894910581384296edd2f656b79ffe21..47dc5b9e9e9104e2d4983a8ac077e5a0810610cf 100644
--- a/deploy/cpp/src/paddlex.cpp
+++ b/deploy/cpp/src/paddlex.cpp
@@ -65,7 +65,11 @@ void Model::create_predictor(const std::string& model_dir,
config.SwitchUseFeedFetchOps(false);
config.SwitchSpecifyInputNames(true);
// 开启图优化
+#if defined(__arm__) || defined(__aarch64__)
+ config.SwitchIrOptim(false);
+#else
config.SwitchIrOptim(use_ir_optim);
+#endif
// 开启内存优化
config.EnableMemoryOptim();
if (use_trt) {
diff --git a/deploy/cpp/src/transforms.cpp b/deploy/cpp/src/transforms.cpp
index 626b2053d2473bcf66fcb1a760d9ce2e101324f4..f623fc664e9d66002e0eb0065d034d90965eddf7 100644
--- a/deploy/cpp/src/transforms.cpp
+++ b/deploy/cpp/src/transforms.cpp
@@ -15,6 +15,7 @@
#include
#include
#include
+#include
#include "include/paddlex/transforms.h"
@@ -60,8 +61,8 @@ bool ResizeByShort::Run(cv::Mat* im, ImageBlob* data) {
data->reshape_order_.push_back("resize");
float scale = GenerateScale(*im);
- int width = static_cast(scale * im->cols);
- int height = static_cast(scale * im->rows);
+ int width = static_cast(round(scale * im->cols));
+ int height = static_cast(round(scale * im->rows));
cv::resize(*im, *im, cv::Size(width, height), 0, 0, cv::INTER_LINEAR);
data->new_im_size_[0] = im->rows;
diff --git a/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java b/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java
index 940ebaa234db2e34faa2daaf74dfacc0e9d131fe..d88ec4bfa7017fede63ffccc154bcf4a34a8a878 100644
--- a/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java
+++ b/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java
@@ -23,6 +23,7 @@ import org.opencv.core.Scalar;
import org.opencv.core.Size;
import org.opencv.imgproc.Imgproc;
import java.util.ArrayList;
+import java.util.Date;
import java.util.HashMap;
import java.util.List;
@@ -101,6 +102,15 @@ public class Transforms {
if (info.containsKey("coarsest_stride")) {
padding.coarsest_stride = (int) info.get("coarsest_stride");
}
+ if (info.containsKey("im_padding_value")) {
+ List im_padding_value = (List) info.get("im_padding_value");
+ if (im_padding_value.size()!=3){
+ Log.e(TAG, "len of im_padding_value in padding must == 3.");
+ }
+ for (int k =0; i> reverseReshapeInfo = new ArrayList>(imageBlob.getReshapeInfo().entrySet()).listIterator(imageBlob.getReshapeInfo().size());
while (reverseReshapeInfo.hasPrevious()) {
Map.Entry entry = reverseReshapeInfo.previous();
@@ -135,10 +138,7 @@ public class Visualize {
Size sz = new Size(entry.getValue()[0], entry.getValue()[1]);
Imgproc.resize(mask, mask, sz,0,0,Imgproc.INTER_LINEAR);
}
- Log.i(TAG, "postprocess operator: " + entry.getKey());
- Log.i(TAG, "shape:: " + String.valueOf(mask.width()) + ","+ String.valueOf(mask.height()));
}
-
Mat dst = new Mat();
List listMat = Arrays.asList(visualizeMat, mask);
Core.merge(listMat, dst);
diff --git a/deploy/lite/export_lite.py b/deploy/lite/export_lite.py
index 85276c8b59b1994712fb66d061bbdfa10359e251..c75c49a0829dbb375aada2dfeac0991142022a08 100644
--- a/deploy/lite/export_lite.py
+++ b/deploy/lite/export_lite.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
diff --git a/docs/README.md b/docs/README.md
index df45f14400cac1d6816e9a55ad92dc59ba141650..4c1c6a00b7866487b4d9c53c0a9139f4e38de730 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,6 +1,6 @@
# PaddleX文档
-PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/latest/index.html)进行查阅。
+PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/develop/index.html)进行查阅。
## 编译文档
在本目录下按如下步骤进行文档编译
diff --git a/docs/apis/datasets.md b/docs/apis/datasets.md
index 94b8cbc7650ba89daea569cef817d41f4a0f8a73..1107d03c8fd946820118c20b33e5f736fac654bb 100644
--- a/docs/apis/datasets.md
+++ b/docs/apis/datasets.md
@@ -5,7 +5,7 @@
```
paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=8, parallel_method='process', shuffle=False)
```
-读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html)
+读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../data/format/classification.md)
示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/image_classification/mobilenetv2.py)
@@ -26,7 +26,7 @@ paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_
paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='process', shuffle=False)
```
-> 读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html)
+> 读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../data/format/detection.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py)
@@ -47,7 +47,7 @@ paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None,
paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False)
```
-> 读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html)
+> 读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../data/format/instance_segmentation.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py)
@@ -67,7 +67,7 @@ paddlex.datasets.CocoDetection(data_dir, ann_file, transforms=None, num_workers=
paddlex.datasets.SegDataset(data_dir, file_list, label_list, transforms=None, num_workers='auto', buffer_size=100, parallel_method='process', shuffle=False)
```
-> 读取语义分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../data/format/index.html)
+> 读取语义分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../data/format/segmentation.md)
> 示例:[代码文件](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py)
diff --git a/docs/apis/deploy.md b/docs/apis/deploy.md
index dd6812452b09b54fd2cf8def2f0085d3dff603d4..3f924ebee2893cfca77cad459f4cb9c7a6b2acb1 100644
--- a/docs/apis/deploy.md
+++ b/docs/apis/deploy.md
@@ -7,7 +7,7 @@
图像分类、目标检测、实例分割、语义分割统一的预测器,实现高性能预测。
```
-paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_trt=False, use_glog=False, memory_optimize=True)
+paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, mkl_thread_num=4, use_trt=False, use_glog=False, memory_optimize=True)
```
**参数**
@@ -16,6 +16,7 @@ paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_
> * **use_gpu** (bool): 是否使用GPU进行预测。
> * **gpu_id** (int): 使用的GPU序列号。
> * **use_mkl** (bool): 是否使用mkldnn加速库。
+> * **mkl_thread_num** (int): 使用mkldnn加速库时的线程数,默认为4
> * **use_trt** (boll): 是否使用TensorRT预测引擎。
> * **use_glog** (bool): 是否打印中间日志。
> * **memory_optimize** (bool): 是否优化内存使用。
@@ -40,7 +41,7 @@ predict(image, topk=1)
> **参数**
>
> > * **image** (str|np.ndarray): 待预测的图片路径或numpy数组(HWC排列,BGR格式)。
-> > * **topk** (int): 图像分类时使用的参数,表示预测前topk个可能的分类
+> > * **topk** (int): 图像分类时使用的参数,表示预测前topk个可能的分类。
### batch_predict 接口
```
diff --git a/docs/apis/models/detection.md b/docs/apis/models/detection.md
index 8a8e6a752a3595bff702f6b88b8f4a7ead161806..b3873ce5eba6516c4296d13d2d99510d5e3e6e45 100755
--- a/docs/apis/models/detection.md
+++ b/docs/apis/models/detection.md
@@ -1,5 +1,129 @@
# Object Detection
+## paddlex.det.PPYOLO
+
+```python
+paddlex.det.PPYOLO(num_classes=80, backbone='ResNet50_vd_ssld', with_dcn_v2=True, anchors=None, anchor_masks=None, use_coord_conv=True, use_iou_aware=True, use_spp=True, use_drop_block=True, scale_x_y=1.05, ignore_threshold=0.7, label_smooth=False, use_iou_loss=True, use_matrix_nms=True, nms_score_threshold=0.01, nms_topk=1000, nms_keep_topk=100, nms_iou_threshold=0.45, train_random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608])
+```
+
+> 构建PPYOLO检测器。**注意在PPYOLO,num_classes不需要包含背景类,如目标包括human、dog两种,则num_classes设为2即可,这里与FasterRCNN/MaskRCNN有差别**
+
+> **参数**
+>
+> > - **num_classes** (int): 类别数。默认为80。
+> > - **backbone** (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd_ssld']。默认为'ResNet50_vd_ssld'。
+> > - **with_dcn_v2** (bool): Backbone是否使用DCNv2结构。默认为True。
+> > - **anchors** (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
+> > [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+> [59, 119], [116, 90], [156, 198], [373, 326]]。
+> > - **anchor_masks** (list|tuple): 在计算PPYOLO损失时,使用anchor的mask索引,为None时表示使用默认值
+> > [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
+> > - **use_coord_conv** (bool): 是否使用CoordConv。默认值为True。
+> > - **use_iou_aware** (bool): 是否使用IoU Aware分支。默认值为True。
+> > - **use_spp** (bool): 是否使用Spatial Pyramid Pooling结构。默认值为True。
+> > - **use_drop_block** (bool): 是否使用Drop Block。默认值为True。
+> > - **scale_x_y** (float): 调整中心点位置时的系数因子。默认值为1.05。
+> > - **use_iou_loss** (bool): 是否使用IoU loss。默认值为True。
+> > - **use_matrix_nms** (bool): 是否使用Matrix NMS。默认值为True。
+> > - **ignore_threshold** (float): 在计算PPYOLO损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
+> > - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
+> > - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
+> > - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
+> > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+> > - **label_smooth** (bool): 是否使用label smooth。默认值为False。
+> > - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
+
+### train
+
+```python
+train(self, num_epochs, train_dataset, train_batch_size=8, eval_dataset=None, save_interval_epochs=20, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/8000, warmup_steps=1000, warmup_start_lr=0.0, lr_decay_epochs=[213, 240], lr_decay_gamma=0.1, metric=None, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05, early_stop=False, early_stop_patience=5, resume_checkpoint=None, use_ema=True, ema_decay=0.9998)
+```
+
+> PPYOLO模型的训练接口,函数内置了`piecewise`学习率衰减策略和`momentum`优化器。
+
+> **参数**
+>
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认值为8。
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为20。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认值为'output'。
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为字符串'COCO',则自动下载在COCO数据集上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+> > - **learning_rate** (float): 默认优化器的学习率。默认为1.0/8000。
+> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为1000。
+> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为0.0。
+> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
+> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
+> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在PascalVOC数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
+> > - **early_stop** (bool): 是否使用提前终止训练策略。默认值为False。
+> > - **early_stop_patience** (int): 当使用提前终止训练策略时,如果验证集精度在`early_stop_patience`个epoch内连续下降或持平,则终止训练。默认值为5。
+> > - **resume_checkpoint** (str): 恢复训练时指定上次训练保存的模型路径。若为None,则不会恢复训练。默认值为None。
+> > - **use_ema** (bool): 是否使用指数衰减计算参数的滑动平均值。默认值为True。
+> > - **ema_decay** (float): 指数衰减率。默认值为0.9998。
+
+### evaluate
+
+```python
+evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
+```
+
+> PPYOLO模型的评估接口,模型评估后会返回在验证集上的指标`box_map`(metric指定为'VOC'时)或`box_mmap`(metric指定为`COCO`时)。
+
+> **参数**
+>
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **batch_size** (int): 验证数据批大小。默认为1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC';如为COCODetection,则`metric`为'COCO'默认为None, 如为EasyData类型数据集,同时也会使用'VOC'。
+> > - **return_details** (bool): 是否返回详细信息。默认值为False。
+> >
+> **返回值**
+>
+> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+
+### predict
+
+```python
+predict(self, img_file, transforms=None)
+```
+
+> PPYOLO模型预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`predict`接口时,用户需要再重新定义`test_transforms`传入给`predict`接口
+
+> **参数**
+>
+> > - **img_file** (str|np.ndarray): 预测图像路径或numpy数组(HWC排列,BGR格式)。
+> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
+>
+> **返回值**
+>
+> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key包括'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
+
+
+### batch_predict
+
+```python
+batch_predict(self, img_file_list, transforms=None, thread_num=2)
+```
+
+> PPYOLO模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义`test_transforms`传入给`batch_predict`接口
+
+> **参数**
+>
+> > - **img_file_list** (str|np.ndarray): 对列表(或元组)中的图像同时进行预测,列表中的元素是预测图像路径或numpy数组(HWC排列,BGR格式)。
+> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
+> > - **thread_num** (int): 并发执行各图像预处理时的线程数。
+>
+> **返回值**
+>
+> > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,key包括'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
+
+
## paddlex.det.YOLOv3
```python
@@ -21,7 +145,7 @@ paddlex.det.YOLOv3(num_classes=80, backbone='MobileNetV1', anchors=None, anchor_
> > - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
> > - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
> > - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
-> > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+> > - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IoU的阈值。默认为0.45。
> > - **label_smooth** (bool): 是否使用label smooth。默认值为False。
> > - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
diff --git a/docs/apis/models/instance_segmentation.md b/docs/apis/models/instance_segmentation.md
index a054aaeb565f4f5dff2a56c00a3c72e39205a89b..494cde32a1888897b5771e6d94d8691d6ff79ce8 100755
--- a/docs/apis/models/instance_segmentation.md
+++ b/docs/apis/models/instance_segmentation.md
@@ -101,4 +101,4 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2)
>
> **返回值**
>
-> > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,key'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度。其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。Mask信息为原图大小的二值图,1表示像素点属于预测类别,0表示像素点是背景。
+> > - **list**: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个元素均为一个dict,包含关键字:'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度。其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。Mask信息为原图大小的二值图,1表示像素点属于预测类别,0表示像素点是背景。
diff --git a/docs/apis/models/semantic_segmentation.md b/docs/apis/models/semantic_segmentation.md
index bcd41a2fb24cb5d547a42e53d87678f539bc4bcc..82b758d98f243e6f653c5e8d39d181b45e150587 100755
--- a/docs/apis/models/semantic_segmentation.md
+++ b/docs/apis/models/semantic_segmentation.md
@@ -3,7 +3,7 @@
## paddlex.seg.DeepLabv3p
```python
-paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
+paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None)
```
@@ -12,7 +12,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride
> **参数**
> > - **num_classes** (int): 类别数。
-> > - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0'],默认值为'MobileNetV2_x1.0'。
+> > - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld'],默认值为'MobileNetV2_x1.0'。
> > - **output_stride** (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
> > - **aspp_with_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
> > - **decoder_use_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
@@ -22,6 +22,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride
> > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,当`use_bce_loss`和`use_dice_loss`都为False时,使用交叉熵损失函数。默认False。
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
+> > - **pooling_crop_size** (int):当backbone为`MobileNetV3_large_x1_0_ssld`时,需设置为训练过程中模型输入大小,格式为[W, H]。例如模型输入大小为[512, 512], 则`pooling_crop_size`应该设置为[512, 512]。在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用`avg_pool`算子得到平均值。默认值None。
### train
@@ -69,7 +70,7 @@ evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
> **返回值**
> >
> > - **dict**: 当`return_details`为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
-> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
+> > 'category_acc'和'kappa',分别表示平均IoU、各类别IoU、平均准确率、各类别准确率和kappa系数。
> > - **tuple** (metrics, eval_details):当`return_details`为True时,增加返回dict (eval_details),
> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
diff --git a/docs/apis/slim.md b/docs/apis/slim.md
index 894d83ad7e05ced84ea334a6065a0838205afb6c..a0a99b984b8e698a59bd192a6e0a6889a8281311 100755
--- a/docs/apis/slim.md
+++ b/docs/apis/slim.md
@@ -26,16 +26,16 @@ paddlex.slim.cal_params_sensitivities(model, save_file, eval_dataset, batch_size
```
paddlex.slim.export_quant_model(model, test_dataset, batch_size=2, batch_num=10, save_dir='./quant_model', cache_dir='./temp')
```
-导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size`和`batch_num`。量化过程中会以数量为`batch_size` X `batch_num`的样本数据的计算结果为统计信息完成模型的量化。
+导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size`和`batch_num`。量化过程中会以数量为`batch_size` * `batch_num`的样本数据的计算结果为统计信息完成模型的量化。
**参数**
* **model**(paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
-* **test_dataset**(paddlex.dataset): 测试数据集
-* **batch_size**(int): 进行前向计算时的批数据大小
-* **batch_num**(int): 进行向前计算时批数据数量
-* **save_dir**(str): 量化后模型的保存目录
-* **cache_dir**(str): 量化过程中的统计数据临时存储目录
+* **test_dataset**(paddlex.dataset): 测试数据集。
+* **batch_size**(int): 进行前向计算时的批数据大小。
+* **batch_num**(int): 进行向前计算时批数据数量。
+* **save_dir**(str): 量化后模型的保存目录。
+* **cache_dir**(str): 量化过程中的统计数据临时存储目录。
**使用示例**
diff --git a/docs/apis/transforms/augment.md b/docs/apis/transforms/augment.md
index 7b7fb5f908df7e4c69c1a5e6a92662da0894e2e5..ec221f4a8b596d2a47d5c5e23a6333b81f7fb0f7 100644
--- a/docs/apis/transforms/augment.md
+++ b/docs/apis/transforms/augment.md
@@ -10,11 +10,11 @@ PaddleX对于图像分类、目标检测、实例分割和语义分割内置了
| :------- | :------------|
| 图像分类 | [RandomCrop](cls_transforms.html#randomcrop)、[RandomHorizontalFlip](cls_transforms.html#randomhorizontalflip)、[RandomVerticalFlip](cls_transforms.html#randomverticalflip)、
[RandomRotate](cls_transforms.html#randomratate)、 [RandomDistort](cls_transforms.html#randomdistort) |
|目标检测
实例分割| [RandomHorizontalFlip](det_transforms.html#randomhorizontalflip)、[RandomDistort](det_transforms.html#randomdistort)、[RandomCrop](det_transforms.html#randomcrop)、
[MixupImage](det_transforms.html#mixupimage)(仅支持YOLOv3模型)、[RandomExpand](det_transforms.html#randomexpand) |
-|语义分割 | [RandomHorizontalFlip](seg_transforms.html#randomhorizontalflip)、[RandomVerticalFlip](seg_transforms.html#randomverticalflip)、[RandomRangeScaling](seg_transforms.html#randomrangescaling)、
[RandomStepScaling](seg_transforms.html#randomstepscaling)、[RandomPaddingCrop](seg_transforms.html#randompaddingcrop)、 [RandomBlur](seg_transforms.html#randomblur)、
[RandomRotate](seg_transforms.html#randomrotate)、[RandomScaleAspect](seg_transforms.html#randomscaleaspect)、[RandomDistort](seg_transforms.html#randomdistort) |
+|语义分割 | [RandomHorizontalFlip](seg_transforms.html#randomhorizontalflip)、[RandomVerticalFlip](seg_transforms.html#randomverticalflip)、[ResizeRangeScaling](seg_transforms.html#resizerangescaling)、
[ResizeStepScaling](seg_transforms.html#resizestepscaling)、[RandomPaddingCrop](seg_transforms.html#randompaddingcrop)、 [RandomBlur](seg_transforms.html#randomblur)、
[RandomRotate](seg_transforms.html#randomrotate)、[RandomScaleAspect](seg_transforms.html#randomscaleaspect)、[RandomDistort](seg_transforms.html#randomdistort) |
## imgaug增强库的支持
-PaddleX目前已适配imgaug图像增强库,用户可以直接在PaddleX构造`transforms`时,调用imgaug的方法, 如下示例
+PaddleX目前已适配imgaug图像增强库,用户可以直接在PaddleX构造`transforms`时,调用imgaug的方法,如下示例,
```
import paddlex as pdx
from paddlex.cls import transforms
diff --git a/docs/apis/transforms/seg_transforms.md b/docs/apis/transforms/seg_transforms.md
index cb24c5869238342fcb3fab0e5b259da8d63da741..f353a8f4436e2793cb4cc7a4c9a086ad4883a87f 100755
--- a/docs/apis/transforms/seg_transforms.md
+++ b/docs/apis/transforms/seg_transforms.md
@@ -16,7 +16,7 @@ paddlex.seg.transforms.Compose(transforms)
```python
paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
```
-以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。
+以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机水平翻转的概率。默认值为0.5。
@@ -25,7 +25,7 @@ paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
```python
paddlex.seg.transforms.RandomVerticalFlip(prob=0.1)
```
-以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。
+以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机垂直翻转的概率。默认值为0.1。
@@ -59,7 +59,7 @@ paddlex.seg.transforms.ResizeByLong(long_size)
```python
paddlex.seg.transforms.ResizeRangeScaling(min_value=400, max_value=600)
```
-对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。
+对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。
### 参数
* **min_value** (int): 图像长边resize后的最小值。默认值400。
* **max_value** (int): 图像长边resize后的最大值。默认值600。
@@ -124,7 +124,7 @@ paddlex.seg.transforms.RandomBlur(prob=0.1)
```python
paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
```
-对图像进行随机旋转, 模型训练时的数据增强操作。
+对图像进行随机旋转,模型训练时的数据增强操作。
在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
并对旋转后的图像和标注图像进行相应的padding。
@@ -138,7 +138,7 @@ paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 12
```python
paddlex.seg.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33)
```
-裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。
+裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
### 参数
diff --git a/docs/apis/visualize.md b/docs/apis/visualize.md
index cc9a449245aa45078983044b7b11ed35a122f6de..e9570901f79f6ee3eb28dad3cc5fc88fb86fbffe 100755
--- a/docs/apis/visualize.md
+++ b/docs/apis/visualize.md
@@ -131,7 +131,7 @@ paddlex.transforms.visualize(dataset,
对数据预处理/增强中间结果进行可视化。
可使用VisualDL查看中间结果:
1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001
-2. 浏览器打开 https://0.0.0.0:8001即可,
+2. 浏览器打开 https://0.0.0.0:8001 即可,
其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
### 参数
diff --git a/docs/appendix/model_zoo.md b/docs/appendix/model_zoo.md
index 811c6f745fba0de095a84f7a1b5ae0b1d526b6ec..4c2f3911f2d8afa95bfe0009cf37212d24d43065 100644
--- a/docs/appendix/model_zoo.md
+++ b/docs/appendix/model_zoo.md
@@ -45,6 +45,7 @@
|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 |
|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 |
|[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 |
+|[PPYOLO](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | 329.1MB | - |45.9 |
|[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 |
|[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 |
|[YOLOv3-MobileNetV3_large](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams)|100.7MB | 143.322 | 31.6 |
@@ -80,6 +81,7 @@
| 模型 | 模型大小 | 预测时间(毫秒) | mIoU(%) |
|:-------|:-----------|:-------------|:----------|
+| [DeepLabv3_MobileNetV3_large_x1_0_ssld](https://paddleseg.bj.bcebos.com/models/deeplabv3p_mobilenetv3_large_cityscapes.tar.gz) | 9.3MB | - | 73.28 |
| [DeepLabv3_MobileNetv2_x1.0](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) | 14.7MB | - | 69.8 |
| [DeepLabv3_Xception65](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz) | 329.3MB | - | 79.3 |
| [HRNet_W18](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 77.3MB | | 79.36 |
diff --git a/docs/data/format/classification.md b/docs/data/format/classification.md
index 131e283b256ec99b53cb14b30ed504739395972e..bd461549b6c51ee6f5e3c0e66fbd9431decbabf8 100644
--- a/docs/data/format/classification.md
+++ b/docs/data/format/classification.md
@@ -25,12 +25,12 @@ MyDataset/ # 图像分类数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载图像分类示例数据集](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz)
-
+
**labels.txt**
diff --git a/docs/data/format/detection.md b/docs/data/format/detection.md
index 82c3110043b39e5a0d008f4cd1c9b4a7fe1aa040..0ba830add2e4d03a62e6d36f1e75b54da60639db 100644
--- a/docs/data/format/detection.md
+++ b/docs/data/format/detection.md
@@ -22,12 +22,10 @@ MyDataset/ # 目标检测数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载目标检测示例数据集](https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz)
-
**labels.txt**
diff --git a/docs/data/format/instance_segmentation.md b/docs/data/format/instance_segmentation.md
index c4f4e424e93745b7c5f2be2aed52905c47b8f574..4d4239dd09309fd7c1520eb71e6e6c24ac13d3b1 100644
--- a/docs/data/format/instance_segmentation.md
+++ b/docs/data/format/instance_segmentation.md
@@ -18,14 +18,12 @@ MyDataset/ # 实例分割数据集根目录
在PaddleX中,为了区分训练集和验证集,在`MyDataset`同级目录,使用不同的json表示数据的划分,例如`train.json`和`val.json`。[点击下载实例分割示例数据集](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz)。
-
-MSCOCO数据的标注文件采用json格式,用户可使用Labelme, 精灵标注助手或EasyData等标注工具进行标注,参见[数据标注工具](../annotations.md)
+MSCOCO数据的标注文件采用json格式,用户可使用Labelme, 精灵标注助手或EasyData等标注工具进行标注,参见[数据标注工具](../annotation.md)
## PaddleX加载数据集
示例代码如下,
diff --git a/docs/data/format/segmentation.md b/docs/data/format/segmentation.md
index 30dbffe0825ed78226180e119d253e97cc05f75b..e9d00ca6c08e1e75de8823f2efb74600c2ae0f26 100644
--- a/docs/data/format/segmentation.md
+++ b/docs/data/format/segmentation.md
@@ -23,12 +23,10 @@ MyDataset/ # 语义分割数据集根目录
**为了用于训练,我们需要在`MyDataset`目录下准备`train_list.txt`, `val_list.txt`和`labels.txt`三个文件**,分别用于表示训练集列表,验证集列表和类别标签列表。[点击下载语义分割示例数据集](https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz)
-
**labels.txt**
diff --git a/docs/deploy/hub_serving.md b/docs/deploy/hub_serving.md
new file mode 100644
index 0000000000000000000000000000000000000000..b0c020bafe9b97bb8a9d64b7982669af8d601f71
--- /dev/null
+++ b/docs/deploy/hub_serving.md
@@ -0,0 +1,153 @@
+# 轻量级服务化部署
+## 简介
+借助`PaddleHub-Serving`,可以将`PaddleX`的`Inference Model`进行快速部署,以提供在线预测的能力。
+
+关于`PaddleHub-Serving`的更多信息,可参照[PaddleHub-Serving](https://github.com/PaddlePaddle/PaddleHub/blob/develop/docs/tutorial/serving.md)。
+
+**注意:使用此方式部署,需确保自己Python环境中PaddleHub的版本高于1.8.0, 可在命令终端输入`pip show paddlehub`确认版本信息。**
+
+
+下面,我们按照步骤,实现将一个图像分类模型[MobileNetV3_small_ssld](https://bj.bcebos.com/paddlex/models/mobilenetv3_small_ssld_imagenet.tar.gz)转换成`PaddleHub`的预训练模型,并利用`PaddleHub-Serving`实现一键部署。
+
+
+# 模型部署
+
+## 1 部署模型准备
+部署模型的格式均为目录下包含`__model__`,`__params__`和`model.yml`三个文件,如若不然,则参照[部署模型导出文档](./export_model.md)进行导出。
+
+## 2 模型转换
+首先,我们将`PaddleX`的`Inference Model`转换成`PaddleHub`的预训练模型,使用命令`hub convert`即可一键转换,对此命令的说明如下:
+
+```shell
+$ hub convert --model_dir XXXX \
+ --module_name XXXX \
+ --module_version XXXX \
+ --output_dir XXXX
+```
+**参数**:
+
+|参数|用途|
+|-|-|
+|--model_dir/-m|`PaddleX Inference Model`所在的目录|
+|--module_name/-n|生成预训练模型的名称|
+|--module_version/-v|生成预训练模型的版本,默认为`1.0.0`|
+|--output_dir/-o|生成预训练模型的存放位置,默认为`{module_name}_{timestamp}`|
+
+因此,我们仅需要一行命令即可完成预训练模型的转换。
+
+```shell
+ hub convert --model_dir mobilenetv3_small_ssld_imagenet_hub --module_name mobilenetv3_small_ssld_imagenet_hub
+```
+
+转换成功后会打印提示信息,如下:
+```shell
+$ The converted module is stored in `MobileNetV3_small_ssld_hub_1596077881.868501`.
+```
+等待生成成功的提示后,我们就在输出目录中得到了一个`PaddleHub`的一个预训练模型。
+
+## 3 模型安装
+在模型转换一步中,我们得到了一个`.tar.gz`格式的预训练模型压缩包,在进行部署之前需要先安装到本机,使用命令`hub install`即可一键安装,对此命令的说明如下:
+```shell
+$ hub install ${MODULE}
+```
+其中${MODULE}为要安装的预训练模型文件路径。
+
+因此,我们使用`hub install`命令安装:
+```shell
+hub install MobileNetV3_small_ssld_hub_1596077881.868501/mobilenetv3_small_ssld_imagenet_hub.tar.gz
+```
+安装成功后会打印提示信息,如下:
+```shell
+$ Successfully installed mobilenetv3_small_ssld_imagenet_hub
+```
+
+## 4 模型部署
+下面,我们只需要使用`hub serving`命令即可完成模型的一键部署,对此命令的说明如下:
+```shell
+$ hub serving start --modules/-m [Module1==Version1, Module2==Version2, ...] \
+ --port/-p XXXX
+ --config/-c XXXX
+```
+
+**参数**:
+
+|参数|用途|
+|-|-|
+|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
*`当不指定Version时,默认选择最新版本`*|
+|--port/-p|服务端口,默认为8866|
+|--config/-c|使用配置文件配置模型|
+
+因此,我们仅需要一行代码即可完成模型的部署,如下:
+
+```shell
+$ hub serving start -m mobilenetv3_small_ssld_imagenet_hub
+```
+等待模型加载后,此预训练模型就已经部署在机器上了。
+
+我们还可以使用配置文件对部署的模型进行更多配置,配置文件格式如下:
+```json
+{
+ "modules_info": {
+ "mobilenetv3_small_ssld_imagenet_hub": {
+ "init_args": {
+ "version": "1.0.0"
+ },
+ "predict_args": {
+ "batch_size": 1,
+ "use_gpu": false
+ }
+ }
+ },
+ "port": 8866
+}
+
+```
+|参数|用途|
+|-|-|
+|modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,key为模型名称。其中:
`init_args`为模型加载时输入的参数,等同于`paddlehub.Module(**init_args)`
`predict_args`为模型预测时输入的参数,以`mobilenetv3_small_ssld_imagenet_hub`为例,等同于`mobilenetv3_small_ssld_imagenet_hub.batch_predict(**predict_args)`
+|port|服务端口,默认为8866|
+
+## 5 测试
+在第二步模型安装的同时,会生成一个客户端请求示例,存放在模型安装目录,默认为`${HUB_HOME}/.paddlehub/modules`,对于此例,我们可以在`~/.paddlehub/modules/mobilenetv3_small_ssld_imagenet_hub`找到此客户端示例`serving_client_demo.py`,代码如下:
+
+```python
+# coding: utf8
+import requests
+import json
+import cv2
+import base64
+
+
+def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+if __name__ == '__main__':
+ # 获取图片的base64编码格式
+ img1 = cv2_to_base64(cv2.imread("IMAGE_PATH1"))
+ img2 = cv2_to_base64(cv2.imread("IMAGE_PATH2"))
+ data = {'images': [img1, img2]}
+ # 指定content-type
+ headers = {"Content-type": "application/json"}
+ # 发送HTTP请求
+ url = "http://127.0.0.1:8866/predict/mobilenetv3_small_ssld_imagenet_hub"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+```
+使用的测试图片如下:
+
+
+
+将代码中的`IMAGE_PATH1`改成想要进行预测的图片路径后,在命令行执行:
+```python
+python ~/.paddlehub/module/MobileNetV3_small_ssld_hub/serving_client_demo.py
+```
+即可收到预测结果,如下:
+```shell
+[[{'category': 'envelope', 'category_id': 549, 'score': 0.2141510397195816}]]
+````
+
+到此,我们就完成了`PaddleX`模型的一键部署。
diff --git a/docs/deploy/index.rst b/docs/deploy/index.rst
index 13aa36073b9b8385dcfc1a52bcd8be23a18f2e5e..cbcea218e2698dd4f7d0388887f497973f363d2b 100755
--- a/docs/deploy/index.rst
+++ b/docs/deploy/index.rst
@@ -7,6 +7,7 @@
:caption: 文档目录:
export_model.md
+ hub_serving.md
server/index
nvidia-jetson.md
paddlelite/index
diff --git a/docs/deploy/nvidia-jetson.md b/docs/deploy/nvidia-jetson.md
index a707f4689198f0c1c626d75938a53df76ebfa881..5cd4c76b6d24f0308023dcd49fcf053696876b6a 100644
--- a/docs/deploy/nvidia-jetson.md
+++ b/docs/deploy/nvidia-jetson.md
@@ -1,11 +1,11 @@
# Nvidia Jetson开发板
## 说明
-本文档在 `Linux`平台使用`GCC 7.4`测试过,如果需要使用更高G++版本编译使用,则需要重新编译Paddle预测库,请参考: [Nvidia Jetson嵌入式硬件预测库源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。
+本文档在基于Nvidia Jetpack 4.4的`Linux`平台上使用`GCC 7.4`测试过,如需使用不同G++版本,则需要重新编译Paddle预测库,请参考: [NVIDIA Jetson嵌入式硬件预测库源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。
## 前置条件
* G++ 7.4
-* CUDA 9.0 / CUDA 10.0, CUDNN 7+ (仅在使用GPU版本的预测库时需要)
+* CUDA 10.0 / CUDNN 8 (仅在使用GPU版本的预测库时需要)
* CMake 3.0+
请确保系统已经安装好上述基本软件,**下面所有示例以工作目录 `/root/projects/`演示**。
@@ -57,13 +57,6 @@ CUDA_LIB=/usr/local/cuda/lib64
# CUDNN 的 lib 路径
CUDNN_LIB=/usr/local/cuda/lib64
-# 是否加载加密后的模型
-WITH_ENCRYPTION=OFF
-
-# OPENCV 路径, 如果使用自带预编译版本可不修改
-sh $(pwd)/scripts/jetson_bootstrap.sh # 下载预编译版本的opencv
-OPENCV_DIR=$(pwd)/deps/opencv3/
-
# 以下无需改动
rm -rf build
mkdir -p build
@@ -77,18 +70,13 @@ cmake .. \
-DPADDLE_DIR=${PADDLE_DIR} \
-DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
- -DCUDNN_LIB=${CUDNN_LIB} \
- -DENCRYPTION_DIR=${ENCRYPTION_DIR} \
- -DOPENCV_DIR=${OPENCV_DIR}
+ -DCUDNN_LIB=${CUDNN_LIB}
make
```
-**注意:** linux环境下编译会自动下载OPENCV和YAML,如果编译环境无法访问外网,可手动下载:
+**注意:** linux环境下编译会自动下载YAML,如果编译环境无法访问外网,可手动下载:
-- [opencv3_aarch.tgz](https://bj.bcebos.com/paddlex/deploy/tools/opencv3_aarch.tgz)
- [yaml-cpp.zip](https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip)
-opencv3_aarch.tgz文件下载后解压,然后在script/build.sh中指定`OPENCE_DIR`为解压后的路径。
-
yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip` 中的网址,改为下载文件的路径。
修改脚本设置好主要参数后,执行`build`脚本:
@@ -100,7 +88,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
**在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](export_model.md)将模型导出为部署格式。**
-编译成功后,预测demo的可执行程序分别为`build/demo/detector`,`build/demo/classifier`,`build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+* 编译成功后,图片预测demo的可执行程序分别为`build/demo/detector`,`build/demo/classifier`,`build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
@@ -111,10 +99,26 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
| gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output",**classfier无该参数** |
-| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 |
-| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
+
+* 编译成功后,视频预测demo的可执行程序分别为`build/demo/video_detector`,`build/demo/video_classifier`,`build/demo/video_segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+
+| 参数 | 说明 |
+| ---- | ---- |
+| model_dir | 导出的预测模型所在路径 |
+| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
+| camera_id | 摄像头设备ID,默认值为0 |
+| video_path | 视频文件的路径 |
+| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
+| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
+| gpu_id | GPU 设备ID, 默认值为0 |
+| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
+| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
+| save_dir | 保存可视化结果的路径, 默认值为"output" |
+
+**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
+
## 样例
@@ -143,3 +147,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2
```
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
+
+**样例三:**
+
+使用摄像头预测:
+
+```shell
+./build/demo/video_detector --model_dir=/root/projects/inference_model --use_camera=1 --use_gpu=1 --save_dir=output --save_result=1
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
+
+**样例四:**
+
+对视频文件进行预测:
+
+```shell
+./build/demo/video_detector --model_dir=/root/projects/inference_model --video_path=/path/to/video_file --use_gpu=1 --save_dir=output --show_result=1 --save_result=1
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
diff --git a/docs/deploy/paddlelite/slim/prune.md b/docs/deploy/paddlelite/slim/prune.md
index f246d8b875e4800b4f7e502d0d7557992a429e79..e8f0eac6d6a562f977431708ba9a04bacb3a0ee1 100644
--- a/docs/deploy/paddlelite/slim/prune.md
+++ b/docs/deploy/paddlelite/slim/prune.md
@@ -49,7 +49,7 @@ PaddleX提供了两种方式:
### 语义分割
实验背景:使用UNet模型,数据集为视盘分割示例数据,剪裁训练代码见[tutorials/compress/segmentation](https://github.com/PaddlePaddle/PaddleX/tree/develop/tutorials/compress/segmentation)
-| 模型 | 剪裁情况 | 模型大小 | mIOU(%) |GPU预测速度 | CPU预测速度 |
+| 模型 | 剪裁情况 | 模型大小 | mIoU(%) |GPU预测速度 | CPU预测速度 |
| :-----| :--------| :-------- | :---------- |:---------- | :---------|
|UNet | 无剪裁(原模型)| 77M | 91.22 |33.28ms |9523.55ms |
|UNet | 方案一(eval_metric_loss=0.10) |26M | 90.37 |21.04ms |3936.20ms |
diff --git a/docs/deploy/paddlelite/slim/quant.md b/docs/deploy/paddlelite/slim/quant.md
index 1903b84a3d2a4bda76055392d6a7115a2eda324d..705a1cadd903141f09ade715abf86a0c651355c1 100644
--- a/docs/deploy/paddlelite/slim/quant.md
+++ b/docs/deploy/paddlelite/slim/quant.md
@@ -6,7 +6,7 @@
定点量化使用更少的比特数(如8-bit、3-bit、2-bit等)表示神经网络的权重和激活值,从而加速模型推理速度。PaddleX提供了训练后量化技术,其原理可参见[训练后量化原理](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html#id14),该量化使用KL散度确定量化比例因子,将FP32模型转成INT8模型,且不需要重新训练,可以快速得到量化模型。
## 使用PaddleX量化模型
-PaddleX提供了`export_quant_model`接口,让用户以接口的形式对训练后的模型进行量化。点击查看[量化接口使用文档](../../../apis/slim.html)。
+PaddleX提供了`export_quant_model`接口,让用户以接口的形式对训练后的模型进行量化。点击查看[量化接口使用文档](../../../apis/slim.md)。
## 量化性能对比
模型量化后的性能对比指标请查阅[PaddleSlim模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html)
diff --git a/docs/deploy/server/cpp/linux.md b/docs/deploy/server/cpp/linux.md
index c7813ede08082555268eba5a46a77cbcd4cab13e..d81569e6d280d06e3637dd13a012e38169b615a2 100644
--- a/docs/deploy/server/cpp/linux.md
+++ b/docs/deploy/server/cpp/linux.md
@@ -116,7 +116,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
**在加载模型前,请检查你的模型目录中文件应该包括`model.yml`、`__model__`和`__params__`三个文件。如若不满足这个条件,请参考[模型导出为Inference文档](../../export_model.md)将模型导出为部署格式。**
-编译成功后,预测demo的可执行程序分别为`build/demo/detector`,`build/demo/classifier`,`build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+* 编译成功后,图片预测demo的可执行程序分别为`build/demo/detector`,`build/demo/classifier`,`build/demo/segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
@@ -130,7 +130,24 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 |
-| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
+
+* 编译成功后,视频预测demo的可执行程序分别为`build/demo/video_detector`,`build/demo/video_classifier`,`build/demo/video_segmenter`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+
+| 参数 | 说明 |
+| ---- | ---- |
+| model_dir | 导出的预测模型所在路径 |
+| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
+| camera_id | 摄像头设备ID,默认值为0 |
+| video_path | 视频文件的路径 |
+| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
+| use_trt | 是否使用 TensorRT 预测, 支持值为0或1(默认值为0) |
+| gpu_id | GPU 设备ID, 默认值为0 |
+| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
+| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
+| save_dir | 保存可视化结果的路径, 默认值为"output"|
+| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
+
+**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
## 样例
@@ -138,7 +155,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
> 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。
-`样例一`:
+**样例一:**
不使用`GPU`测试图片 `/root/projects/images/xiaoduxiong.jpeg`
@@ -148,7 +165,7 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
-`样例二`:
+**样例二:**
使用`GPU`预测多个图片`/root/projects/image_list.txt`,image_list.txt内容的格式如下:
```
@@ -161,3 +178,21 @@ yaml-cpp.zip文件下载后无需解压,在cmake/yaml.cmake中将`URL https://
./build/demo/detector --model_dir=/root/projects/inference_model --image_list=/root/projects/images_list.txt --use_gpu=1 --save_dir=output --batch_size=2 --thread_num=2
```
图片文件`可视化预测结果`会保存在`save_dir`参数设置的目录下。
+
+**样例三:**
+
+使用摄像头预测:
+
+```shell
+./build/demo/video_detector --model_dir=/root/projects/inference_model --use_camera=1 --use_gpu=1 --save_dir=output --save_result=1
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
+
+**样例四:**
+
+对视频文件进行预测:
+
+```shell
+./build/demo/video_detector --model_dir=/root/projects/inference_model --video_path=/path/to/video_file --use_gpu=1 --save_dir=output --show_result=1 --save_result=1
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
diff --git a/docs/deploy/server/cpp/windows.md b/docs/deploy/server/cpp/windows.md
index 641d1cba9262e60bf43a152f288e23bda4b74464..4c5ef9e201424cca4b3bcb291ffa74df9c45546b 100644
--- a/docs/deploy/server/cpp/windows.md
+++ b/docs/deploy/server/cpp/windows.md
@@ -101,7 +101,7 @@ D:
cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release
```
-编译成功后,预测demo的入口程序为`paddlex_inference\detector.exe`,`paddlex_inference\classifier.exe`,`paddlex_inference\segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+* 编译成功后,图片预测demo的入口程序为`paddlex_inference\detector.exe`,`paddlex_inference\classifier.exe`,`paddlex_inference\segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
@@ -114,7 +114,24 @@ cd D:\projects\PaddleX\deploy\cpp\out\build\x64-Release
| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
| batch_size | 预测的批量大小,默认为1 |
| thread_num | 预测的线程数,默认为cpu处理器个数 |
-| use_ir_optim | 是否使用图优化策略,支持值为0或1(默认值为1,图像分割默认值为0)|
+
+* 编译成功后,视频预测demo的入口程序为`paddlex_inference\video_detector.exe`,`paddlex_inference\video_classifier.exe`,`paddlex_inference\video_segmenter.exe`,用户可根据自己的模型类型选择,其主要命令参数说明如下:
+
+| 参数 | 说明 |
+| ---- | ---- |
+| model_dir | 导出的预测模型所在路径 |
+| use_camera | 是否使用摄像头预测,支持值为0或1(默认值为0) |
+| camera_id | 摄像头设备ID,默认值为0 |
+| video_path | 视频文件的路径 |
+| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0) |
+| gpu_id | GPU 设备ID, 默认值为0 |
+| show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果(因加入了延迟处理,故显示结果不能反映真实的帧率),支持值为0或1(默认值为0) |
+| save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
+| save_dir | 保存可视化结果的路径, 默认值为"output" |
+| key | 加密过程中产生的密钥信息,默认值为""表示加载的是未加密的模型 |
+
+**注意:若系统无GUI,则不要将show_result设置为1。当使用摄像头预测时,按`ESC`键可关闭摄像头并推出预测程序。**
+
## 样例
@@ -157,3 +174,18 @@ D:\images\xiaoduxiongn.jpeg
```
`--key`传入加密工具输出的密钥,例如`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=`, 图片文件可视化预测结果会保存在`save_dir`参数设置的目录下。
+
+### 样例四:(使用未加密的模型开启摄像头预测)
+
+```shell
+.\paddlex_inference\video_detector.exe --model_dir=D:\projects\inference_model --use_camera=1 --use_gpu=1 --save_dir=output
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。
+
+### 样例五:(使用未加密的模型对视频文件做预测)
+
+
+```shell
+.\paddlex_inference\video_detector.exe --model_dir=D:\projects\inference_model --video_path=D:\projects\video_test.mp4 --use_gpu=1 --show_result=1 --save_dir=output
+```
+当`save_result`设置为1时,`可视化预测结果`会以视频文件的格式保存在`save_dir`参数设置的目录下。如果系统有GUI,通过将`show_result`设置为1在屏幕上观看可视化预测结果。
diff --git a/docs/deploy/server/encryption.md b/docs/deploy/server/encryption.md
index dcf67b1b8de31afcd638b5858624b52a5c17872a..c172cc802bc859f427e13f5684f092a5b8c5fc1f 100644
--- a/docs/deploy/server/encryption.md
+++ b/docs/deploy/server/encryption.md
@@ -51,7 +51,7 @@ paddlex-encryption
|
├── lib # libpmodel-encrypt.so和libpmodel-decrypt.so动态库
|
-└── tool # paddlex_encrypt_tool
+└── tool # paddle_encrypt_tool
```
Windows加密工具包含内容为:
@@ -61,7 +61,7 @@ paddlex-encryption
|
├── lib # pmodel-encrypt.dll和pmodel-decrypt.dll动态库 pmodel-encrypt.lib和pmodel-encrypt.lib静态库
|
-└── tool # paddlex_encrypt_tool.exe 模型加密工具
+└── tool # paddle_encrypt_tool.exe 模型加密工具
```
### 1.3 加密PaddleX模型
@@ -71,13 +71,13 @@ paddlex-encryption
Linux平台:
```
# 假设模型在/root/projects下
-./paddlex-encryption/tool/paddlex_encrypt_tool -model_dir /root/projects/paddlex_inference_model -save_dir /root/projects/paddlex_encrypted_model
+./paddlex-encryption/tool/paddle_encrypt_tool -model_dir /root/projects/paddlex_inference_model -save_dir /root/projects/paddlex_encrypted_model
```
Windows平台:
```
# 假设模型在D:/projects下
-.\paddlex-encryption\tool\paddlex_encrypt_tool.exe -model_dir D:\projects\paddlex_inference_model -save_dir D:\projects\paddlex_encrypted_model
+.\paddlex-encryption\tool\paddle_encrypt_tool.exe -model_dir D:\projects\paddlex_inference_model -save_dir D:\projects\paddlex_encrypted_model
```
`-model_dir`用于指定inference模型路径(参考[导出inference模型](../export_model.md)将模型导出为inference格式模型),可使用[导出小度熊识别模型](../export_model.md)中导出的`inference_model`。加密完成后,加密过的模型会保存至指定的`-save_dir`下,包含`__model__.encrypted`、`__params__.encrypted`和`model.yml`三个文件,同时生成密钥信息,命令输出如下图所示,密钥为`kLAl1qOs5uRbFt0/RrIDTZW2+tOf5bzvUIaHGF8lJ1c=`
diff --git a/docs/deploy/server/python.md b/docs/deploy/server/python.md
index 36b0891176bb9cf86078a3c9f9dfe5b48419613b..08571b22291a0b8e91b141162ec4927c6204c56b 100644
--- a/docs/deploy/server/python.md
+++ b/docs/deploy/server/python.md
@@ -27,7 +27,26 @@ import paddlex as pdx
predictor = pdx.deploy.Predictor('./inference_model')
image_list = ['xiaoduxiong_test_image/JPEGImages/WeChatIMG110.jpeg',
'xiaoduxiong_test_image/JPEGImages/WeChatIMG111.jpeg']
-result = predictor.predict(image_list=image_list)
+result = predictor.batch_predict(image_list=image_list)
+```
+
+* 视频流预测
+```
+import cv2
+import paddlex as pdx
+predictor = pdx.deploy.Predictor('./inference_model')
+cap = cv2.VideoCapture(0)
+while cap.isOpened():
+ ret, frame = cap.read()
+ if ret:
+ result = predictor.predict(frame)
+ vis_img = pdx.det.visualize(frame, result, threshold=0.6, save_dir=None)
+ cv2.imshow('Xiaoduxiong', vis_img)
+ if cv2.waitKey(1) & 0xFF == ord('q'):
+ break
+ else:
+ break
+cap.release()
```
> 关于预测速度的说明:加载模型后前几张图片的预测速度会较慢,这是因为运行启动时涉及到内存显存初始化等步骤,通常在预测20-30张图片后模型的预测速度达到稳定。
diff --git a/docs/examples/human_segmentation.md b/docs/examples/human_segmentation.md
index b4c707709c9ea0304a44daec085ea4fa1ca2678c..504132bcad5476309d0944fb6d5f94787fb6025f 100644
--- a/docs/examples/human_segmentation.md
+++ b/docs/examples/human_segmentation.md
@@ -1,12 +1,12 @@
# 人像分割模型
-本教程基于PaddleX核心分割模型实现人像分割,开放预训练模型和测试数据、支持视频流人像分割、提供模型Fine-tune到Paddle Lite移动端部署的全流程应用指南。
+本教程基于PaddleX核心分割模型实现人像分割,开放预训练模型和测试数据、支持视频流人像分割、提供模型Fine-tune到Paddle Lite移动端及Nvidia Jeston嵌入式设备部署的全流程应用指南。
## 预训练模型和测试数据
#### 预训练模型
-本案例开放了两个在大规模人像数据集上训练好的模型,以满足服务器端场景和移动端场景的需求。使用这些模型可以快速体验视频流人像分割,也可以部署到移动端进行实时人像分割,也可以用于完成模型Fine-tuning。
+本案例开放了两个在大规模人像数据集上训练好的模型,以满足服务器端场景和移动端场景的需求。使用这些模型可以快速体验视频流人像分割,也可以部署到移动端或嵌入式设备进行实时人像分割,也可以用于完成模型Fine-tuning。
| 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 |
| --- | --- | --- | ---| --- |
@@ -243,15 +243,17 @@ python quant_offline.py --model_dir output/best_model \
* `--save_dir`: 量化模型保存路径
* `--image_shape`: 网络输入图像大小(w, h)
-## Paddle Lite移动端部署
+## 推理部署
+
+### Paddle Lite移动端部署
本案例将人像分割模型在移动端进行部署,部署流程展示如下,通用的移动端部署流程参见[Paddle Lite移动端部署](../../docs/deploy/paddlelite/android.md)。
-### 1. 将PaddleX模型导出为inference模型
+#### 1. 将PaddleX模型导出为inference模型
本案例使用humanseg_mobile_quant预训练模型,该模型已经是inference模型,不需要再执行模型导出步骤。如果不使用预训练模型,则执行上一章节`模型训练`中的`模型导出`将自己训练的模型导出为inference格式。
-### 2. 将inference模型优化为Paddle Lite模型
+#### 2. 将inference模型优化为Paddle Lite模型
下载并解压 [模型优化工具opt](https://bj.bcebos.com/paddlex/deploy/lite/model_optimize_tool_11cbd50e.tar.gz),进入模型优化工具opt所在路径后,执行以下命令:
@@ -273,16 +275,16 @@ python quant_offline.py --model_dir output/best_model \
更详细的使用方法和参数含义请参考: [使用opt转化模型](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html)
-### 3. 移动端预测
+#### 3. 移动端预测
PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图像分类、目标检测、实例分割和语义分割,该demo位于`PaddleX/deploy/lite/android/demo`,用户将模型、配置文件和测试图片拷贝至该demo下进行预测。
-#### 3.1 前置依赖
+##### 3.1 前置依赖
* Android Studio 3.4
* Android手机或开发板
-#### 3.2 拷贝模型、配置文件和测试图片
+##### 3.2 拷贝模型、配置文件和测试图片
* 将Lite模型(.nb文件)拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/model/`目录下, 根据.nb文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`MODEL_PATH_DEFAULT`;
@@ -290,7 +292,7 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图
* 将测试图片拷贝到`PaddleX/deploy/lite/android/demo/app/src/main/assets/images/`目录下,根据图片文件的名字,修改文件`PaddleX/deploy/lite/android/demo/app/src/main/res/values/strings.xml`中的`IMAGE_PATH_DEFAULT`。
-#### 3.3 导入工程并运行
+##### 3.3 导入工程并运行
* 打开Android Studio,在"Welcome to Android Studio"窗口点击"Open an existing Android Studio project",在弹出的路径选择窗口中进入`PaddleX/deploy/lite/android/demo`目录,然后点击右下角的"Open"按钮,导入工程;
@@ -303,3 +305,58 @@ PaddleX提供了基于PaddleX Android SDK的安卓demo,可供用户体验图
测试图片及其分割结果如下所示:

+
+### Nvidia Jetson嵌入式设备部署
+
+#### c++部署
+
+step 1. 下载PaddleX源码
+
+```
+git clone https://github.com/PaddlePaddle/PaddleX
+```
+
+step 2. 将`PaddleX/examples/human_segmentation/deploy/cpp`下的`human_segmenter.cpp`和`CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。
+
+step 3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。
+
+step 4. 编译成功后,可执行程为`build/human_segmenter`,其主要命令参数说明如下:
+
+ | 参数 | 说明 |
+ | ---- | ---- |
+ | model_dir | 人像分割模型路径 |
+ | use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
+ | gpu_id | GPU 设备ID, 默认值为0 |
+ | use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
+ | camera_id | 摄像头设备ID,默认值为0 |
+ | video_path | 视频文件的路径 |
+ | show_result | 对视频文件做预测时,是否在屏幕上实时显示预测可视化结果,支持值为0或1(默认值为0) |
+ | save_result | 是否将每帧的预测可视结果保存为视频文件,支持值为0或1(默认值为1) |
+ | image | 待预测的图片路径 |
+ | save_dir | 保存可视化结果的路径, 默认值为"output"|
+
+step 5. 推理预测
+
+ 用于部署推理的模型应为inference格式,本案例使用humanseg_server_inference预训练模型,该模型已经是inference模型,不需要再执行模型导出步骤。如果不使用预训练模型,则执行第2章节`模型训练`中的`模型导出`将自己训练的模型导出为inference格式。
+
+ * 使用未加密的模型对单张图片做预测
+
+ 待测试图片位于本案例提供的测试数据中,可以替换成自己的图片。
+
+ ```shell
+ ./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --image=/path/to/data/mini_supervisely/Images/pexels-photo-63776.png --use_gpu=1 --save_dir=output
+ ```
+
+ * 使用未加密的模型开启摄像头做预测
+
+ ```shell
+ ./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --use_camera=1 --save_result=1 --use_gpu=1 --save_dir=output
+ ```
+
+ * 使用未加密的模型对视频文件做预测
+
+ 待测试视频文件位于本案例提供的测试数据中,可以替换成自己的视频文件。
+
+ ```shell
+ ./build/human_segmenter --model_dir=/path/to/humanseg_server_inference --video_path=/path/to/data/mini_supervisely/video_test.mp4 --save_result=1 --use_gpu=1 --save_dir=output
+ ```
diff --git a/docs/examples/meter_reader.md b/docs/examples/meter_reader.md
index 6eabe48aa124672ce33caba16f2e93cdb62edc92..670d7d1399b55c672b17ed903663bf26c8a6ef84 100644
--- a/docs/examples/meter_reader.md
+++ b/docs/examples/meter_reader.md
@@ -46,13 +46,13 @@
#### 测试表计读数
-1. 下载PaddleX源码:
+step 1. 下载PaddleX源码:
```
git clone https://github.com/PaddlePaddle/PaddleX
```
-2. 预测执行文件位于`PaddleX/examples/meter_reader/`,进入该目录:
+step 2. 预测执行文件位于`PaddleX/examples/meter_reader/`,进入该目录:
```
cd PaddleX/examples/meter_reader/
@@ -76,7 +76,7 @@ cd PaddleX/examples/meter_reader/
| use_erode | 是否使用图像腐蚀对分割预测图进行细分,默认为False |
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
-3. 预测
+step 3. 预测
若要使用GPU,则指定GPU卡号(以0号卡为例):
@@ -112,17 +112,17 @@ python3 reader_infer.py --detector_dir /path/to/det_inference_model --segmenter_
#### c++部署
-1. 下载PaddleX源码:
+step 1. 下载PaddleX源码:
```
git clone https://github.com/PaddlePaddle/PaddleX
```
-2. 将`PaddleX\examples\meter_reader\deploy\cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX\deploy\cpp`目录下,拷贝之前可以将`PaddleX\deploy\cpp`下原本的`CMakeList.txt`做好备份。
+step 2. 将`PaddleX\examples\meter_reader\deploy\cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX\deploy\cpp`目录下,拷贝之前可以将`PaddleX\deploy\cpp`下原本的`CMakeList.txt`做好备份。
-3. 按照[Windows平台部署](../deploy/server/cpp/windows.md)中的Step2至Step4完成C++预测代码的编译。
+step 3. 按照[Windows平台部署](../deploy/server/cpp/windows.md)中的Step2至Step4完成C++预测代码的编译。
-4. 编译成功后,可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
+step 4. 编译成功后,可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
```
cd PaddleX\deploy\cpp\out\build\x64-Release
@@ -139,8 +139,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
- | det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
- | seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
@@ -149,7 +147,7 @@ git clone https://github.com/PaddlePaddle/PaddleX
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
| score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5|
-5. 推理预测:
+step 5. 推理预测:
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。
@@ -160,6 +158,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
```
* 使用未加密的模型对图像列表做预测
+ 图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
+ ```
+ \path\to\images\1.jpg
+ \path\to\images\2.jpg
+ ...
+ \path\to\images\n.jpg
+ ```
```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
@@ -171,29 +176,29 @@ git clone https://github.com/PaddlePaddle/PaddleX
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
```
- * 使用加密后的模型对单张图片做预测
+ * 使用加密后的模型对单张图片做预测
- 如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
+ 如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
- ```shell
- .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
+ ```shell
+ .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
```
### Linux系统的jetson嵌入式设备安全部署
#### c++部署
-1. 下载PaddleX源码:
+step 1. 下载PaddleX源码:
```
git clone https://github.com/PaddlePaddle/PaddleX
```
-2. 将`PaddleX/examples/meter_reader/deploy/cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。
+step 2. 将`PaddleX/examples/meter_reader/deploy/cpp`下的`meter_reader`文件夹和`CMakeList.txt`拷贝至`PaddleX/deploy/cpp`目录下,拷贝之前可以将`PaddleX/deploy/cpp`下原本的`CMakeList.txt`做好备份。
-3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。
+step 3. 按照[Nvidia Jetson开发板部署](../deploy/nvidia-jetson.md)中的Step2至Step3完成C++预测代码的编译。
-4. 编译成功后,可执行程为`build/meter_reader/meter_reader`,其主要命令参数说明如下:
+step 4. 编译成功后,可执行程为`build/meter_reader/meter_reader`,其主要命令参数说明如下:
| 参数 | 说明 |
| ---- | ---- |
@@ -204,8 +209,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
- | det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
- | seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
@@ -214,7 +217,7 @@ git clone https://github.com/PaddlePaddle/PaddleX
| erode_kernel | 图像腐蚀操作时的卷积核大小,默认值为4 |
| score_threshold | 检测模型输出结果中,预测得分低于该阈值的框将被滤除,默认值为0.5|
-5. 推理预测:
+step 5. 推理预测:
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[部署模型导出](../deploy/export_model.md)将模型导出为inference格式。
@@ -225,7 +228,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
```
* 使用未加密的模型对图像列表做预测
-
+ 图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
+ ```
+ \path\to\images\1.jpg
+ \path\to\images\2.jpg
+ ...
+ \path\to\images\n.jpg
+ ```
```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
```
@@ -236,15 +245,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
```
- * 使用加密后的模型对单张图片做预测
-
- 如果未对模型进行加密,请参考[加密PaddleX模型](../deploy/server/encryption.html#paddlex)对模型进行加密。例如加密后的检测模型所在目录为`/path/to/encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`/path/to/encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
-
- ```shell
- ./build/meter_reader/meter_reader --det_model_dir=/path/to/encrypted_det_inference_model --seg_model_dir=/path/to/encrypted_seg_inference_model --image=/path/to/test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
- ```
-
-
## 模型训练
diff --git a/docs/examples/solutions.md b/docs/examples/solutions.md
index 329d78626b506d2e486d0ed77201f5863e99f40f..ed1304c5e2067414790ced1bd01103110f87f619 100644
--- a/docs/examples/solutions.md
+++ b/docs/examples/solutions.md
@@ -42,6 +42,7 @@ PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉
| YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 |
| YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 |
| YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 |
+| PPYOLO | 预测速度和模型精度都比YOLOv3-DarkNet53优异,适用于大多数的服务器端场景 | 329.1MB | - | - | - | 45.9 |
| FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 |
| FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 |
| FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 |
@@ -74,11 +75,12 @@ PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
> 表中CPU预测速度 (测试CPU型号为)。
> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
-> 测速时模型的输入大小为1024 x 2048,mIOU为Cityscapes数据集上评估所得。
+> 测速时模型的输入大小为1024 x 2048,mIoU为Cityscapes数据集上评估所得。
-| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIOU |
+| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIoU |
| :---- | :------- | :---------- | :---------- | :----- | :----- |:--- |
| DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% |
+| DeepLabv3-MobileNetV3_large_x1_0_ssld | 轻量级模型,适用于移动端场景| - | - | - | 73.28% |
| HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - |
| FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 |
| HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 |
diff --git a/docs/gui/faq.md b/docs/gui/faq.md
index f90bcbf7dd878ecfcae077cb2cf07bd851ae03b4..2f9f0a9dcc69f203d8b10b22778761de78385abf 100644
--- a/docs/gui/faq.md
+++ b/docs/gui/faq.md
@@ -33,4 +33,4 @@
**如果您有任何问题或建议,欢迎以issue的形式,或加入PaddleX官方QQ群(1045148026)直接反馈您的问题和需求**
-
\ No newline at end of file
+
diff --git a/docs/gui/images/QR2.jpg b/docs/gui/images/QR2.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..633be6a2d8d74a0bd52bee93327bdbcf7ff96139
Binary files /dev/null and b/docs/gui/images/QR2.jpg differ
diff --git a/docs/images/vdl1.jpg b/docs/images/vdl1.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5b0c90d28bc9bda583008fe2fb9729a7c3e06df6
Binary files /dev/null and b/docs/images/vdl1.jpg differ
diff --git a/docs/images/vdl2.jpg b/docs/images/vdl2.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..502a5f861104e2b20869b06cf8eb215ec58f0435
Binary files /dev/null and b/docs/images/vdl2.jpg differ
diff --git a/docs/images/vdl3.jpg b/docs/images/vdl3.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a16d6924d8867949ecae258ee588296845c6da86
Binary files /dev/null and b/docs/images/vdl3.jpg differ
diff --git a/docs/images/xiaoduxiong.jpeg b/docs/images/xiaoduxiong.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..d8e64639827da47e64033c00b82ef85be6c0b42f
Binary files /dev/null and b/docs/images/xiaoduxiong.jpeg differ
diff --git a/docs/train/classification.md b/docs/train/classification.md
index 76c947e8dda482d7c78d952ba2c593e61feadfd3..008a6d9713de990ffc0a04b4ca8031b7c7c047b9 100644
--- a/docs/train/classification.md
+++ b/docs/train/classification.md
@@ -29,4 +29,4 @@ python mobilenetv3_small_ssld.py
- 【**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
- 【**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-- 【**拓展**】更多图像分类模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/index.html)。
+- 【**拓展**】更多图像分类模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/classification.md)。
diff --git a/docs/train/index.rst b/docs/train/index.rst
index 54a8a1a7d39019a33a87d1c94ce04b76eb6fb8e8..b922c31268712c9a7c471491fa867746f0d93781 100755
--- a/docs/train/index.rst
+++ b/docs/train/index.rst
@@ -13,3 +13,4 @@ PaddleX集成了PaddleClas、PaddleDetection和PaddleSeg三大CV工具套件中
instance_segmentation.md
semantic_segmentation.md
prediction.md
+ visualdl.md
diff --git a/docs/train/instance_segmentation.md b/docs/train/instance_segmentation.md
index de0f14eaea631e5b398b7fcc6669fcda96878907..2170dbc03577b240945407cfa272e0dd0b5c8a31 100644
--- a/docs/train/instance_segmentation.md
+++ b/docs/train/instance_segmentation.md
@@ -27,4 +27,4 @@ python mask_rcnn_r50_fpn.py
- 【**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
- 【**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-- 【**拓展**】更多实例分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/index.html)。
+- 【**拓展**】更多实例分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/instance_segmentation.md)。
diff --git a/docs/train/object_detection.md b/docs/train/object_detection.md
index 4b7da69a865f07d73691f13045bfc7792df783c1..f671ee0cd0ed297a9b012061fb296d12ed2945f2 100644
--- a/docs/train/object_detection.md
+++ b/docs/train/object_detection.md
@@ -13,6 +13,7 @@ PaddleX目前提供了FasterRCNN和YOLOv3两种检测结构,多种backbone模
| [YOLOv3-MobileNetV1](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv1.py) | 29.3% | 99.2MB | 15.442ms | - | 模型小,预测速度快,适用于低性能或移动端设备 |
| [YOLOv3-MobileNetV3](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_mobilenetv3.py) | 31.6% | 100.7MB | 143.322ms | - | 模型小,移动端上预测速度有优势 |
| [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/yolov3_darknet53.py) | 38.9% | 249.2MB | 42.672ms | - | 模型较大,预测速度快,适用于服务端 |
+| [PPYOLO](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/ppyolo.py) | 45.9% | 329.1MB | - | - | 模型较大,预测速度比YOLOv3-DarkNet53更快,适用于服务端 |
| [FasterRCNN-ResNet50-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r50_fpn.py) | 37.2% | 167.7MB | 197.715ms | - | 模型精度高,适用于服务端部署 |
| [FasterRCNN-ResNet18-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_r18_fpn.py) | 32.6% | 173.2MB | - | - | 模型精度高,适用于服务端部署 |
| [FasterRCNN-HRNet-FPN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py) | 36.0% | 115.MB | 81.592ms | - | 模型精度高,预测速度快,适用于服务端部署 |
@@ -31,4 +32,4 @@ python yolov3_mobilenetv1.py
- 【**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
- 【**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-- 【**拓展**】更多目标检测模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/index.html)。
+- 【**拓展**】更多目标检测模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/detection.md)。
diff --git a/docs/train/semantic_segmentation.md b/docs/train/semantic_segmentation.md
index 391df0aca7b3103dc89068cc7a2603bcc86226b0..2224db4e7d8779e37821574672f91e92b93ab87e 100644
--- a/docs/train/semantic_segmentation.md
+++ b/docs/train/semantic_segmentation.md
@@ -4,15 +4,16 @@
PaddleX目前提供了DeepLabv3p、UNet、HRNet和FastSCNN四种语义分割结构,多种backbone模型,可满足开发者不同场景和性能的需求。
-- **mIOU**: 模型在CityScape数据集上的测试精度
+- **mIoU**: 模型在CityScape数据集上的测试精度
- **预测速度**:单张图片的预测用时(不包括预处理和后处理)
- "-"表示指标暂未更新
-| 模型(点击获取代码) | mIOU | 模型大小 | GPU预测速度 | Arm预测速度 | 备注 |
+| 模型(点击获取代码) | mIoU | 模型大小 | GPU预测速度 | Arm预测速度 | 备注 |
| :---------------- | :------- | :------- | :--------- | :--------- | :----- |
| [DeepLabv3p-MobileNetV2-x0.25](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2_x0.25.py) | - | 2.9MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
| [DeepLabv3p-MobileNetV2-x1.0](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py) | 69.8% | 11MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
-| [DeepLabv3p-Xception65](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_xception65.pyy) | 79.3% | 158MB | - | - | 模型大,精度高,适用于服务端 |
+| [DeepLabv3_MobileNetV3_large_x1_0_ssld](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv3_large_ssld.py) | 73.28% | 9.3MB | - | - | 模型小,预测速度快,精度较高,适用于低性能或移动端设备 |
+| [DeepLabv3p-Xception65](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/deeplabv3p_xception65.py) | 79.3% | 158MB | - | - | 模型大,精度高,适用于服务端 |
| [UNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/unet.py) | - | 52MB | - | - | 模型较大,精度高,适用于服务端 |
| [HRNet](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/hrnet.py) | 79.4% | 37MB | - | - | 模型较小,模型精度高,适用于服务端部署 |
| [FastSCNN](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/semantic_segmentation/fast_scnn.py) | - | 4.5MB | - | - | 模型小,预测速度快,适用于低性能或移动端设备 |
@@ -30,4 +31,4 @@ python deeplabv3p_mobilenetv2_x0.25.py
- 【**重要**】针对自己的机器环境和数据,调整训练参数?先了解下PaddleX中训练参数作用。[——>>传送门](../appendix/parameters.md)
- 【**有用**】没有机器资源?使用AIStudio免费的GPU资源在线训练模型。[——>>传送门](https://aistudio.baidu.com/aistudio/projectdetail/450925)
-- 【**拓展**】更多语义分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/index.html)。
+- 【**拓展**】更多语义分割模型,查阅[PaddleX模型库](../appendix/model_zoo.md)和[API使用文档](../apis/models/semantic_segmentation.md)。
diff --git a/docs/train/visualdl.md b/docs/train/visualdl.md
new file mode 100755
index 0000000000000000000000000000000000000000..ac94d6d2c31e838924e7a393024ec4aac75c227a
--- /dev/null
+++ b/docs/train/visualdl.md
@@ -0,0 +1,26 @@
+# VisualDL可视化训练指标
+在使用PaddleX训练模型过程中,各个训练指标和评估指标会直接输出到标准输出流,同时也可通过VisualDL对训练过程中的指标进行可视化,只需在调用`train`函数时,将`use_vdl`参数设为`True`即可,如下代码所示,
+```
+model = paddlex.cls.ResNet50(num_classes=1000)
+model.train(num_epochs=120, train_dataset=train_dataset,
+ train_batch_size=32, eval_dataset=eval_dataset,
+ log_interval_steps=10, save_interval_epochs=10,
+ save_dir='./output', use_vdl=True)
+```
+
+模型在训练过程中,会在`save_dir`下生成`vdl_log`目录,通过在命令行终端执行以下命令,启动VisualDL。
+```
+visualdl --logdir=output/vdl_log --port=8008
+```
+在浏览器打开`http://0.0.0.0:8008`便可直接查看随训练迭代动态变化的各个指标(0.0.0.0表示启动VisualDL所在服务器的IP,本机使用0.0.0.0即可)。
+
+在训练分类模型过程中,使用VisualDL进行可视化的示例图如下所示。
+
+> 训练过程中每个Step的`Loss`和相应`Top1准确率`变化趋势:
+
+
+> 训练过程中每个Step的`学习率lr`和相应`Top5准确率`变化趋势:
+
+
+> 训练过程中,每次保存模型时,模型在验证数据集上的`Top1准确率`和`Top5准确率`:
+
diff --git a/examples/human_segmentation/deploy/cpp/CMakeLists.txt b/examples/human_segmentation/deploy/cpp/CMakeLists.txt
new file mode 100644
index 0000000000000000000000000000000000000000..fc7a68f389710370d7e7bb0aa11f96596d3f8819
--- /dev/null
+++ b/examples/human_segmentation/deploy/cpp/CMakeLists.txt
@@ -0,0 +1,321 @@
+cmake_minimum_required(VERSION 3.0)
+project(PaddleX CXX C)
+
+option(WITH_MKL "Compile human_segmenter with MKL/OpenBlas support,defaultuseMKL." ON)
+option(WITH_GPU "Compile human_segmenter with GPU/CPU, default use CPU." ON)
+if (NOT WIN32)
+ option(WITH_STATIC_LIB "Compile human_segmenter with static/shared library, default use static." OFF)
+else()
+ option(WITH_STATIC_LIB "Compile human_segmenter with static/shared library, default use static." ON)
+endif()
+option(WITH_TENSORRT "Compile human_segmenter with TensorRT." OFF)
+option(WITH_ENCRYPTION "Compile human_segmenter with encryption tool." OFF)
+
+SET(TENSORRT_DIR "" CACHE PATH "Location of libraries")
+SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
+SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
+SET(ENCRYPTION_DIR"" CACHE PATH "Location of libraries")
+SET(CUDA_LIB "" CACHE PATH "Location of libraries")
+
+if (NOT WIN32)
+ set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
+ set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
+else()
+ set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
+ set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
+ set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/paddlex_inference)
+endif()
+
+if (NOT WIN32)
+ SET(YAML_BUILD_TYPE ON CACHE BOOL "yaml build shared library.")
+else()
+ SET(YAML_BUILD_TYPE OFF CACHE BOOL "yaml build shared library.")
+endif()
+include(cmake/yaml-cpp.cmake)
+
+include_directories("${CMAKE_SOURCE_DIR}/")
+include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
+
+macro(safe_set_static_flag)
+ foreach(flag_var
+ CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+ CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+ if(${flag_var} MATCHES "/MD")
+ string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+ endif(${flag_var} MATCHES "/MD")
+ endforeach(flag_var)
+endmacro()
+
+
+if (WITH_ENCRYPTION)
+add_definitions( -DWITH_ENCRYPTION=${WITH_ENCRYPTION})
+endif()
+
+if (WITH_MKL)
+ ADD_DEFINITIONS(-DUSE_MKL)
+endif()
+
+if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
+ message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
+endif()
+
+if (NOT (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64"))
+ if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
+ message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
+ endif()
+endif()
+
+include_directories("${CMAKE_SOURCE_DIR}/")
+include_directories("${PADDLE_DIR}/")
+include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
+include_directories("${PADDLE_DIR}/third_party/install/glog/include")
+include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
+include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
+if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include")
+ include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
+endif()
+if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include")
+ include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
+endif()
+# zlib does not exist in 1.8.1
+if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/include")
+ include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
+endif()
+
+include_directories("${PADDLE_DIR}/third_party/boost")
+include_directories("${PADDLE_DIR}/third_party/eigen3")
+
+if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+ link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
+endif()
+if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+ link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
+endif()
+
+if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/lib")
+ link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
+endif()
+
+link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
+link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
+link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
+link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
+link_directories("${PADDLE_DIR}/paddle/lib/")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}")
+
+if (WIN32)
+ include_directories("${PADDLE_DIR}/paddle/fluid/inference")
+ include_directories("${PADDLE_DIR}/paddle/include")
+ link_directories("${PADDLE_DIR}/paddle/fluid/inference")
+ find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
+ unset(OpenCV_DIR CACHE)
+else ()
+ if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64") # x86_64 aarch64
+ set(OpenCV_INCLUDE_DIRS "/usr/include/opencv4")
+ file(GLOB OpenCV_LIBS /usr/lib/aarch64-linux-gnu/libopencv_*${CMAKE_SHARED_LIBRARY_SUFFIX})
+ message("OpenCV libs: ${OpenCV_LIBS}")
+ else()
+ find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
+ endif()
+ include_directories("${PADDLE_DIR}/paddle/include")
+ link_directories("${PADDLE_DIR}/paddle/lib")
+endif ()
+include_directories(${OpenCV_INCLUDE_DIRS})
+
+if (WIN32)
+ add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+ find_package(OpenMP REQUIRED)
+ if (OPENMP_FOUND)
+ message("OPENMP FOUND")
+ set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} ${OpenMP_C_FLAGS}")
+ set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} ${OpenMP_C_FLAGS}")
+ set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} ${OpenMP_CXX_FLAGS}")
+ set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} ${OpenMP_CXX_FLAGS}")
+ endif()
+ set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
+ set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
+ set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
+ set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
+ if (WITH_STATIC_LIB)
+ safe_set_static_flag()
+ add_definitions(-DSTATIC_LIB)
+ endif()
+else()
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11")
+ set(CMAKE_STATIC_LIBRARY_PREFIX "")
+endif()
+
+if (WITH_GPU)
+ if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
+ message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda/lib64")
+ endif()
+ if (NOT WIN32)
+ if (NOT DEFINED CUDNN_LIB)
+ message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn/")
+ endif()
+ endif(NOT WIN32)
+endif()
+
+
+if (NOT WIN32)
+ if (WITH_TENSORRT AND WITH_GPU)
+ include_directories("${TENSORRT_DIR}/include")
+ link_directories("${TENSORRT_DIR}/lib")
+ endif()
+endif(NOT WIN32)
+
+if (NOT WIN32)
+ set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
+ if(EXISTS ${NGRAPH_PATH})
+ include(GNUInstallDirs)
+ include_directories("${NGRAPH_PATH}/include")
+ link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
+ set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if(WITH_MKL)
+ include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
+ if (WIN32)
+ set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
+ ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
+ else ()
+ set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+ ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+ execute_process(COMMAND cp -r ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
+ endif ()
+ set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
+ if(EXISTS ${MKLDNN_PATH})
+ include_directories("${MKLDNN_PATH}/include")
+ if (WIN32)
+ set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+ else ()
+ set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+ endif ()
+ endif()
+else()
+ set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+endif()
+
+if (WIN32)
+ if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
+ set(DEPS
+ ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+ else()
+ set(DEPS
+ ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if(WITH_STATIC_LIB)
+ set(DEPS
+ ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+else()
+ if (NOT WIN32)
+ set(DEPS
+ ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+ else()
+ set(DEPS
+ ${PADDLE_DIR}/paddle/lib/paddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if (NOT WIN32)
+ set(DEPS ${DEPS}
+ ${MATH_LIB} ${MKLDNN_LIB}
+ glog gflags protobuf z xxhash yaml-cpp
+ )
+ if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+ set(DEPS ${DEPS} snappystream)
+ endif()
+ if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+ set(DEPS ${DEPS} snappy)
+ endif()
+else()
+ set(DEPS ${DEPS}
+ ${MATH_LIB} ${MKLDNN_LIB}
+ glog gflags_static libprotobuf xxhash libyaml-cppmt)
+
+ if (EXISTS "${PADDLE_DIR}/third_party/install/zlib/lib")
+ set(DEPS ${DEPS} zlibstatic)
+ endif()
+ set(DEPS ${DEPS} libcmt shlwapi)
+ if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+ set(DEPS ${DEPS} snappy)
+ endif()
+ if (EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+ set(DEPS ${DEPS} snappystream)
+ endif()
+endif(NOT WIN32)
+
+if(WITH_GPU)
+ if(NOT WIN32)
+ if (WITH_TENSORRT)
+ set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+ set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
+ else()
+ set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+ set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+ set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if(WITH_ENCRYPTION)
+ if(NOT WIN32)
+ include_directories("${ENCRYPTION_DIR}/include")
+ link_directories("${ENCRYPTION_DIR}/lib")
+ set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/libpmodel-decrypt${CMAKE_SHARED_LIBRARY_SUFFIX})
+ else()
+ include_directories("${ENCRYPTION_DIR}/include")
+ link_directories("${ENCRYPTION_DIR}/lib")
+ set(DEPS ${DEPS} ${ENCRYPTION_DIR}/lib/pmodel-decrypt${CMAKE_STATIC_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if (NOT WIN32)
+ set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
+ set(DEPS ${DEPS} ${EXTERNAL_LIB})
+endif()
+
+set(DEPS ${DEPS} ${OpenCV_LIBS})
+add_library(paddlex_inference SHARED src/visualize src/transforms.cpp src/paddlex.cpp)
+ADD_DEPENDENCIES(paddlex_inference ext-yaml-cpp)
+target_link_libraries(paddlex_inference ${DEPS})
+
+add_executable(human_segmenter human_segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp)
+ADD_DEPENDENCIES(human_segmenter ext-yaml-cpp)
+target_link_libraries(human_segmenter ${DEPS})
+
+
+if (WIN32 AND WITH_MKL)
+ add_custom_command(TARGET human_segmenter POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
+ )
+ # for encryption
+ if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll")
+ add_custom_command(TARGET human_segmenter POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./pmodel-decrypt.dll
+ COMMAND ${CMAKE_COMMAND} -E copy_if_different ${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll ./release/pmodel-decrypt.dll
+ )
+ endif()
+endif()
+
+file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/visualize.h"
+DESTINATION "${CMAKE_BINARY_DIR}/include/" )
+file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/config_parser.h"
+DESTINATION "${CMAKE_BINARY_DIR}/include/" )
+file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/transforms.h"
+DESTINATION "${CMAKE_BINARY_DIR}/include/" )
+file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/results.h"
+DESTINATION "${CMAKE_BINARY_DIR}/include/" )
+file(COPY "${CMAKE_SOURCE_DIR}/include/paddlex/paddlex.h"
+DESTINATION "${CMAKE_BINARY_DIR}/include/" )
diff --git a/examples/human_segmentation/deploy/cpp/human_segmenter.cpp b/examples/human_segmentation/deploy/cpp/human_segmenter.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..479c7a7fd469f6fcfa2cf7b980114893a4febd78
--- /dev/null
+++ b/examples/human_segmentation/deploy/cpp/human_segmenter.cpp
@@ -0,0 +1,208 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include
+#include
+
+#include
+#include // NOLINT
+#include
+#include
+#include
+#include
+#include
+#include
+#include "include/paddlex/paddlex.h"
+#include "include/paddlex/visualize.h"
+
+#if defined(__arm__) || defined(__aarch64__)
+#include
+#endif
+
+using namespace std::chrono; // NOLINT
+
+DEFINE_string(model_dir, "", "Path of inference model");
+DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
+DEFINE_bool(use_trt, false, "Infering with TensorRT");
+DEFINE_int32(gpu_id, 0, "GPU card id");
+DEFINE_string(key, "", "key of encryption");
+DEFINE_string(image, "", "Path of test image file");
+DEFINE_bool(use_camera, false, "Infering with Camera");
+DEFINE_int32(camera_id, 0, "Camera id");
+DEFINE_string(video_path, "", "Path of input video");
+DEFINE_bool(show_result, false, "show the result of each frame with a window");
+DEFINE_bool(save_result, true, "save the result of each frame to a video");
+DEFINE_string(save_dir, "output", "Path to save visualized image");
+
+int main(int argc, char** argv) {
+ // Parsing command-line
+ google::ParseCommandLineFlags(&argc, &argv, true);
+
+ if (FLAGS_model_dir == "") {
+ std::cerr << "--model_dir need to be defined" << std::endl;
+ return -1;
+ }
+ if (FLAGS_image == "" & FLAGS_video_path == ""
+ & FLAGS_use_camera == false) {
+ std::cerr << "--image or --video_path or --use_camera need to be defined"
+ << std::endl;
+ return -1;
+ }
+
+ // Load model
+ PaddleX::Model model;
+ model.Init(FLAGS_model_dir,
+ FLAGS_use_gpu,
+ FLAGS_use_trt,
+ FLAGS_gpu_id,
+ FLAGS_key);
+ if (FLAGS_use_camera || FLAGS_video_path != "") {
+ // Open video
+ cv::VideoCapture capture;
+ if (FLAGS_use_camera) {
+ capture.open(FLAGS_camera_id);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the camera "
+ << FLAGS_camera_id << "."
+ << std::endl;
+ return -1;
+ }
+ } else {
+ capture.open(FLAGS_video_path);
+ if (!capture.isOpened()) {
+ std::cout << "Can not open the video "
+ << FLAGS_video_path << "."
+ << std::endl;
+ return -1;
+ }
+ }
+
+ // Create a VideoWriter
+ cv::VideoWriter video_out;
+ std::string video_out_path;
+ if (FLAGS_save_result) {
+ // Get video information: resolution, fps
+ int video_width = static_cast(capture.get(CV_CAP_PROP_FRAME_WIDTH));
+ int video_height =
+ static_cast(capture.get(CV_CAP_PROP_FRAME_HEIGHT));
+ int video_fps = static_cast(capture.get(CV_CAP_PROP_FPS));
+ int video_fourcc;
+ if (FLAGS_use_camera) {
+ video_fourcc = 828601953;
+ } else {
+ video_fourcc = static_cast(capture.get(CV_CAP_PROP_FOURCC));
+ }
+ if (FLAGS_use_camera) {
+ time_t now = time(0);
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir,
+ std::to_string(now) + ".mp4");
+ } else {
+ video_out_path =
+ PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_video_path);
+ }
+ video_out.open(video_out_path.c_str(),
+ video_fourcc,
+ video_fps,
+ cv::Size(video_width, video_height),
+ true);
+ if (!video_out.isOpened()) {
+ std::cout << "Create video writer failed!" << std::endl;
+ return -1;
+ }
+ }
+
+ PaddleX::SegResult result;
+ cv::Mat frame;
+ int key;
+ while (capture.read(frame)) {
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ key = cv::waitKey(1);
+ // When pressing `ESC`, then exit program and result video is saved
+ if (key == 27) {
+ break;
+ }
+ } else if (frame.empty()) {
+ break;
+ }
+ // Begin to predict
+ model.predict(frame, &result);
+ // Visualize results
+ std::vector label_map(result.label_map.data.begin(),
+ result.label_map.data.end());
+ cv::Mat mask(result.label_map.shape[0],
+ result.label_map.shape[1],
+ CV_8UC1,
+ label_map.data());
+ int rows = result.label_map.shape[0];
+ int cols = result.label_map.shape[1];
+ cv::Mat vis_img = frame.clone();
+ for (int i = 0; i < rows; i++) {
+ for (int j = 0; j < cols; j++) {
+ int category_id = static_cast(mask.at(i, j));
+ if (category_id == 0) {
+ vis_img.at(i, j)[0] = 255;
+ vis_img.at(i, j)[1] = 255;
+ vis_img.at(i, j)[2] = 255;
+ }
+ }
+ }
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::imshow("human_seg", vis_img);
+ }
+ if (FLAGS_save_result) {
+ video_out.write(vis_img);
+ }
+ result.clear();
+ }
+ capture.release();
+ if (FLAGS_save_result) {
+ video_out.release();
+ std::cout << "Visualized output saved as " << video_out_path << std::endl;
+ }
+ if (FLAGS_show_result || FLAGS_use_camera) {
+ cv::destroyAllWindows();
+ }
+ } else {
+ PaddleX::SegResult result;
+ cv::Mat im = cv::imread(FLAGS_image, 1);
+ model.predict(im, &result);
+ // Visualize results
+ std::vector label_map(result.label_map.data.begin(),
+ result.label_map.data.end());
+ cv::Mat mask(result.label_map.shape[0],
+ result.label_map.shape[1],
+ CV_8UC1,
+ label_map.data());
+ int rows = result.label_map.shape[0];
+ int cols = result.label_map.shape[1];
+ cv::Mat vis_img = im.clone();
+ for (int i = 0; i < rows; i++) {
+ for (int j = 0; j < cols; j++) {
+ int category_id = static_cast(mask.at(i, j));
+ if (category_id == 0) {
+ vis_img.at(i, j)[0] = 255;
+ vis_img.at(i, j)[1] = 255;
+ vis_img.at(i, j)[2] = 255;
+ }
+ }
+ }
+ std::string save_path =
+ PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image);
+ cv::imwrite(save_path, vis_img);
+ result.clear();
+ std::cout << "Visualized output saved as " << save_path << std::endl;
+ }
+ return 0;
+}
diff --git a/examples/meter_reader/README.md b/examples/meter_reader/README.md
index f8c8388f395bbf64e7111e873f4e269702b3c6eb..ce5666f5afeecb0dc97dd78429ae132ae52a7723 100644
--- a/examples/meter_reader/README.md
+++ b/examples/meter_reader/README.md
@@ -148,8 +148,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
- | det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
- | seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
@@ -163,13 +161,20 @@ git clone https://github.com/PaddlePaddle/PaddleX
用于部署推理的模型应为inference格式,本案例提供的预训练模型均为inference格式,如若是重新训练的模型,需参考[导出inference模型](https://paddlex.readthedocs.io/zh_CN/latest/tutorials/deploy/deploy_server/deploy_python.html#inference)将模型导出为inference格式。
* 使用未加密的模型对单张图片做预测
-
```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image=\path\to\meter_test\20190822_168.jpg --use_gpu=1 --use_erode=1 --save_dir=output
```
* 使用未加密的模型对图像列表做预测
+ 图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
+ ```
+ \path\to\images\1.jpg
+ \path\to\images\2.jpg
+ ...
+ \path\to\images\n.jpg
+ ```
+
```shell
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --image_list=\path\to\meter_test\image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
```
@@ -180,12 +185,12 @@ git clone https://github.com/PaddlePaddle/PaddleX
.\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\det_inference_model --seg_model_dir=\path\to\seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
```
- * 使用加密后的模型对单张图片做预测
+ * 使用加密后的模型对单张图片做预测
- 如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
+ 如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`\path\to\encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`\path\to\encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
- ```shell
- .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
+ ```shell
+ .\paddlex_inference\meter_reader.exe --det_model_dir=\path\to\encrypted_det_inference_model --seg_model_dir=\path\to\encrypted_seg_inference_model --image=\path\to\test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
```
### Linux系统的jetson嵌入式设备安全部署
@@ -213,8 +218,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
| use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| gpu_id | GPU 设备ID, 默认值为0 |
| save_dir | 保存可视化结果的路径, 默认值为"output"|
- | det_key | 检测模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的检测模型 |
- | seg_key | 分割模型加密过程中产生的密钥信息,默认值为""表示加载的是未加密的分割模型 |
| seg_batch_size | 分割的批量大小,默认为2 |
| thread_num | 分割预测的线程数,默认为cpu处理器个数 |
| use_camera | 是否使用摄像头采集图片,支持值为0或1(默认值为0) |
@@ -234,6 +237,13 @@ git clone https://github.com/PaddlePaddle/PaddleX
```
* 使用未加密的模型对图像列表做预测
+ 图像列表image_list.txt内容的格式如下,因绝对路径不同,暂未提供该文件,用户可根据实际情况自行生成:
+ ```
+ \path\to\images\1.jpg
+ \path\to\images\2.jpg
+ ...
+ \path\to\images\n.jpg
+ ```
```shell
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --image_list=/path/to/image_list.txt --use_gpu=1 --use_erode=1 --save_dir=output
@@ -245,15 +255,6 @@ git clone https://github.com/PaddlePaddle/PaddleX
./build/meter_reader/meter_reader --det_model_dir=/path/to/det_inference_model --seg_model_dir=/path/to/seg_inference_model --use_camera=1 --use_gpu=1 --use_erode=1 --save_dir=output
```
- * 使用加密后的模型对单张图片做预测
-
- 如果未对模型进行加密,请参考[加密PaddleX模型](../../docs/deploy/server/encryption.md#13-加密paddlex模型)对模型进行加密。例如加密后的检测模型所在目录为`/path/to/encrypted_det_inference_model`,密钥为`yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0=`;加密后的分割模型所在目录为`/path/to/encrypted_seg_inference_model`,密钥为`DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=`
-
- ```shell
- ./build/meter_reader/meter_reader --det_model_dir=/path/to/encrypted_det_inference_model --seg_model_dir=/path/to/encrypted_seg_inference_model --image=/path/to/test.jpg --use_gpu=1 --use_erode=1 --save_dir=output --det_key yEBLDiBOdlj+5EsNNrABhfDuQGkdcreYcHcncqwdbx0= --seg_key DbVS64I9pFRo5XmQ8MNV2kSGsfEr4FKA6OH9OUhRrsY=
- ```
-
-
## 模型训练
diff --git a/examples/meter_reader/deploy/cpp/meter_reader/meter_reader.cpp b/examples/meter_reader/deploy/cpp/meter_reader/meter_reader.cpp
index 79307fa05eb7b99c753fd978bcec9f0eb1e2f534..04c6f0e5316e9024c4f103e120a72f2f98f34203 100644
--- a/examples/meter_reader/deploy/cpp/meter_reader/meter_reader.cpp
+++ b/examples/meter_reader/deploy/cpp/meter_reader/meter_reader.cpp
@@ -51,7 +51,8 @@ DEFINE_string(seg_key, "", "Segmenter model key of encryption");
DEFINE_string(image, "", "Path of test image file");
DEFINE_string(image_list, "", "Path of test image list file");
DEFINE_string(save_dir, "output", "Path to save visualized image");
-DEFINE_double(score_threshold, 0.5, "Detected bbox whose score is lower than this threshlod is filtered");
+DEFINE_double(score_threshold, 0.5,
+ "Detected bbox whose score is lower than this threshlod is filtered");
void predict(const cv::Mat &input_image, PaddleX::Model *det_model,
PaddleX::Model *seg_model, const std::string save_dir,
@@ -207,7 +208,7 @@ int main(int argc, char **argv) {
return -1;
}
- // 加载模型
+ // Load model
PaddleX::Model det_model;
det_model.Init(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_use_trt,
FLAGS_gpu_id, FLAGS_det_key);
diff --git a/paddlex/__init__.py b/paddlex/__init__.py
index 404c1789118cd1a4edc1e320892dbef30fca0fef..25fd9f4ec65108feae0cb62743d91468967b88c4 100644
--- a/paddlex/__init__.py
+++ b/paddlex/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cls.py b/paddlex/cls.py
index 90c5eefce512c966a04975ebfe6457613012c872..7711fe77b4b7a7632401e30a2aeb4b6801ddf35f 100644
--- a/paddlex/cls.py
+++ b/paddlex/cls.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/command.py b/paddlex/command.py
index bd2922a439e7b7921c5b5a1307d0c947cf5fd982..4fde4b879c55eb4a278b9089fa4b4b9b0d38c7a5 100644
--- a/paddlex/command.py
+++ b/paddlex/command.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -15,6 +15,7 @@
from six import text_type as _text_type
import argparse
import sys
+import os.path as osp
import paddlex.utils.logging as logging
@@ -91,6 +92,33 @@ def arg_parser():
"-fs",
default=None,
help="export inference model with fixed input shape:[w,h]")
+ parser.add_argument(
+ "--split_dataset",
+ "-sd",
+ action="store_true",
+ default=False,
+ help="split dataset with the split value")
+ parser.add_argument(
+ "--format",
+ "-f",
+ default=None,
+ help="define dataset format(ImageNet/COCO/VOC/Seg)")
+ parser.add_argument(
+ "--dataset_dir",
+ "-dd",
+ type=_text_type,
+ default=None,
+ help="define the path of dataset to be splited")
+ parser.add_argument(
+ "--val_value",
+ "-vv",
+ default=None,
+ help="define the value of validation dataset(E.g 0.2)")
+ parser.add_argument(
+ "--test_value",
+ "-tv",
+ default=None,
+ help="define the value of test dataset(E.g 0.1)")
return parser
@@ -159,6 +187,30 @@ def main():
pdx.tools.convert.dataset_conversion(args.source, args.to, args.pics,
args.annotations, args.save_dir)
+ if args.split_dataset:
+ assert args.dataset_dir is not None, "--dataset_dir should be defined while spliting dataset"
+ assert args.format is not None, "--form should be defined while spliting dataset"
+ assert args.val_value is not None, "--val_value should be defined while spliting dataset"
+
+ dataset_dir = args.dataset_dir
+ dataset_format = args.format.lower()
+ val_value = float(args.val_value)
+ test_value = float(args.test_value
+ if args.test_value is not None else 0)
+ save_dir = dataset_dir
+
+ if not dataset_format in ["coco", "imagenet", "voc", "seg"]:
+ logging.error(
+ "The dataset format is not correct defined.(support COCO/ImageNet/VOC/Seg)"
+ )
+ if not osp.exists(dataset_dir):
+ logging.error("The path of dataset to be splited doesn't exist.")
+ if val_value <= 0 or val_value >= 1 or test_value < 0 or test_value >= 1 or val_value + test_value >= 1:
+ logging.error("The value of split is not correct.")
+
+ pdx.tools.split.dataset_split(dataset_dir, dataset_format, val_value,
+ test_value, save_dir)
+
if __name__ == "__main__":
main()
diff --git a/paddlex/convertor.py b/paddlex/convertor.py
index 4224993cccb3429055ec5bf1a5703329c800e9cf..c41ec959cb7da2b999e300df4234aa35b0611d6d 100644
--- a/paddlex/convertor.py
+++ b/paddlex/convertor.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/__init__.py b/paddlex/cv/__init__.py
index de2ed215de0a00a69da827683ad6563afd862ed9..0d1a546e7c0513619335dd86d6dcdfbfd0f8e042 100644
--- a/paddlex/cv/__init__.py
+++ b/paddlex/cv/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -26,6 +26,7 @@ ResNet50 = models.ResNet50
DarkNet53 = models.DarkNet53
# detection
YOLOv3 = models.YOLOv3
+PPYOLO = models.PPYOLO
#EAST = models.EAST
FasterRCNN = models.FasterRCNN
MaskRCNN = models.MaskRCNN
diff --git a/paddlex/cv/datasets/__init__.py b/paddlex/cv/datasets/__init__.py
index 926f4942d844c7562de9e977b10446328b9f8303..bd5275246eaf0f9357417de28c6f7c4eb68f3f07 100644
--- a/paddlex/cv/datasets/__init__.py
+++ b/paddlex/cv/datasets/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/datasets/coco.py b/paddlex/cv/datasets/coco.py
index 264b2da1e6a6aa9e15bf8a2ae9b3fbdc3ee75f1b..8cc93c3a677e4d79562fc2161e99c57b6c508d28 100644
--- a/paddlex/cv/datasets/coco.py
+++ b/paddlex/cv/datasets/coco.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -15,6 +15,8 @@
from __future__ import absolute_import
import copy
import os.path as osp
+import six
+import sys
import random
import numpy as np
import paddlex.utils.logging as logging
@@ -48,6 +50,12 @@ class CocoDetection(VOCDetection):
shuffle=False):
from pycocotools.coco import COCO
+ try:
+ import shapely.ops
+ from shapely.geometry import Polygon, MultiPolygon, GeometryCollection
+ except:
+ six.reraise(*sys.exc_info())
+
super(VOCDetection, self).__init__(
transforms=transforms,
num_workers=num_workers,
diff --git a/paddlex/cv/datasets/dataset.py b/paddlex/cv/datasets/dataset.py
index 15ba5055d0635069bc53245cacc298f6f3d6f0ef..82a29f5443c56c9caab2ad725e72493e0bc4bd51 100644
--- a/paddlex/cv/datasets/dataset.py
+++ b/paddlex/cv/datasets/dataset.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -115,7 +115,7 @@ def multithread_reader(mapper,
while not isinstance(sample, EndSignal):
batch_data.append(sample)
if len(batch_data) == batch_size:
- batch_data = generate_minibatch(batch_data)
+ batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data
batch_data = []
sample = out_queue.get()
@@ -127,11 +127,11 @@ def multithread_reader(mapper,
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
- batch_data = generate_minibatch(batch_data)
+ batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data
batch_data = []
if not drop_last and len(batch_data) != 0:
- batch_data = generate_minibatch(batch_data)
+ batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data
batch_data = []
@@ -188,18 +188,21 @@ def multiprocess_reader(mapper,
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
- batch_data = generate_minibatch(batch_data)
+ batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data
batch_data = []
if len(batch_data) != 0 and not drop_last:
- batch_data = generate_minibatch(batch_data)
+ batch_data = generate_minibatch(batch_data, mapper=mapper)
yield batch_data
batch_data = []
return queue_reader
-def generate_minibatch(batch_data, label_padding_value=255):
+def generate_minibatch(batch_data, label_padding_value=255, mapper=None):
+ if mapper is not None and mapper.batch_transforms is not None:
+ for op in mapper.batch_transforms:
+ batch_data = op(batch_data)
# if batch_size is 1, do not pad the image
if len(batch_data) == 1:
return batch_data
@@ -218,14 +221,13 @@ def generate_minibatch(batch_data, label_padding_value=255):
(im_c, max_shape[1], max_shape[2]), dtype=np.float32)
padding_im[:, :im_h, :im_w] = data[0]
if len(data) > 2:
- # padding the image, label and insert 'padding' into `im_info` of segmentation during evaluating phase.
+ # padding the image, label and insert 'padding' into `im_info` of segmentation during evaluating phase.
if len(data[1]) == 0 or 'padding' not in [
data[1][i][0] for i in range(len(data[1]))
]:
data[1].append(('padding', [im_h, im_w]))
padding_batch.append((padding_im, data[1], data[2]))
-
elif len(data) > 1:
if isinstance(data[1], np.ndarray) and len(data[1].shape) > 1:
# padding the image and label of segmentation during the training
diff --git a/paddlex/cv/datasets/easydata_cls.py b/paddlex/cv/datasets/easydata_cls.py
index 9c07aa3cfaf87ecf569cebf670dc523efee96fdd..68607b18b8b66f316120fd378d683bfc4b421873 100644
--- a/paddlex/cv/datasets/easydata_cls.py
+++ b/paddlex/cv/datasets/easydata_cls.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/datasets/easydata_det.py b/paddlex/cv/datasets/easydata_det.py
index 65d5471bfd6ab8651cbdc856963d5b7f65dc9acf..445b4e6a725c19b9002c463a75e6361f164fefba 100644
--- a/paddlex/cv/datasets/easydata_det.py
+++ b/paddlex/cv/datasets/easydata_det.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -25,6 +25,7 @@ from .voc import VOCDetection
from .dataset import is_pic
from .dataset import get_encoding
+
class EasyDataDet(VOCDetection):
"""读取EasyDataDet格式的检测数据集,并对样本进行相应的处理。
@@ -41,7 +42,7 @@ class EasyDataDet(VOCDetection):
线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
-
+
def __init__(self,
data_dir,
file_list,
@@ -60,12 +61,12 @@ class EasyDataDet(VOCDetection):
self.file_list = list()
self.labels = list()
self._epoch = 0
-
+
annotations = {}
annotations['images'] = []
annotations['categories'] = []
annotations['annotations'] = []
-
+
cname2cid = {}
label_id = 1
with open(label_list, encoding=get_encoding(label_list)) as fr:
@@ -80,7 +81,7 @@ class EasyDataDet(VOCDetection):
'id': v,
'name': k
})
-
+
from pycocotools.mask import decode
ct = 0
ann_ct = 0
@@ -95,8 +96,8 @@ class EasyDataDet(VOCDetection):
if not osp.isfile(json_file):
continue
if not osp.exists(img_file):
- raise IOError(
- 'The image file {} is not exist!'.format(img_file))
+ raise IOError('The image file {} is not exist!'.format(
+ img_file))
with open(json_file, mode='r', \
encoding=get_encoding(json_file)) as j:
json_info = json.load(j)
@@ -127,21 +128,15 @@ class EasyDataDet(VOCDetection):
mask = decode(mask_dict)
gt_poly[i] = self.mask2polygon(mask)
annotations['annotations'].append({
- 'iscrowd':
- 0,
- 'image_id':
- int(im_id[0]),
+ 'iscrowd': 0,
+ 'image_id': int(im_id[0]),
'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1],
- 'area':
- float((x2 - x1 + 1) * (y2 - y1 + 1)),
- 'segmentation':
- [[x1, y1, x1, y2, x2, y2, x2, y1]] if gt_poly[i] is None else gt_poly[i],
- 'category_id':
- cname2cid[cname],
- 'id':
- ann_ct,
- 'difficult':
- 0
+ 'area': float((x2 - x1 + 1) * (y2 - y1 + 1)),
+ 'segmentation': [[x1, y1, x1, y2, x2, y2, x2, y1]]
+ if gt_poly[i] is None else gt_poly[i],
+ 'category_id': cname2cid[cname],
+ 'id': ann_ct,
+ 'difficult': 0
})
ann_ct += 1
im_info = {
@@ -162,14 +157,10 @@ class EasyDataDet(VOCDetection):
self.file_list.append([img_file, voc_rec])
ct += 1
annotations['images'].append({
- 'height':
- im_h,
- 'width':
- im_w,
- 'id':
- int(im_id[0]),
- 'file_name':
- osp.split(img_file)[1]
+ 'height': im_h,
+ 'width': im_w,
+ 'id': int(im_id[0]),
+ 'file_name': osp.split(img_file)[1]
})
if not len(self.file_list) > 0:
@@ -181,13 +172,13 @@ class EasyDataDet(VOCDetection):
self.coco_gt = COCO()
self.coco_gt.dataset = annotations
self.coco_gt.createIndex()
-
+
def mask2polygon(self, mask):
contours, hierarchy = cv2.findContours(
- (mask).astype(np.uint8), cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
+ (mask).astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
segmentation = []
for contour in contours:
contour_list = contour.flatten().tolist()
if len(contour_list) > 4:
segmentation.append(contour_list)
- return segmentation
\ No newline at end of file
+ return segmentation
diff --git a/paddlex/cv/datasets/easydata_seg.py b/paddlex/cv/datasets/easydata_seg.py
index 5e938cca10a346bf1c92ae65413c801d589da5e9..6b706fbd63d77c1b6f2c693cd43fb8b5c50a1e24 100644
--- a/paddlex/cv/datasets/easydata_seg.py
+++ b/paddlex/cv/datasets/easydata_seg.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -25,6 +25,7 @@ from .dataset import Dataset
from .dataset import get_encoding
from .dataset import is_pic
+
class EasyDataSeg(Dataset):
"""读取EasyDataSeg语义分割任务数据集,并对样本进行相应的处理。
@@ -67,7 +68,7 @@ class EasyDataSeg(Dataset):
cname2cid[line.strip()] = label_id
label_id += 1
self.labels.append(line.strip())
-
+
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
img_file, json_file = [osp.join(data_dir, x) \
@@ -79,8 +80,8 @@ class EasyDataSeg(Dataset):
if not osp.isfile(json_file):
continue
if not osp.exists(img_file):
- raise IOError(
- 'The image file {} is not exist!'.format(img_file))
+ raise IOError('The image file {} is not exist!'.format(
+ img_file))
with open(json_file, mode='r', \
encoding=get_encoding(json_file)) as j:
json_info = json.load(j)
@@ -97,7 +98,8 @@ class EasyDataSeg(Dataset):
mask_dict['counts'] = obj['mask'].encode()
mask = decode(mask_dict)
mask *= cid
- conflict_index = np.where(((lable_npy > 0) & (mask == cid)) == True)
+ conflict_index = np.where(((lable_npy > 0) &
+ (mask == cid)) == True)
mask[conflict_index] = 0
lable_npy += mask
self.file_list.append([img_file, lable_npy])
diff --git a/paddlex/cv/datasets/imagenet.py b/paddlex/cv/datasets/imagenet.py
index 41024e6a37b365d91a83f075c8b289e1f9f8a826..ea93d583d6c35eff5b23f495ee006b8582effe3d 100644
--- a/paddlex/cv/datasets/imagenet.py
+++ b/paddlex/cv/datasets/imagenet.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/datasets/shared_queue/__init__.py b/paddlex/cv/datasets/shared_queue/__init__.py
index f4c3990e67d6ade96d20abd1aa34b34b1ff891cb..29a5e0e06754274dc83fca71dcc722b086115aa4 100644
--- a/paddlex/cv/datasets/shared_queue/__init__.py
+++ b/paddlex/cv/datasets/shared_queue/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/datasets/shared_queue/queue.py b/paddlex/cv/datasets/shared_queue/queue.py
index 157df0a51ee3d552c810bafe5e826c1072c75649..85b126fa7bd62fca5dd831320e4fe42c4aa3c10c 100644
--- a/paddlex/cv/datasets/shared_queue/queue.py
+++ b/paddlex/cv/datasets/shared_queue/queue.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/datasets/shared_queue/sharedmemory.py b/paddlex/cv/datasets/shared_queue/sharedmemory.py
index 2712fc42b728ee87bf4413fab869cbc9e7609029..c05834e02747cc7a9db1a9d218764869c4aac4fd 100644
--- a/paddlex/cv/datasets/shared_queue/sharedmemory.py
+++ b/paddlex/cv/datasets/shared_queue/sharedmemory.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -278,8 +278,8 @@ class PageAllocator(object):
def set_alloc_info(self, alloc_pos, used_pages):
""" set allocating position to new value
"""
- memcopy(self._base[4:12], struct.pack(
- str('II'), alloc_pos, used_pages))
+ memcopy(self._base[4:12],
+ struct.pack(str('II'), alloc_pos, used_pages))
def set_page_status(self, start, page_num, status):
""" set pages from 'start' to 'end' with new same status 'status'
@@ -525,8 +525,8 @@ class SharedMemoryMgr(object):
logger.info('destroy [%s]' % (self))
if not self._released and not self._allocator.empty():
- logger.debug(
- 'not empty when delete this SharedMemoryMgr[%s]' % (self))
+ logger.debug('not empty when delete this SharedMemoryMgr[%s]' %
+ (self))
else:
self._released = True
diff --git a/paddlex/cv/datasets/voc.py b/paddlex/cv/datasets/voc.py
index 410c9f7d4a7d02c5743491723226a5cfbdd6c182..fae619b31bbf2a173fe949618c997b98a616636b 100644
--- a/paddlex/cv/datasets/voc.py
+++ b/paddlex/cv/datasets/voc.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/__init__.py b/paddlex/cv/models/__init__.py
index 1c7e4b35bc7387c3f5c536e74edc0feafa1811d9..679f8bf52cfe4b8a4a611dd5ad7641845e05efba 100644
--- a/paddlex/cv/models/__init__.py
+++ b/paddlex/cv/models/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -38,6 +38,7 @@ from .classifier import HRNet_W18
from .classifier import AlexNet
from .base import BaseAPI
from .yolo_v3 import YOLOv3
+from .ppyolo import PPYOLO
from .faster_rcnn import FasterRCNN
from .mask_rcnn import MaskRCNN
from .unet import UNet
diff --git a/paddlex/cv/models/base.py b/paddlex/cv/models/base.py
index 399a6708faeeb694052d5b4c27c95dd13bf71d6b..19bf4f034a2fb2c0c42126843913517f8c7cb56a 100644
--- a/paddlex/cv/models/base.py
+++ b/paddlex/cv/models/base.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -246,8 +246,8 @@ class BaseAPI:
logging.info(
"Load pretrain weights from {}.".format(pretrain_weights),
use_color=True)
- paddlex.utils.utils.load_pretrain_weights(self.exe, self.train_prog,
- pretrain_weights, fuse_bn)
+ paddlex.utils.utils.load_pretrain_weights(
+ self.exe, self.train_prog, pretrain_weights, fuse_bn)
# 进行裁剪
if sensitivities_file is not None:
import paddleslim
@@ -351,7 +351,9 @@ class BaseAPI:
logging.info("Model saved in {}.".format(save_dir))
def export_inference_model(self, save_dir):
- test_input_names = [var.name for var in list(self.test_inputs.values())]
+ test_input_names = [
+ var.name for var in list(self.test_inputs.values())
+ ]
test_outputs = list(self.test_outputs.values())
with fluid.scope_guard(self.scope):
if self.__class__.__name__ == 'MaskRCNN':
@@ -389,7 +391,8 @@ class BaseAPI:
# 模型保存成功的标志
open(osp.join(save_dir, '.success'), 'w').close()
- logging.info("Model for inference deploy saved in {}.".format(save_dir))
+ logging.info("Model for inference deploy saved in {}.".format(
+ save_dir))
def train_loop(self,
num_epochs,
@@ -516,11 +519,13 @@ class BaseAPI:
eta = ((num_epochs - i) * total_num_steps - step - 1
) * avg_step_time
if time_eval_one_epoch is not None:
- eval_eta = (total_eval_times - i // save_interval_epochs
- ) * time_eval_one_epoch
+ eval_eta = (
+ total_eval_times - i // save_interval_epochs
+ ) * time_eval_one_epoch
else:
- eval_eta = (total_eval_times - i // save_interval_epochs
- ) * total_num_steps_eval * avg_step_time
+ eval_eta = (
+ total_eval_times - i // save_interval_epochs
+ ) * total_num_steps_eval * avg_step_time
eta_str = seconds_to_hms(eta + eval_eta)
logging.info(
@@ -543,6 +548,8 @@ class BaseAPI:
current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
if not osp.isdir(current_save_dir):
os.makedirs(current_save_dir)
+ if getattr(self, 'use_ema', False):
+ self.exe.run(self.ema.apply_program)
if eval_dataset is not None and eval_dataset.num_samples > 0:
self.eval_metrics, self.eval_details = self.evaluate(
eval_dataset=eval_dataset,
@@ -569,6 +576,8 @@ class BaseAPI:
log_writer.add_scalar(
"Metrics/Eval(Epoch): {}".format(k), v, i + 1)
self.save_model(save_dir=current_save_dir)
+ if getattr(self, 'use_ema', False):
+ self.exe.run(self.ema.restore_program)
time_eval_one_epoch = time.time() - eval_epoch_start_time
eval_epoch_start_time = time.time()
if best_model_epoch > 0:
diff --git a/paddlex/cv/models/classifier.py b/paddlex/cv/models/classifier.py
index 3e8b70ea35a2f40ba2dadd98a385de68480bcd8e..7f1c3527d8c681e8737e6a65a898ec083495bf4b 100644
--- a/paddlex/cv/models/classifier.py
+++ b/paddlex/cv/models/classifier.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/deeplabv3p.py b/paddlex/cv/models/deeplabv3p.py
index d5afd4659d56531a6c483456e6e8ae35d40400b2..fe1c294ae61d5d7e6e18696e56ff22909d8cc6c8 100644
--- a/paddlex/cv/models/deeplabv3p.py
+++ b/paddlex/cv/models/deeplabv3p.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -37,7 +37,7 @@ class DeepLabv3p(BaseAPI):
num_classes (int): 类别数。
backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41',
'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
- 'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。
+ 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld']。默认'MobileNetV2_x1.0'。
output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。默认True。
decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。默认True。
@@ -51,10 +51,13 @@ class DeepLabv3p(BaseAPI):
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None时,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
+ pooling_crop_size (list): 当backbone为MobileNetV3_large_x1_0_ssld时,需设置为训练过程中模型输入大小, 格式为[W, H]。
+ 在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用'pool'算子得到平均值。
+ 默认值为None。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25',
- 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0']之内。
+ 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld']之内。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
@@ -71,7 +74,8 @@ class DeepLabv3p(BaseAPI):
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
- ignore_index=255):
+ ignore_index=255,
+ pooling_crop_size=None):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
@@ -85,12 +89,12 @@ class DeepLabv3p(BaseAPI):
if backbone not in [
'Xception65', 'Xception41', 'MobileNetV2_x0.25',
'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
- 'MobileNetV2_x2.0'
+ 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld'
]:
raise ValueError(
"backbone: {} is set wrong. it should be one of "
"('Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',"
- " 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0')".
+ " 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0', 'MobileNetV3_large_x1_0_ssld')".
format(backbone))
if class_weight is not None:
@@ -121,6 +125,30 @@ class DeepLabv3p(BaseAPI):
self.labels = None
self.sync_bn = True
self.fixed_input_shape = None
+ self.pooling_stride = [1, 1]
+ self.pooling_crop_size = pooling_crop_size
+ self.aspp_with_se = False
+ self.se_use_qsigmoid = False
+ self.aspp_convs_filters = 256
+ self.aspp_with_concat_projection = True
+ self.add_image_level_feature = True
+ self.use_sum_merge = False
+ self.conv_filters = 256
+ self.output_is_logits = False
+ self.backbone_lr_mult_list = None
+ if 'MobileNetV3' in backbone:
+ self.output_stride = 32
+ self.pooling_stride = (4, 5)
+ self.aspp_with_se = True
+ self.se_use_qsigmoid = True
+ self.aspp_convs_filters = 128
+ self.aspp_with_concat_projection = False
+ self.add_image_level_feature = False
+ self.use_sum_merge = True
+ self.output_is_logits = True
+ if self.output_is_logits:
+ self.conv_filters = self.num_classes
+ self.backbone_lr_mult_list = [0.15, 0.35, 0.65, 0.85, 1]
def _get_backbone(self, backbone):
def mobilenetv2(backbone):
@@ -167,10 +195,22 @@ class DeepLabv3p(BaseAPI):
end_points=end_points,
decode_points=decode_points)
+ def mobilenetv3(backbone):
+ scale = 1.0
+ lr_mult_list = self.backbone_lr_mult_list
+ return paddlex.cv.nets.MobileNetV3(
+ scale=scale,
+ model_name='large',
+ output_stride=self.output_stride,
+ lr_mult_list=lr_mult_list,
+ for_seg=True)
+
if 'Xception' in backbone:
return xception(backbone)
elif 'MobileNetV2' in backbone:
return mobilenetv2(backbone)
+ elif 'MobileNetV3' in backbone:
+ return mobilenetv3(backbone)
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.DeepLabv3p(
@@ -186,7 +226,17 @@ class DeepLabv3p(BaseAPI):
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index,
- fixed_input_shape=self.fixed_input_shape)
+ fixed_input_shape=self.fixed_input_shape,
+ pooling_stride=self.pooling_stride,
+ pooling_crop_size=self.pooling_crop_size,
+ aspp_with_se=self.aspp_with_se,
+ se_use_qsigmoid=self.se_use_qsigmoid,
+ aspp_convs_filters=self.aspp_convs_filters,
+ aspp_with_concat_projection=self.aspp_with_concat_projection,
+ add_image_level_feature=self.add_image_level_feature,
+ use_sum_merge=self.use_sum_merge,
+ conv_filters=self.conv_filters,
+ output_is_logits=self.output_is_logits)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
@@ -360,18 +410,16 @@ class DeepLabv3p(BaseAPI):
pred = pred[0:num_samples]
for i in range(num_samples):
- one_pred = pred[i].astype('uint8')
+ one_pred = np.squeeze(pred[i]).astype('uint8')
one_label = labels[i]
for info in im_info[i][::-1]:
if info[0] == 'resize':
w, h = info[1][1], info[1][0]
- one_pred = cv2.resize(one_pred, (w, h), cv2.INTER_NEAREST)
+ one_pred = cv2.resize(one_pred, (w, h),
+ cv2.INTER_NEAREST)
elif info[0] == 'padding':
w, h = info[1][1], info[1][0]
one_pred = one_pred[0:h, 0:w]
- else:
- raise Exception("Unexpected info '{}' in im_info".format(
- info[0]))
one_pred = one_pred.astype('int64')
one_pred = one_pred[np.newaxis, :, :, np.newaxis]
one_label = one_label[np.newaxis, np.newaxis, :, :]
@@ -429,9 +477,6 @@ class DeepLabv3p(BaseAPI):
w, h = info[1][1], info[1][0]
pred = pred[0:h, 0:w]
logit = logit[0:h, 0:w, :]
- else:
- raise Exception("Unexpected info '{}' in im_info".format(
- info[0]))
pred_list.append(pred)
logit_list.append(logit)
diff --git a/paddlex/cv/models/fast_scnn.py b/paddlex/cv/models/fast_scnn.py
index 5f66e4df6ede1b48c0363b5b8a496b23021454ef..36f6ffbb887ce868c38578dec18e099a71fb7f02 100644
--- a/paddlex/cv/models/fast_scnn.py
+++ b/paddlex/cv/models/fast_scnn.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/faster_rcnn.py b/paddlex/cv/models/faster_rcnn.py
index a6b8f2a118c6aa1681f853da243b812aaf8b030a..3ab4da52899a7d122a68d2de17666addc8ae4849 100644
--- a/paddlex/cv/models/faster_rcnn.py
+++ b/paddlex/cv/models/faster_rcnn.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/hrnet.py b/paddlex/cv/models/hrnet.py
index 691114da8caffb2bf86860ed51cd07e449ae7cd7..8d9a224de34c91ea9663d2fe4cbed2683f817662 100644
--- a/paddlex/cv/models/hrnet.py
+++ b/paddlex/cv/models/hrnet.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/load_model.py b/paddlex/cv/models/load_model.py
index 7c9aa265d4e863e0e3b97e4460a98313b58e40dd..afccc44506079eea4b6043610dedefb0a8be5334 100644
--- a/paddlex/cv/models/load_model.py
+++ b/paddlex/cv/models/load_model.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -103,8 +103,8 @@ def load_model(model_dir, fixed_input_shape=None):
model.model_type, info['Transforms'], info['BatchTransforms'])
model.eval_transforms = copy.deepcopy(model.test_transforms)
else:
- model.test_transforms = build_transforms(model.model_type,
- info['Transforms'], to_rgb)
+ model.test_transforms = build_transforms(
+ model.model_type, info['Transforms'], to_rgb)
model.eval_transforms = copy.deepcopy(model.test_transforms)
if '_Attributes' in info:
diff --git a/paddlex/cv/models/mask_rcnn.py b/paddlex/cv/models/mask_rcnn.py
index 888cd21725b68ea7e467681f2ac42789c2a72d81..7f31cd530ff0d6660e65661531b442941c88a336 100644
--- a/paddlex/cv/models/mask_rcnn.py
+++ b/paddlex/cv/models/mask_rcnn.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -280,8 +280,9 @@ class MaskRCNN(FasterRCNN):
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
- logging.info("Start to evaluating(total_samples={}, total_steps={})...".
- format(eval_dataset.num_samples, total_steps))
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data]).astype('float32')
@@ -325,7 +326,8 @@ class MaskRCNN(FasterRCNN):
zip(['bbox_map', 'segm_map'],
[ap_stats[0][1], ap_stats[1][1]]))
else:
- metrics = OrderedDict(zip(['bbox_map', 'segm_map'], [0.0, 0.0]))
+ metrics = OrderedDict(
+ zip(['bbox_map', 'segm_map'], [0.0, 0.0]))
elif metric == 'COCO':
if isinstance(ap_stats[0], np.ndarray) and isinstance(ap_stats[1],
np.ndarray):
@@ -429,8 +431,8 @@ class MaskRCNN(FasterRCNN):
if transforms is None:
transforms = self.test_transforms
im, im_resize_info, im_shape = FasterRCNN._preprocess(
- img_file_list, transforms, self.model_type, self.__class__.__name__,
- thread_num)
+ img_file_list, transforms, self.model_type,
+ self.__class__.__name__, thread_num)
with fluid.scope_guard(self.scope):
result = self.exe.run(self.test_prog,
diff --git a/paddlex/cv/models/ppyolo.py b/paddlex/cv/models/ppyolo.py
new file mode 100644
index 0000000000000000000000000000000000000000..e82dea4b10b4857d4aeea86e1c4998fdaa7358dc
--- /dev/null
+++ b/paddlex/cv/models/ppyolo.py
@@ -0,0 +1,565 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+import math
+import tqdm
+import os.path as osp
+import numpy as np
+from multiprocessing.pool import ThreadPool
+import paddle.fluid as fluid
+from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
+from paddle.fluid.optimizer import ExponentialMovingAverage
+import paddlex.utils.logging as logging
+import paddlex
+import copy
+from paddlex.cv.transforms import arrange_transforms
+from paddlex.cv.datasets import generate_minibatch
+from .base import BaseAPI
+from collections import OrderedDict
+from .utils.detection_eval import eval_results, bbox2out
+
+
+class PPYOLO(BaseAPI):
+ """构建PPYOLO,并实现其训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 类别数。默认为80。
+ backbone (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd']。默认为'ResNet50_vd'。
+ with_dcn_v2 (bool): Backbone是否使用DCNv2结构。默认为True。
+ anchors (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
+ [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+ [59, 119], [116, 90], [156, 198], [373, 326]]。
+ anchor_masks (list|tuple): 在计算PPYOLO损失时,使用anchor的mask索引,为None时表示使用默认值
+ [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
+ use_coord_conv (bool): 是否使用CoordConv。默认值为True。
+ use_iou_aware (bool): 是否使用IoU Aware分支。默认值为True。
+ use_spp (bool): 是否使用Spatial Pyramid Pooling结构。默认值为True。
+ use_drop_block (bool): 是否使用Drop Block。默认值为True。
+ scale_x_y (float): 调整中心点位置时的系数因子。默认值为1.05。
+ use_iou_loss (bool): 是否使用IoU loss。默认值为True。
+ use_matrix_nms (bool): 是否使用Matrix NMS。默认值为True。
+ ignore_threshold (float): 在计算PPYOLO损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
+ nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
+ nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
+ nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
+ nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+ label_smooth (bool): 是否使用label smooth。默认值为False。
+ train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
+ """
+
+ def __init__(
+ self,
+ num_classes=80,
+ backbone='ResNet50_vd_ssld',
+ with_dcn_v2=True,
+ # YOLO Head
+ anchors=None,
+ anchor_masks=None,
+ use_coord_conv=True,
+ use_iou_aware=True,
+ use_spp=True,
+ use_drop_block=True,
+ scale_x_y=1.05,
+ # PPYOLO Loss
+ ignore_threshold=0.7,
+ label_smooth=False,
+ use_iou_loss=True,
+ # NMS
+ use_matrix_nms=True,
+ nms_score_threshold=0.01,
+ nms_topk=1000,
+ nms_keep_topk=100,
+ nms_iou_threshold=0.45,
+ train_random_shapes=[
+ 320, 352, 384, 416, 448, 480, 512, 544, 576, 608
+ ]):
+ self.init_params = locals()
+ super(PPYOLO, self).__init__('detector')
+ backbones = ['ResNet50_vd_ssld']
+ assert backbone in backbones, "backbone should be one of {}".format(
+ backbones)
+ self.backbone = backbone
+ self.num_classes = num_classes
+ self.anchors = anchors
+ self.anchor_masks = anchor_masks
+ if anchors is None:
+ self.anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+ [59, 119], [116, 90], [156, 198], [373, 326]]
+ if anchor_masks is None:
+ self.anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+ self.ignore_threshold = ignore_threshold
+ self.nms_score_threshold = nms_score_threshold
+ self.nms_topk = nms_topk
+ self.nms_keep_topk = nms_keep_topk
+ self.nms_iou_threshold = nms_iou_threshold
+ self.label_smooth = label_smooth
+ self.sync_bn = True
+ self.train_random_shapes = train_random_shapes
+ self.fixed_input_shape = None
+ self.use_fine_grained_loss = False
+ if use_coord_conv or use_iou_aware or use_spp or use_drop_block or use_iou_loss:
+ self.use_fine_grained_loss = True
+ self.use_coord_conv = use_coord_conv
+ self.use_iou_aware = use_iou_aware
+ self.use_spp = use_spp
+ self.use_drop_block = use_drop_block
+ self.use_iou_loss = use_iou_loss
+ self.scale_x_y = scale_x_y
+ self.max_height = 608
+ self.max_width = 608
+ self.use_matrix_nms = use_matrix_nms
+ self.use_ema = False
+ self.with_dcn_v2 = with_dcn_v2
+
+ def _get_backbone(self, backbone_name):
+ if backbone_name.startswith('ResNet50_vd'):
+ backbone = paddlex.cv.nets.ResNet(
+ norm_type='sync_bn',
+ layers=50,
+ freeze_norm=False,
+ norm_decay=0.,
+ feature_maps=[3, 4, 5],
+ freeze_at=0,
+ variant='d',
+ dcn_v2_stages=[5] if self.with_dcn_v2 else [])
+ return backbone
+
+ def build_net(self, mode='train'):
+ model = paddlex.cv.nets.detection.YOLOv3(
+ backbone=self._get_backbone(self.backbone),
+ num_classes=self.num_classes,
+ mode=mode,
+ anchors=self.anchors,
+ anchor_masks=self.anchor_masks,
+ ignore_threshold=self.ignore_threshold,
+ label_smooth=self.label_smooth,
+ nms_score_threshold=self.nms_score_threshold,
+ nms_topk=self.nms_topk,
+ nms_keep_topk=self.nms_keep_topk,
+ nms_iou_threshold=self.nms_iou_threshold,
+ fixed_input_shape=self.fixed_input_shape,
+ coord_conv=self.use_coord_conv,
+ iou_aware=self.use_iou_aware,
+ scale_x_y=self.scale_x_y,
+ spp=self.use_spp,
+ drop_block=self.use_drop_block,
+ use_matrix_nms=self.use_matrix_nms,
+ use_fine_grained_loss=self.use_fine_grained_loss,
+ use_iou_loss=self.use_iou_loss,
+ batch_size=self.batch_size_per_gpu
+ if hasattr(self, 'batch_size_per_gpu') else 8)
+ if mode == 'train' and self.use_iou_loss or self.use_iou_aware:
+ model.max_height = self.max_height
+ model.max_width = self.max_width
+ inputs = model.generate_inputs()
+ model_out = model.build_net(inputs)
+ outputs = OrderedDict([('bbox', model_out)])
+ if mode == 'train':
+ self.optimizer.minimize(model_out)
+ outputs = OrderedDict([('loss', model_out)])
+ if self.use_ema:
+ global_steps = _decay_step_counter()
+ self.ema = ExponentialMovingAverage(
+ self.ema_decay, thres_steps=global_steps)
+ self.ema.update()
+ return inputs, outputs
+
+ def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
+ lr_decay_epochs, lr_decay_gamma,
+ num_steps_each_epoch):
+ if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
+ logging.error(
+ "In function train(), parameters should satisfy: warmup_steps <= lr_decay_epochs[0]*num_samples_in_train_dataset",
+ exit=False)
+ logging.error(
+ "See this doc for more information: https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/appendix/parameters.md#notice",
+ exit=False)
+ logging.error(
+ "warmup_steps should less than {} or lr_decay_epochs[0] greater than {}, please modify 'lr_decay_epochs' or 'warmup_steps' in train function".
+ format(lr_decay_epochs[0] * num_steps_each_epoch, warmup_steps
+ // num_steps_each_epoch))
+ boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
+ values = [(lr_decay_gamma**i) * learning_rate
+ for i in range(len(lr_decay_epochs) + 1)]
+ lr_decay = fluid.layers.piecewise_decay(
+ boundaries=boundaries, values=values)
+ lr_warmup = fluid.layers.linear_lr_warmup(
+ learning_rate=lr_decay,
+ warmup_steps=warmup_steps,
+ start_lr=warmup_start_lr,
+ end_lr=learning_rate)
+ optimizer = fluid.optimizer.Momentum(
+ learning_rate=lr_warmup,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=8,
+ eval_dataset=None,
+ save_interval_epochs=20,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=1.0 / 8000,
+ warmup_steps=1000,
+ warmup_start_lr=0.0,
+ lr_decay_epochs=[213, 240],
+ lr_decay_gamma=0.1,
+ metric=None,
+ use_vdl=False,
+ sensitivities_file=None,
+ eval_metric_loss=0.05,
+ early_stop=False,
+ early_stop_patience=5,
+ resume_checkpoint=None,
+ use_ema=True,
+ ema_decay=0.9998):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡
+ 数量之商为验证数据batch大小。默认值为8。
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为20。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为10。
+ save_dir (str): 模型保存路径。默认值为'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为字符串'COCO',
+ 则自动下载在COCO数据集上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
+ fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+ learning_rate (float): 默认优化器的学习率。默认为1.0/8000。
+ warmup_steps (int): 默认优化器进行warmup过程的步数。默认为1000。
+ warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为0.0。
+ lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
+ lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
+ sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
+ 则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+ early_stop (bool): 是否使用提前终止训练策略。默认值为False。
+ early_stop_patience (int): 当使用提前终止训练策略时,如果验证集精度在`early_stop_patience`个epoch内
+ 连续下降或持平,则终止训练。默认值为5。
+ resume_checkpoint (str): 恢复训练时指定上次训练保存的模型路径。若为None,则不会恢复训练。默认值为None。
+ use_ema (bool): 是否使用指数衰减计算参数的滑动平均值。默认值为True。
+ ema_decay (float): 指数衰减率。默认值为0.9998。
+
+ Raises:
+ ValueError: 评估类型不在指定列表中。
+ ValueError: 模型从inference model进行加载。
+ """
+ if not self.trainable:
+ raise ValueError("Model is not trainable from load_model method.")
+ if metric is None:
+ if isinstance(train_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(train_dataset, paddlex.datasets.VOCDetection) or \
+ isinstance(train_dataset, paddlex.datasets.EasyDataDet):
+ metric = 'VOC'
+ else:
+ raise ValueError(
+ "train_dataset should be datasets.VOCDetection or datasets.COCODetection or datasets.EasyDataDet."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+ self.metric = metric
+
+ self.labels = train_dataset.labels
+ # 构建训练网络
+ if optimizer is None:
+ # 构建默认的优化策略
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate=learning_rate,
+ warmup_steps=warmup_steps,
+ warmup_start_lr=warmup_start_lr,
+ lr_decay_epochs=lr_decay_epochs,
+ lr_decay_gamma=lr_decay_gamma,
+ num_steps_each_epoch=num_steps_each_epoch)
+ self.optimizer = optimizer
+ self.use_ema = use_ema
+ self.ema_decay = ema_decay
+
+ self.batch_size_per_gpu = int(train_batch_size /
+ paddlex.env_info['num'])
+ if self.use_fine_grained_loss:
+ for transform in train_dataset.transforms.transforms:
+ if isinstance(transform, paddlex.det.transforms.Resize):
+ self.max_height = transform.target_size
+ self.max_width = transform.target_size
+ break
+ if train_dataset.transforms.batch_transforms is None:
+ train_dataset.transforms.batch_transforms = list()
+ define_random_shape = False
+ for bt in train_dataset.transforms.batch_transforms:
+ if isinstance(bt, paddlex.det.transforms.BatchRandomShape):
+ define_random_shape = True
+ if not define_random_shape:
+ if isinstance(self.train_random_shapes,
+ (list, tuple)) and len(self.train_random_shapes) > 0:
+ train_dataset.transforms.batch_transforms.append(
+ paddlex.det.transforms.BatchRandomShape(
+ random_shapes=self.train_random_shapes))
+ if self.use_fine_grained_loss:
+ self.max_height = max(self.max_height,
+ max(self.train_random_shapes))
+ self.max_width = max(self.max_width,
+ max(self.train_random_shapes))
+ if self.use_fine_grained_loss:
+ define_generate_target = False
+ for bt in train_dataset.transforms.batch_transforms:
+ if isinstance(bt, paddlex.det.transforms.GenerateYoloTarget):
+ define_generate_target = True
+ if not define_generate_target:
+ train_dataset.transforms.batch_transforms.append(
+ paddlex.det.transforms.GenerateYoloTarget(
+ anchors=self.anchors,
+ anchor_masks=self.anchor_masks,
+ num_classes=self.num_classes,
+ downsample_ratios=[32, 16, 8]))
+ # 构建训练、验证、预测网络
+ self.build_program()
+ # 初始化网络权重
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss,
+ resume_checkpoint=resume_checkpoint)
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl,
+ early_stop=early_stop,
+ early_stop_patience=early_stop_patience)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ metric=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
+ 根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
+ 如为COCODetection,则metric为'COCO'。
+ return_details (bool): 是否返回详细信息。
+
+ Returns:
+ tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
+ 当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
+ 分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
+ eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
+ 预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+ """
+ arrange_transforms(
+ model_type=self.model_type,
+ class_name=self.__class__.__name__,
+ transforms=eval_dataset.transforms,
+ mode='eval')
+ if metric is None:
+ if hasattr(self, 'metric') and self.metric is not None:
+ metric = self.metric
+ else:
+ if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
+ metric = 'VOC'
+ else:
+ raise Exception(
+ "eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ results = list()
+
+ data_generator = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(
+ enumerate(data_generator()), total=total_steps):
+ images = np.array([d[0] for d in data])
+ im_sizes = np.array([d[1] for d in data])
+ feed_data = {'image': images, 'im_size': im_sizes}
+ with fluid.scope_guard(self.scope):
+ outputs = self.exe.run(
+ self.test_prog,
+ feed=[feed_data],
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ 'bbox': (np.array(outputs[0]),
+ outputs[0].recursive_sequence_lengths())
+ }
+ res_id = [np.array([d[2]]) for d in data]
+ res['im_id'] = (res_id, [])
+ if metric == 'VOC':
+ res_gt_box = [d[3].reshape(-1, 4) for d in data]
+ res_gt_label = [d[4].reshape(-1, 1) for d in data]
+ res_is_difficult = [d[5].reshape(-1, 1) for d in data]
+ res_id = [np.array([d[2]]) for d in data]
+ res['gt_box'] = (res_gt_box, [])
+ res['gt_label'] = (res_gt_label, [])
+ res['is_difficult'] = (res_is_difficult, [])
+ results.append(res)
+ logging.debug("[EVAL] Epoch={}, Step={}/{}".format(epoch_id, step +
+ 1, total_steps))
+ box_ap_stats, eval_details = eval_results(
+ results, metric, eval_dataset.coco_gt, with_background=False)
+ evaluate_metrics = OrderedDict(
+ zip(['bbox_mmap'
+ if metric == 'COCO' else 'bbox_map'], box_ap_stats))
+ if return_details:
+ return evaluate_metrics, eval_details
+ return evaluate_metrics
+
+ @staticmethod
+ def _preprocess(images, transforms, model_type, class_name, thread_num=1):
+ arrange_transforms(
+ model_type=model_type,
+ class_name=class_name,
+ transforms=transforms,
+ mode='test')
+ pool = ThreadPool(thread_num)
+ batch_data = pool.map(transforms, images)
+ pool.close()
+ pool.join()
+ padding_batch = generate_minibatch(batch_data)
+ im = np.array(
+ [data[0] for data in padding_batch],
+ dtype=padding_batch[0][0].dtype)
+ im_size = np.array([data[1] for data in padding_batch], dtype=np.int32)
+
+ return im, im_size
+
+ @staticmethod
+ def _postprocess(res, batch_size, num_classes, labels):
+ clsid2catid = dict({i: i for i in range(num_classes)})
+ xywh_results = bbox2out([res], clsid2catid)
+ preds = [[] for i in range(batch_size)]
+ for xywh_res in xywh_results:
+ image_id = xywh_res['image_id']
+ del xywh_res['image_id']
+ xywh_res['category'] = labels[xywh_res['category_id']]
+ preds[image_id].append(xywh_res)
+
+ return preds
+
+ def predict(self, img_file, transforms=None):
+ """预测。
+
+ Args:
+ img_file (str|np.ndarray): 预测图像路径,或者是解码后的排列格式为(H, W, C)且类型为float32且为BGR格式的数组。
+ transforms (paddlex.det.transforms): 数据预处理操作。
+
+ Returns:
+ list: 预测结果列表,每个预测结果由预测框类别标签、
+ 预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
+ 预测框得分组成。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ if isinstance(img_file, (str, np.ndarray)):
+ images = [img_file]
+ else:
+ raise Exception("img_file must be str/np.ndarray")
+
+ if transforms is None:
+ transforms = self.test_transforms
+ im, im_size = PPYOLO._preprocess(images, transforms, self.model_type,
+ self.__class__.__name__)
+
+ with fluid.scope_guard(self.scope):
+ result = self.exe.run(self.test_prog,
+ feed={'image': im,
+ 'im_size': im_size},
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False,
+ use_program_cache=True)
+
+ res = {
+ k: (np.array(v), v.recursive_sequence_lengths())
+ for k, v in zip(list(self.test_outputs.keys()), result)
+ }
+ res['im_id'] = (np.array(
+ [[i] for i in range(len(images))]).astype('int32'), [[]])
+ preds = PPYOLO._postprocess(res,
+ len(images), self.num_classes, self.labels)
+ return preds[0]
+
+ def batch_predict(self, img_file_list, transforms=None, thread_num=2):
+ """预测。
+
+ Args:
+ img_file_list (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径,也可以是解码后的排列格式为(H,W,C)
+ 且类型为float32且为BGR格式的数组。
+ transforms (paddlex.det.transforms): 数据预处理操作。
+ thread_num (int): 并发执行各图像预处理时的线程数。
+ Returns:
+ list: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个预测结果由预测框类别标签、
+ 预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
+ 预测框得分组成。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+
+ if not isinstance(img_file_list, (list, tuple)):
+ raise Exception("im_file must be list/tuple")
+
+ if transforms is None:
+ transforms = self.test_transforms
+ im, im_size = PPYOLO._preprocess(img_file_list, transforms,
+ self.model_type,
+ self.__class__.__name__, thread_num)
+
+ with fluid.scope_guard(self.scope):
+ result = self.exe.run(self.test_prog,
+ feed={'image': im,
+ 'im_size': im_size},
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False,
+ use_program_cache=True)
+
+ res = {
+ k: (np.array(v), v.recursive_sequence_lengths())
+ for k, v in zip(list(self.test_outputs.keys()), result)
+ }
+ res['im_id'] = (np.array(
+ [[i] for i in range(len(img_file_list))]).astype('int32'), [[]])
+ preds = PPYOLO._postprocess(res,
+ len(img_file_list), self.num_classes,
+ self.labels)
+ return preds
diff --git a/paddlex/cv/models/slim/post_quantization.py b/paddlex/cv/models/slim/post_quantization.py
index c5570087821d8441174aa276d8e5ce22d5ff8e03..e110980bb481466164bc6bfc0a9dfcaabbe4e128 100644
--- a/paddlex/cv/models/slim/post_quantization.py
+++ b/paddlex/cv/models/slim/post_quantization.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -80,7 +80,9 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization):
self._support_activation_quantize_type = [
'range_abs_max', 'moving_average_abs_max', 'abs_max'
]
- self._support_weight_quantize_type = ['abs_max', 'channel_wise_abs_max']
+ self._support_weight_quantize_type = [
+ 'abs_max', 'channel_wise_abs_max'
+ ]
self._support_algo_type = ['KL', 'abs_max', 'min_max']
self._support_quantize_op_type = \
list(set(QuantizationTransformPass._supported_quantizable_op_type +
@@ -240,8 +242,8 @@ class PaddleXPostTrainingQuantization(PostTrainingQuantization):
'[Calculate weight] Weight_id={}/{}, time_each_weight={} s.'.
format(
str(ct),
- str(len(self._quantized_weight_var_name)), str(end -
- start)))
+ str(len(self._quantized_weight_var_name)),
+ str(end - start)))
ct += 1
ct = 1
diff --git a/paddlex/cv/models/slim/prune.py b/paddlex/cv/models/slim/prune.py
index 749977ac30e2ec087f6026bc6be55005f3c5b0d5..4ff3e237d13a156f96f21360a5cb8393dbdd9e40 100644
--- a/paddlex/cv/models/slim/prune.py
+++ b/paddlex/cv/models/slim/prune.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -288,8 +288,8 @@ def get_params_ratios(sensitivities_file, eval_metric_loss=0.05):
if not osp.exists(sensitivities_file):
raise Exception('The sensitivities file is not exists!')
sensitivitives = paddleslim.prune.load_sensitivities(sensitivities_file)
- params_ratios = paddleslim.prune.get_ratios_by_loss(
- sensitivitives, eval_metric_loss)
+ params_ratios = paddleslim.prune.get_ratios_by_loss(sensitivitives,
+ eval_metric_loss)
return params_ratios
diff --git a/paddlex/cv/models/slim/prune_config.py b/paddlex/cv/models/slim/prune_config.py
index 64d7c45c7d5072f5d3826cc041ac175baa76f4fa..d5e6325e805f6dda7987c1e0e909950e43aa5218 100644
--- a/paddlex/cv/models/slim/prune_config.py
+++ b/paddlex/cv/models/slim/prune_config.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/models/slim/visualize.py b/paddlex/cv/models/slim/visualize.py
index 79e885a9f9a51ff86fa24f73e12c9dbc869e0acc..4be6721632cd7c8d26309cedb686466d2c0ec776 100644
--- a/paddlex/cv/models/slim/visualize.py
+++ b/paddlex/cv/models/slim/visualize.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -47,8 +47,7 @@ def visualize(model, sensitivities_file, save_dir='./'):
y.append(loss_thresh)
plt.plot(x, y, color='green', linewidth=0.5, marker='o', markersize=3)
my_x_ticks = np.arange(
- min(np.array(x)) - 0.01,
- max(np.array(x)) + 0.01, 0.05)
+ min(np.array(x)) - 0.01, max(np.array(x)) + 0.01, 0.05)
my_y_ticks = np.arange(0.05, 1, 0.05)
plt.xticks(my_x_ticks, rotation=15, fontsize=8)
plt.yticks(my_y_ticks, fontsize=8)
diff --git a/paddlex/cv/models/unet.py b/paddlex/cv/models/unet.py
index 34c597b0e190122c3ba80c485378273abff20b65..7cce07b990003e04506e330ef74d356914d6182f 100644
--- a/paddlex/cv/models/unet.py
+++ b/paddlex/cv/models/unet.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/cv/models/utils/detection_eval.py b/paddlex/cv/models/utils/detection_eval.py
index d2c0ae8abf867baddfc767bd6e1a73cf5d36ea3d..656cfaeff2607592a1a41eac06db036d43c6cac0 100644
--- a/paddlex/cv/models/utils/detection_eval.py
+++ b/paddlex/cv/models/utils/detection_eval.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -158,8 +158,8 @@ def loadRes(coco_obj, anns):
for id, ann in enumerate(anns):
ann['id'] = id + 1
elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
- res.dataset['categories'] = copy.deepcopy(
- coco_obj.dataset['categories'])
+ res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
+ 'categories'])
for id, ann in enumerate(anns):
bb = ann['bbox']
x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
@@ -169,8 +169,8 @@ def loadRes(coco_obj, anns):
ann['id'] = id + 1
ann['iscrowd'] = 0
elif 'segmentation' in anns[0]:
- res.dataset['categories'] = copy.deepcopy(
- coco_obj.dataset['categories'])
+ res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
+ 'categories'])
for id, ann in enumerate(anns):
# now only support compressed RLE format as segmentation results
ann['area'] = maskUtils.area(ann['segmentation'])
@@ -179,8 +179,8 @@ def loadRes(coco_obj, anns):
ann['id'] = id + 1
ann['iscrowd'] = 0
elif 'keypoints' in anns[0]:
- res.dataset['categories'] = copy.deepcopy(
- coco_obj.dataset['categories'])
+ res.dataset['categories'] = copy.deepcopy(coco_obj.dataset[
+ 'categories'])
for id, ann in enumerate(anns):
s = ann['keypoints']
x = s[0::3]
@@ -375,8 +375,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
expand_bbox = expand_boxes(bbox, scale)
expand_bbox = expand_bbox.astype(np.int32)
- padded_mask = np.zeros((resolution + 2, resolution + 2),
- dtype=np.float32)
+ padded_mask = np.zeros(
+ (resolution + 2, resolution + 2), dtype=np.float32)
for j in range(num):
xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
@@ -404,7 +404,8 @@ def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)]
segm = mask_util.encode(
- np.array(im_mask[:, :, np.newaxis], order='F'))[0]
+ np.array(
+ im_mask[:, :, np.newaxis], order='F'))[0]
catid = clsid2catid[clsid]
segm['counts'] = segm['counts'].decode('utf8')
coco_res = {
@@ -571,8 +572,8 @@ def prune_zero_padding(gt_box, gt_label, difficult=None):
gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
break
valid_cnt += 1
- return (gt_box[:valid_cnt], gt_label[:valid_cnt],
- difficult[:valid_cnt] if difficult is not None else None)
+ return (gt_box[:valid_cnt], gt_label[:valid_cnt], difficult[:valid_cnt]
+ if difficult is not None else None)
def bbox_area(bbox, is_bbox_normalized):
@@ -694,8 +695,9 @@ class DetectionMAP(object):
"""
mAP = 0.
valid_cnt = 0
- for id, (score_pos, count) in enumerate(
- zip(self.class_score_poss, self.class_gt_counts)):
+ for id, (
+ score_pos, count
+ ) in enumerate(zip(self.class_score_poss, self.class_gt_counts)):
if count == 0: continue
if len(score_pos) == 0:
valid_cnt += 1
diff --git a/paddlex/cv/models/utils/pretrain_weights.py b/paddlex/cv/models/utils/pretrain_weights.py
index 0d969981a5fae2ae015beed74e852fa06514ec79..cd8213c96dd44a5469f6e4f85de1c1b80d9d9d18 100644
--- a/paddlex/cv/models/utils/pretrain_weights.py
+++ b/paddlex/cv/models/utils/pretrain_weights.py
@@ -116,10 +116,14 @@ coco_pretrain = {
'DeepLabv3p_MobileNetV2_x1.0_COCO':
'https://bj.bcebos.com/v1/paddleseg/deeplab_mobilenet_x1_0_coco.tgz',
'DeepLabv3p_Xception65_COCO':
- 'https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz'
+ 'https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz',
+ 'PPYOLO_ResNet50_vd_ssld_COCO':
+ 'https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams'
}
cityscapes_pretrain = {
+ 'DeepLabv3p_MobileNetV3_large_x1_0_ssld_CITYSCAPES':
+ 'https://paddleseg.bj.bcebos.com/models/deeplabv3p_mobilenetv3_large_cityscapes.tar.gz',
'DeepLabv3p_MobileNetV2_x1.0_CITYSCAPES':
'https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz',
'DeepLabv3p_Xception65_CITYSCAPES':
@@ -142,7 +146,8 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir):
if flag == 'COCO':
if class_name == 'DeepLabv3p' and backbone in [
'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',
- 'MobileNetV2_x1.5', 'MobileNetV2_x2.0'
+ 'MobileNetV2_x1.5', 'MobileNetV2_x2.0',
+ 'MobileNetV3_large_x1_0_ssld'
]:
model_name = '{}_{}'.format(class_name, backbone)
logging.warning(warning_info.format(model_name, flag, 'IMAGENET'))
@@ -226,7 +231,9 @@ def get_pretrain_weights(flag, class_name, backbone, save_dir):
new_save_dir = save_dir
if hasattr(paddlex, 'pretrain_dir'):
new_save_dir = paddlex.pretrain_dir
- if class_name in ['YOLOv3', 'FasterRCNN', 'MaskRCNN', 'DeepLabv3p']:
+ if class_name in [
+ 'YOLOv3', 'FasterRCNN', 'MaskRCNN', 'DeepLabv3p', 'PPYOLO'
+ ]:
backbone = '{}_{}'.format(class_name, backbone)
backbone = "{}_{}".format(backbone, flag)
if flag == 'COCO':
diff --git a/paddlex/cv/models/utils/seg_eval.py b/paddlex/cv/models/utils/seg_eval.py
index 745f75a48064e3b90902e0a0d48764db7deeba17..84b395a251f3d1772023313e2b659944a4a96dae 100644
--- a/paddlex/cv/models/utils/seg_eval.py
+++ b/paddlex/cv/models/utils/seg_eval.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -24,8 +24,8 @@ class ConfusionMatrix(object):
"""
def __init__(self, num_classes=2, streaming=False):
- self.confusion_matrix = np.zeros([num_classes, num_classes],
- dtype='int64')
+ self.confusion_matrix = np.zeros(
+ [num_classes, num_classes], dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
@@ -42,15 +42,15 @@ class ConfusionMatrix(object):
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
- spm = csr_matrix((one, (label, pred)),
- shape=(self.num_classes, self.num_classes))
+ spm = csr_matrix(
+ (one, (label, pred)), shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
- self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
- dtype='int64')
+ self.confusion_matrix = np.zeros(
+ [self.num_classes, self.num_classes], dtype='int64')
def mean_iou(self):
iou_list = []
diff --git a/paddlex/cv/models/utils/visualize.py b/paddlex/cv/models/utils/visualize.py
index 7e1fbbc74932cd9cca06327bf757a566b6d30547..ef3bb958794576e979b084640f8b518c5f1eded7 100644
--- a/paddlex/cv/models/utils/visualize.py
+++ b/paddlex/cv/models/utils/visualize.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -404,8 +404,9 @@ def draw_pr_curve(eval_details_file=None,
plt.plot(x, sr_array, color=color, label=nm, linewidth=1)
plt.legend(loc="lower left", fontsize=5)
plt.savefig(
- os.path.join(save_dir,
- "./{}_pr_curve(iou-{}).png".format(style, iou_thresh)),
+ os.path.join(
+ save_dir,
+ "./{}_pr_curve(iou-{}).png".format(style, iou_thresh)),
dpi=800)
plt.close()
diff --git a/paddlex/cv/models/yolo_v3.py b/paddlex/cv/models/yolo_v3.py
index 32b74df408b0ce68b632b81cb08536a8d6c9115a..cf0282dd78dfda2e6332095415a5794d55a00212 100644
--- a/paddlex/cv/models/yolo_v3.py
+++ b/paddlex/cv/models/yolo_v3.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -15,21 +15,11 @@
from __future__ import absolute_import
import math
import tqdm
-import os.path as osp
-import numpy as np
-from multiprocessing.pool import ThreadPool
-import paddle.fluid as fluid
-import paddlex.utils.logging as logging
import paddlex
-import copy
-from paddlex.cv.transforms import arrange_transforms
-from paddlex.cv.datasets import generate_minibatch
-from .base import BaseAPI
-from collections import OrderedDict
-from .utils.detection_eval import eval_results, bbox2out
+from .ppyolo import PPYOLO
-class YOLOv3(BaseAPI):
+class YOLOv3(PPYOLO):
"""构建YOLOv3,并实现其训练、评估、预测和模型导出。
Args:
@@ -45,7 +35,7 @@ class YOLOv3(BaseAPI):
nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
- nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+ nms_iou_threshold (float): 进行NMS时,用于剔除检测框IoU的阈值。默认为0.45。
label_smooth (bool): 是否使用label smooth。默认值为False。
train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
"""
@@ -65,12 +55,12 @@ class YOLOv3(BaseAPI):
320, 352, 384, 416, 448, 480, 512, 544, 576, 608
]):
self.init_params = locals()
- super(YOLOv3, self).__init__('detector')
backbones = [
'DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large'
]
assert backbone in backbones, "backbone should be one of {}".format(
backbones)
+ super(PPYOLO, self).__init__('detector')
self.backbone = backbone
self.num_classes = num_classes
self.anchors = anchors
@@ -84,6 +74,16 @@ class YOLOv3(BaseAPI):
self.sync_bn = True
self.train_random_shapes = train_random_shapes
self.fixed_input_shape = None
+ self.use_fine_grained_loss = False
+ self.use_coord_conv = False
+ self.use_iou_aware = False
+ self.use_spp = False
+ self.use_drop_block = False
+ self.use_iou_loss = False
+ self.scale_x_y = 1.
+ self.use_matrix_nms = False
+ self.use_ema = False
+ self.with_dcn_v2 = False
def _get_backbone(self, backbone_name):
if backbone_name == 'DarkNet53':
@@ -104,59 +104,6 @@ class YOLOv3(BaseAPI):
norm_type='sync_bn', model_name=model_name)
return backbone
- def build_net(self, mode='train'):
- model = paddlex.cv.nets.detection.YOLOv3(
- backbone=self._get_backbone(self.backbone),
- num_classes=self.num_classes,
- mode=mode,
- anchors=self.anchors,
- anchor_masks=self.anchor_masks,
- ignore_threshold=self.ignore_threshold,
- label_smooth=self.label_smooth,
- nms_score_threshold=self.nms_score_threshold,
- nms_topk=self.nms_topk,
- nms_keep_topk=self.nms_keep_topk,
- nms_iou_threshold=self.nms_iou_threshold,
- train_random_shapes=self.train_random_shapes,
- fixed_input_shape=self.fixed_input_shape)
- inputs = model.generate_inputs()
- model_out = model.build_net(inputs)
- outputs = OrderedDict([('bbox', model_out)])
- if mode == 'train':
- self.optimizer.minimize(model_out)
- outputs = OrderedDict([('loss', model_out)])
- return inputs, outputs
-
- def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
- lr_decay_epochs, lr_decay_gamma,
- num_steps_each_epoch):
- if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
- logging.error(
- "In function train(), parameters should satisfy: warmup_steps <= lr_decay_epochs[0]*num_samples_in_train_dataset",
- exit=False)
- logging.error(
- "See this doc for more information: https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/appendix/parameters.md#notice",
- exit=False)
- logging.error(
- "warmup_steps should less than {} or lr_decay_epochs[0] greater than {}, please modify 'lr_decay_epochs' or 'warmup_steps' in train function".
- format(lr_decay_epochs[0] * num_steps_each_epoch, warmup_steps
- // num_steps_each_epoch))
- boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
- values = [(lr_decay_gamma**i) * learning_rate
- for i in range(len(lr_decay_epochs) + 1)]
- lr_decay = fluid.layers.piecewise_decay(
- boundaries=boundaries, values=values)
- lr_warmup = fluid.layers.linear_lr_warmup(
- learning_rate=lr_decay,
- warmup_steps=warmup_steps,
- start_lr=warmup_start_lr,
- end_lr=learning_rate)
- optimizer = fluid.optimizer.Momentum(
- learning_rate=lr_warmup,
- momentum=0.9,
- regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
- return optimizer
-
def train(self,
num_epochs,
train_dataset,
@@ -214,259 +161,11 @@ class YOLOv3(BaseAPI):
ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。
"""
- if not self.trainable:
- raise ValueError("Model is not trainable from load_model method.")
- if metric is None:
- if isinstance(train_dataset, paddlex.datasets.CocoDetection):
- metric = 'COCO'
- elif isinstance(train_dataset, paddlex.datasets.VOCDetection) or \
- isinstance(train_dataset, paddlex.datasets.EasyDataDet):
- metric = 'VOC'
- else:
- raise ValueError(
- "train_dataset should be datasets.VOCDetection or datasets.COCODetection or datasets.EasyDataDet."
- )
- assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
- self.metric = metric
-
- self.labels = train_dataset.labels
- # 构建训练网络
- if optimizer is None:
- # 构建默认的优化策略
- num_steps_each_epoch = train_dataset.num_samples // train_batch_size
- optimizer = self.default_optimizer(
- learning_rate=learning_rate,
- warmup_steps=warmup_steps,
- warmup_start_lr=warmup_start_lr,
- lr_decay_epochs=lr_decay_epochs,
- lr_decay_gamma=lr_decay_gamma,
- num_steps_each_epoch=num_steps_each_epoch)
- self.optimizer = optimizer
- # 构建训练、验证、预测网络
- self.build_program()
- # 初始化网络权重
- self.net_initialize(
- startup_prog=fluid.default_startup_program(),
- pretrain_weights=pretrain_weights,
- save_dir=save_dir,
- sensitivities_file=sensitivities_file,
- eval_metric_loss=eval_metric_loss,
- resume_checkpoint=resume_checkpoint)
- # 训练
- self.train_loop(
- num_epochs=num_epochs,
- train_dataset=train_dataset,
- train_batch_size=train_batch_size,
- eval_dataset=eval_dataset,
- save_interval_epochs=save_interval_epochs,
- log_interval_steps=log_interval_steps,
- save_dir=save_dir,
- use_vdl=use_vdl,
- early_stop=early_stop,
- early_stop_patience=early_stop_patience)
-
- def evaluate(self,
- eval_dataset,
- batch_size=1,
- epoch_id=None,
- metric=None,
- return_details=False):
- """评估。
-
- Args:
- eval_dataset (paddlex.datasets): 验证数据读取器。
- batch_size (int): 验证数据批大小。默认为1。
- epoch_id (int): 当前评估模型所在的训练轮数。
- metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
- 根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
- 如为COCODetection,则metric为'COCO'。
- return_details (bool): 是否返回详细信息。
-
- Returns:
- tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
- 当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
- 分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
- eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
- 预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
- """
- arrange_transforms(
- model_type=self.model_type,
- class_name=self.__class__.__name__,
- transforms=eval_dataset.transforms,
- mode='eval')
- if metric is None:
- if hasattr(self, 'metric') and self.metric is not None:
- metric = self.metric
- else:
- if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
- metric = 'COCO'
- elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
- metric = 'VOC'
- else:
- raise Exception(
- "eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
- )
- assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
-
- total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
- results = list()
-
- data_generator = eval_dataset.generator(
- batch_size=batch_size, drop_last=False)
- logging.info(
- "Start to evaluating(total_samples={}, total_steps={})...".format(
- eval_dataset.num_samples, total_steps))
- for step, data in tqdm.tqdm(
- enumerate(data_generator()), total=total_steps):
- images = np.array([d[0] for d in data])
- im_sizes = np.array([d[1] for d in data])
- feed_data = {'image': images, 'im_size': im_sizes}
- with fluid.scope_guard(self.scope):
- outputs = self.exe.run(
- self.test_prog,
- feed=[feed_data],
- fetch_list=list(self.test_outputs.values()),
- return_numpy=False)
- res = {
- 'bbox': (np.array(outputs[0]),
- outputs[0].recursive_sequence_lengths())
- }
- res_id = [np.array([d[2]]) for d in data]
- res['im_id'] = (res_id, [])
- if metric == 'VOC':
- res_gt_box = [d[3].reshape(-1, 4) for d in data]
- res_gt_label = [d[4].reshape(-1, 1) for d in data]
- res_is_difficult = [d[5].reshape(-1, 1) for d in data]
- res_id = [np.array([d[2]]) for d in data]
- res['gt_box'] = (res_gt_box, [])
- res['gt_label'] = (res_gt_label, [])
- res['is_difficult'] = (res_is_difficult, [])
- results.append(res)
- logging.debug("[EVAL] Epoch={}, Step={}/{}".format(epoch_id, step +
- 1, total_steps))
- box_ap_stats, eval_details = eval_results(
- results, metric, eval_dataset.coco_gt, with_background=False)
- evaluate_metrics = OrderedDict(
- zip(['bbox_mmap'
- if metric == 'COCO' else 'bbox_map'], box_ap_stats))
- if return_details:
- return evaluate_metrics, eval_details
- return evaluate_metrics
-
- @staticmethod
- def _preprocess(images, transforms, model_type, class_name, thread_num=1):
- arrange_transforms(
- model_type=model_type,
- class_name=class_name,
- transforms=transforms,
- mode='test')
- pool = ThreadPool(thread_num)
- batch_data = pool.map(transforms, images)
- pool.close()
- pool.join()
- padding_batch = generate_minibatch(batch_data)
- im = np.array(
- [data[0] for data in padding_batch],
- dtype=padding_batch[0][0].dtype)
- im_size = np.array([data[1] for data in padding_batch], dtype=np.int32)
-
- return im, im_size
-
- @staticmethod
- def _postprocess(res, batch_size, num_classes, labels):
- clsid2catid = dict({i: i for i in range(num_classes)})
- xywh_results = bbox2out([res], clsid2catid)
- preds = [[] for i in range(batch_size)]
- for xywh_res in xywh_results:
- image_id = xywh_res['image_id']
- del xywh_res['image_id']
- xywh_res['category'] = labels[xywh_res['category_id']]
- preds[image_id].append(xywh_res)
-
- return preds
-
- def predict(self, img_file, transforms=None):
- """预测。
-
- Args:
- img_file (str|np.ndarray): 预测图像路径,或者是解码后的排列格式为(H, W, C)且类型为float32且为BGR格式的数组。
- transforms (paddlex.det.transforms): 数据预处理操作。
-
- Returns:
- list: 预测结果列表,每个预测结果由预测框类别标签、
- 预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
- 预测框得分组成。
- """
- if transforms is None and not hasattr(self, 'test_transforms'):
- raise Exception("transforms need to be defined, now is None.")
- if isinstance(img_file, (str, np.ndarray)):
- images = [img_file]
- else:
- raise Exception("img_file must be str/np.ndarray")
-
- if transforms is None:
- transforms = self.test_transforms
- im, im_size = YOLOv3._preprocess(images, transforms, self.model_type,
- self.__class__.__name__)
-
- with fluid.scope_guard(self.scope):
- result = self.exe.run(self.test_prog,
- feed={'image': im,
- 'im_size': im_size},
- fetch_list=list(self.test_outputs.values()),
- return_numpy=False,
- use_program_cache=True)
-
- res = {
- k: (np.array(v), v.recursive_sequence_lengths())
- for k, v in zip(list(self.test_outputs.keys()), result)
- }
- res['im_id'] = (np.array(
- [[i] for i in range(len(images))]).astype('int32'), [[]])
- preds = YOLOv3._postprocess(res,
- len(images), self.num_classes, self.labels)
- return preds[0]
-
- def batch_predict(self, img_file_list, transforms=None, thread_num=2):
- """预测。
-
- Args:
- img_file_list (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径,也可以是解码后的排列格式为(H,W,C)
- 且类型为float32且为BGR格式的数组。
- transforms (paddlex.det.transforms): 数据预处理操作。
- thread_num (int): 并发执行各图像预处理时的线程数。
- Returns:
- list: 每个元素都为列表,表示各图像的预测结果。在各图像的预测结果列表中,每个预测结果由预测框类别标签、
- 预测框类别名称、预测框坐标(坐标格式为[xmin, ymin, w, h])、
- 预测框得分组成。
- """
- if transforms is None and not hasattr(self, 'test_transforms'):
- raise Exception("transforms need to be defined, now is None.")
-
- if not isinstance(img_file_list, (list, tuple)):
- raise Exception("im_file must be list/tuple")
-
- if transforms is None:
- transforms = self.test_transforms
- im, im_size = YOLOv3._preprocess(img_file_list, transforms,
- self.model_type,
- self.__class__.__name__, thread_num)
-
- with fluid.scope_guard(self.scope):
- result = self.exe.run(self.test_prog,
- feed={'image': im,
- 'im_size': im_size},
- fetch_list=list(self.test_outputs.values()),
- return_numpy=False,
- use_program_cache=True)
- res = {
- k: (np.array(v), v.recursive_sequence_lengths())
- for k, v in zip(list(self.test_outputs.keys()), result)
- }
- res['im_id'] = (np.array(
- [[i] for i in range(len(img_file_list))]).astype('int32'), [[]])
- preds = YOLOv3._postprocess(res,
- len(img_file_list), self.num_classes,
- self.labels)
- return preds
+ return super(YOLOv3, self).train(
+ num_epochs, train_dataset, train_batch_size, eval_dataset,
+ save_interval_epochs, log_interval_steps, save_dir,
+ pretrain_weights, optimizer, learning_rate, warmup_steps,
+ warmup_start_lr, lr_decay_epochs, lr_decay_gamma, metric, use_vdl,
+ sensitivities_file, eval_metric_loss, early_stop,
+ early_stop_patience, resume_checkpoint, False)
diff --git a/paddlex/cv/nets/__init__.py b/paddlex/cv/nets/__init__.py
index 5b427fe31be957f92611f7cfc6a9e6102a3c9616..c95b0e9281a3bceb2f241580999bac79073837e0 100644
--- a/paddlex/cv/nets/__init__.py
+++ b/paddlex/cv/nets/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/alexnet.py b/paddlex/cv/nets/alexnet.py
index 6770f437d982428cd8d5ed7edb44e00915754139..d95363401d90397e1038bc23129a81f579bf5363 100644
--- a/paddlex/cv/nets/alexnet.py
+++ b/paddlex/cv/nets/alexnet.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/backbone_utils.py b/paddlex/cv/nets/backbone_utils.py
index 454be850a0c54d1d0bca63655eccaee662967e61..962887148a8a4a0c9afbd1f7d16192828f5502b2 100644
--- a/paddlex/cv/nets/backbone_utils.py
+++ b/paddlex/cv/nets/backbone_utils.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/densenet.py b/paddlex/cv/nets/densenet.py
index 76997c48de412e52cf914c32057f8a1bd0c06f9d..aab1ee57dd98f4bb29d866c8248d0cdc0b1df970 100644
--- a/paddlex/cv/nets/densenet.py
+++ b/paddlex/cv/nets/densenet.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/cv/nets/detection/__init__.py b/paddlex/cv/nets/detection/__init__.py
index 7b9d5d547c8aa7f9dc8254a389624a238843039d..f6e01683575746f7434719bf80ef0cee528b9ab6 100644
--- a/paddlex/cv/nets/detection/__init__.py
+++ b/paddlex/cv/nets/detection/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/detection/iou_aware.py b/paddlex/cv/nets/detection/iou_aware.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a85a70a62c41b6a10c78cbcd1250d63cd534349
--- /dev/null
+++ b/paddlex/cv/nets/detection/iou_aware.py
@@ -0,0 +1,85 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+
+
+def _split_ioup(output, an_num, num_classes):
+ """
+ Split new output feature map to output, predicted iou
+ along channel dimension
+ """
+ ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num])
+ ioup = fluid.layers.sigmoid(ioup)
+
+ oriout = fluid.layers.slice(
+ output, axes=[1], starts=[an_num], ends=[an_num * (num_classes + 6)])
+
+ return (ioup, oriout)
+
+
+def _de_sigmoid(x, eps=1e-7):
+ x = fluid.layers.clip(x, eps, 1 / eps)
+ one = fluid.layers.fill_constant(
+ shape=[1, 1, 1, 1], dtype=x.dtype, value=1.)
+ x = fluid.layers.clip((one / x - 1.0), eps, 1 / eps)
+ x = -fluid.layers.log(x)
+ return x
+
+
+def _postprocess_output(ioup, output, an_num, num_classes, iou_aware_factor):
+ """
+ post process output objectness score
+ """
+ tensors = []
+ stride = output.shape[1] // an_num
+ for m in range(an_num):
+ tensors.append(
+ fluid.layers.slice(
+ output,
+ axes=[1],
+ starts=[stride * m + 0],
+ ends=[stride * m + 4]))
+ obj = fluid.layers.slice(
+ output, axes=[1], starts=[stride * m + 4], ends=[stride * m + 5])
+ obj = fluid.layers.sigmoid(obj)
+ ip = fluid.layers.slice(ioup, axes=[1], starts=[m], ends=[m + 1])
+
+ new_obj = fluid.layers.pow(obj, (
+ 1 - iou_aware_factor)) * fluid.layers.pow(ip, iou_aware_factor)
+ new_obj = _de_sigmoid(new_obj)
+
+ tensors.append(new_obj)
+
+ tensors.append(
+ fluid.layers.slice(
+ output,
+ axes=[1],
+ starts=[stride * m + 5],
+ ends=[stride * m + 5 + num_classes]))
+
+ output = fluid.layers.concat(tensors, axis=1)
+
+ return output
+
+
+def get_iou_aware_score(output, an_num, num_classes, iou_aware_factor):
+ ioup, output = _split_ioup(output, an_num, num_classes)
+ output = _postprocess_output(ioup, output, an_num, num_classes,
+ iou_aware_factor)
+ return output
diff --git a/paddlex/cv/nets/detection/loss/__init__.py b/paddlex/cv/nets/detection/loss/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..517155e601df11f8a556b4f08f472c26be178794
--- /dev/null
+++ b/paddlex/cv/nets/detection/loss/__init__.py
@@ -0,0 +1,21 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import yolo_loss
+from . import iou_aware_loss
+from . import iou_loss
+
+from .yolo_loss import *
+from .iou_aware_loss import *
+from .iou_loss import *
diff --git a/paddlex/cv/nets/detection/loss/iou_aware_loss.py b/paddlex/cv/nets/detection/loss/iou_aware_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..64796eb7d92543a73a053bc1349ba3806d1eea5e
--- /dev/null
+++ b/paddlex/cv/nets/detection/loss/iou_aware_loss.py
@@ -0,0 +1,77 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import NumpyArrayInitializer
+
+from paddle import fluid
+from .iou_loss import IouLoss
+
+
+class IouAwareLoss(IouLoss):
+ """
+ iou aware loss, see https://arxiv.org/abs/1912.05992
+ Args:
+ loss_weight (float): iou aware loss weight, default is 1.0
+ max_height (int): max height of input to support random shape input
+ max_width (int): max width of input to support random shape input
+ """
+
+ def __init__(self, loss_weight=1.0, max_height=608, max_width=608):
+ super(IouAwareLoss, self).__init__(
+ loss_weight=loss_weight,
+ max_height=max_height,
+ max_width=max_width)
+
+ def __call__(self,
+ ioup,
+ x,
+ y,
+ w,
+ h,
+ tx,
+ ty,
+ tw,
+ th,
+ anchors,
+ downsample_ratio,
+ batch_size,
+ scale_x_y,
+ eps=1.e-10):
+ '''
+ Args:
+ ioup ([Variables]): the predicted iou
+ x | y | w | h ([Variables]): the output of yolov3 for encoded x|y|w|h
+ tx |ty |tw |th ([Variables]): the target of yolov3 for encoded x|y|w|h
+ anchors ([float]): list of anchors for current output layer
+ downsample_ratio (float): the downsample ratio for current output layer
+ batch_size (int): training batch size
+ eps (float): the decimal to prevent the denominator eqaul zero
+ '''
+
+ pred = self._bbox_transform(x, y, w, h, anchors, downsample_ratio,
+ batch_size, False, scale_x_y, eps)
+ gt = self._bbox_transform(tx, ty, tw, th, anchors, downsample_ratio,
+ batch_size, True, scale_x_y, eps)
+ iouk = self._iou(pred, gt, ioup, eps)
+ iouk.stop_gradient = True
+
+ loss_iou_aware = fluid.layers.cross_entropy(
+ ioup, iouk, soft_label=True)
+ loss_iou_aware = loss_iou_aware * self._loss_weight
+ return loss_iou_aware
diff --git a/paddlex/cv/nets/detection/loss/iou_loss.py b/paddlex/cv/nets/detection/loss/iou_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..da1beeaf9b5ad6be4c61c27d71bcac24e37f2b9a
--- /dev/null
+++ b/paddlex/cv/nets/detection/loss/iou_loss.py
@@ -0,0 +1,235 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import NumpyArrayInitializer
+
+from paddle import fluid
+
+
+class IouLoss(object):
+ """
+ iou loss, see https://arxiv.org/abs/1908.03851
+ loss = 1.0 - iou * iou
+ Args:
+ loss_weight (float): iou loss weight, default is 2.5
+ max_height (int): max height of input to support random shape input
+ max_width (int): max width of input to support random shape input
+ ciou_term (bool): whether to add ciou_term
+ loss_square (bool): whether to square the iou term
+ """
+
+ def __init__(self,
+ loss_weight=2.5,
+ max_height=608,
+ max_width=608,
+ ciou_term=False,
+ loss_square=True):
+ self._loss_weight = loss_weight
+ self._MAX_HI = max_height
+ self._MAX_WI = max_width
+ self.ciou_term = ciou_term
+ self.loss_square = loss_square
+
+ def __call__(self,
+ x,
+ y,
+ w,
+ h,
+ tx,
+ ty,
+ tw,
+ th,
+ anchors,
+ downsample_ratio,
+ batch_size,
+ scale_x_y=1.,
+ ioup=None,
+ eps=1.e-10):
+ '''
+ Args:
+ x | y | w | h ([Variables]): the output of yolov3 for encoded x|y|w|h
+ tx |ty |tw |th ([Variables]): the target of yolov3 for encoded x|y|w|h
+ anchors ([float]): list of anchors for current output layer
+ downsample_ratio (float): the downsample ratio for current output layer
+ batch_size (int): training batch size
+ eps (float): the decimal to prevent the denominator eqaul zero
+ '''
+ pred = self._bbox_transform(x, y, w, h, anchors, downsample_ratio,
+ batch_size, False, scale_x_y, eps)
+ gt = self._bbox_transform(tx, ty, tw, th, anchors, downsample_ratio,
+ batch_size, True, scale_x_y, eps)
+ iouk = self._iou(pred, gt, ioup, eps)
+ if self.loss_square:
+ loss_iou = 1. - iouk * iouk
+ else:
+ loss_iou = 1. - iouk
+ loss_iou = loss_iou * self._loss_weight
+
+ return loss_iou
+
+ def _iou(self, pred, gt, ioup=None, eps=1.e-10):
+ x1, y1, x2, y2 = pred
+ x1g, y1g, x2g, y2g = gt
+ x2 = fluid.layers.elementwise_max(x1, x2)
+ y2 = fluid.layers.elementwise_max(y1, y2)
+
+ xkis1 = fluid.layers.elementwise_max(x1, x1g)
+ ykis1 = fluid.layers.elementwise_max(y1, y1g)
+ xkis2 = fluid.layers.elementwise_min(x2, x2g)
+ ykis2 = fluid.layers.elementwise_min(y2, y2g)
+
+ intsctk = (xkis2 - xkis1) * (ykis2 - ykis1)
+ intsctk = intsctk * fluid.layers.greater_than(
+ xkis2, xkis1) * fluid.layers.greater_than(ykis2, ykis1)
+ unionk = (x2 - x1) * (y2 - y1) + (x2g - x1g) * (y2g - y1g
+ ) - intsctk + eps
+ iouk = intsctk / unionk
+ if self.ciou_term:
+ ciou = self.get_ciou_term(pred, gt, iouk, eps)
+ iouk = iouk - ciou
+ return iouk
+
+ def get_ciou_term(self, pred, gt, iouk, eps):
+ x1, y1, x2, y2 = pred
+ x1g, y1g, x2g, y2g = gt
+
+ cx = (x1 + x2) / 2
+ cy = (y1 + y2) / 2
+ w = (x2 - x1) + fluid.layers.cast((x2 - x1) == 0, 'float32')
+ h = (y2 - y1) + fluid.layers.cast((y2 - y1) == 0, 'float32')
+
+ cxg = (x1g + x2g) / 2
+ cyg = (y1g + y2g) / 2
+ wg = x2g - x1g
+ hg = y2g - y1g
+
+ # A or B
+ xc1 = fluid.layers.elementwise_min(x1, x1g)
+ yc1 = fluid.layers.elementwise_min(y1, y1g)
+ xc2 = fluid.layers.elementwise_max(x2, x2g)
+ yc2 = fluid.layers.elementwise_max(y2, y2g)
+
+ # DIOU term
+ dist_intersection = (cx - cxg) * (cx - cxg) + (cy - cyg) * (cy - cyg)
+ dist_union = (xc2 - xc1) * (xc2 - xc1) + (yc2 - yc1) * (yc2 - yc1)
+ diou_term = (dist_intersection + eps) / (dist_union + eps)
+ # CIOU term
+ ciou_term = 0
+ ar_gt = wg / hg
+ ar_pred = w / h
+ arctan = fluid.layers.atan(ar_gt) - fluid.layers.atan(ar_pred)
+ ar_loss = 4. / np.pi / np.pi * arctan * arctan
+ alpha = ar_loss / (1 - iouk + ar_loss + eps)
+ alpha.stop_gradient = True
+ ciou_term = alpha * ar_loss
+ return diou_term + ciou_term
+
+ def _bbox_transform(self, dcx, dcy, dw, dh, anchors, downsample_ratio,
+ batch_size, is_gt, scale_x_y, eps):
+ grid_x = int(self._MAX_WI / downsample_ratio)
+ grid_y = int(self._MAX_HI / downsample_ratio)
+ an_num = len(anchors) // 2
+
+ shape_fmp = fluid.layers.shape(dcx)
+ shape_fmp.stop_gradient = True
+ # generate the grid_w x grid_h center of feature map
+ idx_i = np.array([[i for i in range(grid_x)]])
+ idx_j = np.array([[j for j in range(grid_y)]]).transpose()
+ gi_np = np.repeat(idx_i, grid_y, axis=0)
+ gi_np = np.reshape(gi_np, newshape=[1, 1, grid_y, grid_x])
+ gi_np = np.tile(gi_np, reps=[batch_size, an_num, 1, 1])
+ gj_np = np.repeat(idx_j, grid_x, axis=1)
+ gj_np = np.reshape(gj_np, newshape=[1, 1, grid_y, grid_x])
+ gj_np = np.tile(gj_np, reps=[batch_size, an_num, 1, 1])
+ gi_max = self._create_tensor_from_numpy(gi_np.astype(np.float32))
+ gi = fluid.layers.crop(x=gi_max, shape=dcx)
+ gi.stop_gradient = True
+ gj_max = self._create_tensor_from_numpy(gj_np.astype(np.float32))
+ gj = fluid.layers.crop(x=gj_max, shape=dcx)
+ gj.stop_gradient = True
+
+ grid_x_act = fluid.layers.cast(shape_fmp[3], dtype="float32")
+ grid_x_act.stop_gradient = True
+ grid_y_act = fluid.layers.cast(shape_fmp[2], dtype="float32")
+ grid_y_act.stop_gradient = True
+ if is_gt:
+ cx = fluid.layers.elementwise_add(dcx, gi) / grid_x_act
+ cx.gradient = True
+ cy = fluid.layers.elementwise_add(dcy, gj) / grid_y_act
+ cy.gradient = True
+ else:
+ dcx_sig = fluid.layers.sigmoid(dcx)
+ dcy_sig = fluid.layers.sigmoid(dcy)
+ if (abs(scale_x_y - 1.0) > eps):
+ dcx_sig = scale_x_y * dcx_sig - 0.5 * (scale_x_y - 1)
+ dcy_sig = scale_x_y * dcy_sig - 0.5 * (scale_x_y - 1)
+ cx = fluid.layers.elementwise_add(dcx_sig, gi) / grid_x_act
+ cy = fluid.layers.elementwise_add(dcy_sig, gj) / grid_y_act
+
+ anchor_w_ = [anchors[i] for i in range(0, len(anchors)) if i % 2 == 0]
+ anchor_w_np = np.array(anchor_w_)
+ anchor_w_np = np.reshape(anchor_w_np, newshape=[1, an_num, 1, 1])
+ anchor_w_np = np.tile(
+ anchor_w_np, reps=[batch_size, 1, grid_y, grid_x])
+ anchor_w_max = self._create_tensor_from_numpy(
+ anchor_w_np.astype(np.float32))
+ anchor_w = fluid.layers.crop(x=anchor_w_max, shape=dcx)
+ anchor_w.stop_gradient = True
+ anchor_h_ = [anchors[i] for i in range(0, len(anchors)) if i % 2 == 1]
+ anchor_h_np = np.array(anchor_h_)
+ anchor_h_np = np.reshape(anchor_h_np, newshape=[1, an_num, 1, 1])
+ anchor_h_np = np.tile(
+ anchor_h_np, reps=[batch_size, 1, grid_y, grid_x])
+ anchor_h_max = self._create_tensor_from_numpy(
+ anchor_h_np.astype(np.float32))
+ anchor_h = fluid.layers.crop(x=anchor_h_max, shape=dcx)
+ anchor_h.stop_gradient = True
+ # e^tw e^th
+ exp_dw = fluid.layers.exp(dw)
+ exp_dh = fluid.layers.exp(dh)
+ pw = fluid.layers.elementwise_mul(exp_dw, anchor_w) / \
+ (grid_x_act * downsample_ratio)
+ ph = fluid.layers.elementwise_mul(exp_dh, anchor_h) / \
+ (grid_y_act * downsample_ratio)
+ if is_gt:
+ exp_dw.stop_gradient = True
+ exp_dh.stop_gradient = True
+ pw.stop_gradient = True
+ ph.stop_gradient = True
+
+ x1 = cx - 0.5 * pw
+ y1 = cy - 0.5 * ph
+ x2 = cx + 0.5 * pw
+ y2 = cy + 0.5 * ph
+ if is_gt:
+ x1.stop_gradient = True
+ y1.stop_gradient = True
+ x2.stop_gradient = True
+ y2.stop_gradient = True
+
+ return x1, y1, x2, y2
+
+ def _create_tensor_from_numpy(self, numpy_array):
+ paddle_array = fluid.layers.create_parameter(
+ attr=ParamAttr(),
+ shape=numpy_array.shape,
+ dtype=numpy_array.dtype,
+ default_initializer=NumpyArrayInitializer(numpy_array))
+ paddle_array.stop_gradient = True
+ return paddle_array
diff --git a/paddlex/cv/nets/detection/loss/yolo_loss.py b/paddlex/cv/nets/detection/loss/yolo_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..4d948600f6f7e00fd05734f64337efa06c208ab4
--- /dev/null
+++ b/paddlex/cv/nets/detection/loss/yolo_loss.py
@@ -0,0 +1,371 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+try:
+ from collections.abc import Sequence
+except Exception:
+ from collections import Sequence
+
+
+class YOLOv3Loss(object):
+ """
+ Combined loss for YOLOv3 network
+
+ Args:
+ batch_size (int): training batch size
+ ignore_thresh (float): threshold to ignore confidence loss
+ label_smooth (bool): whether to use label smoothing
+ use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss
+ instead of fluid.layers.yolov3_loss
+ """
+
+ def __init__(self,
+ batch_size=8,
+ ignore_thresh=0.7,
+ label_smooth=True,
+ use_fine_grained_loss=False,
+ iou_loss=None,
+ iou_aware_loss=None,
+ downsample=[32, 16, 8],
+ scale_x_y=1.,
+ match_score=False):
+ self._batch_size = batch_size
+ self._ignore_thresh = ignore_thresh
+ self._label_smooth = label_smooth
+ self._use_fine_grained_loss = use_fine_grained_loss
+ self._iou_loss = iou_loss
+ self._iou_aware_loss = iou_aware_loss
+ self.downsample = downsample
+ self.scale_x_y = scale_x_y
+ self.match_score = match_score
+
+ def __call__(self, outputs, gt_box, gt_label, gt_score, targets, anchors,
+ anchor_masks, mask_anchors, num_classes, prefix_name):
+ if self._use_fine_grained_loss:
+ return self._get_fine_grained_loss(
+ outputs, targets, gt_box, self._batch_size, num_classes,
+ mask_anchors, self._ignore_thresh)
+ else:
+ losses = []
+ for i, output in enumerate(outputs):
+ scale_x_y = self.scale_x_y if not isinstance(
+ self.scale_x_y, Sequence) else self.scale_x_y[i]
+ anchor_mask = anchor_masks[i]
+ loss = fluid.layers.yolov3_loss(
+ x=output,
+ gt_box=gt_box,
+ gt_label=gt_label,
+ gt_score=gt_score,
+ anchors=anchors,
+ anchor_mask=anchor_mask,
+ class_num=num_classes,
+ ignore_thresh=self._ignore_thresh,
+ downsample_ratio=self.downsample[i],
+ use_label_smooth=self._label_smooth,
+ scale_x_y=scale_x_y,
+ name=prefix_name + "yolo_loss" + str(i))
+
+ losses.append(fluid.layers.reduce_mean(loss))
+
+ return {'loss': sum(losses)}
+
+ def _get_fine_grained_loss(self,
+ outputs,
+ targets,
+ gt_box,
+ batch_size,
+ num_classes,
+ mask_anchors,
+ ignore_thresh,
+ eps=1.e-10):
+ """
+ Calculate fine grained YOLOv3 loss
+
+ Args:
+ outputs ([Variables]): List of Variables, output of backbone stages
+ targets ([Variables]): List of Variables, The targets for yolo
+ loss calculatation.
+ gt_box (Variable): The ground-truth boudding boxes.
+ batch_size (int): The training batch size
+ num_classes (int): class num of dataset
+ mask_anchors ([[float]]): list of anchors in each output layer
+ ignore_thresh (float): prediction bbox overlap any gt_box greater
+ than ignore_thresh, objectness loss will
+ be ignored.
+
+ Returns:
+ Type: dict
+ xy_loss (Variable): YOLOv3 (x, y) coordinates loss
+ wh_loss (Variable): YOLOv3 (w, h) coordinates loss
+ obj_loss (Variable): YOLOv3 objectness score loss
+ cls_loss (Variable): YOLOv3 classification loss
+
+ """
+
+ assert len(outputs) == len(targets), \
+ "YOLOv3 output layer number not equal target number"
+
+ loss_xys, loss_whs, loss_objs, loss_clss = [], [], [], []
+ if self._iou_loss is not None:
+ loss_ious = []
+ if self._iou_aware_loss is not None:
+ loss_iou_awares = []
+ for i, (output, target,
+ anchors) in enumerate(zip(outputs, targets, mask_anchors)):
+ downsample = self.downsample[i]
+ an_num = len(anchors) // 2
+ if self._iou_aware_loss is not None:
+ ioup, output = self._split_ioup(output, an_num, num_classes)
+ x, y, w, h, obj, cls = self._split_output(output, an_num,
+ num_classes)
+ tx, ty, tw, th, tscale, tobj, tcls = self._split_target(target)
+
+ tscale_tobj = tscale * tobj
+
+ scale_x_y = self.scale_x_y if not isinstance(
+ self.scale_x_y, Sequence) else self.scale_x_y[i]
+
+ if (abs(scale_x_y - 1.0) < eps):
+ loss_x = fluid.layers.sigmoid_cross_entropy_with_logits(
+ x, tx) * tscale_tobj
+ loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3])
+ loss_y = fluid.layers.sigmoid_cross_entropy_with_logits(
+ y, ty) * tscale_tobj
+ loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3])
+ else:
+ dx = scale_x_y * fluid.layers.sigmoid(x) - 0.5 * (scale_x_y -
+ 1.0)
+ dy = scale_x_y * fluid.layers.sigmoid(y) - 0.5 * (scale_x_y -
+ 1.0)
+ loss_x = fluid.layers.abs(dx - tx) * tscale_tobj
+ loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3])
+ loss_y = fluid.layers.abs(dy - ty) * tscale_tobj
+ loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3])
+
+ # NOTE: we refined loss function of (w, h) as L1Loss
+ loss_w = fluid.layers.abs(w - tw) * tscale_tobj
+ loss_w = fluid.layers.reduce_sum(loss_w, dim=[1, 2, 3])
+ loss_h = fluid.layers.abs(h - th) * tscale_tobj
+ loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3])
+ if self._iou_loss is not None:
+ loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors,
+ downsample, self._batch_size,
+ scale_x_y)
+ loss_iou = loss_iou * tscale_tobj
+ loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3])
+ loss_ious.append(fluid.layers.reduce_mean(loss_iou))
+
+ if self._iou_aware_loss is not None:
+ loss_iou_aware = self._iou_aware_loss(
+ ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample,
+ self._batch_size, scale_x_y)
+ loss_iou_aware = loss_iou_aware * tobj
+ loss_iou_aware = fluid.layers.reduce_sum(
+ loss_iou_aware, dim=[1, 2, 3])
+ loss_iou_awares.append(
+ fluid.layers.reduce_mean(loss_iou_aware))
+
+ loss_obj_pos, loss_obj_neg = self._calc_obj_loss(
+ output, obj, tobj, gt_box, self._batch_size, anchors,
+ num_classes, downsample, self._ignore_thresh, scale_x_y)
+
+ loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls,
+ tcls)
+ loss_cls = fluid.layers.elementwise_mul(loss_cls, tobj, axis=0)
+ loss_cls = fluid.layers.reduce_sum(loss_cls, dim=[1, 2, 3, 4])
+
+ loss_xys.append(fluid.layers.reduce_mean(loss_x + loss_y))
+ loss_whs.append(fluid.layers.reduce_mean(loss_w + loss_h))
+ loss_objs.append(
+ fluid.layers.reduce_mean(loss_obj_pos + loss_obj_neg))
+ loss_clss.append(fluid.layers.reduce_mean(loss_cls))
+
+ losses_all = {
+ "loss_xy": fluid.layers.sum(loss_xys),
+ "loss_wh": fluid.layers.sum(loss_whs),
+ "loss_obj": fluid.layers.sum(loss_objs),
+ "loss_cls": fluid.layers.sum(loss_clss),
+ }
+ if self._iou_loss is not None:
+ losses_all["loss_iou"] = fluid.layers.sum(loss_ious)
+ if self._iou_aware_loss is not None:
+ losses_all["loss_iou_aware"] = fluid.layers.sum(loss_iou_awares)
+ return losses_all
+
+ def _split_ioup(self, output, an_num, num_classes):
+ """
+ Split output feature map to output, predicted iou
+ along channel dimension
+ """
+ ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num])
+ ioup = fluid.layers.sigmoid(ioup)
+ oriout = fluid.layers.slice(
+ output,
+ axes=[1],
+ starts=[an_num],
+ ends=[an_num * (num_classes + 6)])
+ return (ioup, oriout)
+
+ def _split_output(self, output, an_num, num_classes):
+ """
+ Split output feature map to x, y, w, h, objectness, classification
+ along channel dimension
+ """
+ x = fluid.layers.strided_slice(
+ output,
+ axes=[1],
+ starts=[0],
+ ends=[output.shape[1]],
+ strides=[5 + num_classes])
+ y = fluid.layers.strided_slice(
+ output,
+ axes=[1],
+ starts=[1],
+ ends=[output.shape[1]],
+ strides=[5 + num_classes])
+ w = fluid.layers.strided_slice(
+ output,
+ axes=[1],
+ starts=[2],
+ ends=[output.shape[1]],
+ strides=[5 + num_classes])
+ h = fluid.layers.strided_slice(
+ output,
+ axes=[1],
+ starts=[3],
+ ends=[output.shape[1]],
+ strides=[5 + num_classes])
+ obj = fluid.layers.strided_slice(
+ output,
+ axes=[1],
+ starts=[4],
+ ends=[output.shape[1]],
+ strides=[5 + num_classes])
+ clss = []
+ stride = output.shape[1] // an_num
+ for m in range(an_num):
+ clss.append(
+ fluid.layers.slice(
+ output,
+ axes=[1],
+ starts=[stride * m + 5],
+ ends=[stride * m + 5 + num_classes]))
+ cls = fluid.layers.transpose(
+ fluid.layers.stack(
+ clss, axis=1), perm=[0, 1, 3, 4, 2])
+
+ return (x, y, w, h, obj, cls)
+
+ def _split_target(self, target):
+ """
+ split target to x, y, w, h, objectness, classification
+ along dimension 2
+
+ target is in shape [N, an_num, 6 + class_num, H, W]
+ """
+ tx = target[:, :, 0, :, :]
+ ty = target[:, :, 1, :, :]
+ tw = target[:, :, 2, :, :]
+ th = target[:, :, 3, :, :]
+
+ tscale = target[:, :, 4, :, :]
+ tobj = target[:, :, 5, :, :]
+
+ tcls = fluid.layers.transpose(
+ target[:, :, 6:, :, :], perm=[0, 1, 3, 4, 2])
+ tcls.stop_gradient = True
+
+ return (tx, ty, tw, th, tscale, tobj, tcls)
+
+ def _calc_obj_loss(self, output, obj, tobj, gt_box, batch_size, anchors,
+ num_classes, downsample, ignore_thresh, scale_x_y):
+ # A prediction bbox overlap any gt_bbox over ignore_thresh,
+ # objectness loss will be ignored, process as follows:
+
+ # 1. get pred bbox, which is same with YOLOv3 infer mode, use yolo_box here
+ # NOTE: img_size is set as 1.0 to get noramlized pred bbox
+ bbox, prob = fluid.layers.yolo_box(
+ x=output,
+ img_size=fluid.layers.ones(
+ shape=[batch_size, 2], dtype="int32"),
+ anchors=anchors,
+ class_num=num_classes,
+ conf_thresh=0.,
+ downsample_ratio=downsample,
+ clip_bbox=False,
+ scale_x_y=scale_x_y)
+
+ # 2. split pred bbox and gt bbox by sample, calculate IoU between pred bbox
+ # and gt bbox in each sample
+ if batch_size > 1:
+ preds = fluid.layers.split(bbox, batch_size, dim=0)
+ gts = fluid.layers.split(gt_box, batch_size, dim=0)
+ else:
+ preds = [bbox]
+ gts = [gt_box]
+ probs = [prob]
+ ious = []
+ for pred, gt in zip(preds, gts):
+
+ def box_xywh2xyxy(box):
+ x = box[:, 0]
+ y = box[:, 1]
+ w = box[:, 2]
+ h = box[:, 3]
+ return fluid.layers.stack(
+ [
+ x - w / 2.,
+ y - h / 2.,
+ x + w / 2.,
+ y + h / 2.,
+ ], axis=1)
+
+ pred = fluid.layers.squeeze(pred, axes=[0])
+ gt = box_xywh2xyxy(fluid.layers.squeeze(gt, axes=[0]))
+ ious.append(fluid.layers.iou_similarity(pred, gt))
+
+ iou = fluid.layers.stack(ious, axis=0)
+ # 3. Get iou_mask by IoU between gt bbox and prediction bbox,
+ # Get obj_mask by tobj(holds gt_score), calculate objectness loss
+
+ max_iou = fluid.layers.reduce_max(iou, dim=-1)
+ iou_mask = fluid.layers.cast(max_iou <= ignore_thresh, dtype="float32")
+ if self.match_score:
+ max_prob = fluid.layers.reduce_max(prob, dim=-1)
+ iou_mask = iou_mask * fluid.layers.cast(
+ max_prob <= 0.25, dtype="float32")
+ output_shape = fluid.layers.shape(output)
+ an_num = len(anchors) // 2
+ iou_mask = fluid.layers.reshape(iou_mask, (-1, an_num, output_shape[2],
+ output_shape[3]))
+ iou_mask.stop_gradient = True
+
+ # NOTE: tobj holds gt_score, obj_mask holds object existence mask
+ obj_mask = fluid.layers.cast(tobj > 0., dtype="float32")
+ obj_mask.stop_gradient = True
+
+ # For positive objectness grids, objectness loss should be calculated
+ # For negative objectness grids, objectness loss is calculated only iou_mask == 1.0
+ loss_obj = fluid.layers.sigmoid_cross_entropy_with_logits(obj,
+ obj_mask)
+ loss_obj_pos = fluid.layers.reduce_sum(loss_obj * tobj, dim=[1, 2, 3])
+ loss_obj_neg = fluid.layers.reduce_sum(
+ loss_obj * (1.0 - obj_mask) * iou_mask, dim=[1, 2, 3])
+
+ return loss_obj_pos, loss_obj_neg
diff --git a/paddlex/cv/nets/detection/ops.py b/paddlex/cv/nets/detection/ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1ff6823092f52d8f595bc7a49db3dde2d447c7a
--- /dev/null
+++ b/paddlex/cv/nets/detection/ops.py
@@ -0,0 +1,270 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+from numbers import Integral
+import math
+import six
+
+import paddle
+from paddle import fluid
+
+
+def DropBlock(input, block_size, keep_prob, is_test):
+ if is_test:
+ return input
+
+ def CalculateGamma(input, block_size, keep_prob):
+ input_shape = fluid.layers.shape(input)
+ feat_shape_tmp = fluid.layers.slice(input_shape, [0], [3], [4])
+ feat_shape_tmp = fluid.layers.cast(feat_shape_tmp, dtype="float32")
+ feat_shape_t = fluid.layers.reshape(feat_shape_tmp, [1, 1, 1, 1])
+ feat_area = fluid.layers.pow(feat_shape_t, factor=2)
+
+ block_shape_t = fluid.layers.fill_constant(
+ shape=[1, 1, 1, 1], value=block_size, dtype='float32')
+ block_area = fluid.layers.pow(block_shape_t, factor=2)
+
+ useful_shape_t = feat_shape_t - block_shape_t + 1
+ useful_area = fluid.layers.pow(useful_shape_t, factor=2)
+
+ upper_t = feat_area * (1 - keep_prob)
+ bottom_t = block_area * useful_area
+ output = upper_t / bottom_t
+ return output
+
+ gamma = CalculateGamma(input, block_size=block_size, keep_prob=keep_prob)
+ input_shape = fluid.layers.shape(input)
+ p = fluid.layers.expand_as(gamma, input)
+
+ input_shape_tmp = fluid.layers.cast(input_shape, dtype="int64")
+ random_matrix = fluid.layers.uniform_random(
+ input_shape_tmp, dtype='float32', min=0.0, max=1.0)
+ one_zero_m = fluid.layers.less_than(random_matrix, p)
+ one_zero_m.stop_gradient = True
+ one_zero_m = fluid.layers.cast(one_zero_m, dtype="float32")
+
+ mask_flag = fluid.layers.pool2d(
+ one_zero_m,
+ pool_size=block_size,
+ pool_type='max',
+ pool_stride=1,
+ pool_padding=block_size // 2)
+ mask = 1.0 - mask_flag
+
+ elem_numel = fluid.layers.reduce_prod(input_shape)
+ elem_numel_m = fluid.layers.cast(elem_numel, dtype="float32")
+ elem_numel_m.stop_gradient = True
+
+ elem_sum = fluid.layers.reduce_sum(mask)
+ elem_sum_m = fluid.layers.cast(elem_sum, dtype="float32")
+ elem_sum_m.stop_gradient = True
+
+ output = input * mask * elem_numel_m / elem_sum_m
+ return output
+
+
+class MultiClassNMS(object):
+ def __init__(self,
+ score_threshold=.05,
+ nms_top_k=-1,
+ keep_top_k=100,
+ nms_threshold=.5,
+ normalized=False,
+ nms_eta=1.0,
+ background_label=0):
+ super(MultiClassNMS, self).__init__()
+ self.score_threshold = score_threshold
+ self.nms_top_k = nms_top_k
+ self.keep_top_k = keep_top_k
+ self.nms_threshold = nms_threshold
+ self.normalized = normalized
+ self.nms_eta = nms_eta
+ self.background_label = background_label
+
+ def __call__(self, bboxes, scores):
+ return fluid.layers.multiclass_nms(
+ bboxes=bboxes,
+ scores=scores,
+ score_threshold=self.score_threshold,
+ nms_top_k=self.nms_top_k,
+ keep_top_k=self.keep_top_k,
+ normalized=self.normalized,
+ nms_threshold=self.nms_threshold,
+ nms_eta=self.nms_eta,
+ background_label=self.background_label)
+
+
+class MatrixNMS(object):
+ def __init__(self,
+ score_threshold=.05,
+ post_threshold=.05,
+ nms_top_k=-1,
+ keep_top_k=100,
+ use_gaussian=False,
+ gaussian_sigma=2.,
+ normalized=False,
+ background_label=0):
+ super(MatrixNMS, self).__init__()
+ self.score_threshold = score_threshold
+ self.post_threshold = post_threshold
+ self.nms_top_k = nms_top_k
+ self.keep_top_k = keep_top_k
+ self.normalized = normalized
+ self.use_gaussian = use_gaussian
+ self.gaussian_sigma = gaussian_sigma
+ self.background_label = background_label
+
+ def __call__(self, bboxes, scores):
+ return paddle.fluid.layers.matrix_nms(
+ bboxes=bboxes,
+ scores=scores,
+ score_threshold=self.score_threshold,
+ post_threshold=self.post_threshold,
+ nms_top_k=self.nms_top_k,
+ keep_top_k=self.keep_top_k,
+ normalized=self.normalized,
+ use_gaussian=self.use_gaussian,
+ gaussian_sigma=self.gaussian_sigma,
+ background_label=self.background_label)
+
+
+class MultiClassSoftNMS(object):
+ def __init__(
+ self,
+ score_threshold=0.01,
+ keep_top_k=300,
+ softnms_sigma=0.5,
+ normalized=False,
+ background_label=0, ):
+ super(MultiClassSoftNMS, self).__init__()
+ self.score_threshold = score_threshold
+ self.keep_top_k = keep_top_k
+ self.softnms_sigma = softnms_sigma
+ self.normalized = normalized
+ self.background_label = background_label
+
+ def __call__(self, bboxes, scores):
+ def create_tmp_var(program, name, dtype, shape, lod_level):
+ return program.current_block().create_var(
+ name=name, dtype=dtype, shape=shape, lod_level=lod_level)
+
+ def _soft_nms_for_cls(dets, sigma, thres):
+ """soft_nms_for_cls"""
+ dets_final = []
+ while len(dets) > 0:
+ maxpos = np.argmax(dets[:, 0])
+ dets_final.append(dets[maxpos].copy())
+ ts, tx1, ty1, tx2, ty2 = dets[maxpos]
+ scores = dets[:, 0]
+ # force remove bbox at maxpos
+ scores[maxpos] = -1
+ x1 = dets[:, 1]
+ y1 = dets[:, 2]
+ x2 = dets[:, 3]
+ y2 = dets[:, 4]
+ eta = 0 if self.normalized else 1
+ areas = (x2 - x1 + eta) * (y2 - y1 + eta)
+ xx1 = np.maximum(tx1, x1)
+ yy1 = np.maximum(ty1, y1)
+ xx2 = np.minimum(tx2, x2)
+ yy2 = np.minimum(ty2, y2)
+ w = np.maximum(0.0, xx2 - xx1 + eta)
+ h = np.maximum(0.0, yy2 - yy1 + eta)
+ inter = w * h
+ ovr = inter / (areas + areas[maxpos] - inter)
+ weight = np.exp(-(ovr * ovr) / sigma)
+ scores = scores * weight
+ idx_keep = np.where(scores >= thres)
+ dets[:, 0] = scores
+ dets = dets[idx_keep]
+ dets_final = np.array(dets_final).reshape(-1, 5)
+ return dets_final
+
+ def _soft_nms(bboxes, scores):
+ class_nums = scores.shape[-1]
+
+ softnms_thres = self.score_threshold
+ softnms_sigma = self.softnms_sigma
+ keep_top_k = self.keep_top_k
+
+ cls_boxes = [[] for _ in range(class_nums)]
+ cls_ids = [[] for _ in range(class_nums)]
+
+ start_idx = 1 if self.background_label == 0 else 0
+ for j in range(start_idx, class_nums):
+ inds = np.where(scores[:, j] >= softnms_thres)[0]
+ scores_j = scores[inds, j]
+ rois_j = bboxes[inds, j, :] if len(
+ bboxes.shape) > 2 else bboxes[inds, :]
+ dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
+ np.float32, copy=False)
+ cls_rank = np.argsort(-dets_j[:, 0])
+ dets_j = dets_j[cls_rank]
+
+ cls_boxes[j] = _soft_nms_for_cls(
+ dets_j, sigma=softnms_sigma, thres=softnms_thres)
+ cls_ids[j] = np.array([j] * cls_boxes[j].shape[0]).reshape(-1,
+ 1)
+
+ cls_boxes = np.vstack(cls_boxes[start_idx:])
+ cls_ids = np.vstack(cls_ids[start_idx:])
+ pred_result = np.hstack([cls_ids, cls_boxes])
+
+ # Limit to max_per_image detections **over all classes**
+ image_scores = cls_boxes[:, 0]
+ if len(image_scores) > keep_top_k:
+ image_thresh = np.sort(image_scores)[-keep_top_k]
+ keep = np.where(cls_boxes[:, 0] >= image_thresh)[0]
+ pred_result = pred_result[keep, :]
+
+ return pred_result
+
+ def _batch_softnms(bboxes, scores):
+ batch_offsets = bboxes.lod()
+ bboxes = np.array(bboxes)
+ scores = np.array(scores)
+ out_offsets = [0]
+ pred_res = []
+ if len(batch_offsets) > 0:
+ batch_offset = batch_offsets[0]
+ for i in range(len(batch_offset) - 1):
+ s, e = batch_offset[i], batch_offset[i + 1]
+ pred = _soft_nms(bboxes[s:e], scores[s:e])
+ out_offsets.append(pred.shape[0] + out_offsets[-1])
+ pred_res.append(pred)
+ else:
+ assert len(bboxes.shape) == 3
+ assert len(scores.shape) == 3
+ for i in range(bboxes.shape[0]):
+ pred = _soft_nms(bboxes[i], scores[i])
+ out_offsets.append(pred.shape[0] + out_offsets[-1])
+ pred_res.append(pred)
+
+ res = fluid.LoDTensor()
+ res.set_lod([out_offsets])
+ if len(pred_res) == 0:
+ pred_res = np.array([[1]], dtype=np.float32)
+ res.set(np.vstack(pred_res).astype(np.float32), fluid.CPUPlace())
+ return res
+
+ pred_result = create_tmp_var(
+ fluid.default_main_program(),
+ name='softnms_pred_result',
+ dtype='float32',
+ shape=[-1, 6],
+ lod_level=1)
+ fluid.layers.py_func(
+ func=_batch_softnms, x=[bboxes, scores], out=pred_result)
+ return pred_result
diff --git a/paddlex/cv/nets/detection/yolo_v3.py b/paddlex/cv/nets/detection/yolo_v3.py
index 4b02132e2b4df9b4bc0f5eaaf72271d53bd31dee..b73cdc768737a54ff6b01eb7977c3c508ba5c0e3 100644
--- a/paddlex/cv/nets/detection/yolo_v3.py
+++ b/paddlex/cv/nets/detection/yolo_v3.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -16,25 +16,50 @@ from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from collections import OrderedDict
+from .ops import MultiClassNMS, MultiClassSoftNMS, MatrixNMS
+from .ops import DropBlock
+from .loss.yolo_loss import YOLOv3Loss
+from .loss.iou_loss import IouLoss
+from .loss.iou_aware_loss import IouAwareLoss
+from .iou_aware import get_iou_aware_score
+try:
+ from collections.abc import Sequence
+except Exception:
+ from collections import Sequence
class YOLOv3:
- def __init__(self,
- backbone,
- num_classes,
- mode='train',
- anchors=None,
- anchor_masks=None,
- ignore_threshold=0.7,
- label_smooth=False,
- nms_score_threshold=0.01,
- nms_topk=1000,
- nms_keep_topk=100,
- nms_iou_threshold=0.45,
- train_random_shapes=[
- 320, 352, 384, 416, 448, 480, 512, 544, 576, 608
- ],
- fixed_input_shape=None):
+ def __init__(
+ self,
+ backbone,
+ mode='train',
+ # YOLOv3Head
+ num_classes=80,
+ anchors=None,
+ anchor_masks=None,
+ coord_conv=False,
+ iou_aware=False,
+ iou_aware_factor=0.4,
+ scale_x_y=1.,
+ spp=False,
+ drop_block=False,
+ use_matrix_nms=False,
+ # YOLOv3Loss
+ batch_size=8,
+ ignore_threshold=0.7,
+ label_smooth=False,
+ use_fine_grained_loss=False,
+ use_iou_loss=False,
+ iou_loss_weight=2.5,
+ iou_aware_loss_weight=1.0,
+ max_height=608,
+ max_width=608,
+ # NMS
+ nms_score_threshold=0.01,
+ nms_topk=1000,
+ nms_keep_topk=100,
+ nms_iou_threshold=0.45,
+ fixed_input_shape=None):
if anchors is None:
anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]]
@@ -46,56 +71,114 @@ class YOLOv3:
self.mode = mode
self.num_classes = num_classes
self.backbone = backbone
- self.ignore_thresh = ignore_threshold
- self.label_smooth = label_smooth
- self.nms_score_threshold = nms_score_threshold
- self.nms_topk = nms_topk
- self.nms_keep_topk = nms_keep_topk
- self.nms_iou_threshold = nms_iou_threshold
self.norm_decay = 0.0
self.prefix_name = ''
- self.train_random_shapes = train_random_shapes
+ self.use_fine_grained_loss = use_fine_grained_loss
self.fixed_input_shape = fixed_input_shape
+ self.coord_conv = coord_conv
+ self.iou_aware = iou_aware
+ self.iou_aware_factor = iou_aware_factor
+ self.scale_x_y = scale_x_y
+ self.use_spp = spp
+ self.drop_block = drop_block
- def _head(self, feats):
+ if use_matrix_nms:
+ self.nms = MatrixNMS(
+ background_label=-1,
+ keep_top_k=nms_keep_topk,
+ normalized=False,
+ score_threshold=nms_score_threshold,
+ post_threshold=0.01)
+ else:
+ self.nms = MultiClassNMS(
+ background_label=-1,
+ keep_top_k=nms_keep_topk,
+ nms_threshold=nms_iou_threshold,
+ nms_top_k=nms_topk,
+ normalized=False,
+ score_threshold=nms_score_threshold)
+ self.iou_loss = None
+ self.iou_aware_loss = None
+ if use_iou_loss:
+ self.iou_loss = IouLoss(
+ loss_weight=iou_loss_weight,
+ max_height=max_height,
+ max_width=max_width)
+ if iou_aware:
+ self.iou_aware_loss = IouAwareLoss(
+ loss_weight=iou_aware_loss_weight,
+ max_height=max_height,
+ max_width=max_width)
+ self.yolo_loss = YOLOv3Loss(
+ batch_size=batch_size,
+ ignore_thresh=ignore_threshold,
+ scale_x_y=scale_x_y,
+ label_smooth=label_smooth,
+ use_fine_grained_loss=self.use_fine_grained_loss,
+ iou_loss=self.iou_loss,
+ iou_aware_loss=self.iou_aware_loss)
+ self.conv_block_num = 2
+ self.block_size = 3
+ self.keep_prob = 0.9
+ self.downsample = [32, 16, 8]
+ self.clip_bbox = True
+
+ def _head(self, input, is_train=True):
outputs = []
+
+ # get last out_layer_num blocks in reverse order
out_layer_num = len(self.anchor_masks)
- blocks = feats[-1:-out_layer_num - 1:-1]
- route = None
+ blocks = input[-1:-out_layer_num - 1:-1]
+ route = None
for i, block in enumerate(blocks):
- if i > 0:
+ if i > 0: # perform concat in first 2 detection_block
block = fluid.layers.concat(input=[route, block], axis=1)
route, tip = self._detection_block(
block,
- channel=512 // (2**i),
- name=self.prefix_name + 'yolo_block.{}'.format(i))
+ channel=64 * (2**out_layer_num) // (2**i),
+ is_first=i == 0,
+ is_test=(not is_train),
+ conv_block_num=self.conv_block_num,
+ name=self.prefix_name + "yolo_block.{}".format(i))
- num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
- block_out = fluid.layers.conv2d(
- input=tip,
- num_filters=num_filters,
- filter_size=1,
- stride=1,
- padding=0,
- act=None,
- param_attr=ParamAttr(name=self.prefix_name +
- 'yolo_output.{}.conv.weights'.format(i)),
- bias_attr=ParamAttr(
- regularizer=L2Decay(0.0),
- name=self.prefix_name +
- 'yolo_output.{}.conv.bias'.format(i)))
- outputs.append(block_out)
+ # out channel number = mask_num * (5 + class_num)
+ if self.iou_aware:
+ num_filters = len(self.anchor_masks[i]) * (
+ self.num_classes + 6)
+ else:
+ num_filters = len(self.anchor_masks[i]) * (
+ self.num_classes + 5)
+ with fluid.name_scope('yolo_output'):
+ block_out = fluid.layers.conv2d(
+ input=tip,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=None,
+ param_attr=ParamAttr(
+ name=self.prefix_name +
+ "yolo_output.{}.conv.weights".format(i)),
+ bias_attr=ParamAttr(
+ regularizer=L2Decay(0.),
+ name=self.prefix_name +
+ "yolo_output.{}.conv.bias".format(i)))
+ outputs.append(block_out)
if i < len(blocks) - 1:
+ # do not perform upsample in the last detection_block
route = self._conv_bn(
input=route,
ch_out=256 // (2**i),
filter_size=1,
stride=1,
padding=0,
- name=self.prefix_name + 'yolo_transition.{}'.format(i))
+ is_test=(not is_train),
+ name=self.prefix_name + "yolo_transition.{}".format(i))
+ # upsample
route = self._upsample(route)
+
return outputs
def _parse_anchors(self, anchors):
@@ -116,6 +199,54 @@ class YOLOv3:
assert mask < anchor_num, "anchor mask index overflow"
self.mask_anchors[-1].extend(anchors[mask])
+ def _create_tensor_from_numpy(self, numpy_array):
+ paddle_array = fluid.layers.create_global_var(
+ shape=numpy_array.shape, value=0., dtype=numpy_array.dtype)
+ fluid.layers.assign(numpy_array, paddle_array)
+ return paddle_array
+
+ def _add_coord(self, input, is_test=True):
+ if not self.coord_conv:
+ return input
+
+ # NOTE: here is used for exporting model for TensorRT inference,
+ # only support batch_size=1 for input shape should be fixed,
+ # and we create tensor with fixed shape from numpy array
+ if is_test and input.shape[2] > 0 and input.shape[3] > 0:
+ batch_size = 1
+ grid_x = int(input.shape[3])
+ grid_y = int(input.shape[2])
+ idx_i = np.array(
+ [[i / (grid_x - 1) * 2.0 - 1 for i in range(grid_x)]],
+ dtype='float32')
+ gi_np = np.repeat(idx_i, grid_y, axis=0)
+ gi_np = np.reshape(gi_np, newshape=[1, 1, grid_y, grid_x])
+ gi_np = np.tile(gi_np, reps=[batch_size, 1, 1, 1])
+
+ x_range = self._create_tensor_from_numpy(gi_np.astype(np.float32))
+ x_range.stop_gradient = True
+ y_range = self._create_tensor_from_numpy(
+ gi_np.transpose([0, 1, 3, 2]).astype(np.float32))
+ y_range.stop_gradient = True
+
+ # NOTE: in training mode, H and W is variable for random shape,
+ # implement add_coord with shape as Variable
+ else:
+ input_shape = fluid.layers.shape(input)
+ b = input_shape[0]
+ h = input_shape[2]
+ w = input_shape[3]
+
+ x_range = fluid.layers.range(0, w, 1, 'float32') / ((w - 1.) / 2.)
+ x_range = x_range - 1.
+ x_range = fluid.layers.unsqueeze(x_range, [0, 1, 2])
+ x_range = fluid.layers.expand(x_range, [b, 1, h, 1])
+ x_range.stop_gradient = True
+ y_range = fluid.layers.transpose(x_range, [0, 1, 3, 2])
+ y_range.stop_gradient = True
+
+ return fluid.layers.concat([input, x_range, y_range], axis=1)
+
def _conv_bn(self,
input,
ch_out,
@@ -151,18 +282,52 @@ class YOLOv3:
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
+ def _spp_module(self, input, is_test=True, name=""):
+ output1 = input
+ output2 = fluid.layers.pool2d(
+ input=output1,
+ pool_size=5,
+ pool_stride=1,
+ pool_padding=2,
+ ceil_mode=False,
+ pool_type='max')
+ output3 = fluid.layers.pool2d(
+ input=output1,
+ pool_size=9,
+ pool_stride=1,
+ pool_padding=4,
+ ceil_mode=False,
+ pool_type='max')
+ output4 = fluid.layers.pool2d(
+ input=output1,
+ pool_size=13,
+ pool_stride=1,
+ pool_padding=6,
+ ceil_mode=False,
+ pool_type='max')
+ output = fluid.layers.concat(
+ input=[output1, output2, output3, output4], axis=1)
+ return output
+
def _upsample(self, input, scale=2, name=None):
out = fluid.layers.resize_nearest(
input=input, scale=float(scale), name=name, align_corners=False)
return out
- def _detection_block(self, input, channel, name=None):
- assert channel % 2 == 0, "channel({}) cannot be divided by 2 in detection block({})".format(
- channel, name)
+ def _detection_block(self,
+ input,
+ channel,
+ conv_block_num=2,
+ is_first=False,
+ is_test=True,
+ name=None):
+ assert channel % 2 == 0, \
+ "channel {} cannot be divided by 2 in detection block {}" \
+ .format(channel, name)
- is_test = False if self.mode == 'train' else True
conv = input
- for i in range(2):
+ for j in range(conv_block_num):
+ conv = self._add_coord(conv, is_test=is_test)
conv = self._conv_bn(
conv,
channel,
@@ -170,7 +335,17 @@ class YOLOv3:
stride=1,
padding=0,
is_test=is_test,
- name='{}.{}.0'.format(name, i))
+ name='{}.{}.0'.format(name, j))
+ if self.use_spp and is_first and j == 1:
+ conv = self._spp_module(conv, is_test=is_test, name="spp")
+ conv = self._conv_bn(
+ conv,
+ 512,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ is_test=is_test,
+ name='{}.{}.spp.conv'.format(name, j))
conv = self._conv_bn(
conv,
channel * 2,
@@ -178,7 +353,21 @@ class YOLOv3:
stride=1,
padding=1,
is_test=is_test,
- name='{}.{}.1'.format(name, i))
+ name='{}.{}.1'.format(name, j))
+ if self.drop_block and j == 0 and not is_first:
+ conv = DropBlock(
+ conv,
+ block_size=self.block_size,
+ keep_prob=self.keep_prob,
+ is_test=is_test)
+
+ if self.drop_block and is_first:
+ conv = DropBlock(
+ conv,
+ block_size=self.block_size,
+ keep_prob=self.keep_prob,
+ is_test=is_test)
+ conv = self._add_coord(conv, is_test=is_test)
route = self._conv_bn(
conv,
channel,
@@ -187,8 +376,9 @@ class YOLOv3:
padding=0,
is_test=is_test,
name='{}.2'.format(name))
+ new_route = self._add_coord(route, is_test=is_test)
tip = self._conv_bn(
- route,
+ new_route,
channel * 2,
filter_size=3,
stride=1,
@@ -197,54 +387,44 @@ class YOLOv3:
name='{}.tip'.format(name))
return route, tip
- def _get_loss(self, inputs, gt_box, gt_label, gt_score):
- losses = []
- downsample = 32
- for i, input in enumerate(inputs):
- loss = fluid.layers.yolov3_loss(
- x=input,
- gt_box=gt_box,
- gt_label=gt_label,
- gt_score=gt_score,
- anchors=self.anchors,
- anchor_mask=self.anchor_masks[i],
- class_num=self.num_classes,
- ignore_thresh=self.ignore_thresh,
- downsample_ratio=downsample,
- use_label_smooth=self.label_smooth,
- name=self.prefix_name + 'yolo_loss' + str(i))
- losses.append(fluid.layers.reduce_mean(loss))
- downsample //= 2
- return sum(losses)
+ def _get_loss(self, inputs, gt_box, gt_label, gt_score, targets):
+ loss = self.yolo_loss(inputs, gt_box, gt_label, gt_score, targets,
+ self.anchors, self.anchor_masks,
+ self.mask_anchors, self.num_classes,
+ self.prefix_name)
+ total_loss = fluid.layers.sum(list(loss.values()))
+ return total_loss
def _get_prediction(self, inputs, im_size):
boxes = []
scores = []
- downsample = 32
for i, input in enumerate(inputs):
+ if self.iou_aware:
+ input = get_iou_aware_score(input,
+ len(self.anchor_masks[i]),
+ self.num_classes,
+ self.iou_aware_factor)
+ scale_x_y = self.scale_x_y if not isinstance(
+ self.scale_x_y, Sequence) else self.scale_x_y[i]
+
box, score = fluid.layers.yolo_box(
x=input,
img_size=im_size,
anchors=self.mask_anchors[i],
class_num=self.num_classes,
- conf_thresh=self.nms_score_threshold,
- downsample_ratio=downsample,
- name=self.prefix_name + 'yolo_box' + str(i))
+ conf_thresh=self.nms.score_threshold,
+ downsample_ratio=self.downsample[i],
+ name=self.prefix_name + 'yolo_box' + str(i),
+ clip_bbox=self.clip_bbox,
+ scale_x_y=self.scale_x_y)
boxes.append(box)
scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
- downsample //= 2
+
yolo_boxes = fluid.layers.concat(boxes, axis=1)
yolo_scores = fluid.layers.concat(scores, axis=2)
- pred = fluid.layers.multiclass_nms(
- bboxes=yolo_boxes,
- scores=yolo_scores,
- score_threshold=self.nms_score_threshold,
- nms_top_k=self.nms_topk,
- keep_top_k=self.nms_keep_topk,
- nms_threshold=self.nms_iou_threshold,
- normalized=False,
- nms_eta=1.0,
- background_label=-1)
+ if type(self.nms) is MultiClassSoftNMS:
+ yolo_scores = fluid.layers.transpose(yolo_scores, perm=[0, 2, 1])
+ pred = self.nms(bboxes=yolo_boxes, scores=yolo_scores)
return pred
def generate_inputs(self):
@@ -267,6 +447,25 @@ class YOLOv3:
dtype='float32', shape=[None, None], name='gt_score')
inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size')
+ if self.use_fine_grained_loss:
+ downsample = 32
+ for i, mask in enumerate(self.anchor_masks):
+ if self.fixed_input_shape is not None:
+ target_shape = [
+ self.fixed_input_shape[1] // downsample,
+ self.fixed_input_shape[0] // downsample
+ ]
+ else:
+ target_shape = [None, None]
+ inputs['target{}'.format(i)] = fluid.data(
+ dtype='float32',
+ lod_level=0,
+ shape=[
+ None, len(mask), 6 + self.num_classes,
+ target_shape[0], target_shape[1]
+ ],
+ name='target{}'.format(i))
+ downsample //= 2
elif self.mode == 'eval':
inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size')
@@ -285,28 +484,12 @@ class YOLOv3:
def build_net(self, inputs):
image = inputs['image']
- if self.mode == 'train':
- if isinstance(self.train_random_shapes,
- (list, tuple)) and len(self.train_random_shapes) > 0:
- import numpy as np
- shapes = np.array(self.train_random_shapes)
- shapes = np.stack([shapes, shapes], axis=1).astype('float32')
- shapes_tensor = fluid.layers.assign(shapes)
- index = fluid.layers.uniform_random(
- shape=[1], dtype='float32', min=0.0, max=1)
- index = fluid.layers.cast(
- index * len(self.train_random_shapes), dtype='int32')
- shape = fluid.layers.gather(shapes_tensor, index)
- shape = fluid.layers.reshape(shape, [-1])
- shape = fluid.layers.cast(shape, dtype='int32')
- image = fluid.layers.resize_nearest(
- image, out_shape=shape, align_corners=False)
feats = self.backbone(image)
if isinstance(feats, OrderedDict):
feat_names = list(feats.keys())
feats = [feats[name] for name in feat_names]
- head_outputs = self._head(feats)
+ head_outputs = self._head(feats, self.mode == 'train')
if self.mode == 'train':
gt_box = inputs['gt_box']
gt_label = inputs['gt_label']
@@ -320,8 +503,15 @@ class YOLOv3:
whwh = fluid.layers.cast(whwh, dtype='float32')
whwh.stop_gradient = True
normalized_box = fluid.layers.elementwise_div(gt_box, whwh)
+
+ targets = []
+ if self.use_fine_grained_loss:
+ for i, mask in enumerate(self.anchor_masks):
+ k = 'target{}'.format(i)
+ if k in inputs:
+ targets.append(inputs[k])
return self._get_loss(head_outputs, normalized_box, gt_label,
- gt_score)
+ gt_score, targets)
else:
im_size = inputs['im_size']
return self._get_prediction(head_outputs, im_size)
diff --git a/paddlex/cv/nets/mobilenet_v1.py b/paddlex/cv/nets/mobilenet_v1.py
index c9b99255fb36eb9a9b44ea12ba5ed3c099620db4..01c9ed1750f3909330d917842625e39a38b11cae 100755
--- a/paddlex/cv/nets/mobilenet_v1.py
+++ b/paddlex/cv/nets/mobilenet_v1.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/mobilenet_v2.py b/paddlex/cv/nets/mobilenet_v2.py
index ee0db962e7c4906d8e6a079f63a3db13e5debbef..0d4421be76fbbf4ca09bb532e5ca04bf41254e7b 100644
--- a/paddlex/cv/nets/mobilenet_v2.py
+++ b/paddlex/cv/nets/mobilenet_v2.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -30,10 +30,10 @@ class MobileNetV2:
self.output_stride = output_stride
self.end_points = end_points
self.decode_points = decode_points
- self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2),
- (6, 32, 3, 2), (6, 64, 4, 2),
- (6, 96, 3, 1), (6, 160, 3, 2),
- (6, 320, 1, 1)]
+ self.bottleneck_params_list = [
+ (1, 16, 1, 1), (6, 24, 2, 2), (6, 32, 3, 2), (6, 64, 4, 2),
+ (6, 96, 3, 1), (6, 160, 3, 2), (6, 320, 1, 1)
+ ]
self.modify_bottle_params(output_stride)
def __call__(self, input):
@@ -104,11 +104,10 @@ class MobileNetV2:
output = fluid.layers.pool2d(
input=output, pool_type='avg', global_pooling=True)
- output = fluid.layers.fc(
- input=output,
- size=self.num_classes,
- param_attr=ParamAttr(name='fc10_weights'),
- bias_attr=ParamAttr(name='fc10_offset'))
+ output = fluid.layers.fc(input=output,
+ size=self.num_classes,
+ param_attr=ParamAttr(name='fc10_weights'),
+ bias_attr=ParamAttr(name='fc10_offset'))
return output
def modify_bottle_params(self, output_stride=None):
@@ -239,4 +238,4 @@ class MobileNetV2:
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
- return last_residual_block, depthwise_output
\ No newline at end of file
+ return last_residual_block, depthwise_output
diff --git a/paddlex/cv/nets/mobilenet_v3.py b/paddlex/cv/nets/mobilenet_v3.py
index 6adcee03d7bb9c5ffab0ceb7198083e3534e7ab9..67a43bf193d28ad7ddd92815a17d6619ba4475ba 100644
--- a/paddlex/cv/nets/mobilenet_v3.py
+++ b/paddlex/cv/nets/mobilenet_v3.py
@@ -42,7 +42,9 @@ class MobileNetV3():
extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]],
num_classes=None,
- lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0]):
+ lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
+ for_seg=False,
+ output_stride=None):
assert len(lr_mult_list) == 5, \
"lr_mult_list length in MobileNetV3 must be 5 but got {}!!".format(
len(lr_mult_list))
@@ -57,48 +59,112 @@ class MobileNetV3():
self.num_classes = num_classes
self.lr_mult_list = lr_mult_list
self.curr_stage = 0
- if model_name == "large":
- self.cfg = [
- # kernel_size, expand, channel, se_block, act_mode, stride
- [3, 16, 16, False, 'relu', 1],
- [3, 64, 24, False, 'relu', 2],
- [3, 72, 24, False, 'relu', 1],
- [5, 72, 40, True, 'relu', 2],
- [5, 120, 40, True, 'relu', 1],
- [5, 120, 40, True, 'relu', 1],
- [3, 240, 80, False, 'hard_swish', 2],
- [3, 200, 80, False, 'hard_swish', 1],
- [3, 184, 80, False, 'hard_swish', 1],
- [3, 184, 80, False, 'hard_swish', 1],
- [3, 480, 112, True, 'hard_swish', 1],
- [3, 672, 112, True, 'hard_swish', 1],
- [5, 672, 160, True, 'hard_swish', 2],
- [5, 960, 160, True, 'hard_swish', 1],
- [5, 960, 160, True, 'hard_swish', 1],
- ]
- self.cls_ch_squeeze = 960
- self.cls_ch_expand = 1280
- self.lr_interval = 3
- elif model_name == "small":
- self.cfg = [
- # kernel_size, expand, channel, se_block, act_mode, stride
- [3, 16, 16, True, 'relu', 2],
- [3, 72, 24, False, 'relu', 2],
- [3, 88, 24, False, 'relu', 1],
- [5, 96, 40, True, 'hard_swish', 2],
- [5, 240, 40, True, 'hard_swish', 1],
- [5, 240, 40, True, 'hard_swish', 1],
- [5, 120, 48, True, 'hard_swish', 1],
- [5, 144, 48, True, 'hard_swish', 1],
- [5, 288, 96, True, 'hard_swish', 2],
- [5, 576, 96, True, 'hard_swish', 1],
- [5, 576, 96, True, 'hard_swish', 1],
- ]
- self.cls_ch_squeeze = 576
- self.cls_ch_expand = 1280
- self.lr_interval = 2
+ self.for_seg = for_seg
+ self.decode_point = None
+
+ if self.for_seg:
+ if model_name == "large":
+ self.cfg = [
+ # k, exp, c, se, nl, s,
+ [3, 16, 16, False, 'relu', 1],
+ [3, 64, 24, False, 'relu', 2],
+ [3, 72, 24, False, 'relu', 1],
+ [5, 72, 40, True, 'relu', 2],
+ [5, 120, 40, True, 'relu', 1],
+ [5, 120, 40, True, 'relu', 1],
+ [3, 240, 80, False, 'hard_swish', 2],
+ [3, 200, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 480, 112, True, 'hard_swish', 1],
+ [3, 672, 112, True, 'hard_swish', 1],
+ # The number of channels in the last 4 stages is reduced by a
+ # factor of 2 compared to the standard implementation.
+ [5, 336, 80, True, 'hard_swish', 2],
+ [5, 480, 80, True, 'hard_swish', 1],
+ [5, 480, 80, True, 'hard_swish', 1],
+ ]
+ self.cls_ch_squeeze = 480
+ self.cls_ch_expand = 1280
+ self.lr_interval = 3
+ elif model_name == "small":
+ self.cfg = [
+ # k, exp, c, se, nl, s,
+ [3, 16, 16, True, 'relu', 2],
+ [3, 72, 24, False, 'relu', 2],
+ [3, 88, 24, False, 'relu', 1],
+ [5, 96, 40, True, 'hard_swish', 2],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 120, 48, True, 'hard_swish', 1],
+ [5, 144, 48, True, 'hard_swish', 1],
+ # The number of channels in the last 4 stages is reduced by a
+ # factor of 2 compared to the standard implementation.
+ [5, 144, 48, True, 'hard_swish', 2],
+ [5, 288, 48, True, 'hard_swish', 1],
+ [5, 288, 48, True, 'hard_swish', 1],
+ ]
+ else:
+ raise NotImplementedError
else:
- raise NotImplementedError
+ if model_name == "large":
+ self.cfg = [
+ # kernel_size, expand, channel, se_block, act_mode, stride
+ [3, 16, 16, False, 'relu', 1],
+ [3, 64, 24, False, 'relu', 2],
+ [3, 72, 24, False, 'relu', 1],
+ [5, 72, 40, True, 'relu', 2],
+ [5, 120, 40, True, 'relu', 1],
+ [5, 120, 40, True, 'relu', 1],
+ [3, 240, 80, False, 'hard_swish', 2],
+ [3, 200, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 480, 112, True, 'hard_swish', 1],
+ [3, 672, 112, True, 'hard_swish', 1],
+ [5, 672, 160, True, 'hard_swish', 2],
+ [5, 960, 160, True, 'hard_swish', 1],
+ [5, 960, 160, True, 'hard_swish', 1],
+ ]
+ self.cls_ch_squeeze = 960
+ self.cls_ch_expand = 1280
+ self.lr_interval = 3
+ elif model_name == "small":
+ self.cfg = [
+ # kernel_size, expand, channel, se_block, act_mode, stride
+ [3, 16, 16, True, 'relu', 2],
+ [3, 72, 24, False, 'relu', 2],
+ [3, 88, 24, False, 'relu', 1],
+ [5, 96, 40, True, 'hard_swish', 2],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 120, 48, True, 'hard_swish', 1],
+ [5, 144, 48, True, 'hard_swish', 1],
+ [5, 288, 96, True, 'hard_swish', 2],
+ [5, 576, 96, True, 'hard_swish', 1],
+ [5, 576, 96, True, 'hard_swish', 1],
+ ]
+ self.cls_ch_squeeze = 576
+ self.cls_ch_expand = 1280
+ self.lr_interval = 2
+ else:
+ raise NotImplementedError
+
+ if self.for_seg:
+ self.modify_bottle_params(output_stride)
+
+ def modify_bottle_params(self, output_stride=None):
+ if output_stride is not None and output_stride % 2 != 0:
+ raise Exception("output stride must to be even number")
+ if output_stride is None:
+ return
+ else:
+ stride = 2
+ for i, _cfg in enumerate(self.cfg):
+ stride = stride * _cfg[-1]
+ if stride > output_stride:
+ s = 1
+ self.cfg[i][-1] = s
def _conv_bn_layer(self,
input,
@@ -153,6 +219,14 @@ class MobileNetV3():
bn = fluid.layers.relu6(bn)
return bn
+ def make_divisible(self, v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
def _hard_swish(self, x):
return x * fluid.layers.relu6(x + 3) / 6.
@@ -220,6 +294,9 @@ class MobileNetV3():
use_cudnn=False,
name=name + '_depthwise')
+ if self.curr_stage == 5:
+ self.decode_point = conv1
+
if use_se:
conv1 = self._se_block(
input=conv1, num_out_filter=num_mid_filter, name=name + '_se')
@@ -282,7 +359,7 @@ class MobileNetV3():
conv = self._conv_bn_layer(
input,
filter_size=3,
- num_filters=inplanes if scale <= 1.0 else int(inplanes * scale),
+ num_filters=self.make_divisible(inplanes * scale),
stride=2,
padding=1,
num_groups=1,
@@ -290,6 +367,7 @@ class MobileNetV3():
act='hard_swish',
name='conv1')
i = 0
+ inplanes = self.make_divisible(inplanes * scale)
for layer_cfg in cfg:
self.block_stride *= layer_cfg[5]
if layer_cfg[5] == 2:
@@ -297,19 +375,32 @@ class MobileNetV3():
conv = self._residual_unit(
input=conv,
num_in_filter=inplanes,
- num_mid_filter=int(scale * layer_cfg[1]),
- num_out_filter=int(scale * layer_cfg[2]),
+ num_mid_filter=self.make_divisible(scale * layer_cfg[1]),
+ num_out_filter=self.make_divisible(scale * layer_cfg[2]),
act=layer_cfg[4],
stride=layer_cfg[5],
filter_size=layer_cfg[0],
use_se=layer_cfg[3],
name='conv' + str(i + 2))
-
- inplanes = int(scale * layer_cfg[2])
+ inplanes = self.make_divisible(scale * layer_cfg[2])
i += 1
self.curr_stage = i
blocks.append(conv)
+ if self.for_seg:
+ conv = self._conv_bn_layer(
+ input=conv,
+ filter_size=1,
+ num_filters=self.make_divisible(scale * self.cls_ch_squeeze),
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ act='hard_swish',
+ name='conv_last')
+
+ return conv, self.decode_point
+
if self.num_classes:
conv = self._conv_bn_layer(
input=conv,
diff --git a/paddlex/cv/nets/resnet.py b/paddlex/cv/nets/resnet.py
index ff7a8d17ac9862f319d81ddcc5cb938918677692..779a756a4ad709bde0665a5c437d6423b31653b7 100644
--- a/paddlex/cv/nets/resnet.py
+++ b/paddlex/cv/nets/resnet.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/segmentation/__init__.py b/paddlex/cv/nets/segmentation/__init__.py
index 8c7d9674ae79a3ee6145c1c92612498ac7340faa..998fa183ea0d3f85f316a1fb1c3abe2e41009165 100644
--- a/paddlex/cv/nets/segmentation/__init__.py
+++ b/paddlex/cv/nets/segmentation/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/segmentation/deeplabv3p.py b/paddlex/cv/nets/segmentation/deeplabv3p.py
index 1100ed3a37ddc80c8bcfa7e2a44f4e452701c4b0..7d597a606a88a78513452c37357b806c4dfa156f 100644
--- a/paddlex/cv/nets/segmentation/deeplabv3p.py
+++ b/paddlex/cv/nets/segmentation/deeplabv3p.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -21,7 +21,7 @@ from collections import OrderedDict
import paddle.fluid as fluid
from .model_utils.libs import scope, name_scope
-from .model_utils.libs import bn, bn_relu, relu
+from .model_utils.libs import bn, bn_relu, relu, qsigmoid
from .model_utils.libs import conv, max_pool, deconv
from .model_utils.libs import separate_conv
from .model_utils.libs import sigmoid_to_softmax
@@ -82,7 +82,17 @@ class DeepLabv3p(object):
use_dice_loss=False,
class_weight=None,
ignore_index=255,
- fixed_input_shape=None):
+ fixed_input_shape=None,
+ pooling_stride=[1, 1],
+ pooling_crop_size=None,
+ aspp_with_se=False,
+ se_use_qsigmoid=False,
+ aspp_convs_filters=256,
+ aspp_with_concat_projection=True,
+ add_image_level_feature=True,
+ use_sum_merge=False,
+ conv_filters=256,
+ output_is_logits=False):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
@@ -117,6 +127,17 @@ class DeepLabv3p(object):
self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder
self.fixed_input_shape = fixed_input_shape
+ self.output_is_logits = output_is_logits
+ self.aspp_convs_filters = aspp_convs_filters
+ self.output_stride = output_stride
+ self.pooling_crop_size = pooling_crop_size
+ self.pooling_stride = pooling_stride
+ self.se_use_qsigmoid = se_use_qsigmoid
+ self.aspp_with_concat_projection = aspp_with_concat_projection
+ self.add_image_level_feature = add_image_level_feature
+ self.aspp_with_se = aspp_with_se
+ self.use_sum_merge = use_sum_merge
+ self.conv_filters = conv_filters
def _encoder(self, input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
@@ -129,19 +150,36 @@ class DeepLabv3p(object):
elif self.output_stride == 8:
aspp_ratios = [12, 24, 36]
else:
- raise Exception("DeepLabv3p only support stride 8 or 16")
+ aspp_ratios = []
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.06))
+
+ concat_logits = []
with scope('encoder'):
- channel = 256
+ channel = self.aspp_convs_filters
with scope("image_pool"):
- image_avg = fluid.layers.reduce_mean(
- input, [2, 3], keep_dim=True)
- image_avg = bn_relu(
+ if self.pooling_crop_size is None:
+ image_avg = fluid.layers.reduce_mean(
+ input, [2, 3], keep_dim=True)
+ else:
+ pool_w = int((self.pooling_crop_size[0] - 1.0) /
+ self.output_stride + 1.0)
+ pool_h = int((self.pooling_crop_size[1] - 1.0) /
+ self.output_stride + 1.0)
+ image_avg = fluid.layers.pool2d(
+ input,
+ pool_size=(pool_h, pool_w),
+ pool_stride=self.pooling_stride,
+ pool_type='avg',
+ pool_padding='VALID')
+
+ act = qsigmoid if self.se_use_qsigmoid else bn_relu
+
+ image_avg = act(
conv(
image_avg,
channel,
@@ -151,8 +189,10 @@ class DeepLabv3p(object):
padding=0,
param_attr=param_attr))
input_shape = fluid.layers.shape(input)
- image_avg = fluid.layers.resize_bilinear(
- image_avg, input_shape[2:], align_corners=False)
+ image_avg = fluid.layers.resize_bilinear(image_avg,
+ input_shape[2:])
+ if self.add_image_level_feature:
+ concat_logits.append(image_avg)
with scope("aspp0"):
aspp0 = bn_relu(
@@ -164,77 +204,160 @@ class DeepLabv3p(object):
groups=1,
padding=0,
param_attr=param_attr))
- with scope("aspp1"):
- if self.aspp_with_sep_conv:
- aspp1 = separate_conv(
- input,
- channel,
- 1,
- 3,
- dilation=aspp_ratios[0],
- act=relu)
- else:
- aspp1 = bn_relu(
- conv(
+ concat_logits.append(aspp0)
+
+ if aspp_ratios:
+ with scope("aspp1"):
+ if self.aspp_with_sep_conv:
+ aspp1 = separate_conv(
input,
channel,
- stride=1,
- filter_size=3,
+ 1,
+ 3,
dilation=aspp_ratios[0],
- padding=aspp_ratios[0],
- param_attr=param_attr))
- with scope("aspp2"):
- if self.aspp_with_sep_conv:
- aspp2 = separate_conv(
- input,
- channel,
- 1,
- 3,
- dilation=aspp_ratios[1],
- act=relu)
- else:
- aspp2 = bn_relu(
- conv(
+ act=relu)
+ else:
+ aspp1 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[0],
+ padding=aspp_ratios[0],
+ param_attr=param_attr))
+ concat_logits.append(aspp1)
+ with scope("aspp2"):
+ if self.aspp_with_sep_conv:
+ aspp2 = separate_conv(
input,
channel,
- stride=1,
- filter_size=3,
+ 1,
+ 3,
dilation=aspp_ratios[1],
- padding=aspp_ratios[1],
- param_attr=param_attr))
- with scope("aspp3"):
- if self.aspp_with_sep_conv:
- aspp3 = separate_conv(
- input,
- channel,
- 1,
- 3,
- dilation=aspp_ratios[2],
- act=relu)
- else:
- aspp3 = bn_relu(
- conv(
+ act=relu)
+ else:
+ aspp2 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[1],
+ padding=aspp_ratios[1],
+ param_attr=param_attr))
+ concat_logits.append(aspp2)
+ with scope("aspp3"):
+ if self.aspp_with_sep_conv:
+ aspp3 = separate_conv(
input,
channel,
- stride=1,
- filter_size=3,
+ 1,
+ 3,
dilation=aspp_ratios[2],
- padding=aspp_ratios[2],
- param_attr=param_attr))
+ act=relu)
+ else:
+ aspp3 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[2],
+ padding=aspp_ratios[2],
+ param_attr=param_attr))
+ concat_logits.append(aspp3)
+
with scope("concat"):
- data = fluid.layers.concat(
- [image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
- data = bn_relu(
+ data = fluid.layers.concat(concat_logits, axis=1)
+ if self.aspp_with_concat_projection:
+ data = bn_relu(
+ conv(
+ data,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ data = fluid.layers.dropout(data, 0.9)
+ if self.aspp_with_se:
+ data = data * image_avg
+ return data
+
+ def _decoder_with_sum_merge(self, encode_data, decode_shortcut,
+ param_attr):
+ decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
+ encode_data = fluid.layers.resize_bilinear(encode_data,
+ decode_shortcut_shape[2:])
+
+ encode_data = conv(
+ encode_data,
+ self.conv_filters,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr)
+
+ with scope('merge'):
+ decode_shortcut = conv(
+ decode_shortcut,
+ self.conv_filters,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr)
+
+ return encode_data + decode_shortcut
+
+ def _decoder_with_concat(self, encode_data, decode_shortcut, param_attr):
+ with scope('concat'):
+ decode_shortcut = bn_relu(
+ conv(
+ decode_shortcut,
+ 48,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+
+ decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
+ encode_data = fluid.layers.resize_bilinear(
+ encode_data, decode_shortcut_shape[2:])
+ encode_data = fluid.layers.concat(
+ [encode_data, decode_shortcut], axis=1)
+ if self.decoder_use_sep_conv:
+ with scope("separable_conv1"):
+ encode_data = separate_conv(
+ encode_data, self.conv_filters, 1, 3, dilation=1, act=relu)
+ with scope("separable_conv2"):
+ encode_data = separate_conv(
+ encode_data, self.conv_filters, 1, 3, dilation=1, act=relu)
+ else:
+ with scope("decoder_conv1"):
+ encode_data = bn_relu(
conv(
- data,
- channel,
- 1,
- 1,
- groups=1,
- padding=0,
+ encode_data,
+ self.conv_filters,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
param_attr=param_attr))
- data = fluid.layers.dropout(data, 0.9)
- return data
+ with scope("decoder_conv2"):
+ encode_data = bn_relu(
+ conv(
+ encode_data,
+ self.conv_filters,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
+ param_attr=param_attr))
+ return encode_data
def _decoder(self, encode_data, decode_shortcut):
# 解码器配置
@@ -246,54 +369,14 @@ class DeepLabv3p(object):
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.06))
+
with scope('decoder'):
- with scope('concat'):
- decode_shortcut = bn_relu(
- conv(
- decode_shortcut,
- 48,
- 1,
- 1,
- groups=1,
- padding=0,
- param_attr=param_attr))
+ if self.use_sum_merge:
+ return self._decoder_with_sum_merge(
+ encode_data, decode_shortcut, param_attr)
- decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
- encode_data = fluid.layers.resize_bilinear(
- encode_data,
- decode_shortcut_shape[2:],
- align_corners=False)
- encode_data = fluid.layers.concat(
- [encode_data, decode_shortcut], axis=1)
- if self.decoder_use_sep_conv:
- with scope("separable_conv1"):
- encode_data = separate_conv(
- encode_data, 256, 1, 3, dilation=1, act=relu)
- with scope("separable_conv2"):
- encode_data = separate_conv(
- encode_data, 256, 1, 3, dilation=1, act=relu)
- else:
- with scope("decoder_conv1"):
- encode_data = bn_relu(
- conv(
- encode_data,
- 256,
- stride=1,
- filter_size=3,
- dilation=1,
- padding=1,
- param_attr=param_attr))
- with scope("decoder_conv2"):
- encode_data = bn_relu(
- conv(
- encode_data,
- 256,
- stride=1,
- filter_size=3,
- dilation=1,
- padding=1,
- param_attr=param_attr))
- return encode_data
+ return self._decoder_with_concat(encode_data, decode_shortcut,
+ param_attr)
def _get_loss(self, logit, label, mask):
avg_loss = 0
@@ -337,8 +420,11 @@ class DeepLabv3p(object):
self.num_classes = 1
image = inputs['image']
- data, decode_shortcuts = self.backbone(image)
- decode_shortcut = decode_shortcuts[self.backbone.decode_points]
+ if 'MobileNetV3' in self.backbone.__class__.__name__:
+ data, decode_shortcut = self.backbone(image)
+ else:
+ data, decode_shortcuts = self.backbone(image)
+ decode_shortcut = decode_shortcuts[self.backbone.decode_points]
# 编码器解码器设置
if self.encoder_with_aspp:
@@ -353,19 +439,22 @@ class DeepLabv3p(object):
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.01))
- with scope('logit'):
- with fluid.name_scope('last_conv'):
- logit = conv(
- data,
- self.num_classes,
- 1,
- stride=1,
- padding=0,
- bias_attr=True,
- param_attr=param_attr)
- image_shape = fluid.layers.shape(image)
- logit = fluid.layers.resize_bilinear(
- logit, image_shape[2:], align_corners=False)
+ if not self.output_is_logits:
+ with scope('logit'):
+ with fluid.name_scope('last_conv'):
+ logit = conv(
+ data,
+ self.num_classes,
+ 1,
+ stride=1,
+ padding=0,
+ bias_attr=True,
+ param_attr=param_attr)
+ else:
+ logit = data
+
+ image_shape = fluid.layers.shape(image)
+ logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
diff --git a/paddlex/cv/nets/segmentation/fast_scnn.py b/paddlex/cv/nets/segmentation/fast_scnn.py
index 71866e56df9adf31c45d841a7bcde3a062c3067a..8e86f4bffa275c3d7660d3d2f7b01151c2785c41 100644
--- a/paddlex/cv/nets/segmentation/fast_scnn.py
+++ b/paddlex/cv/nets/segmentation/fast_scnn.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/segmentation/hrnet.py b/paddlex/cv/nets/segmentation/hrnet.py
index 5193b3f1cb01efef0fec7c82ef5a24c0fa551a35..b74c044951f62a0dcc70fbc9964f42f781f4d573 100644
--- a/paddlex/cv/nets/segmentation/hrnet.py
+++ b/paddlex/cv/nets/segmentation/hrnet.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -77,12 +77,9 @@ class HRNet(object):
st4 = self.backbone(image)
# upsample
shape = fluid.layers.shape(st4[0])[-2:]
- st4[1] = fluid.layers.resize_bilinear(
- st4[1], out_shape=shape, align_corners=False, align_mode=1)
- st4[2] = fluid.layers.resize_bilinear(
- st4[2], out_shape=shape, align_corners=False, align_mode=1)
- st4[3] = fluid.layers.resize_bilinear(
- st4[3], out_shape=shape, align_corners=False, align_mode=1)
+ st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=shape)
+ st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=shape)
+ st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
out = fluid.layers.concat(st4, axis=1)
last_channels = sum(self.backbone.channels[str(self.backbone.width)][
@@ -107,8 +104,7 @@ class HRNet(object):
bias_attr=False)
input_shape = fluid.layers.shape(image)[-2:]
- logit = fluid.layers.resize_bilinear(
- out, input_shape, align_corners=False, align_mode=1)
+ logit = fluid.layers.resize_bilinear(out, input_shape)
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
diff --git a/paddlex/cv/nets/segmentation/model_utils/__init__.py b/paddlex/cv/nets/segmentation/model_utils/__init__.py
index 87ab6f19957bbcd460056c5def700b0c7e14424f..6e872dbfb0ae09c1896cd36cde15e8ceaf387200 100644
--- a/paddlex/cv/nets/segmentation/model_utils/__init__.py
+++ b/paddlex/cv/nets/segmentation/model_utils/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/nets/segmentation/model_utils/libs.py b/paddlex/cv/nets/segmentation/model_utils/libs.py
index 01fdad2cec6ce4b13cea2b7c957fb648edb4aeb2..68ddd35beff56697135d4a8b3ffb1862426ca07d 100644
--- a/paddlex/cv/nets/segmentation/model_utils/libs.py
+++ b/paddlex/cv/nets/segmentation/model_utils/libs.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -112,6 +112,10 @@ def bn_relu(data, norm_type='bn', eps=1e-5):
return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
+def qsigmoid(data):
+ return fluid.layers.relu6(data + 3) * 0.16667
+
+
def relu(data):
return fluid.layers.relu(data)
@@ -148,7 +152,8 @@ def separate_conv(input,
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
- initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+ initializer=fluid.initializer.TruncatedNormal(
+ loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
@@ -166,7 +171,8 @@ def separate_conv(input,
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
- initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ initializer=fluid.initializer.TruncatedNormal(
+ loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
diff --git a/paddlex/cv/nets/segmentation/model_utils/loss.py b/paddlex/cv/nets/segmentation/model_utils/loss.py
index 60c21bd2fc159cf049dc46c0f43130481b80d896..4b93c4a7dbef876235c6a766af58be529cf56ed4 100644
--- a/paddlex/cv/nets/segmentation/model_utils/loss.py
+++ b/paddlex/cv/nets/segmentation/model_utils/loss.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -24,8 +24,9 @@ def softmax_with_loss(logit,
ignore_index=255):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
- label, fluid.layers.assign(
- np.array([num_classes - 1], dtype=np.int32)))
+ label,
+ fluid.layers.assign(np.array(
+ [num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
@@ -60,8 +61,8 @@ def softmax_with_loss(logit,
'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes])
- weighted_label_one_hot = fluid.layers.elementwise_mul(
- label_one_hot, weight)
+ weighted_label_one_hot = fluid.layers.elementwise_mul(label_one_hot,
+ weight)
probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy(
probs,
diff --git a/paddlex/cv/nets/segmentation/unet.py b/paddlex/cv/nets/segmentation/unet.py
index 3f11eaae3132a3b83025646786a4ce348728ac76..a18f9c00c071d93c4cd4c004685a1c7472bed1a8 100644
--- a/paddlex/cv/nets/segmentation/unet.py
+++ b/paddlex/cv/nets/segmentation/unet.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -144,8 +144,7 @@ class UNet(object):
with scope("up"):
if self.upsample_mode == 'bilinear':
short_cut_shape = fluid.layers.shape(short_cut)
- data = fluid.layers.resize_bilinear(
- data, short_cut_shape[2:], align_corners=False)
+ data = fluid.layers.resize_bilinear(data, short_cut_shape[2:])
else:
data = deconv(
data,
diff --git a/paddlex/cv/nets/shufflenet_v2.py b/paddlex/cv/nets/shufflenet_v2.py
index 23045ee0d7279011ad93160e778dfd88862b9953..84254e37c4e24ede3745ecc8af17836f1676a43f 100644
--- a/paddlex/cv/nets/shufflenet_v2.py
+++ b/paddlex/cv/nets/shufflenet_v2.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -96,11 +96,12 @@ class ShuffleNetV2():
pool_stride=1,
pool_padding=0,
pool_type='avg')
- output = fluid.layers.fc(
- input=output,
- size=self.num_classes,
- param_attr=ParamAttr(initializer=MSRA(), name='fc6_weights'),
- bias_attr=ParamAttr(name='fc6_offset'))
+ output = fluid.layers.fc(input=output,
+ size=self.num_classes,
+ param_attr=ParamAttr(
+ initializer=MSRA(),
+ name='fc6_weights'),
+ bias_attr=ParamAttr(name='fc6_offset'))
return output
def conv_bn_layer(self,
@@ -122,7 +123,8 @@ class ShuffleNetV2():
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
- param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
+ param_attr=ParamAttr(
+ initializer=MSRA(), name=name + '_weights'),
bias_attr=False)
out = int((input.shape[2] - 1) / float(stride) + 1)
bn_name = name + '_bn'
diff --git a/paddlex/cv/nets/xception.py b/paddlex/cv/nets/xception.py
index a24a9304362f450981937e402894a6319ced6e33..b06ad1c3b1ad90d9b277426df6f1c86b3f6a297f 100644
--- a/paddlex/cv/nets/xception.py
+++ b/paddlex/cv/nets/xception.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -69,8 +69,7 @@ class Xception():
def __call__(
self,
- input,
- ):
+ input, ):
self.stride = 2
self.block_point = 0
self.short_cuts = dict()
@@ -140,7 +139,8 @@ class Xception():
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
- initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
+ initializer=fluid.initializer.TruncatedNormal(
+ loc=0.0, scale=0.09))
with scope("entry_flow"):
with scope("conv1"):
data = bn_relu(
@@ -178,10 +178,10 @@ class Xception():
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
- stride = strides[i] if check_stride(
- s * strides[i], output_stride) else 1
- data, short_cuts = self.xception_block(
- data, chns[i], [1, 1, stride])
+ stride = strides[i] if check_stride(s * strides[i],
+ output_stride) else 1
+ data, short_cuts = self.xception_block(data, chns[i],
+ [1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
@@ -205,8 +205,8 @@ class Xception():
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
- stride = strides[i] if check_stride(
- s * strides[i], output_stride) else 1
+ stride = strides[i] if check_stride(s * strides[i],
+ output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride
@@ -302,16 +302,15 @@ class Xception():
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope('shortcut'):
- skip = bn(
- conv(
- input,
- channels[-1],
- 1,
- strides[-1],
- groups=1,
- padding=0,
- param_attr=param_attr),
- eps=1e-3)
+ skip = bn(conv(
+ input,
+ channels[-1],
+ 1,
+ strides[-1],
+ groups=1,
+ padding=0,
+ param_attr=param_attr),
+ eps=1e-3)
else:
skip = input
return data + skip, results
@@ -329,4 +328,4 @@ def xception_41(num_classes=None):
def xception_71(num_classes=None):
model = Xception(num_classes, 71)
- return model
\ No newline at end of file
+ return model
diff --git a/paddlex/cv/transforms/__init__.py b/paddlex/cv/transforms/__init__.py
index fc8494c7fd279fce03e70993a64349be38d11cfb..445ab164546f62dbc992588a4f9252c07df617c1 100644
--- a/paddlex/cv/transforms/__init__.py
+++ b/paddlex/cv/transforms/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -91,7 +91,10 @@ def arrange_transforms(model_type, class_name, transforms, mode='train'):
elif model_type == 'segmenter':
arrange_transform = seg_transforms.ArrangeSegmenter
elif model_type == 'detector':
- arrange_name = 'Arrange{}'.format(class_name)
+ if class_name == "PPYOLO":
+ arrange_name = 'ArrangeYOLOv3'
+ else:
+ arrange_name = 'Arrange{}'.format(class_name)
arrange_transform = getattr(det_transforms, arrange_name)
else:
raise Exception("Unrecognized model type: {}".format(self.model_type))
diff --git a/paddlex/cv/transforms/box_utils.py b/paddlex/cv/transforms/box_utils.py
index 02f3c4d4c12af392ffde26e9a783d6ca9122e865..14d139f6fcbd2364301f391961b44238bf6faefe 100644
--- a/paddlex/cv/transforms/box_utils.py
+++ b/paddlex/cv/transforms/box_utils.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -170,7 +170,8 @@ def expand_segms(segms, x, y, height, width, ratio):
0).astype(mask.dtype)
expanded_mask[y:y + height, x:x + width] = mask
rle = mask_util.encode(
- np.array(expanded_mask, order='F', dtype=np.uint8))
+ np.array(
+ expanded_mask, order='F', dtype=np.uint8))
return rle
expanded_segms = []
diff --git a/paddlex/cv/transforms/cls_transforms.py b/paddlex/cv/transforms/cls_transforms.py
index 4166cd170ecf1a0f1b840a804b0d0e28615a04ba..361d9a00649502c522fbe50d3366d95570506e7f 100644
--- a/paddlex/cv/transforms/cls_transforms.py
+++ b/paddlex/cv/transforms/cls_transforms.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -46,7 +46,7 @@ class Compose(ClsTransform):
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
-
+ self.batch_transforms = None
# 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作
for op in self.transforms:
if not isinstance(op, ClsTransform):
diff --git a/paddlex/cv/transforms/det_transforms.py b/paddlex/cv/transforms/det_transforms.py
index 9154f03cf9975625041728d8656bf838ad36c434..32603bac5141c10c7ceedb59bf438b281f86ccf0 100644
--- a/paddlex/cv/transforms/det_transforms.py
+++ b/paddlex/cv/transforms/det_transforms.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -55,6 +55,7 @@ class Compose(DetTransform):
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
+ self.batch_transforms = None
self.use_mixup = False
for t in self.transforms:
if type(t).__name__ == 'MixupImage':
@@ -1385,3 +1386,187 @@ class ComposedYOLOv3Transforms(Compose):
mean=mean, std=std)
]
super(ComposedYOLOv3Transforms, self).__init__(transforms)
+
+
+class BatchRandomShape(DetTransform):
+ """调整图像大小(resize)。
+
+ 对batch数据中的每张图像全部resize到random_shapes中任意一个大小。
+ 注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。
+
+ Args:
+ random_shapes (list): resize大小选择列表。
+ 默认为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
+ interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为
+ ['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"RANDOM"。
+ Raises:
+ ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC',
+ 'AREA', 'LANCZOS4', 'RANDOM']中。
+ """
+
+ # The interpolation mode
+ interp_dict = {
+ 'NEAREST': cv2.INTER_NEAREST,
+ 'LINEAR': cv2.INTER_LINEAR,
+ 'CUBIC': cv2.INTER_CUBIC,
+ 'AREA': cv2.INTER_AREA,
+ 'LANCZOS4': cv2.INTER_LANCZOS4
+ }
+
+ def __init__(
+ self,
+ random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608],
+ interp='RANDOM'):
+ if not (interp == "RANDOM" or interp in self.interp_dict):
+ raise ValueError("interp should be one of {}".format(
+ self.interp_dict.keys()))
+ self.random_shapes = random_shapes
+ self.interp = interp
+
+ def __call__(self, batch_data):
+ """
+ Args:
+ batch_data (list): 由与图像相关的各种信息组成的batch数据。
+ Returns:
+ list: 由与图像相关的各种信息组成的batch数据。
+ """
+ shape = np.random.choice(self.random_shapes)
+
+ if self.interp == "RANDOM":
+ interp = random.choice(list(self.interp_dict.keys()))
+ else:
+ interp = self.interp
+ for data_id, data in enumerate(batch_data):
+ data_list = list(data)
+ im = data_list[0]
+ im = np.swapaxes(im, 1, 0)
+ im = np.swapaxes(im, 1, 2)
+ im = resize(im, shape, self.interp_dict[interp])
+ im = np.swapaxes(im, 1, 2)
+ im = np.swapaxes(im, 1, 0)
+ data_list[0] = im
+ batch_data[data_id] = tuple(data_list)
+ return batch_data
+
+
+class GenerateYoloTarget(object):
+ """生成YOLOv3的ground truth(真实标注框)在不同特征层的位置转换信息。
+ 该transform只在YOLOv3计算细粒度loss时使用。
+
+ Args:
+ anchors (list|tuple): anchor框的宽度和高度。
+ anchor_masks (list|tuple): 在计算损失时,使用anchor的mask索引。
+ num_classes (int): 类别数。默认为80。
+ iou_thresh (float): iou阈值,当anchor和真实标注框的iou大于该阈值时,计入target。默认为1.0。
+ """
+
+ def __init__(self,
+ anchors,
+ anchor_masks,
+ downsample_ratios,
+ num_classes=80,
+ iou_thresh=1.):
+ super(GenerateYoloTarget, self).__init__()
+ self.anchors = anchors
+ self.anchor_masks = anchor_masks
+ self.downsample_ratios = downsample_ratios
+ self.num_classes = num_classes
+ self.iou_thresh = iou_thresh
+
+ def __call__(self, batch_data):
+ """
+ Args:
+ batch_data (list): 由与图像相关的各种信息组成的batch数据。
+ Returns:
+ list: 由与图像相关的各种信息组成的batch数据。
+ 其中,每个数据新添加的字段为:
+ - target0 (np.ndarray): YOLOv3的ground truth在特征层0的位置转换信息,
+ 形状为(特征层0的anchor数量, 6+类别数, 特征层0的h, 特征层0的w)。
+ - target1 (np.ndarray): YOLOv3的ground truth在特征层1的位置转换信息,
+ 形状为(特征层1的anchor数量, 6+类别数, 特征层1的h, 特征层1的w)。
+ - ...
+ -targetn (np.ndarray): YOLOv3的ground truth在特征层n的位置转换信息,
+ 形状为(特征层n的anchor数量, 6+类别数, 特征层n的h, 特征层n的w)。
+ n的是大小由anchor_masks的长度决定。
+ """
+ im = batch_data[0][0]
+ h = im.shape[1]
+ w = im.shape[2]
+ an_hw = np.array(self.anchors) / np.array([[w, h]])
+ for data_id, data in enumerate(batch_data):
+ gt_bbox = data[1]
+ gt_class = data[2]
+ gt_score = data[3]
+ im_shape = data[4]
+ origin_h = float(im_shape[0])
+ origin_w = float(im_shape[1])
+ data_list = list(data)
+ for i, (
+ mask, downsample_ratio
+ ) in enumerate(zip(self.anchor_masks, self.downsample_ratios)):
+ grid_h = int(h / downsample_ratio)
+ grid_w = int(w / downsample_ratio)
+ target = np.zeros(
+ (len(mask), 6 + self.num_classes, grid_h, grid_w),
+ dtype=np.float32)
+ for b in range(gt_bbox.shape[0]):
+ gx = gt_bbox[b, 0] / float(origin_w)
+ gy = gt_bbox[b, 1] / float(origin_h)
+ gw = gt_bbox[b, 2] / float(origin_w)
+ gh = gt_bbox[b, 3] / float(origin_h)
+ cls = gt_class[b]
+ score = gt_score[b]
+ if gw <= 0. or gh <= 0. or score <= 0.:
+ continue
+ # find best match anchor index
+ best_iou = 0.
+ best_idx = -1
+ for an_idx in range(an_hw.shape[0]):
+ iou = jaccard_overlap(
+ [0., 0., gw, gh],
+ [0., 0., an_hw[an_idx, 0], an_hw[an_idx, 1]])
+ if iou > best_iou:
+ best_iou = iou
+ best_idx = an_idx
+ gi = int(gx * grid_w)
+ gj = int(gy * grid_h)
+ # gtbox should be regresed in this layes if best match
+ # anchor index in anchor mask of this layer
+ if best_idx in mask:
+ best_n = mask.index(best_idx)
+ # x, y, w, h, scale
+ target[best_n, 0, gj, gi] = gx * grid_w - gi
+ target[best_n, 1, gj, gi] = gy * grid_h - gj
+ target[best_n, 2, gj, gi] = np.log(
+ gw * w / self.anchors[best_idx][0])
+ target[best_n, 3, gj, gi] = np.log(
+ gh * h / self.anchors[best_idx][1])
+ target[best_n, 4, gj, gi] = 2.0 - gw * gh
+ # objectness record gt_score
+ target[best_n, 5, gj, gi] = score
+ # classification
+ target[best_n, 6 + cls, gj, gi] = 1.
+ # For non-matched anchors, calculate the target if the iou
+ # between anchor and gt is larger than iou_thresh
+ if self.iou_thresh < 1:
+ for idx, mask_i in enumerate(mask):
+ if mask_i == best_idx: continue
+ iou = jaccard_overlap(
+ [0., 0., gw, gh],
+ [0., 0., an_hw[mask_i, 0], an_hw[mask_i, 1]])
+ if iou > self.iou_thresh:
+ # x, y, w, h, scale
+ target[idx, 0, gj, gi] = gx * grid_w - gi
+ target[idx, 1, gj, gi] = gy * grid_h - gj
+ target[idx, 2, gj, gi] = np.log(
+ gw * w / self.anchors[mask_i][0])
+ target[idx, 3, gj, gi] = np.log(
+ gh * h / self.anchors[mask_i][1])
+ target[idx, 4, gj, gi] = 2.0 - gw * gh
+ # objectness record gt_score
+ target[idx, 5, gj, gi] = score
+ # classification
+ target[idx, 6 + cls, gj, gi] = 1.
+ data_list.append(target)
+ batch_data[data_id] = tuple(data_list)
+ return batch_data
diff --git a/paddlex/cv/transforms/imgaug_support.py b/paddlex/cv/transforms/imgaug_support.py
index edaaba958d7501861ae36eac3dab8900af1ddb8f..d6163c2c22c595374a7af50f046857dc83e7b47a 100644
--- a/paddlex/cv/transforms/imgaug_support.py
+++ b/paddlex/cv/transforms/imgaug_support.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/cv/transforms/ops.py b/paddlex/cv/transforms/ops.py
index dd517d4ccb7c113cfd00460e5df27125248bb602..64363f72ce56a99676a8b8aa4e4d5497a1cb8600 100644
--- a/paddlex/cv/transforms/ops.py
+++ b/paddlex/cv/transforms/ops.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -69,8 +69,8 @@ def random_crop(im,
(float(im.shape[1]) / im.shape[0]) / (w**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
- target_area = im.shape[0] * im.shape[1] * np.random.uniform(
- scale_min, scale_max)
+ target_area = im.shape[0] * im.shape[1] * np.random.uniform(scale_min,
+ scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
@@ -146,6 +146,7 @@ def brightness(im, brightness_lower, brightness_upper):
im += delta
return im
+
def rotate(im, rotate_lower, rotate_upper):
rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
im = im.rotate(int(rotate_delta))
diff --git a/paddlex/cv/transforms/seg_transforms.py b/paddlex/cv/transforms/seg_transforms.py
index 8c92c911b11fb2056817550aba1b2dcdf2c9eda0..327a7c5d0a6d382b16f317fc730883020384ebe0 100644
--- a/paddlex/cv/transforms/seg_transforms.py
+++ b/paddlex/cv/transforms/seg_transforms.py
@@ -1,5 +1,5 @@
# coding: utf8
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -49,6 +49,7 @@ class Compose(SegTransform):
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
+ self.batch_transforms = None
self.to_rgb = False
# 检查transforms里面的操作,目前支持PaddleX定义的或者是imgaug操作
for op in self.transforms:
@@ -72,8 +73,6 @@ class Compose(SegTransform):
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
"""
- if im_info is None:
- im_info = list()
if isinstance(im, np.ndarray):
if len(im.shape) != 3:
raise Exception(
@@ -85,6 +84,8 @@ class Compose(SegTransform):
except:
raise ValueError('Can\'t read The image file {}!'.format(im))
im = im.astype('float32')
+ if im_info is None:
+ im_info = [('origin_shape', im.shape[0:2])]
if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if label is not None:
diff --git a/paddlex/cv/transforms/visualize.py b/paddlex/cv/transforms/visualize.py
index 2efb0fb8f26f1f5d1ec3f2e6f3239b38f3336c12..ef2f451a1f721d16521c85d10566f3c8f8d44349 100644
--- a/paddlex/cv/transforms/visualize.py
+++ b/paddlex/cv/transforms/visualize.py
@@ -1,10 +1,10 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
-# http://www.apache.org/licenses/LICENSE-2.0
+# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@@ -48,181 +48,192 @@ def _draw_rectangle_and_cname(img, xmin, ymin, xmax, ymax, cname, color):
thickness=line_width)
return img
+
def cls_compose(im, label=None, transforms=None, vdl_writer=None, step=0):
- """
+ """
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
vdl_writer (visualdl.LogWriter): VisualDL存储器,日志信息将保存在其中。
当为None时,不对日志进行保存。默认为None。
step (int): 数据预处理的轮数,当vdl_writer不为None时有效。默认为0。
-
+
Returns:
tuple: 根据网络所需字段所组成的tuple;
字段由transforms中的最后一个数据预处理操作决定。
"""
- if isinstance(im, np.ndarray):
- if len(im.shape) != 3:
+ if isinstance(im, np.ndarray):
+ if len(im.shape) != 3:
+ raise Exception(
+ "im should be 3-dimension, but now is {}-dimensions".format(
+ len(im.shape)))
+ else:
+ try:
+ im = cv2.imread(im).astype('float32')
+ except:
+ raise TypeError('Can\'t read The image file {}!'.format(im))
+ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+ if vdl_writer is not None:
+ vdl_writer.add_image(
+ tag='0. OriginalImage/' + str(step), img=im, step=0)
+ op_id = 1
+ for op in transforms:
+ if isinstance(op, ClsTransform):
+ if vdl_writer is not None and hasattr(op, 'prob'):
+ op.prob = 1.0
+ outputs = op(im, label)
+ im = outputs[0]
+ if len(outputs) == 2:
+ label = outputs[1]
+ if isinstance(op, pdx.cv.transforms.cls_transforms.Normalize):
+ continue
+ else:
+ import imgaug.augmenters as iaa
+ if isinstance(op, iaa.Augmenter):
+ im = execute_imgaug(op, im)
+ outputs = (im, )
+ if label is not None:
+ outputs = (im, label)
+ if vdl_writer is not None:
+ tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
+ vdl_writer.add_image(tag=tag, img=im, step=0)
+ op_id += 1
+
+
+def det_compose(im,
+ im_info=None,
+ label_info=None,
+ transforms=None,
+ vdl_writer=None,
+ step=0,
+ labels=[],
+ catid2color=None):
+ def decode_image(im_file, im_info, label_info):
+ if im_info is None:
+ im_info = dict()
+ if isinstance(im_file, np.ndarray):
+ if len(im_file.shape) != 3:
raise Exception(
- "im should be 3-dimension, but now is {}-dimensions".
- format(len(im.shape)))
+ "im should be 3-dimensions, but now is {}-dimensions".
+ format(len(im_file.shape)))
+ im = im_file
else:
try:
- im = cv2.imread(im).astype('float32')
+ im = cv2.imread(im_file).astype('float32')
except:
- raise TypeError('Can\'t read The image file {}!'.format(im))
+ raise TypeError('Can\'t read The image file {}!'.format(
+ im_file))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
- if vdl_writer is not None:
- vdl_writer.add_image(tag='0. OriginalImage/' + str(step),
- img=im,
- step=0)
- op_id = 1
- for op in transforms:
- if isinstance(op, ClsTransform):
- if vdl_writer is not None and hasattr(op, 'prob'):
- op.prob = 1.0
- outputs = op(im, label)
- im = outputs[0]
- if len(outputs) == 2:
- label = outputs[1]
- if isinstance(op, pdx.cv.transforms.cls_transforms.Normalize):
- continue
+ # make default im_info with [h, w, 1]
+ im_info['im_resize_info'] = np.array(
+ [im.shape[0], im.shape[1], 1.], dtype=np.float32)
+ im_info['image_shape'] = np.array([im.shape[0],
+ im.shape[1]]).astype('int32')
+ use_mixup = False
+ for t in transforms:
+ if type(t).__name__ == 'MixupImage':
+ use_mixup = True
+ if not use_mixup:
+ if 'mixup' in im_info:
+ del im_info['mixup']
+ # decode mixup image
+ if 'mixup' in im_info:
+ im_info['mixup'] = \
+ decode_image(im_info['mixup'][0],
+ im_info['mixup'][1],
+ im_info['mixup'][2])
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+ outputs = decode_image(im, im_info, label_info)
+ im = outputs[0]
+ im_info = outputs[1]
+ if len(outputs) == 3:
+ label_info = outputs[2]
+ if vdl_writer is not None:
+ vdl_writer.add_image(
+ tag='0. OriginalImage/' + str(step), img=im, step=0)
+ op_id = 1
+ bboxes = label_info['gt_bbox']
+ transforms = [None] + transforms
+ for op in transforms:
+ if im is None:
+ return None
+ if isinstance(op, DetTransform) or op is None:
+ if vdl_writer is not None and hasattr(op, 'prob'):
+ op.prob = 1.0
+ if op is not None:
+ outputs = op(im, im_info, label_info)
else:
- import imgaug.augmenters as iaa
- if isinstance(op, iaa.Augmenter):
- im = execute_imgaug(op, im)
- outputs = (im, )
- if label is not None:
- outputs = (im, label)
+ outputs = (im, im_info, label_info)
+ im = outputs[0]
+ vdl_im = im
if vdl_writer is not None:
- tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
- vdl_writer.add_image(tag=tag,
- img=im,
- step=0)
- op_id += 1
-
-def det_compose(im, im_info=None, label_info=None, transforms=None, vdl_writer=None, step=0,
- labels=[], catid2color=None):
- def decode_image(im_file, im_info, label_info):
- if im_info is None:
- im_info = dict()
- if isinstance(im_file, np.ndarray):
- if len(im_file.shape) != 3:
- raise Exception(
- "im should be 3-dimensions, but now is {}-dimensions".
- format(len(im_file.shape)))
- im = im_file
- else:
- try:
- im = cv2.imread(im_file).astype('float32')
- except:
- raise TypeError('Can\'t read The image file {}!'.format(
- im_file))
- im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
- # make default im_info with [h, w, 1]
- im_info['im_resize_info'] = np.array(
- [im.shape[0], im.shape[1], 1.], dtype=np.float32)
- im_info['image_shape'] = np.array([im.shape[0],
- im.shape[1]]).astype('int32')
- use_mixup = False
- for t in transforms:
- if type(t).__name__ == 'MixupImage':
- use_mixup = True
- if not use_mixup:
- if 'mixup' in im_info:
- del im_info['mixup']
- # decode mixup image
- if 'mixup' in im_info:
- im_info['mixup'] = \
- decode_image(im_info['mixup'][0],
- im_info['mixup'][1],
- im_info['mixup'][2])
- if label_info is None:
- return (im, im_info)
- else:
- return (im, im_info, label_info)
-
- outputs = decode_image(im, im_info, label_info)
- im = outputs[0]
- im_info = outputs[1]
- if len(outputs) == 3:
- label_info = outputs[2]
- if vdl_writer is not None:
- vdl_writer.add_image(tag='0. OriginalImage/' + str(step),
- img=im,
- step=0)
- op_id = 1
- bboxes = label_info['gt_bbox']
- transforms = [None] + transforms
- for op in transforms:
- if im is None:
- return None
- if isinstance(op, DetTransform) or op is None:
- if vdl_writer is not None and hasattr(op, 'prob'):
- op.prob = 1.0
- if op is not None:
- outputs = op(im, im_info, label_info)
- else:
- outputs = (im, im_info, label_info)
- im = outputs[0]
- vdl_im = im
- if vdl_writer is not None:
- if isinstance(op, pdx.cv.transforms.det_transforms.ResizeByShort):
- scale = outputs[1]['im_resize_info'][2]
- bboxes = bboxes * scale
- elif isinstance(op, pdx.cv.transforms.det_transforms.Resize):
- h = outputs[1]['image_shape'][0]
- w = outputs[1]['image_shape'][1]
- target_size = op.target_size
- if isinstance(target_size, int):
- h_scale = float(target_size) / h
- w_scale = float(target_size) / w
- else:
- h_scale = float(target_size[0]) / h
- w_scale = float(target_size[1]) / w
- bboxes[:,0] = bboxes[:,0] * w_scale
- bboxes[:,1] = bboxes[:,1] * h_scale
- bboxes[:,2] = bboxes[:,2] * w_scale
- bboxes[:,3] = bboxes[:,3] * h_scale
+ if isinstance(op,
+ pdx.cv.transforms.det_transforms.ResizeByShort):
+ scale = outputs[1]['im_resize_info'][2]
+ bboxes = bboxes * scale
+ elif isinstance(op, pdx.cv.transforms.det_transforms.Resize):
+ h = outputs[1]['image_shape'][0]
+ w = outputs[1]['image_shape'][1]
+ target_size = op.target_size
+ if isinstance(target_size, int):
+ h_scale = float(target_size) / h
+ w_scale = float(target_size) / w
else:
- bboxes = outputs[2]['gt_bbox']
- if not isinstance(op, pdx.cv.transforms.det_transforms.RandomHorizontalFlip):
- for i in range(bboxes.shape[0]):
- bbox = bboxes[i]
- cname = labels[outputs[2]['gt_class'][i][0]-1]
- vdl_im = _draw_rectangle_and_cname(vdl_im,
- int(bbox[0]),
- int(bbox[1]),
- int(bbox[2]),
- int(bbox[3]),
- cname,
- catid2color[outputs[2]['gt_class'][i][0]-1])
- if isinstance(op, pdx.cv.transforms.det_transforms.Normalize):
- continue
- else:
- im = execute_imgaug(op, im)
- if label_info is not None:
- outputs = (im, im_info, label_info)
+ h_scale = float(target_size[0]) / h
+ w_scale = float(target_size[1]) / w
+ bboxes[:, 0] = bboxes[:, 0] * w_scale
+ bboxes[:, 1] = bboxes[:, 1] * h_scale
+ bboxes[:, 2] = bboxes[:, 2] * w_scale
+ bboxes[:, 3] = bboxes[:, 3] * h_scale
else:
- outputs = (im, im_info)
- vdl_im = im
- if vdl_writer is not None:
- tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
- if op is None:
- tag = str(op_id) + '. OriginalImageWithGTBox/' + str(step)
- vdl_writer.add_image(tag=tag,
- img=vdl_im,
- step=0)
- op_id += 1
-
-def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None, step=0):
+ bboxes = outputs[2]['gt_bbox']
+ if not isinstance(op, (
+ pdx.cv.transforms.det_transforms.RandomHorizontalFlip,
+ pdx.cv.transforms.det_transforms.Padding)):
+ for i in range(bboxes.shape[0]):
+ bbox = bboxes[i]
+ cname = labels[outputs[2]['gt_class'][i][0] - 1]
+ vdl_im = _draw_rectangle_and_cname(
+ vdl_im,
+ int(bbox[0]),
+ int(bbox[1]),
+ int(bbox[2]),
+ int(bbox[3]), cname,
+ catid2color[outputs[2]['gt_class'][i][0] - 1])
+ if isinstance(op, pdx.cv.transforms.det_transforms.Normalize):
+ continue
+ else:
+ im = execute_imgaug(op, im)
+ if label_info is not None:
+ outputs = (im, im_info, label_info)
+ else:
+ outputs = (im, im_info)
+ vdl_im = im
+ if vdl_writer is not None:
+ tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
+ if op is None:
+ tag = str(op_id) + '. OriginalImageWithGTBox/' + str(step)
+ vdl_writer.add_image(tag=tag, img=vdl_im, step=0)
+ op_id += 1
+
+
+def seg_compose(im,
+ im_info=None,
+ label=None,
+ transforms=None,
+ vdl_writer=None,
+ step=0):
if im_info is None:
im_info = list()
if isinstance(im, np.ndarray):
if len(im.shape) != 3:
raise Exception(
- "im should be 3-dimensions, but now is {}-dimensions".
- format(len(im.shape)))
+ "im should be 3-dimensions, but now is {}-dimensions".format(
+ len(im.shape)))
else:
try:
im = cv2.imread(im).astype('float32')
@@ -233,9 +244,8 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None,
if not isinstance(label, np.ndarray):
label = np.asarray(Image.open(label))
if vdl_writer is not None:
- vdl_writer.add_image(tag='0. OriginalImage' + '/' + str(step),
- img=im,
- step=0)
+ vdl_writer.add_image(
+ tag='0. OriginalImage' + '/' + str(step), img=im, step=0)
op_id = 1
for op in transforms:
if isinstance(op, SegTransform):
@@ -254,19 +264,18 @@ def seg_compose(im, im_info=None, label=None, transforms=None, vdl_writer=None,
else:
outputs = (im, im_info)
if vdl_writer is not None:
- tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
- vdl_writer.add_image(tag=tag,
- img=im,
- step=0)
+ tag = str(op_id) + '. ' + op.__class__.__name__ + '/' + str(step)
+ vdl_writer.add_image(tag=tag, img=im, step=0)
op_id += 1
+
def visualize(dataset, img_count=3, save_dir='vdl_output'):
'''对数据预处理/增强中间结果进行可视化。
可使用VisualDL查看中间结果:
1. VisualDL启动方式: visualdl --logdir vdl_output --port 8001
2. 浏览器打开 https://0.0.0.0:8001即可,
其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
+
Args:
dataset (paddlex.datasets): 数据集读取器。
img_count (int): 需要进行数据预处理/增强的图像数目。默认为3。
diff --git a/paddlex/deploy.py b/paddlex/deploy.py
index c5d114a230f83241df743166ad51bb04ad71f499..ced22aee21e787c3ecf3e6b9e7d51b348ed27077 100644
--- a/paddlex/deploy.py
+++ b/paddlex/deploy.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -19,7 +19,9 @@ import yaml
import paddlex
import paddle.fluid as fluid
from paddlex.cv.transforms import build_transforms
-from paddlex.cv.models import BaseClassifier, YOLOv3, FasterRCNN, MaskRCNN, DeepLabv3p
+from paddlex.cv.models import BaseClassifier
+from paddlex.cv.models import PPYOLO, FasterRCNN, MaskRCNN
+from paddlex.cv.models import DeepLabv3p
class Predictor:
@@ -28,6 +30,7 @@ class Predictor:
use_gpu=True,
gpu_id=0,
use_mkl=False,
+ mkl_thread_num=4,
use_trt=False,
use_glog=False,
memory_optimize=True):
@@ -38,6 +41,7 @@ class Predictor:
use_gpu: 是否使用gpu,默认True
gpu_id: 使用gpu的id,默认0
use_mkl: 是否使用mkldnn计算库,CPU情况下使用,默认False
+ mkl_thread_num: mkldnn计算线程数,默认为4
use_trt: 是否使用TensorRT,默认False
use_glog: 是否启用glog日志, 默认False
memory_optimize: 是否启动内存优化,默认True
@@ -72,13 +76,15 @@ class Predictor:
to_rgb = False
self.transforms = build_transforms(self.model_type,
self.info['Transforms'], to_rgb)
- self.predictor = self.create_predictor(
- use_gpu, gpu_id, use_mkl, use_trt, use_glog, memory_optimize)
+ self.predictor = self.create_predictor(use_gpu, gpu_id, use_mkl,
+ mkl_thread_num, use_trt,
+ use_glog, memory_optimize)
def create_predictor(self,
use_gpu=True,
gpu_id=0,
use_mkl=False,
+ mkl_thread_num=4,
use_trt=False,
use_glog=False,
memory_optimize=True):
@@ -93,6 +99,7 @@ class Predictor:
config.disable_gpu()
if use_mkl:
config.enable_mkldnn()
+ config.set_cpu_math_library_num_threads(mkl_thread_num)
if use_glog:
config.enable_glog_info()
else:
@@ -124,8 +131,8 @@ class Predictor:
thread_num=thread_num)
res['image'] = im
elif self.model_type == "detector":
- if self.model_name == "YOLOv3":
- im, im_size = YOLOv3._preprocess(
+ if self.model_name in ["PPYOLO", "YOLOv3"]:
+ im, im_size = PPYOLO._preprocess(
image,
self.transforms,
self.model_type,
@@ -185,8 +192,8 @@ class Predictor:
res = {'bbox': (results[0][0], offset_to_lengths(results[0][1])), }
res['im_id'] = (np.array(
[[i] for i in range(batch_size)]).astype('int32'), [[]])
- if self.model_name == "YOLOv3":
- preds = YOLOv3._postprocess(res, batch_size, self.num_classes,
+ if self.model_name in ["PPYOLO", "YOLOv3"]:
+ preds = PPYOLO._postprocess(res, batch_size, self.num_classes,
self.labels)
elif self.model_name == "FasterRCNN":
preds = FasterRCNN._postprocess(res, batch_size,
diff --git a/paddlex/det.py b/paddlex/det.py
index ee56a934c23e7d329499f527d2ba44ea55fc573f..4f38068c4b1950450a39f3949adac8021c61da80 100644
--- a/paddlex/det.py
+++ b/paddlex/det.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -17,6 +17,7 @@ from . import cv
FasterRCNN = cv.models.FasterRCNN
YOLOv3 = cv.models.YOLOv3
+PPYOLO = cv.models.PPYOLO
MaskRCNN = cv.models.MaskRCNN
transforms = cv.transforms.det_transforms
visualize = cv.models.utils.visualize.visualize_detection
diff --git a/paddlex/interpret/__init__.py b/paddlex/interpret/__init__.py
index 55c92c92a32c3fa6e34497e2d70589f63b180956..576329dc831ea2ed7a4a7e62aa37032fba72ae03 100644
--- a/paddlex/interpret/__init__.py
+++ b/paddlex/interpret/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/interpret/as_data_reader/__init__.py b/paddlex/interpret/as_data_reader/__init__.py
index 1d11e265597c7c8e39098a228108da3bb954b892..569da2ac4e130501487482ddfc63568c369d1ddf 100644
--- a/paddlex/interpret/as_data_reader/__init__.py
+++ b/paddlex/interpret/as_data_reader/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/interpret/as_data_reader/data_path_utils.py b/paddlex/interpret/as_data_reader/data_path_utils.py
index 1c915050bed935c4e7f6ea34be6a231f7c05f44c..8e934b0e3f122274ce5815739c9da2994b29f9c3 100644
--- a/paddlex/interpret/as_data_reader/data_path_utils.py
+++ b/paddlex/interpret/as_data_reader/data_path_utils.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -14,6 +14,7 @@
import os
+
def _find_classes(dir):
# Faster and available in Python 3.5 and above
classes = [d.name for d in os.scandir(dir) if d.is_dir()]
diff --git a/paddlex/interpret/as_data_reader/readers.py b/paddlex/interpret/as_data_reader/readers.py
index 4b551177334c1da6546a605f2cee00518d90c57a..5e87b0eb4384bec75a9cccdd006ec307cdc6d77d 100644
--- a/paddlex/interpret/as_data_reader/readers.py
+++ b/paddlex/interpret/as_data_reader/readers.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -138,8 +138,10 @@ class ReaderConfig(object):
...
"""
+
def __init__(self, dataset_dir, is_test):
- image_paths, labels, self.num_classes = self.get_dataset_info(dataset_dir, is_test)
+ image_paths, labels, self.num_classes = self.get_dataset_info(
+ dataset_dir, is_test)
random_per = np.random.permutation(range(len(image_paths)))
self.image_paths = image_paths[random_per]
self.labels = labels[random_per]
@@ -147,7 +149,8 @@ class ReaderConfig(object):
def get_reader(self):
def reader():
- IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp')
+ IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
+ '.tif', '.tiff', '.webp')
target_size = 256
crop_size = 224
@@ -171,7 +174,8 @@ class ReaderConfig(object):
return reader
def get_dataset_info(self, dataset_dir, is_test=False):
- IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp')
+ IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
+ '.tif', '.tiff', '.webp')
# read
if is_test:
@@ -199,7 +203,8 @@ class ReaderConfig(object):
def create_reader(list_image_path, list_label=None, is_test=False):
def reader():
- IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp')
+ IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
+ '.tif', '.tiff', '.webp')
target_size = 256
crop_size = 224
diff --git a/paddlex/interpret/core/__init__.py b/paddlex/interpret/core/__init__.py
index 1d11e265597c7c8e39098a228108da3bb954b892..569da2ac4e130501487482ddfc63568c369d1ddf 100644
--- a/paddlex/interpret/core/__init__.py
+++ b/paddlex/interpret/core/__init__.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/interpret/core/_session_preparation.py b/paddlex/interpret/core/_session_preparation.py
index 81d3b22b216f07047b6a3a4c39701a03ec96a964..0b192e00d1e7480b56a4c06730a6ed1dc23b0eed 100644
--- a/paddlex/interpret/core/_session_preparation.py
+++ b/paddlex/interpret/core/_session_preparation.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/interpret/core/interpretation.py b/paddlex/interpret/core/interpretation.py
index ca3b1cf3371f244a1ab55e6940de2cd382fd7ab3..54f57d80faac0402f15cf96da1661e0e3d295fcd 100644
--- a/paddlex/interpret/core/interpretation.py
+++ b/paddlex/interpret/core/interpretation.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/interpret/core/interpretation_algorithms.py b/paddlex/interpret/core/interpretation_algorithms.py
index 2805af601a91314a5d554511af04b53eef7b653a..49cc6d835d2ca76bb56ace3059a93d6b60f91be8 100644
--- a/paddlex/interpret/core/interpretation_algorithms.py
+++ b/paddlex/interpret/core/interpretation_algorithms.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/interpret/core/normlime_base.py b/paddlex/interpret/core/normlime_base.py
index 8270099b17c858688903354bffcfa412ed8c804c..1aaafd5b981314b62931a9168b4062d05cd5ffdb 100644
--- a/paddlex/interpret/core/normlime_base.py
+++ b/paddlex/interpret/core/normlime_base.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/interpret/interpretation_predict.py b/paddlex/interpret/interpretation_predict.py
index 2ebe9a87d7fc80a6379331b7e4d0ef7c2da304bb..b06bd099c893cc4802075ad159e81a66e08863e9 100644
--- a/paddlex/interpret/interpretation_predict.py
+++ b/paddlex/interpret/interpretation_predict.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/interpret/visualize.py b/paddlex/interpret/visualize.py
index 2d7c096175ce0ff7f10c33696cac42a9f1a64e99..63a0e00bddca37b4208388f9dbb4cabc63811061 100644
--- a/paddlex/interpret/visualize.py
+++ b/paddlex/interpret/visualize.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/seg.py b/paddlex/seg.py
index fdfdffd4639c6b3ddb75ac20ca0b3ecf4edd2328..a6cc4a9823cdecd9725e53fe03854167d61f8368 100644
--- a/paddlex/seg.py
+++ b/paddlex/seg.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/slim.py b/paddlex/slim.py
index 407119dc624b9d74807cb9215e00eb3144b7093f..2b307a8315d0429adcedc934a5f543f950785ff3 100644
--- a/paddlex/slim.py
+++ b/paddlex/slim.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
diff --git a/paddlex/tools/__init__.py b/paddlex/tools/__init__.py
index 364bb470ccbf7832ca0a72400bc21359fafcd398..ceddcdace25e31a7c26b4bc4a417ca067367b8d9 100644
--- a/paddlex/tools/__init__.py
+++ b/paddlex/tools/__init__.py
@@ -14,4 +14,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-from .convert import *
\ No newline at end of file
+from .convert import *
+from .split import *
diff --git a/paddlex/tools/dataset_split/__init__.py b/paddlex/tools/dataset_split/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/paddlex/tools/dataset_split/coco_split.py b/paddlex/tools/dataset_split/coco_split.py
new file mode 100644
index 0000000000000000000000000000000000000000..dbedf9c86d3f789593fb571c9e15508c9c8b8f09
--- /dev/null
+++ b/paddlex/tools/dataset_split/coco_split.py
@@ -0,0 +1,64 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path as osp
+import random
+import json
+from pycocotools.coco import COCO
+from .utils import MyEncoder
+import paddlex.utils.logging as logging
+
+
+def split_coco_dataset(dataset_dir, val_percent, test_percent, save_dir):
+ if not osp.exists(osp.join(dataset_dir, "annotations.json")):
+ logging.error("\'annotations.json\' is not found in {}!".format(
+ dataset_dir))
+
+ annotation_file = osp.join(dataset_dir, "annotations.json")
+ coco = COCO(annotation_file)
+ img_ids = coco.getImgIds()
+ cat_ids = coco.getCatIds()
+ anno_ids = coco.getAnnIds()
+
+ val_num = int(len(img_ids) * val_percent)
+ test_num = int(len(img_ids) * test_percent)
+ train_num = len(img_ids) - val_num - test_num
+
+ random.shuffle(img_ids)
+ train_files_ids = img_ids[:train_num]
+ val_files_ids = img_ids[train_num:train_num + val_num]
+ test_files_ids = img_ids[train_num + val_num:]
+
+ for img_id_list in [train_files_ids, val_files_ids, test_files_ids]:
+ img_anno_ids = coco.getAnnIds(imgIds=img_id_list, iscrowd=0)
+ imgs = coco.loadImgs(img_id_list)
+ instances = coco.loadAnns(img_anno_ids)
+ categories = coco.loadCats(cat_ids)
+ img_dict = {
+ "annotations": instances,
+ "images": imgs,
+ "categories": categories
+ }
+
+ if img_id_list == train_files_ids:
+ json_file = open(osp.join(save_dir, 'train.json'), 'w+')
+ json.dump(img_dict, json_file, cls=MyEncoder)
+ elif img_id_list == val_files_ids:
+ json_file = open(osp.join(save_dir, 'val.json'), 'w+')
+ json.dump(img_dict, json_file, cls=MyEncoder)
+ elif img_id_list == test_files_ids and len(test_files_ids):
+ json_file = open(osp.join(save_dir, 'test.json'), 'w+')
+ json.dump(img_dict, json_file, cls=MyEncoder)
+
+ return train_num, val_num, test_num
diff --git a/paddlex/tools/dataset_split/imagenet_split.py b/paddlex/tools/dataset_split/imagenet_split.py
new file mode 100644
index 0000000000000000000000000000000000000000..06bcdd37f8db8b88c49a45ada45a09d28d136bff
--- /dev/null
+++ b/paddlex/tools/dataset_split/imagenet_split.py
@@ -0,0 +1,75 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path as osp
+import random
+from .utils import list_files, is_pic
+import paddlex.utils.logging as logging
+
+
+def split_imagenet_dataset(dataset_dir, val_percent, test_percent, save_dir):
+ all_files = list_files(dataset_dir)
+ label_list = list()
+ train_image_anno_list = list()
+ val_image_anno_list = list()
+ test_image_anno_list = list()
+ for file in all_files:
+ if not is_pic(file):
+ continue
+ label, image_name = osp.split(file)
+ if label not in label_list:
+ label_list.append(label)
+ label_list = sorted(label_list)
+
+ for i in range(len(label_list)):
+ image_list = list_files(osp.join(dataset_dir, label_list[i]))
+ image_anno_list = list()
+ for img in image_list:
+ image_anno_list.append([osp.join(label_list[i], img), i])
+ random.shuffle(image_anno_list)
+ image_num = len(image_anno_list)
+ val_num = int(image_num * val_percent)
+ test_num = int(image_num * test_percent)
+ train_num = image_num - val_num - test_num
+
+ train_image_anno_list += image_anno_list[:train_num]
+ val_image_anno_list += image_anno_list[train_num:train_num + val_num]
+ test_image_anno_list += image_anno_list[train_num + val_num:]
+
+ with open(
+ osp.join(save_dir, 'train_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in train_image_anno_list:
+ file, label = x
+ f.write('{} {}\n'.format(file, label))
+ with open(
+ osp.join(save_dir, 'val_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in val_image_anno_list:
+ file, label = x
+ f.write('{} {}\n'.format(file, label))
+ if len(test_image_anno_list):
+ with open(
+ osp.join(save_dir, 'test_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in test_image_anno_list:
+ file, label = x
+ f.write('{} {}\n'.format(file, label))
+ with open(
+ osp.join(save_dir, 'labels.txt'), mode='w', encoding='utf-8') as f:
+ for l in sorted(label_list):
+ f.write('{}\n'.format(l))
+
+ return len(train_image_anno_list), len(val_image_anno_list), len(
+ test_image_anno_list)
diff --git a/paddlex/tools/dataset_split/seg_split.py b/paddlex/tools/dataset_split/seg_split.py
new file mode 100644
index 0000000000000000000000000000000000000000..b16a5123a6acc5697217727b1da652ef672dc5d3
--- /dev/null
+++ b/paddlex/tools/dataset_split/seg_split.py
@@ -0,0 +1,96 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path as osp
+import random
+from .utils import list_files, is_pic, replace_ext, read_seg_ann
+import paddlex.utils.logging as logging
+
+
+def split_seg_dataset(dataset_dir, val_percent, test_percent, save_dir):
+ if not osp.exists(osp.join(dataset_dir, "JPEGImages")):
+ logging.error("\'JPEGImages\' is not found in {}!".format(dataset_dir))
+ if not osp.exists(osp.join(dataset_dir, "Annotations")):
+ logging.error("\'Annotations\' is not found in {}!".format(
+ dataset_dir))
+
+ all_image_files = list_files(osp.join(dataset_dir, "JPEGImages"))
+
+ image_anno_list = list()
+ label_list = list()
+ for image_file in all_image_files:
+ if not is_pic(image_file):
+ continue
+ anno_name = replace_ext(image_file, "png")
+ if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
+ image_anno_list.append([image_file, anno_name])
+ else:
+ anno_name = replace_ext(image_file, "PNG")
+ if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
+ image_anno_list.append([image_file, anno_name])
+ else:
+ logging.error("The annotation file {} doesn't exist!".format(
+ anno_name))
+
+ if not osp.exists(osp.join(dataset_dir, "labels.txt")):
+ for image_anno in image_anno_list:
+ labels = read_seg_ann(
+ osp.join(dataset_dir, "Annotations", anno_name))
+ for i in labels:
+ if i not in label_list:
+ label_list.append(i)
+ # 如果类标签的最大值大于类别数,添加对应缺失的标签
+ if len(label_list) != max(label_list) + 1:
+ label_list = [i for i in range(max(label_list) + 1)]
+
+ random.shuffle(image_anno_list)
+ image_num = len(image_anno_list)
+ val_num = int(image_num * val_percent)
+ test_num = int(image_num * test_percent)
+ train_num = image_num - val_num - test_num
+
+ train_image_anno_list = image_anno_list[:train_num]
+ val_image_anno_list = image_anno_list[train_num:train_num + val_num]
+ test_image_anno_list = image_anno_list[train_num + val_num:]
+
+ with open(
+ osp.join(save_dir, 'train_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in train_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ with open(
+ osp.join(save_dir, 'val_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in val_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ if len(test_image_anno_list):
+ with open(
+ osp.join(save_dir, 'test_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in test_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ if len(label_list):
+ with open(
+ osp.join(save_dir, 'labels.txt'), mode='w',
+ encoding='utf-8') as f:
+ for l in sorted(label_list):
+ f.write('{}\n'.format(l))
+
+ return train_num, val_num, test_num
diff --git a/paddlex/tools/dataset_split/utils.py b/paddlex/tools/dataset_split/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..30b4b928b1cc9e3a4176f3606cb64cd9a5348118
--- /dev/null
+++ b/paddlex/tools/dataset_split/utils.py
@@ -0,0 +1,102 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import os.path as osp
+from PIL import Image
+import numpy as np
+import json
+
+
+class MyEncoder(json.JSONEncoder):
+ # 调整json文件存储形式
+ def default(self, obj):
+ if isinstance(obj, np.integer):
+ return int(obj)
+ elif isinstance(obj, np.floating):
+ return float(obj)
+ elif isinstance(obj, np.ndarray):
+ return obj.tolist()
+ else:
+ return super(MyEncoder, self).default(obj)
+
+
+def list_files(dirname):
+ """ 列出目录下所有文件(包括所属的一级子目录下文件)
+
+ Args:
+ dirname: 目录路径
+ """
+
+ def filter_file(f):
+ if f.startswith('.'):
+ return True
+ return False
+
+ all_files = list()
+ dirs = list()
+ for f in os.listdir(dirname):
+ if filter_file(f):
+ continue
+ if osp.isdir(osp.join(dirname, f)):
+ dirs.append(f)
+ else:
+ all_files.append(f)
+ for d in dirs:
+ for f in os.listdir(osp.join(dirname, d)):
+ if filter_file(f):
+ continue
+ if osp.isdir(osp.join(dirname, d, f)):
+ continue
+ all_files.append(osp.join(d, f))
+ return all_files
+
+
+def is_pic(filename):
+ """ 判断文件是否为图片格式
+
+ Args:
+ filename: 文件路径
+ """
+ suffixes = {'JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png'}
+ suffix = filename.strip().split('.')[-1]
+ if suffix not in suffixes:
+ return False
+ return True
+
+
+def replace_ext(filename, new_ext):
+ """ 替换文件后缀
+
+ Args:
+ filename: 文件路径
+ new_ext: 需要替换的新的后缀
+ """
+ items = filename.split(".")
+ items[-1] = new_ext
+ new_filename = ".".join(items)
+ return new_filename
+
+
+def read_seg_ann(pngfile):
+ """ 解析语义分割的标注png图片
+
+ Args:
+ pngfile: 包含标注信息的png图片路径
+ """
+ grt = np.asarray(Image.open(pngfile))
+ labels = list(np.unique(grt))
+ if 255 in labels:
+ labels.remove(255)
+ return labels
diff --git a/paddlex/tools/dataset_split/voc_split.py b/paddlex/tools/dataset_split/voc_split.py
new file mode 100644
index 0000000000000000000000000000000000000000..588f9e62e4688b12315f6afb6815009df9838fa5
--- /dev/null
+++ b/paddlex/tools/dataset_split/voc_split.py
@@ -0,0 +1,91 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path as osp
+import random
+import xml.etree.ElementTree as ET
+from .utils import list_files, is_pic, replace_ext
+import paddlex.utils.logging as logging
+
+
+def split_voc_dataset(dataset_dir, val_percent, test_percent, save_dir):
+ if not osp.exists(osp.join(dataset_dir, "JPEGImages")):
+ logging.error("\'JPEGImages\' is not found in {}!".format(dataset_dir))
+ if not osp.exists(osp.join(dataset_dir, "Annotations")):
+ logging.error("\'Annotations\' is not found in {}!".format(
+ dataset_dir))
+
+ all_image_files = list_files(osp.join(dataset_dir, "JPEGImages"))
+
+ image_anno_list = list()
+ label_list = list()
+ for image_file in all_image_files:
+ if not is_pic(image_file):
+ continue
+ anno_name = replace_ext(image_file, "xml")
+ if osp.exists(osp.join(dataset_dir, "Annotations", anno_name)):
+ image_anno_list.append([image_file, anno_name])
+ try:
+ tree = ET.parse(
+ osp.join(dataset_dir, "Annotations", anno_name))
+ except:
+ raise Exception("文件{}不是一个良构的xml文件,请检查标注文件".format(
+ osp.join(dataset_dir, "Annotations", anno_name)))
+ objs = tree.findall("object")
+ for i, obj in enumerate(objs):
+ cname = obj.find('name').text
+ if not cname in label_list:
+ label_list.append(cname)
+ else:
+ logging.error("The annotation file {} doesn't exist!".format(
+ anno_name))
+
+ random.shuffle(image_anno_list)
+ image_num = len(image_anno_list)
+ val_num = int(image_num * val_percent)
+ test_num = int(image_num * test_percent)
+ train_num = image_num - val_num - test_num
+
+ train_image_anno_list = image_anno_list[:train_num]
+ val_image_anno_list = image_anno_list[train_num:train_num + val_num]
+ test_image_anno_list = image_anno_list[train_num + val_num:]
+
+ with open(
+ osp.join(save_dir, 'train_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in train_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ with open(
+ osp.join(save_dir, 'val_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in val_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ if len(test_image_anno_list):
+ with open(
+ osp.join(save_dir, 'test_list.txt'), mode='w',
+ encoding='utf-8') as f:
+ for x in test_image_anno_list:
+ file = osp.join("JPEGImages", x[0])
+ label = osp.join("Annotations", x[1])
+ f.write('{} {}\n'.format(file, label))
+ with open(
+ osp.join(save_dir, 'labels.txt'), mode='w', encoding='utf-8') as f:
+ for l in sorted(label_list):
+ f.write('{}\n'.format(l))
+
+ return train_num, val_num, test_num
diff --git a/paddlex/tools/split.py b/paddlex/tools/split.py
new file mode 100644
index 0000000000000000000000000000000000000000..23394f026cfdb39e4a6ac25e7cd5cf8a8f379462
--- /dev/null
+++ b/paddlex/tools/split.py
@@ -0,0 +1,41 @@
+#!/usr/bin/env python
+# coding: utf-8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .dataset_split.coco_split import split_coco_dataset
+from .dataset_split.voc_split import split_voc_dataset
+from .dataset_split.imagenet_split import split_imagenet_dataset
+from .dataset_split.seg_split import split_seg_dataset
+
+
+def dataset_split(dataset_dir, dataset_format, val_value, test_value,
+ save_dir):
+ if dataset_format == "coco":
+ train_num, val_num, test_num = split_coco_dataset(
+ dataset_dir, val_value, test_value, save_dir)
+ elif dataset_format == "voc":
+ train_num, val_num, test_num = split_voc_dataset(
+ dataset_dir, val_value, test_value, save_dir)
+ elif dataset_format == "seg":
+ train_num, val_num, test_num = split_seg_dataset(
+ dataset_dir, val_value, test_value, save_dir)
+ elif dataset_format == "imagenet":
+ train_num, val_num, test_num = split_imagenet_dataset(
+ dataset_dir, val_value, test_value, save_dir)
+ print("Dataset Split Done.")
+ print("Train samples: {}".format(train_num))
+ print("Eval samples: {}".format(val_num))
+ print("Test samples: {}".format(test_num))
+ print("Split files saved in {}".format(save_dir))
diff --git a/paddlex/tools/x2coco.py b/paddlex/tools/x2coco.py
index b99d5d8226551d972a811e61819f319a36b359a9..4d16c72c4de4755fd27a2c8b12236b70b64a829a 100644
--- a/paddlex/tools/x2coco.py
+++ b/paddlex/tools/x2coco.py
@@ -147,7 +147,7 @@ class LabelMe2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(osp.join(image_dir, img_file))
+ os.remove(osp.join(img_dir, img_file))
continue
image_id = image_id + 1
with open(json_file, mode='r', \
@@ -220,7 +220,7 @@ class EasyData2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(osp.join(image_dir, img_file))
+ os.remove(osp.join(img_dir, img_file))
continue
image_id = image_id + 1
with open(json_file, mode='r', \
@@ -317,7 +317,7 @@ class JingLing2COCO(X2COCO):
img_name_part = osp.splitext(img_file)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(osp.join(image_dir, img_file))
+ os.remove(osp.join(img_dir, img_file))
continue
image_id = image_id + 1
with open(json_file, mode='r', \
diff --git a/paddlex/tools/x2seg.py b/paddlex/tools/x2seg.py
index 1935a49c375ffc09de122401b5176bba281c9ba3..8a6c25bd0d85ae9d93c48b19d4ca154c5bdab029 100644
--- a/paddlex/tools/x2seg.py
+++ b/paddlex/tools/x2seg.py
@@ -23,6 +23,7 @@ import shutil
import numpy as np
import PIL.Image
from .base import MyEncoder, is_pic, get_encoding
+import math
class X2Seg(object):
def __init__(self):
@@ -140,7 +141,7 @@ class JingLing2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(os.remove(osp.join(image_dir, img_name)))
+ os.remove(osp.join(image_dir, img_name))
continue
with open(json_file, mode="r", \
encoding=get_encoding(json_file)) as j:
@@ -226,7 +227,7 @@ class LabelMe2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(os.remove(osp.join(image_dir, img_name)))
+ os.remove(osp.join(image_dir, img_name))
continue
img_file = osp.join(image_dir, img_name)
img = np.asarray(PIL.Image.open(img_file))
@@ -260,7 +261,7 @@ class EasyData2Seg(X2Seg):
img_name_part = osp.splitext(img_name)[0]
json_file = osp.join(json_dir, img_name_part + ".json")
if not osp.exists(json_file):
- os.remove(os.remove(osp.join(image_dir, img_name)))
+ os.remove(osp.join(image_dir, img_name))
continue
with open(json_file, mode="r", \
encoding=get_encoding(json_file)) as j:
diff --git a/paddlex/utils/__init__.py b/paddlex/utils/__init__.py
index 2e7d1bb3899fd42490416c391ec8f60e54493b5f..9b7e3c68a2de609892880abb37ec487c7d07a30d 100644
--- a/paddlex/utils/__init__.py
+++ b/paddlex/utils/__init__.py
@@ -1,11 +1,11 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/paddlex/utils/logging.py b/paddlex/utils/logging.py
index adfcea515273286f37921ec13999fb2234ce404f..a89abaeda9a1462db558a75834a3d29ecfd06d80 100644
--- a/paddlex/utils/logging.py
+++ b/paddlex/utils/logging.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -29,8 +29,9 @@ def log(level=2, message="", use_color=False):
current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
if paddlex.log_level >= level:
if use_color:
- print("\033[1;31;40m{} [{}]\t{}\033[0m".format(current_time, levels[
- level], message).encode("utf-8").decode("latin1"))
+ print("\033[1;31;40m{} [{}]\t{}\033[0m".format(
+ current_time, levels[level], message).encode("utf-8").decode(
+ "latin1"))
else:
print("{} [{}]\t{}".format(current_time, levels[level], message)
.encode("utf-8").decode("latin1"))
diff --git a/paddlex/utils/save.py b/paddlex/utils/save.py
index 397022d3c1e2d2110e900051a666f820de523204..228d685281df8d9db2e2f5dad78fd18c129b767c 100644
--- a/paddlex/utils/save.py
+++ b/paddlex/utils/save.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -531,8 +531,8 @@ def save_mask_inference_model(dirname,
if isinstance(target_vars, Variable):
target_vars = [target_vars]
elif export_for_deployment:
- if not (bool(target_vars)
- and all(isinstance(var, Variable) for var in target_vars)):
+ if not (bool(target_vars) and
+ all(isinstance(var, Variable) for var in target_vars)):
raise ValueError("'target_vars' should be a list of Variable.")
main_program = _get_valid_program(main_program)
diff --git a/paddlex/utils/utils.py b/paddlex/utils/utils.py
index 6af574bb3403bb47f6f41dcb1223ec43407f8e92..7b7bca86fbc17e8d030edc14b9c4f60d17d4b8a4 100644
--- a/paddlex/utils/utils.py
+++ b/paddlex/utils/utils.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -50,6 +50,7 @@ def get_environ_info():
info['num'] = fluid.core.get_cuda_device_count()
return info
+
def path_normalization(path):
win_sep = "\\"
other_sep = "/"
@@ -59,6 +60,7 @@ def path_normalization(path):
path = other_sep.join(path.split(win_sep))
return path
+
def parse_param_file(param_file, return_shape=True):
from paddle.fluid.proto.framework_pb2 import VarType
f = open(param_file, 'rb')
diff --git a/requirements.txt b/requirements.txt
index f7804c2e632fcc7cad515e42e325ba797222f81f..2e290c13f57924b752185417999a7642cf3b78b8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,3 +8,4 @@ paddleslim == 1.0.1
shapely
x2paddle
paddlepaddle-gpu
+opencv-python
diff --git a/setup.py b/setup.py
index c7dbd5b9b33368877907d2de04beb8a32f2b714a..edcee85e8f42edbda41a7feb7557f6f3c5524d34 100644
--- a/setup.py
+++ b/setup.py
@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -31,7 +31,7 @@ setuptools.setup(
install_requires=[
"pycocotools;platform_system!='Windows'", 'pyyaml', 'colorama', 'tqdm',
'paddleslim==1.0.1', 'visualdl>=2.0.0b', 'paddlehub>=1.6.2',
- 'shapely>=1.7.0'
+ 'shapely>=1.7.0', "opencv-python"
],
classifiers=[
"Programming Language :: Python :: 3",
diff --git a/tutorials/compress/README.md b/tutorials/compress/README.md
index 0d2f2eeff72181e40382f30b2b4ceb0ddc96e682..b9e343b71ffeb58e00f75a532a79bd3c04628c40 100644
--- a/tutorials/compress/README.md
+++ b/tutorials/compress/README.md
@@ -2,17 +2,19 @@
本目录下整理了使用PaddleX进行模型剪裁训练的代码,代码均会自动下载数据,并使用单张GPU卡进行训练。
PaddleX提供了两种剪裁训练方式,
-1. 用户自行计算剪裁配置(推荐),整体流程为
-> 1.使用数据训练原始模型;
-> 2.使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件
-> 3.再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件,
-> 4.模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构剪裁后,继续迭代训练
->
-2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
-> 1. 在训练调用`train`接口时,将`sensetivities_file`参数设为`DEFAULT`字符串
-> 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构剪裁,继而迭代训练
-
-上述两种方式,第1种方法相对比第2种方法少了两步(即用户训练原始模型+自行计算参数敏感度信息),实验验证第1种方法的精度会更高,剪裁的模型效果更好,因此在时间和计算成本允许的前提下,更推荐使用第1种方法。
+1. 用户自行计算剪裁配置(推荐),整体流程为
+
+> 1. 使用数据训练原始模型;
+> 2. 使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件
+> 3. 再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件,
+> 4. 模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构剪裁后,继续迭代训练
+
+2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
+
+> 1. 在训练调用`train`接口时,将`sensetivities_file`参数设为`DEFAULT`字符串
+> 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构剪裁,继而迭代训练
+
+上述两种方式,第1种方法相对比第2种方法多两步(即用户训练原始模型+自行计算参数敏感度信息),实验验证第1种方法的精度会更高,剪裁的模型效果更好,因此在时间和计算成本允许的前提下,更推荐使用第1种方法。
## 开始剪裁训练
diff --git a/tutorials/compress/classification/cal_sensitivities_file.py b/tutorials/compress/classification/cal_sensitivities_file.py
index b762ec26031d4b971d6311f13ef79ce721ecb670..08fd165ef92b7d00fd6dda071ccf03aff4853707 100644
--- a/tutorials/compress/classification/cal_sensitivities_file.py
+++ b/tutorials/compress/classification/cal_sensitivities_file.py
@@ -1,11 +1,11 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
diff --git a/tutorials/compress/classification/mobilenetv2.py b/tutorials/compress/classification/mobilenetv2.py
index 86fb3795c9103def6b72daede56856a8ce9388cd..0271577fa72bdc3bbc292132e43c05487c5307b1 100644
--- a/tutorials/compress/classification/mobilenetv2.py
+++ b/tutorials/compress/classification/mobilenetv2.py
@@ -1,11 +1,11 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -29,13 +29,11 @@ def train(model_dir=None, sensitivities_file=None, eval_metric_loss=0.05):
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
diff --git a/tutorials/compress/detection/cal_sensitivities_file.py b/tutorials/compress/detection/cal_sensitivities_file.py
index d1111a434d8e669bc23b3cf86f245b64c1bbb9a1..f374842f5d99a559ba6def3abf736c83b24994fa 100644
--- a/tutorials/compress/detection/cal_sensitivities_file.py
+++ b/tutorials/compress/detection/cal_sensitivities_file.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
diff --git a/tutorials/compress/detection/yolov3_mobilenet.py b/tutorials/compress/detection/yolov3_mobilenet.py
index 8c125d0980757180453b912999f10ae13c978c18..7bc79b9f6dd0935c84cafcf3b814aca8fecdbae1 100644
--- a/tutorials/compress/detection/yolov3_mobilenet.py
+++ b/tutorials/compress/detection/yolov3_mobilenet.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
@@ -28,17 +28,14 @@ def train(model_dir, sensitivities_file, eval_metric_loss):
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
- transforms.MixupImage(mixup_epoch=250),
- transforms.RandomDistort(),
- transforms.RandomExpand(),
- transforms.RandomCrop(),
- transforms.Resize(target_size=608, interp='RANDOM'),
- transforms.RandomHorizontalFlip(),
- transforms.Normalize()
+ transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
+ transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
+ target_size=608, interp='RANDOM'),
+ transforms.RandomHorizontalFlip(), transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.Resize(target_size=608, interp='CUBIC'),
- transforms.Normalize()
+ transforms.Resize(
+ target_size=608, interp='CUBIC'), transforms.Normalize()
])
# 定义训练和验证所用的数据集
diff --git a/tutorials/compress/segmentation/cal_sensitivities_file.py b/tutorials/compress/segmentation/cal_sensitivities_file.py
index 542488afe902ef02f82cab3ef9b58f9f65dd53ba..c52c0d42032dc2687be6351ab01901afd15d73fb 100644
--- a/tutorials/compress/segmentation/cal_sensitivities_file.py
+++ b/tutorials/compress/segmentation/cal_sensitivities_file.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
diff --git a/tutorials/compress/segmentation/unet.py b/tutorials/compress/segmentation/unet.py
index 7895443d59e483bedd9e5a5cf267d5278c33770f..8a0b013ef72ba51700809a03ad000f5549ddcc5f 100644
--- a/tutorials/compress/segmentation/unet.py
+++ b/tutorials/compress/segmentation/unet.py
@@ -1,4 +1,4 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
@@ -28,15 +28,12 @@ def train(model_dir, sensitivities_file, eval_metric_loss):
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.ResizeRangeScaling(),
- transforms.RandomPaddingCrop(crop_size=512),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512),
- transforms.Padding(target_size=512),
- transforms.Normalize()
+ transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
diff --git a/tutorials/train/README.md b/tutorials/train/README.md
index 1900143bceb3435da8ffa04a7fed7b0205e04477..e480378a19c75cb622d149ca1da89be7d85baf84 100644
--- a/tutorials/train/README.md
+++ b/tutorials/train/README.md
@@ -4,15 +4,31 @@
|代码 | 模型任务 | 数据 |
|------|--------|---------|
-|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
-|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 |
-|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
-|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 |
-|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 |
-|segmentation/unet.py | 语义分割UNet | 视盘分割 |
+|image_classification/alexnet.py | 图像分类AlexyNet | 蔬菜分类 |
+|image_classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
+|image_classification/mobilenetv3_small_ssld.py | 图像分类MobileNetV3_small_ssld | 蔬菜分类 |
+|image_classification/resnet50_vd_ssld.py | 图像分类ResNet50_vd_ssld | 蔬菜分类 |
+|image_classification/shufflenetv2.py | 图像分类ShuffleNetV2 | 蔬菜分类 |
+|object_detection/faster_rcnn_hrnet_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
+|object_detection/faster_rcnn_r18_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
+|object_detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
+|object_detection/ppyolo.py | 目标检测PPYOLO | 昆虫检测 |
+|object_detection/yolov3_darknet53.py | 目标检测YOLOv3 | 昆虫检测 |
+|object_detection/yolov3_mobilenetv1.py | 目标检测YOLOv3 | 昆虫检测 |
+|object_detection/yolov3_mobilenetv3.py | 目标检测YOLOv3 | 昆虫检测 |
+|instance_segmentation/mask_rcnn_hrnet_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
+|instance_segmentation/mask_rcnn_r18_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
+|instance_segmentation/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 小度熊分拣 |
+|semantic_segmentation/deeplabv3p_mobilenetv2.py | 语义分割DeepLabV3 | 视盘分割 |
+|semantic_segmentation/deeplabv3p_mobilenetv2.py | 语义分割DeepLabV3 | 视盘分割 |
+|semantic_segmentation/deeplabv3p_mobilenetv2_x0.25.py | 语义分割DeepLabV3 | 视盘分割 |
+|semantic_segmentation/deeplabv3p_xception65.py | 语义分割DeepLabV3 | 视盘分割 |
+|semantic_segmentation/fast_scnn.py | 语义分割FastSCNN | 视盘分割 |
+|semantic_segmentation/hrnet.py | 语义分割HRNet | 视盘分割 |
+|semantic_segmentation/unet.py | 语义分割UNet | 视盘分割 |
## 开始训练
在安装PaddleX后,使用如下命令开始训练
```
-python classification/mobilenetv2.py
+python image_classification/mobilenetv2.py
```
diff --git a/tutorials/train/image_classification/README.md b/tutorials/train/image_classification/README.md
index 9d09c6e274e5315d650de7010041414d02da8740..4343d34cd93823b6b2c6d4a6b56446cf428f42f0 100644
--- a/tutorials/train/image_classification/README.md
+++ b/tutorials/train/image_classification/README.md
@@ -17,4 +17,4 @@ python mobilenetv3_small_ssld.py
visualdl --logdir output/mobilenetv3_small_ssld/vdl_log --port 8001
```
-服务启动后,使用浏览器打开 https://0.0.0.0:8001 或 https://localhost:8001
+服务启动后,使用浏览器打开 https://0.0.0.0:8001 或 https://localhost:8001
diff --git a/tutorials/train/image_classification/alexnet.py b/tutorials/train/image_classification/alexnet.py
index bec066962abd8955f6021c8d578e6543eefa0a70..7eb76b94697c7a19127abdc9362ff27abf48e36d 100644
--- a/tutorials/train/image_classification/alexnet.py
+++ b/tutorials/train/image_classification/alexnet.py
@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
+ transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001或https://localhost:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
model = pdx.cls.AlexNet(num_classes=len(train_dataset.labels))
# AlexNet需要指定确定的input_shape
model.fixed_input_shape = [224, 224]
diff --git a/tutorials/train/image_classification/mobilenetv2.py b/tutorials/train/image_classification/mobilenetv2.py
index 7533aab7bc0fc2498d17fd1bd554f595253c05b8..940c3c499a58c3079d8542375cc14c23c46d70ab 100644
--- a/tutorials/train/image_classification/mobilenetv2.py
+++ b/tutorials/train/image_classification/mobilenetv2.py
@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
+ transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
diff --git a/tutorials/train/image_classification/mobilenetv3_small_ssld.py b/tutorials/train/image_classification/mobilenetv3_small_ssld.py
index 8f13312d835b582ec673635f11b4c3fff1c95dda..7c3fb7ffcdc43517de6a7437529d5106c83fb435 100644
--- a/tutorials/train/image_classification/mobilenetv3_small_ssld.py
+++ b/tutorials/train/image_classification/mobilenetv3_small_ssld.py
@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
+ transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
model = pdx.cls.MobileNetV3_small_ssld(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-imagenet
diff --git a/tutorials/train/image_classification/resnet50_vd_ssld.py b/tutorials/train/image_classification/resnet50_vd_ssld.py
index b72ebc52d74f6a0023b830c33f5afc31fb4b7196..547e65fcc922c8576243dffdd07f9bfa65364687 100644
--- a/tutorials/train/image_classification/resnet50_vd_ssld.py
+++ b/tutorials/train/image_classification/resnet50_vd_ssld.py
@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
+ transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
model = pdx.cls.ResNet50_vd_ssld(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
diff --git a/tutorials/train/image_classification/shufflenetv2.py b/tutorials/train/image_classification/shufflenetv2.py
index cdfa1889ba926f4728277929b76536ddaea75c04..23c338b071706ef3a139f4807b3e7d0500e8d1c4 100644
--- a/tutorials/train/image_classification/shufflenetv2.py
+++ b/tutorials/train/image_classification/shufflenetv2.py
@@ -13,14 +13,12 @@ pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/cls_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomCrop(crop_size=224),
- transforms.RandomHorizontalFlip(),
+ transforms.RandomCrop(crop_size=224), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
- transforms.CenterCrop(crop_size=224),
- transforms.Normalize()
+ transforms.CenterCrop(crop_size=224), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.ImageNet(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
model = pdx.cls.ShuffleNetV2(num_classes=len(train_dataset.labels))
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/classification.html#train
diff --git a/tutorials/train/instance_segmentation/mask_rcnn_hrnet_fpn.py b/tutorials/train/instance_segmentation/mask_rcnn_hrnet_fpn.py
index f78446546cd793f96cb074f0a1701a718f7d84b4..6450d6fd1efa4e71049ccf04e88e5a45b0e8a0b3 100644
--- a/tutorials/train/instance_segmentation/mask_rcnn_hrnet_fpn.py
+++ b/tutorials/train/instance_segmentation/mask_rcnn_hrnet_fpn.py
@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32),
])
@@ -38,10 +38,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/instance_segmentation/mask_rcnn_r18_fpn.py b/tutorials/train/instance_segmentation/mask_rcnn_r18_fpn.py
index dc16b66b3941e0d639fd45dbaa691ec51bc5cfbd..d4f9bd640e50329457908a5be7d40529785be7e5 100644
--- a/tutorials/train/instance_segmentation/mask_rcnn_r18_fpn.py
+++ b/tutorials/train/instance_segmentation/mask_rcnn_r18_fpn.py
@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.Normalize(), transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py b/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py
index e87c88e5d8feba36df1bd65430058a4f413ba73c..9a93ec35c0178693dbbde5dc564246e443f55fb3 100644
--- a/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py
+++ b/tutorials/train/instance_segmentation/mask_rcnn_r50_fpn.py
@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.Normalize(), transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
# 定义训练和验证所用的数据集
@@ -38,10 +36,7 @@ eval_dataset = pdx.datasets.CocoDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py b/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py
index e46d3ae56b57aa90cdcecdcce3ad3ee1ad67d098..c948d16b40d14ab723cd3b8fa0dce472c3f49118 100644
--- a/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py
+++ b/tutorials/train/object_detection/faster_rcnn_hrnet_fpn.py
@@ -13,16 +13,14 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.Normalize(), transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
# 定义训练和验证所用的数据集
@@ -40,10 +38,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/object_detection/faster_rcnn_r18_fpn.py b/tutorials/train/object_detection/faster_rcnn_r18_fpn.py
index 0ae82d3ec8166159649f09d33b3f2ad094c3c6ee..46679f22018b330b3e44eb668ee4c890a7af13fb 100644
--- a/tutorials/train/object_detection/faster_rcnn_r18_fpn.py
+++ b/tutorials/train/object_detection/faster_rcnn_r18_fpn.py
@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32),
])
@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/object_detection/faster_rcnn_r50_fpn.py b/tutorials/train/object_detection/faster_rcnn_r50_fpn.py
index 0f26bfa9a5c571419c5b4b2f6e553f383d011399..fde705bfbb0b1732a4146222851b790098619fcf 100644
--- a/tutorials/train/object_detection/faster_rcnn_r50_fpn.py
+++ b/tutorials/train/object_detection/faster_rcnn_r50_fpn.py
@@ -13,15 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
- transforms.Padding(coarsest_stride=32)
+ transforms.RandomHorizontalFlip(), transforms.Normalize(),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333), transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
- transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.ResizeByShort(
+ short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32),
])
@@ -40,10 +40,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
diff --git a/tutorials/train/object_detection/ppyolo.py b/tutorials/train/object_detection/ppyolo.py
new file mode 100644
index 0000000000000000000000000000000000000000..63b47a95671692e89761251e9a1059cac9b542eb
--- /dev/null
+++ b/tutorials/train/object_detection/ppyolo.py
@@ -0,0 +1,58 @@
+# 环境变量配置,用于控制是否使用GPU
+# 说明文档:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html#gpu
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+# 下载和解压昆虫检测数据集
+insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
+pdx.utils.download_and_decompress(insect_dataset, path='./')
+
+# 定义训练和验证时的transforms
+# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
+train_transforms = transforms.Compose([
+ transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
+ transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
+ target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Resize(
+ target_size=608, interp='CUBIC'), transforms.Normalize()
+])
+
+# 定义训练和验证所用的数据集
+# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-vocdetection
+train_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/train_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/val_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
+num_classes = len(train_dataset.labels)
+
+# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
+model = pdx.det.PPYOLO(num_classes=num_classes)
+
+# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#train
+# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
+model.train(
+ num_epochs=270,
+ train_dataset=train_dataset,
+ train_batch_size=8,
+ eval_dataset=eval_dataset,
+ learning_rate=0.000125,
+ lr_decay_epochs=[210, 240],
+ save_dir='output/ppyolo',
+ use_vdl=True)
diff --git a/tutorials/train/object_detection/yolov3_darknet53.py b/tutorials/train/object_detection/yolov3_darknet53.py
index 085be4bf7ffa3f9eca31f3b2807d83f00544b455..7e5b0b07dbdddf7859528556819700d785ad2845 100644
--- a/tutorials/train/object_detection/yolov3_darknet53.py
+++ b/tutorials/train/object_detection/yolov3_darknet53.py
@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.MixupImage(mixup_epoch=250),
- transforms.RandomDistort(),
- transforms.RandomExpand(),
- transforms.RandomCrop(),
- transforms.Resize(target_size=608, interp='RANDOM'),
- transforms.RandomHorizontalFlip(),
+ transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
+ transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
+ target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.Resize(target_size=608, interp='CUBIC'),
- transforms.Normalize()
+ transforms.Resize(
+ target_size=608, interp='CUBIC'), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
diff --git a/tutorials/train/object_detection/yolov3_mobilenetv1.py b/tutorials/train/object_detection/yolov3_mobilenetv1.py
index bfc2bea0716c1bc0b7c27cb8014d6215eed8306c..e565ce0714b67669afcbeb827c45cee9d38370b4 100644
--- a/tutorials/train/object_detection/yolov3_mobilenetv1.py
+++ b/tutorials/train/object_detection/yolov3_mobilenetv1.py
@@ -17,13 +17,15 @@ train_transforms = transforms.Compose([
transforms.RandomDistort(),
transforms.RandomExpand(),
transforms.RandomCrop(),
- transforms.Resize(target_size=608, interp='RANDOM'),
+ transforms.Resize(
+ target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
])
eval_transforms = transforms.Compose([
- transforms.Resize(target_size=608, interp='CUBIC'),
+ transforms.Resize(
+ target_size=608, interp='CUBIC'),
transforms.Normalize(),
])
@@ -42,10 +44,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
diff --git a/tutorials/train/object_detection/yolov3_mobilenetv3.py b/tutorials/train/object_detection/yolov3_mobilenetv3.py
index 85570781851665a9ab28a718ecf85a0b078508a3..a80f34899ca1e8b6fb42a790b4782543880ae992 100644
--- a/tutorials/train/object_detection/yolov3_mobilenetv3.py
+++ b/tutorials/train/object_detection/yolov3_mobilenetv3.py
@@ -13,18 +13,15 @@ pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/det_transforms.html
train_transforms = transforms.Compose([
- transforms.MixupImage(mixup_epoch=250),
- transforms.RandomDistort(),
- transforms.RandomExpand(),
- transforms.RandomCrop(),
- transforms.Resize(target_size=608, interp='RANDOM'),
- transforms.RandomHorizontalFlip(),
+ transforms.MixupImage(mixup_epoch=250), transforms.RandomDistort(),
+ transforms.RandomExpand(), transforms.RandomCrop(), transforms.Resize(
+ target_size=608, interp='RANDOM'), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.Resize(target_size=608, interp='CUBIC'),
- transforms.Normalize()
+ transforms.Resize(
+ target_size=608, interp='CUBIC'), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -42,10 +39,7 @@ eval_dataset = pdx.datasets.VOCDetection(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明: https://paddlex.readthedocs.io/zh_CN/develop/apis/models/detection.html#paddlex-det-yolov3
diff --git a/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py b/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py
index fc5b738a0641604f28fd83a47b795313c13bcd39..ea7891ac8d607d3954cbf39614da13d17137dabe 100644
--- a/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py
+++ b/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv2.py
@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.ResizeRangeScaling(),
- transforms.RandomPaddingCrop(crop_size=512),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.ResizeByLong(long_size=512),
- transforms.Padding(target_size=512),
- transforms.Normalize()
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -40,15 +37,12 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p
-model = pdx.seg.DeepLabv3p(num_classes=num_classes, backbone='MobileNetV2_x1.0')
-
+model = pdx.seg.DeepLabv3p(
+ num_classes=num_classes, backbone='MobileNetV2_x1.0')
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#train
# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
diff --git a/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv3_large_ssld.py b/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv3_large_ssld.py
new file mode 100644
index 0000000000000000000000000000000000000000..9be782cde1394115feea973eea483b4bc2b24ea0
--- /dev/null
+++ b/tutorials/train/semantic_segmentation/deeplabv3p_mobilenetv3_large_ssld.py
@@ -0,0 +1,58 @@
+# 环境变量配置,用于控制是否使用GPU
+# 说明文档:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html#gpu
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+import paddlex as pdx
+from paddlex.seg import transforms
+
+# 下载和解压视盘分割数据集
+optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
+pdx.utils.download_and_decompress(optic_dataset, path='./')
+
+# 定义训练和验证时的transforms
+# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
+])
+
+eval_transforms = transforms.Compose([
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512), transforms.Normalize()
+])
+
+# 定义训练和验证所用的数据集
+# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/datasets.html#paddlex-datasets-segdataset
+train_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/train_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/val_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
+num_classes = len(train_dataset.labels)
+
+# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p
+model = pdx.seg.DeepLabv3p(
+ num_classes=num_classes,
+ backbone='MobileNetV3_large_x1_0_ssld',
+ pooling_crop_size=(512, 512))
+
+# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#train
+# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
+model.train(
+ num_epochs=40,
+ train_dataset=train_dataset,
+ train_batch_size=4,
+ eval_dataset=eval_dataset,
+ learning_rate=0.01,
+ save_dir='output/deeplabv3p_mobilenetv3_large_ssld',
+ use_vdl=True)
diff --git a/tutorials/train/semantic_segmentation/fast_scnn.py b/tutorials/train/semantic_segmentation/fast_scnn.py
index 38fa51a7ab6242795dfd16c322d004b733e62a74..bb1de91df483e7f13da1681f21b8c468c9a09244 100644
--- a/tutorials/train/semantic_segmentation/fast_scnn.py
+++ b/tutorials/train/semantic_segmentation/fast_scnn.py
@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.ResizeRangeScaling(),
- transforms.RandomPaddingCrop(crop_size=512),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.ResizeByLong(long_size=512),
- transforms.Padding(target_size=512),
- transforms.Normalize()
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -40,13 +37,8 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
-
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-fastscnn
model = pdx.seg.FastSCNN(num_classes=num_classes)
diff --git a/tutorials/train/semantic_segmentation/hrnet.py b/tutorials/train/semantic_segmentation/hrnet.py
index 9526e99b352eee73ca3ee4d308ec9fe36250f7d1..91514ea0218dfd7830bdce75ab2987509b62b0ce 100644
--- a/tutorials/train/semantic_segmentation/hrnet.py
+++ b/tutorials/train/semantic_segmentation/hrnet.py
@@ -13,16 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.ResizeRangeScaling(),
- transforms.RandomPaddingCrop(crop_size=512),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.ResizeByLong(long_size=512),
- transforms.Padding(target_size=512),
- transforms.Normalize()
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -40,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-hrnet
diff --git a/tutorials/train/semantic_segmentation/unet.py b/tutorials/train/semantic_segmentation/unet.py
index c0ba72666d4b386667cc747077916eaf251675a9..81d346988cf634c2e07e981f48d2b610bf44d81d 100644
--- a/tutorials/train/semantic_segmentation/unet.py
+++ b/tutorials/train/semantic_segmentation/unet.py
@@ -13,15 +13,13 @@ pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明 https://paddlex.readthedocs.io/zh_CN/develop/apis/transforms/seg_transforms.html
train_transforms = transforms.Compose([
- transforms.RandomHorizontalFlip(),
- transforms.ResizeRangeScaling(),
- transforms.RandomPaddingCrop(crop_size=512),
- transforms.Normalize()
+ transforms.RandomHorizontalFlip(), transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512), transforms.Normalize()
])
eval_transforms = transforms.Compose([
- transforms.ResizeByLong(long_size=512), transforms.Padding(target_size=512),
- transforms.Normalize()
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512), transforms.Normalize()
])
# 定义训练和验证所用的数据集
@@ -39,10 +37,7 @@ eval_dataset = pdx.datasets.SegDataset(
transforms=eval_transforms)
# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# 可使用VisualDL查看训练指标,参考https://paddlex.readthedocs.io/zh_CN/develop/train/visualdl.html
num_classes = len(train_dataset.labels)
# API说明:https://paddlex.readthedocs.io/zh_CN/develop/apis/models/semantic_segmentation.html#paddlex-seg-deeplabv3p