diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..c796dfd89f473684cce1584ebb28f2850063a78f Binary files /dev/null and b/.DS_Store differ diff --git a/.github/ISSUE_TEMPLATE/1_data.md b/.github/ISSUE_TEMPLATE/1_data.md index 05627aa353d1cf06074445d2bb5344d94727fedf..674da8e938470dd8da8a5d7c6cbf946c58e3eca4 100644 --- a/.github/ISSUE_TEMPLATE/1_data.md +++ b/.github/ISSUE_TEMPLATE/1_data.md @@ -2,5 +2,4 @@ name: 1. 数据类问题 about: 数据标注、格式转换等问题 --- - -说明数据类型(图像分类、目标检测、实例分割或语义分割) +数据类型:请说明你的数据类型,如图像分类、目标检测、实例分割或语义分割 diff --git a/.github/ISSUE_TEMPLATE/2_train.md b/.github/ISSUE_TEMPLATE/2_train.md index 489159731bfef42773dffa15cd30582d5c53f992..51edf12bdacd66cea3347cb174c08ab2aee56a33 100644 --- a/.github/ISSUE_TEMPLATE/2_train.md +++ b/.github/ISSUE_TEMPLATE/2_train.md @@ -3,4 +3,8 @@ name: 2. 模型训练 about: 模型训练中的问题 --- -如模型训练出错,建议贴上模型训练代码,以便开发人员分析,并快速响应 +问题类型:模型训练 +**问题描述** + +==================== +请在这里描述您在使用过程中的问题,如模型训练出错,建议贴上模型训练代码,以便开发人员分析,并快速响应 diff --git a/.github/ISSUE_TEMPLATE/3_deploy.md b/.github/ISSUE_TEMPLATE/3_deploy.md index d012d10125c957e702f3877dc087b7331baceb0a..fc74abd33050ba2ee9b27a09ec0f5bb638ebc139 100644 --- a/.github/ISSUE_TEMPLATE/3_deploy.md +++ b/.github/ISSUE_TEMPLATE/3_deploy.md @@ -3,4 +3,9 @@ name: 3. 模型部署 about: 模型部署相关问题,包括C++、Python、Paddle Lite等 --- -说明您的部署环境,部署需求,模型类型和应用场景等,便于开发人员快速响应。 +问题类型:模型部署 +**问题描述** + +======================== + +请在这里描述您在使用过程中的问题,说明您的部署环境,部署需求,模型类型和应用场景等,便于开发人员快速响应。 diff --git a/.github/ISSUE_TEMPLATE/4_gui.md b/.github/ISSUE_TEMPLATE/4_gui.md index 780c8b903b9137f72037e311213443c8678f61d9..7ba7fe77ff08cd18d4ccd81c686a2c03968b40f6 100644 --- a/.github/ISSUE_TEMPLATE/4_gui.md +++ b/.github/ISSUE_TEMPLATE/4_gui.md @@ -2,5 +2,8 @@ name: 4. PaddleX GUI使用问题 about: Paddle GUI客户端使用问题 --- +问题类型:PaddleX GUI +**问题描述** -PaddleX GUI: https://www.paddlepaddle.org.cn/paddle/paddleX (请在ISSUE内容中保留此行内容) +=================================== +请在这里描述您在使用GUI过程中的问题 diff --git a/.github/ISSUE_TEMPLATE/5_other.md b/.github/ISSUE_TEMPLATE/5_other.md index 8ddfe49b544621918355f5c114c1124bdecc8ef3..f347d4f5cdc87b51ec37742a28a836eb8943bd71 100644 --- a/.github/ISSUE_TEMPLATE/5_other.md +++ b/.github/ISSUE_TEMPLATE/5_other.md @@ -2,3 +2,10 @@ name: 5. 其它类型问题 about: 所有问题都可以在这里提 --- + +问题类型:其它 +**问题描述** + +======================== + +请在这里描述您的问题 diff --git a/README.md b/README.md index add63566f2632a0e535504a94da0605ce0618bc7..22392ad80dc3950a6d815cf8cc176b2e2f13e901 100644 --- a/README.md +++ b/README.md @@ -14,10 +14,13 @@ ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) ![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20) +[完整PaddleX在线使用文档目录](https://paddlex.readthedocs.io/zh_CN/develop/index.html) + 集成飞桨智能视觉领域**图像分类**、**目标检测**、**语义分割**、**实例分割**任务能力,将深度学习开发全流程从**数据准备**、**模型训练与优化**到**多端部署**端到端打通,并提供**统一任务API接口**及**图形化开发界面Demo**。开发者无需分别安装不同套件,以**低代码**的形式即可快速完成飞桨全流程开发。 **PaddleX** 经过**质检**、**安防**、**巡检**、**遥感**、**零售**、**医疗**等十多个行业实际应用场景验证,沉淀产业实际经验,**并提供丰富的案例实践教程**,全程助力开发者产业实践落地。 +![](./docs/gui/images/paddlexoverview.png) ## 安装 @@ -29,7 +32,7 @@ 通过简洁易懂的Python API,在兼顾功能全面性、开发灵活性、集成方便性的基础上,给开发者最流畅的深度学习开发体验。
**前置依赖** -> - paddlepaddle >= 1.8.0 +> - paddlepaddle >= 1.8.4 > - python >= 3.6 > - cython > - pycocotools @@ -44,10 +47,11 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple 无代码开发的可视化客户端,应用Paddle API实现,使开发者快速进行产业项目验证,并为用户开发自有深度学习软件/应用提供参照。 -- 前往[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex),申请下载Paddle X GUI一键绿色安装包。 +- 前往[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex),申请下载PaddleX GUI一键绿色安装包。 - 前往[PaddleX GUI使用教程](./docs/gui/how_to_use.md)了解PaddleX GUI使用详情。 +- [PaddleX GUI安装环境说明](./docs/gui/download.md) ## 产品模块说明 @@ -104,15 +108,15 @@ pip install paddlex -i https://mirror.baidu.com/pypi/simple ## 交流与反馈 - 项目官网:https://www.paddlepaddle.org.cn/paddle/paddlex -- PaddleX用户交流群:1045148026 (手机QQ扫描如下二维码快速加入) - ![](./docs/gui/images/QR.jpg) +- PaddleX用户交流群:957286141 (手机QQ扫描如下二维码快速加入) + ![](./docs/gui/images/QR2.jpg) ## 更新日志 > [历史版本及更新内容](https://paddlex.readthedocs.io/zh_CN/develop/change_log.html) - +- 2020.09.05 v1.2.0 - 2020.07.13 v1.1.0 - 2020.07.12 v1.0.8 - 2020.05.20 v1.0.0 diff --git a/deploy/README.md b/deploy/README.md index 7fe3219882c3c8d863824829baf6742b74759d2f..15fbe898d3a4ebbf488b5c0fc1f665bf847f3aa9 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -14,3 +14,5 @@ - [模型量化](../docs/deploy/paddlelite/slim/quant.md) - [模型裁剪](../docs/deploy/paddlelite/slim/prune.md) - [Android平台](../docs/deploy/paddlelite/android.md) +- [OpenVINO部署](../docs/deploy/openvino/introduction.md) +- [树莓派部署](../docs/deploy/raspberry/Raspberry.md) \ No newline at end of file diff --git a/deploy/cpp/CMakeLists.txt b/deploy/cpp/CMakeLists.txt index 349afa2cae5bf40721cafdf38bbf28ddd621beeb..a54979683cd14d2a352cb789b9d6dc7bd26d0a46 100644 --- a/deploy/cpp/CMakeLists.txt +++ b/deploy/cpp/CMakeLists.txt @@ -320,46 +320,34 @@ target_link_libraries(video_segmenter ${DEPS}) if (WIN32 AND WITH_MKL) add_custom_command(TARGET classifier POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/mkldnn.dll ) add_custom_command(TARGET detector POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/mkldnn.dll ) add_custom_command(TARGET segmenter POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/mkldnn.dll ) add_custom_command(TARGET video_classifier POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/mkldnn.dll ) add_custom_command(TARGET video_detector POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/kldnn.dll ) add_custom_command(TARGET video_segmenter POST_BUILD - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/Release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/Release/libiomp5md.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/Release/mkldnn.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll - COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./paddlex_inference/mklml.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./paddlex_inference/libiomp5md.dll + COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./paddlex_inference/mkldnn.dll ) # for encryption if (EXISTS "${ENCRYPTION_DIR}/lib/pmodel-decrypt.dll") diff --git a/deploy/cpp/demo/classifier.cpp b/deploy/cpp/demo/classifier.cpp index cf3bb5ccf64c43ec42d59a9b73fdced6b50b8dc5..548eaff411a737ea0ffcfca63d36a7f18cd9d994 100644 --- a/deploy/cpp/demo/classifier.cpp +++ b/deploy/cpp/demo/classifier.cpp @@ -29,6 +29,10 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_string(key, "", "key of encryption"); DEFINE_string(image, "", "Path of test image file"); @@ -56,6 +60,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); diff --git a/deploy/cpp/demo/detector.cpp b/deploy/cpp/demo/detector.cpp index ef7fd782715bef5d9cc1dae43c87ceaa123e914f..f5fefc05d0bbc4bbd482c23f0db8c066b7d1013b 100644 --- a/deploy/cpp/demo/detector.cpp +++ b/deploy/cpp/demo/detector.cpp @@ -31,6 +31,10 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_string(key, "", "key of encryption"); DEFINE_string(image, "", "Path of test image file"); @@ -61,6 +65,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); int imgs = 1; diff --git a/deploy/cpp/demo/segmenter.cpp b/deploy/cpp/demo/segmenter.cpp index d13a328f5beecc90fe9257a4f32ee63a8fe609a5..0d888001490759f65790d51837e2e69a6f448c4b 100644 --- a/deploy/cpp/demo/segmenter.cpp +++ b/deploy/cpp/demo/segmenter.cpp @@ -30,6 +30,10 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_string(key, "", "key of encryption"); DEFINE_string(image, "", "Path of test image file"); @@ -58,6 +62,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); int imgs = 1; diff --git a/deploy/cpp/demo/video_classifier.cpp b/deploy/cpp/demo/video_classifier.cpp index 96be867d40800455184b7938dc829e8a0b8f8390..c0485791ccb42fc880ab384ae2cf5e1d9d48b1ae 100644 --- a/deploy/cpp/demo/video_classifier.cpp +++ b/deploy/cpp/demo/video_classifier.cpp @@ -35,8 +35,12 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_string(key, "", "key of encryption"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_bool(use_camera, false, "Infering with Camera"); DEFINE_int32(camera_id, 0, "Camera id"); DEFINE_string(video_path, "", "Path of input video"); @@ -62,6 +66,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); diff --git a/deploy/cpp/demo/video_detector.cpp b/deploy/cpp/demo/video_detector.cpp index ee4d5bdb138d03020042e60d41ded0ca1efde46d..e617dbd1339b73676225a65a667a42a06abfa63e 100644 --- a/deploy/cpp/demo/video_detector.cpp +++ b/deploy/cpp/demo/video_detector.cpp @@ -35,6 +35,7 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_bool(use_camera, false, "Infering with Camera"); DEFINE_int32(camera_id, 0, "Camera id"); @@ -42,6 +43,9 @@ DEFINE_string(video_path, "", "Path of input video"); DEFINE_bool(show_result, false, "show the result of each frame with a window"); DEFINE_bool(save_result, true, "save the result of each frame to a video"); DEFINE_string(key, "", "key of encryption"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_string(save_dir, "output", "Path to save visualized image"); DEFINE_double(threshold, 0.5, @@ -64,6 +68,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); // Open video diff --git a/deploy/cpp/demo/video_segmenter.cpp b/deploy/cpp/demo/video_segmenter.cpp index 6a835117cd1434b5f26e0fb660e6fe07ef56e607..35af64f4b00ea5983653bb135394da9389539604 100644 --- a/deploy/cpp/demo/video_segmenter.cpp +++ b/deploy/cpp/demo/video_segmenter.cpp @@ -35,8 +35,12 @@ using namespace std::chrono; // NOLINT DEFINE_string(model_dir, "", "Path of inference model"); DEFINE_bool(use_gpu, false, "Infering with GPU or CPU"); DEFINE_bool(use_trt, false, "Infering with TensorRT"); +DEFINE_bool(use_mkl, true, "Infering with MKL"); DEFINE_int32(gpu_id, 0, "GPU card id"); DEFINE_string(key, "", "key of encryption"); +DEFINE_int32(mkl_thread_num, + omp_get_num_procs(), + "Number of mkl threads"); DEFINE_bool(use_camera, false, "Infering with Camera"); DEFINE_int32(camera_id, 0, "Camera id"); DEFINE_string(video_path, "", "Path of input video"); @@ -62,6 +66,8 @@ int main(int argc, char** argv) { model.Init(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_use_trt, + FLAGS_use_mkl, + FLAGS_mkl_thread_num, FLAGS_gpu_id, FLAGS_key); // Open video diff --git a/deploy/cpp/include/paddlex/paddlex.h b/deploy/cpp/include/paddlex/paddlex.h index 00b1a05ac8127d403dd7325f3357ece75ec23a58..327058e4bd3251f41be82309f154b41eae11027c 100644 --- a/deploy/cpp/include/paddlex/paddlex.h +++ b/deploy/cpp/include/paddlex/paddlex.h @@ -70,6 +70,8 @@ class Model { * @param model_dir: the directory which contains model.yml * @param use_gpu: use gpu or not when infering * @param use_trt: use Tensor RT or not when infering + * @param use_mkl: use mkl or not when infering + * @param mkl_thread_num: number of threads for mkldnn when infering * @param gpu_id: the id of gpu when infering with using gpu * @param key: the key of encryption when using encrypted model * @param use_ir_optim: use ir optimization when infering @@ -77,15 +79,26 @@ class Model { void Init(const std::string& model_dir, bool use_gpu = false, bool use_trt = false, + bool use_mkl = true, + int mkl_thread_num = 4, int gpu_id = 0, std::string key = "", bool use_ir_optim = true) { - create_predictor(model_dir, use_gpu, use_trt, gpu_id, key, use_ir_optim); + create_predictor( + model_dir, + use_gpu, + use_trt, + use_mkl, + mkl_thread_num, + gpu_id, + key, + use_ir_optim); } - void create_predictor(const std::string& model_dir, bool use_gpu = false, bool use_trt = false, + bool use_mkl = true, + int mkl_thread_num = 4, int gpu_id = 0, std::string key = "", bool use_ir_optim = true); @@ -219,5 +232,7 @@ class Model { std::vector outputs_; // a predictor which run the model predicting std::unique_ptr predictor_; + // input channel + int input_channel_; }; } // namespace PaddleX diff --git a/deploy/cpp/include/paddlex/results.h b/deploy/cpp/include/paddlex/results.h index 72caa1f5d4f78275ca9c4de55aa89bc22edd02e5..e3526bf69b854d19a99cc001df226c5d51c7094d 100644 --- a/deploy/cpp/include/paddlex/results.h +++ b/deploy/cpp/include/paddlex/results.h @@ -37,7 +37,7 @@ struct Mask { }; /* - * @brief + * @brief * This class represents target box in detection or instance segmentation tasks. * */ struct Box { @@ -47,7 +47,7 @@ struct Box { // confidence score float score; std::vector coordinate; - Mask mask; + Mask mask; }; /* diff --git a/deploy/cpp/include/paddlex/transforms.h b/deploy/cpp/include/paddlex/transforms.h index 7e936dc17f4b6e58cdb8cdc36639173ccc24177c..46d0768b1bc6bcb2f2d70b541dd29314653873ac 100644 --- a/deploy/cpp/include/paddlex/transforms.h +++ b/deploy/cpp/include/paddlex/transforms.h @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -81,6 +82,16 @@ class Normalize : public Transform { virtual void Init(const YAML::Node& item) { mean_ = item["mean"].as>(); std_ = item["std"].as>(); + if (item["min_val"].IsDefined()) { + min_val_ = item["min_val"].as>(); + } else { + min_val_ = std::vector(mean_.size(), 0.); + } + if (item["max_val"].IsDefined()) { + max_val_ = item["max_val"].as>(); + } else { + max_val_ = std::vector(mean_.size(), 255.); + } } virtual bool Run(cv::Mat* im, ImageBlob* data); @@ -88,6 +99,8 @@ class Normalize : public Transform { private: std::vector mean_; std::vector std_; + std::vector min_val_; + std::vector max_val_; }; /* @@ -216,8 +229,7 @@ class Padding : public Transform { } if (item["im_padding_value"].IsDefined()) { im_value_ = item["im_padding_value"].as>(); - } - else { + } else { im_value_ = {0, 0, 0}; } } @@ -229,6 +241,25 @@ class Padding : public Transform { int height_ = 0; std::vector im_value_; }; + +/* + * @brief + * This class execute clip operation on image matrix + * */ +class Clip : public Transform { + public: + virtual void Init(const YAML::Node& item) { + min_val_ = item["min_val"].as>(); + max_val_ = item["max_val"].as>(); + } + + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector min_val_; + std::vector max_val_; +}; + /* * @brief * This class is transform operations manager. It stores all neccessary diff --git a/deploy/cpp/scripts/bootstrap.sh b/deploy/cpp/scripts/bootstrap.sh index bb9756204e9e610365f67aa37dc78d1b5eaf80b8..e2434d13277a0f058158ba3cfcc883430825c745 100644 --- a/deploy/cpp/scripts/bootstrap.sh +++ b/deploy/cpp/scripts/bootstrap.sh @@ -8,10 +8,37 @@ fi # download pre-compiled opencv lib OPENCV_URL=https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2 +{ + system_name=`awk -F= '/^NAME/{print $2}' /etc/os-release ` +} || { + echo "[ERROR] There's some problems, maybe caused by your system is not Ubuntu, refer this doc for more informat: https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/deploy/opencv.md" + exit -1 +} + +# download pre-compiled opencv lib +OPENCV_URL=https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2 +if [ $system_name == '"Ubuntu"' ] +then + system_version=`awk -F= '/^VERSION_ID/{print $2}' /etc/os-release ` + if [ $system_version == '"18.04"' ] + then + OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/opencv3.4.6gcc4.8ffmpeg_ubuntu_18.04.tar.gz2 + elif [ $system_version == '"16.04"' ] + then + OPENCV_URL=https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2 + else + echo "[ERROR] Cannot find pre-comipled opencv lib for your system environment, refer this doc for more information: https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/deploy/opencv.md" + exit -1 + fi +else + echo "[ERROR] Cannot find pre-comipled opencv lib for your system environment, refer this doc for more information: https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/deploy/opencv.md" + exit -1 +fi + if [ ! -d "./deps/opencv3.4.6gcc4.8ffmpeg/" ]; then mkdir -p deps cd deps - wget -c ${OPENCV_URL} + wget -c ${OPENCV_URL} -O opencv3.4.6gcc4.8ffmpeg.tar.gz2 tar xvfj opencv3.4.6gcc4.8ffmpeg.tar.gz2 rm -rf opencv3.4.6gcc4.8ffmpeg.tar.gz2 cd .. diff --git a/deploy/cpp/scripts/build.sh b/deploy/cpp/scripts/build.sh index 6d6ad25b24170a27639f9b1d651888c4027dbeed..790e2160194a3d5fc73f4c4c608ab31af0f6a5e7 100644 --- a/deploy/cpp/scripts/build.sh +++ b/deploy/cpp/scripts/build.sh @@ -5,9 +5,9 @@ WITH_MKL=ON # 是否集成 TensorRT(仅WITH_GPU=ON 有效) WITH_TENSORRT=OFF # TensorRT 的路径,如果需要集成TensorRT,需修改为您实际安装的TensorRT路径 -TENSORRT_DIR=/root/projects/TensorRT/ +TENSORRT_DIR=$(pwd)/TensorRT/ # Paddle 预测库路径, 请修改为您实际安装的预测库路径 -PADDLE_DIR=/root/projects/fluid_inference +PADDLE_DIR=$(pwd)/fluid_inference # Paddle 的预测库是否使用静态库来编译 # 使用TensorRT时,Paddle的预测库通常为动态库 WITH_STATIC_LIB=OFF @@ -16,14 +16,18 @@ CUDA_LIB=/usr/local/cuda/lib64 # CUDNN 的 lib 路径 CUDNN_LIB=/usr/local/cuda/lib64 +{ + bash $(pwd)/scripts/bootstrap.sh # 下载预编译版本的加密工具和opencv依赖库 +} || { + echo "Fail to execute script/bootstrap.sh" + exit -1 +} + # 是否加载加密后的模型 WITH_ENCRYPTION=ON # 加密工具的路径, 如果使用自带预编译版本可不修改 -sh $(pwd)/scripts/bootstrap.sh # 下载预编译版本的加密工具 ENCRYPTION_DIR=$(pwd)/paddlex-encryption - # OPENCV 路径, 如果使用自带预编译版本可不修改 -sh $(pwd)/scripts/bootstrap.sh # 下载预编译版本的opencv OPENCV_DIR=$(pwd)/deps/opencv3.4.6gcc4.8ffmpeg/ # 以下无需改动 diff --git a/deploy/cpp/src/paddlex.cpp b/deploy/cpp/src/paddlex.cpp index 47dc5b9e9e9104e2d4983a8ac077e5a0810610cf..6d3c23094c3944b9359c701c4c3359c26313d1e3 100644 --- a/deploy/cpp/src/paddlex.cpp +++ b/deploy/cpp/src/paddlex.cpp @@ -11,16 +11,25 @@ // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. + +#include #include #include #include #include #include "include/paddlex/paddlex.h" + +#include +#include +#include + namespace PaddleX { void Model::create_predictor(const std::string& model_dir, bool use_gpu, bool use_trt, + bool use_mkl, + int mkl_thread_num, int gpu_id, std::string key, bool use_ir_optim) { @@ -40,7 +49,7 @@ void Model::create_predictor(const std::string& model_dir, } #endif if (yaml_input == "") { - // 读取配置文件 + // read yaml file std::ifstream yaml_fin(yaml_file); yaml_fin.seekg(0, std::ios::end); size_t yaml_file_size = yaml_fin.tellg(); @@ -48,7 +57,7 @@ void Model::create_predictor(const std::string& model_dir, yaml_fin.seekg(0); yaml_fin.read(&yaml_input[0], yaml_file_size); } - // 读取配置文件内容 + // load yaml file if (!load_config(yaml_input)) { std::cerr << "Parse file 'model.yml' failed!" << std::endl; exit(-1); @@ -57,6 +66,15 @@ void Model::create_predictor(const std::string& model_dir, if (key == "") { config.SetModel(model_file, params_file); } + if (use_mkl && !use_gpu) { + if (name != "HRNet" && name != "DeepLabv3p" && name != "PPYOLO") { + config.EnableMKLDNN(); + config.SetCpuMathLibraryNumThreads(mkl_thread_num); + } else { + std::cerr << "HRNet/DeepLabv3p/PPYOLO are not supported " + << "for the use of mkldnn" << std::endl; + } + } if (use_gpu) { config.EnableUseGpu(100, gpu_id); } else { @@ -64,15 +82,15 @@ void Model::create_predictor(const std::string& model_dir, } config.SwitchUseFeedFetchOps(false); config.SwitchSpecifyInputNames(true); - // 开启图优化 + // enable graph Optim #if defined(__arm__) || defined(__aarch64__) config.SwitchIrOptim(false); #else config.SwitchIrOptim(use_ir_optim); #endif - // 开启内存优化 + // enable Memory Optim config.EnableMemoryOptim(); - if (use_trt) { + if (use_trt && use_gpu) { config.EnableTensorRtEngine( 1 << 20 /* workspace_size*/, 32 /* max_batch_size*/, @@ -108,14 +126,19 @@ bool Model::load_config(const std::string& yaml_input) { return false; } } - // 构建数据处理流 + // build data preprocess stream transforms_.Init(config["Transforms"], to_rgb); - // 读入label list + // read label list labels.clear(); for (const auto& item : config["_Attributes"]["labels"]) { int index = labels.size(); labels[index] = item.as(); } + if (config["_init_params"]["input_channel"].IsDefined()) { + input_channel_ = config["_init_params"]["input_channel"].as(); + } else { + input_channel_ = 3; + } return true; } @@ -152,19 +175,19 @@ bool Model::predict(const cv::Mat& im, ClsResult* result) { "to function predict()!" << std::endl; return false; } - // 处理输入图像 + // im preprocess if (!preprocess(im, &inputs_)) { std::cerr << "Preprocess failed!" << std::endl; return false; } - // 使用加载的模型进行预测 + // predict auto in_tensor = predictor_->GetInputTensor("image"); int h = inputs_.new_im_size_[0]; int w = inputs_.new_im_size_[1]; - in_tensor->Reshape({1, 3, h, w}); + in_tensor->Reshape({1, input_channel_, h, w}); in_tensor->copy_from_cpu(inputs_.im_data_.data()); predictor_->ZeroCopyRun(); - // 取出模型的输出结果 + // get result auto output_names = predictor_->GetOutputNames(); auto output_tensor = predictor_->GetOutputTensor(output_names[0]); std::vector output_shape = output_tensor->shape(); @@ -174,7 +197,7 @@ bool Model::predict(const cv::Mat& im, ClsResult* result) { } outputs_.resize(size); output_tensor->copy_to_cpu(outputs_.data()); - // 对模型输出结果进行后处理 + // postprocess auto ptr = std::max_element(std::begin(outputs_), std::end(outputs_)); result->category_id = std::distance(std::begin(outputs_), ptr); result->score = *ptr; @@ -198,27 +221,27 @@ bool Model::predict(const std::vector& im_batch, return false; } inputs_batch_.assign(im_batch.size(), ImageBlob()); - // 处理输入图像 + // preprocess if (!preprocess(im_batch, &inputs_batch_, thread_num)) { std::cerr << "Preprocess failed!" << std::endl; return false; } - // 使用加载的模型进行预测 + // predict int batch_size = im_batch.size(); auto in_tensor = predictor_->GetInputTensor("image"); int h = inputs_batch_[0].new_im_size_[0]; int w = inputs_batch_[0].new_im_size_[1]; - in_tensor->Reshape({batch_size, 3, h, w}); - std::vector inputs_data(batch_size * 3 * h * w); + in_tensor->Reshape({batch_size, input_channel_, h, w}); + std::vector inputs_data(batch_size * input_channel_ * h * w); for (int i = 0; i < batch_size; ++i) { std::copy(inputs_batch_[i].im_data_.begin(), inputs_batch_[i].im_data_.end(), - inputs_data.begin() + i * 3 * h * w); + inputs_data.begin() + i * input_channel_ * h * w); } in_tensor->copy_from_cpu(inputs_data.data()); // in_tensor->copy_from_cpu(inputs_.im_data_.data()); predictor_->ZeroCopyRun(); - // 取出模型的输出结果 + // get result auto output_names = predictor_->GetOutputNames(); auto output_tensor = predictor_->GetOutputTensor(output_names[0]); std::vector output_shape = output_tensor->shape(); @@ -228,7 +251,7 @@ bool Model::predict(const std::vector& im_batch, } outputs_.resize(size); output_tensor->copy_to_cpu(outputs_.data()); - // 对模型输出结果进行后处理 + // postprocess (*results).clear(); (*results).resize(batch_size); int single_batch_size = size / batch_size; @@ -258,7 +281,7 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { return false; } - // 处理输入图像 + // preprocess if (!preprocess(im, &inputs_)) { std::cerr << "Preprocess failed!" << std::endl; return false; @@ -267,10 +290,10 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { int h = inputs_.new_im_size_[0]; int w = inputs_.new_im_size_[1]; auto im_tensor = predictor_->GetInputTensor("image"); - im_tensor->Reshape({1, 3, h, w}); + im_tensor->Reshape({1, input_channel_, h, w}); im_tensor->copy_from_cpu(inputs_.im_data_.data()); - if (name == "YOLOv3") { + if (name == "YOLOv3" || name == "PPYOLO") { auto im_size_tensor = predictor_->GetInputTensor("im_size"); im_size_tensor->Reshape({1, 2}); im_size_tensor->copy_from_cpu(inputs_.ori_im_size_.data()); @@ -288,7 +311,7 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { im_info_tensor->copy_from_cpu(im_info); im_shape_tensor->copy_from_cpu(im_shape); } - // 使用加载的模型进行预测 + // predict predictor_->ZeroCopyRun(); std::vector output_box; @@ -306,7 +329,7 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { return true; } int num_boxes = size / 6; - // 解析预测框box + // box postprocess for (int i = 0; i < num_boxes; ++i) { Box box; box.category_id = static_cast(round(output_box[i * 6])); @@ -321,7 +344,7 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { box.coordinate = {xmin, ymin, w, h}; result->boxes.push_back(std::move(box)); } - // 实例分割需解析mask + // mask postprocess if (name == "MaskRCNN") { std::vector output_mask; auto output_mask_tensor = predictor_->GetOutputTensor(output_names[1]); @@ -337,12 +360,22 @@ bool Model::predict(const cv::Mat& im, DetResult* result) { result->mask_resolution = output_mask_shape[2]; for (int i = 0; i < result->boxes.size(); ++i) { Box* box = &result->boxes[i]; - auto begin_mask = - output_mask.begin() + (i * classes + box->category_id) * mask_pixels; - auto end_mask = begin_mask + mask_pixels; - box->mask.data.assign(begin_mask, end_mask); box->mask.shape = {static_cast(box->coordinate[2]), static_cast(box->coordinate[3])}; + auto begin_mask = + output_mask.data() + (i * classes + box->category_id) * mask_pixels; + cv::Mat bin_mask(result->mask_resolution, + result->mask_resolution, + CV_32FC1, + begin_mask); + cv::resize(bin_mask, + bin_mask, + cv::Size(box->mask.shape[0], box->mask.shape[1])); + cv::threshold(bin_mask, bin_mask, 0.5, 1, cv::THRESH_BINARY); + auto mask_int_begin = reinterpret_cast(bin_mask.data); + auto mask_int_end = + mask_int_begin + box->mask.shape[0] * box->mask.shape[1]; + box->mask.data.assign(mask_int_begin, mask_int_end); } } return true; @@ -366,12 +399,12 @@ bool Model::predict(const std::vector& im_batch, inputs_batch_.assign(im_batch.size(), ImageBlob()); int batch_size = im_batch.size(); - // 处理输入图像 + // preprocess if (!preprocess(im_batch, &inputs_batch_, thread_num)) { std::cerr << "Preprocess failed!" << std::endl; return false; } - // 对RCNN类模型做批量padding + // RCNN model padding if (batch_size > 1) { if (name == "FasterRCNN" || name == "MaskRCNN") { int max_h = -1; @@ -411,15 +444,15 @@ bool Model::predict(const std::vector& im_batch, int h = inputs_batch_[0].new_im_size_[0]; int w = inputs_batch_[0].new_im_size_[1]; auto im_tensor = predictor_->GetInputTensor("image"); - im_tensor->Reshape({batch_size, 3, h, w}); - std::vector inputs_data(batch_size * 3 * h * w); + im_tensor->Reshape({batch_size, input_channel_, h, w}); + std::vector inputs_data(batch_size * input_channel_ * h * w); for (int i = 0; i < batch_size; ++i) { std::copy(inputs_batch_[i].im_data_.begin(), inputs_batch_[i].im_data_.end(), - inputs_data.begin() + i * 3 * h * w); + inputs_data.begin() + i * input_channel_ * h * w); } im_tensor->copy_from_cpu(inputs_data.data()); - if (name == "YOLOv3") { + if (name == "YOLOv3" || name == "PPYOLO") { auto im_size_tensor = predictor_->GetInputTensor("im_size"); im_size_tensor->Reshape({batch_size, 2}); std::vector inputs_data_size(batch_size * 2); @@ -452,10 +485,10 @@ bool Model::predict(const std::vector& im_batch, im_info_tensor->copy_from_cpu(im_info.data()); im_shape_tensor->copy_from_cpu(im_shape.data()); } - // 使用加载的模型进行预测 + // predict predictor_->ZeroCopyRun(); - // 读取所有box + // get all box std::vector output_box; auto output_names = predictor_->GetOutputNames(); auto output_box_tensor = predictor_->GetOutputTensor(output_names[0]); @@ -472,7 +505,7 @@ bool Model::predict(const std::vector& im_batch, } auto lod_vector = output_box_tensor->lod(); int num_boxes = size / 6; - // 解析预测框box + // box postprocess (*results).clear(); (*results).resize(batch_size); for (int i = 0; i < lod_vector[0].size() - 1; ++i) { @@ -492,7 +525,7 @@ bool Model::predict(const std::vector& im_batch, } } - // 实例分割需解析mask + // mask postprocess if (name == "MaskRCNN") { std::vector output_mask; auto output_mask_tensor = predictor_->GetOutputTensor(output_names[1]); @@ -509,14 +542,24 @@ bool Model::predict(const std::vector& im_batch, for (int i = 0; i < lod_vector[0].size() - 1; ++i) { (*results)[i].mask_resolution = output_mask_shape[2]; for (int j = 0; j < (*results)[i].boxes.size(); ++j) { - Box* box = &(*results)[i].boxes[j]; + Box* box = &(*results)[i].boxes[i]; int category_id = box->category_id; - auto begin_mask = output_mask.begin() + - (mask_idx * classes + category_id) * mask_pixels; - auto end_mask = begin_mask + mask_pixels; - box->mask.data.assign(begin_mask, end_mask); box->mask.shape = {static_cast(box->coordinate[2]), - static_cast(box->coordinate[3])}; + static_cast(box->coordinate[3])}; + auto begin_mask = + output_mask.data() + (i * classes + box->category_id) * mask_pixels; + cv::Mat bin_mask(output_mask_shape[2], + output_mask_shape[2], + CV_32FC1, + begin_mask); + cv::resize(bin_mask, + bin_mask, + cv::Size(box->mask.shape[0], box->mask.shape[1])); + cv::threshold(bin_mask, bin_mask, 0.5, 1, cv::THRESH_BINARY); + auto mask_int_begin = reinterpret_cast(bin_mask.data); + auto mask_int_end = + mask_int_begin + box->mask.shape[0] * box->mask.shape[1]; + box->mask.data.assign(mask_int_begin, mask_int_end); mask_idx++; } } @@ -537,7 +580,7 @@ bool Model::predict(const cv::Mat& im, SegResult* result) { return false; } - // 处理输入图像 + // preprocess if (!preprocess(im, &inputs_)) { std::cerr << "Preprocess failed!" << std::endl; return false; @@ -546,13 +589,13 @@ bool Model::predict(const cv::Mat& im, SegResult* result) { int h = inputs_.new_im_size_[0]; int w = inputs_.new_im_size_[1]; auto im_tensor = predictor_->GetInputTensor("image"); - im_tensor->Reshape({1, 3, h, w}); + im_tensor->Reshape({1, input_channel_, h, w}); im_tensor->copy_from_cpu(inputs_.im_data_.data()); - // 使用加载的模型进行预测 + // predict predictor_->ZeroCopyRun(); - // 获取预测置信度,经过argmax后的labelmap + // get labelmap auto output_names = predictor_->GetOutputNames(); auto output_label_tensor = predictor_->GetOutputTensor(output_names[0]); std::vector output_label_shape = output_label_tensor->shape(); @@ -565,7 +608,7 @@ bool Model::predict(const cv::Mat& im, SegResult* result) { result->label_map.data.resize(size); output_label_tensor->copy_to_cpu(result->label_map.data.data()); - // 获取预测置信度scoremap + // get scoremap auto output_score_tensor = predictor_->GetOutputTensor(output_names[1]); std::vector output_score_shape = output_score_tensor->shape(); size = 1; @@ -577,7 +620,7 @@ bool Model::predict(const cv::Mat& im, SegResult* result) { result->score_map.data.resize(size); output_score_tensor->copy_to_cpu(result->score_map.data.data()); - // 解析输出结果到原图大小 + // get origin image result std::vector label_map(result->label_map.data.begin(), result->label_map.data.end()); cv::Mat mask_label(result->label_map.shape[1], @@ -647,7 +690,7 @@ bool Model::predict(const std::vector& im_batch, return false; } - // 处理输入图像 + // preprocess inputs_batch_.assign(im_batch.size(), ImageBlob()); if (!preprocess(im_batch, &inputs_batch_, thread_num)) { std::cerr << "Preprocess failed!" << std::endl; @@ -660,20 +703,20 @@ bool Model::predict(const std::vector& im_batch, int h = inputs_batch_[0].new_im_size_[0]; int w = inputs_batch_[0].new_im_size_[1]; auto im_tensor = predictor_->GetInputTensor("image"); - im_tensor->Reshape({batch_size, 3, h, w}); - std::vector inputs_data(batch_size * 3 * h * w); + im_tensor->Reshape({batch_size, input_channel_, h, w}); + std::vector inputs_data(batch_size * input_channel_ * h * w); for (int i = 0; i < batch_size; ++i) { std::copy(inputs_batch_[i].im_data_.begin(), inputs_batch_[i].im_data_.end(), - inputs_data.begin() + i * 3 * h * w); + inputs_data.begin() + i * input_channel_ * h * w); } im_tensor->copy_from_cpu(inputs_data.data()); // im_tensor->copy_from_cpu(inputs_.im_data_.data()); - // 使用加载的模型进行预测 + // predict predictor_->ZeroCopyRun(); - // 获取预测置信度,经过argmax后的labelmap + // get labelmap auto output_names = predictor_->GetOutputNames(); auto output_label_tensor = predictor_->GetOutputTensor(output_names[0]); std::vector output_label_shape = output_label_tensor->shape(); @@ -698,7 +741,7 @@ bool Model::predict(const std::vector& im_batch, (*results)[i].label_map.data.data()); } - // 获取预测置信度scoremap + // get scoremap auto output_score_tensor = predictor_->GetOutputTensor(output_names[1]); std::vector output_score_shape = output_score_tensor->shape(); size = 1; @@ -722,7 +765,7 @@ bool Model::predict(const std::vector& im_batch, (*results)[i].score_map.data.data()); } - // 解析输出结果到原图大小 + // get origin image result for (int i = 0; i < batch_size; ++i) { std::vector label_map((*results)[i].label_map.data.begin(), (*results)[i].label_map.data.end()); diff --git a/deploy/cpp/src/transforms.cpp b/deploy/cpp/src/transforms.cpp index f623fc664e9d66002e0eb0065d034d90965eddf7..bf4fbb70a11c00b7a259824ed2544afef43e3631 100644 --- a/deploy/cpp/src/transforms.cpp +++ b/deploy/cpp/src/transforms.cpp @@ -12,12 +12,13 @@ // See the License for the specific language governing permissions and // limitations under the License. +#include "include/paddlex/transforms.h" + +#include + #include #include #include -#include - -#include "include/paddlex/transforms.h" namespace PaddleX { @@ -28,16 +29,20 @@ std::map interpolations = {{"LINEAR", cv::INTER_LINEAR}, {"LANCZOS4", cv::INTER_LANCZOS4}}; bool Normalize::Run(cv::Mat* im, ImageBlob* data) { - for (int h = 0; h < im->rows; h++) { - for (int w = 0; w < im->cols; w++) { - im->at(h, w)[0] = - (im->at(h, w)[0] / 255.0 - mean_[0]) / std_[0]; - im->at(h, w)[1] = - (im->at(h, w)[1] / 255.0 - mean_[1]) / std_[1]; - im->at(h, w)[2] = - (im->at(h, w)[2] / 255.0 - mean_[2]) / std_[2]; - } + std::vector range_val; + for (int c = 0; c < im->channels(); c++) { + range_val.push_back(max_val_[c] - min_val_[c]); } + + std::vector split_im; + cv::split(*im, split_im); + for (int c = 0; c < im->channels(); c++) { + cv::subtract(split_im[c], cv::Scalar(min_val_[c]), split_im[c]); + cv::divide(split_im[c], cv::Scalar(range_val[c]), split_im[c]); + cv::subtract(split_im[c], cv::Scalar(mean_[c]), split_im[c]); + cv::divide(split_im[c], cv::Scalar(std_[c]), split_im[c]); + } + cv::merge(split_im, *im); return true; } @@ -111,11 +116,22 @@ bool Padding::Run(cv::Mat* im, ImageBlob* data) { << ", but they should be greater than 0." << std::endl; return false; } - cv::Scalar value = cv::Scalar(im_value_[0], im_value_[1], im_value_[2]); - cv::copyMakeBorder( - *im, *im, 0, padding_h, 0, padding_w, cv::BORDER_CONSTANT, value); + std::vector padded_im_per_channel; + for (size_t i = 0; i < im->channels(); i++) { + const cv::Mat per_channel = cv::Mat(im->rows + padding_h, + im->cols + padding_w, + CV_32FC1, + cv::Scalar(im_value_[i])); + padded_im_per_channel.push_back(per_channel); + } + cv::Mat padded_im; + cv::merge(padded_im_per_channel, padded_im); + cv::Rect im_roi = cv::Rect(0, 0, im->cols, im->rows); + im->copyTo(padded_im(im_roi)); + *im = padded_im; data->new_im_size_[0] = im->rows; data->new_im_size_[1] = im->cols; + return true; } @@ -161,12 +177,26 @@ bool Resize::Run(cv::Mat* im, ImageBlob* data) { return true; } +bool Clip::Run(cv::Mat* im, ImageBlob* data) { + std::vector split_im; + cv::split(*im, split_im); + for (int c = 0; c < im->channels(); c++) { + cv::threshold(split_im[c], split_im[c], max_val_[c], max_val_[c], + cv::THRESH_TRUNC); + cv::subtract(cv::Scalar(0), split_im[c], split_im[c]); + cv::threshold(split_im[c], split_im[c], min_val_[c], min_val_[c], + cv::THRESH_TRUNC); + cv::divide(split_im[c], cv::Scalar(-1), split_im[c]); + } + cv::merge(split_im, *im); + return true; +} + void Transforms::Init(const YAML::Node& transforms_node, bool to_rgb) { transforms_.clear(); to_rgb_ = to_rgb; for (const auto& item : transforms_node) { std::string name = item.begin()->first.as(); - std::cout << "trans name: " << name << std::endl; std::shared_ptr transform = CreateTransform(name); transform->Init(item.begin()->second); transforms_.push_back(transform); @@ -187,6 +217,8 @@ std::shared_ptr Transforms::CreateTransform( return std::make_shared(); } else if (transform_name == "ResizeByLong") { return std::make_shared(); + } else if (transform_name == "Clip") { + return std::make_shared(); } else { std::cerr << "There's unexpected transform(name='" << transform_name << "')." << std::endl; @@ -195,11 +227,11 @@ std::shared_ptr Transforms::CreateTransform( } bool Transforms::Run(cv::Mat* im, ImageBlob* data) { - // 按照transforms中预处理算子顺序处理图像 + // do all preprocess ops by order if (to_rgb_) { cv::cvtColor(*im, *im, cv::COLOR_BGR2RGB); } - (*im).convertTo(*im, CV_32FC3); + (*im).convertTo(*im, CV_32FC(im->channels())); data->ori_im_size_[0] = im->rows; data->ori_im_size_[1] = im->cols; data->new_im_size_[0] = im->rows; @@ -211,8 +243,8 @@ bool Transforms::Run(cv::Mat* im, ImageBlob* data) { } } - // 将图像由NHWC转为NCHW格式 - // 同时转为连续的内存块存储到ImageBlob + // data format NHWC to NCHW + // img data save to ImageBlob int h = im->rows; int w = im->cols; int c = im->channels(); diff --git a/deploy/cpp/src/visualize.cpp b/deploy/cpp/src/visualize.cpp index afc1733b497269b706bf4e07d82f3a7aa43087f5..d6efc7f9f5c19c436d9bc32a7a7330a0749b9dd5 100644 --- a/deploy/cpp/src/visualize.cpp +++ b/deploy/cpp/src/visualize.cpp @@ -47,7 +47,7 @@ cv::Mat Visualize(const cv::Mat& img, boxes[i].coordinate[2], boxes[i].coordinate[3]); - // 生成预测框和标题 + // draw box and title std::string text = boxes[i].category; int c1 = colormap[3 * boxes[i].category_id + 0]; int c2 = colormap[3 * boxes[i].category_id + 1]; @@ -63,13 +63,13 @@ cv::Mat Visualize(const cv::Mat& img, origin.x = roi.x; origin.y = roi.y; - // 生成预测框标题的背景 + // background cv::Rect text_back = cv::Rect(boxes[i].coordinate[0], boxes[i].coordinate[1] - text_size.height, text_size.width, text_size.height); - // 绘图和文字 + // draw cv::rectangle(vis_img, roi, roi_color, 2); cv::rectangle(vis_img, text_back, roi_color, -1); cv::putText(vis_img, @@ -80,18 +80,16 @@ cv::Mat Visualize(const cv::Mat& img, cv::Scalar(255, 255, 255), thickness); - // 生成实例分割mask + // mask if (boxes[i].mask.data.size() == 0) { continue; } - cv::Mat bin_mask(result.mask_resolution, - result.mask_resolution, + std::vector mask_data; + mask_data.assign(boxes[i].mask.data.begin(), boxes[i].mask.data.end()); + cv::Mat bin_mask(boxes[i].mask.shape[1], + boxes[i].mask.shape[0], CV_32FC1, - boxes[i].mask.data.data()); - cv::resize(bin_mask, - bin_mask, - cv::Size(boxes[i].mask.shape[0], boxes[i].mask.shape[1])); - cv::threshold(bin_mask, bin_mask, 0.5, 1, cv::THRESH_BINARY); + mask_data.data()); cv::Mat full_mask = cv::Mat::zeros(vis_img.size(), CV_8UC1); bin_mask.copyTo(full_mask(roi)); cv::Mat mask_ch[3]; diff --git a/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java b/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java index 940ebaa234db2e34faa2daaf74dfacc0e9d131fe..d88ec4bfa7017fede63ffccc154bcf4a34a8a878 100644 --- a/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java +++ b/deploy/lite/android/sdk/src/main/java/com/baidu/paddlex/preprocess/Transforms.java @@ -23,6 +23,7 @@ import org.opencv.core.Scalar; import org.opencv.core.Size; import org.opencv.imgproc.Imgproc; import java.util.ArrayList; +import java.util.Date; import java.util.HashMap; import java.util.List; @@ -101,6 +102,15 @@ public class Transforms { if (info.containsKey("coarsest_stride")) { padding.coarsest_stride = (int) info.get("coarsest_stride"); } + if (info.containsKey("im_padding_value")) { + List im_padding_value = (List) info.get("im_padding_value"); + if (im_padding_value.size()!=3){ + Log.e(TAG, "len of im_padding_value in padding must == 3."); + } + for (int k =0; i> reverseReshapeInfo = new ArrayList>(imageBlob.getReshapeInfo().entrySet()).listIterator(imageBlob.getReshapeInfo().size()); while (reverseReshapeInfo.hasPrevious()) { Map.Entry entry = reverseReshapeInfo.previous(); diff --git a/deploy/openvino/CMakeLists.txt b/deploy/openvino/CMakeLists.txt old mode 100644 new mode 100755 index 8e32a9592fce38918e46ad9ab9e4b2d1fc97cd6e..e219c8537c40af153b48e5025d07f9292482686a --- a/deploy/openvino/CMakeLists.txt +++ b/deploy/openvino/CMakeLists.txt @@ -8,7 +8,9 @@ SET(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH}) SET(OPENVINO_DIR "" CACHE PATH "Location of libraries") SET(OPENCV_DIR "" CACHE PATH "Location of libraries") SET(GFLAGS_DIR "" CACHE PATH "Location of libraries") +SET(GLOG_DIR "" CACHE PATH "Location of libraries") SET(NGRAPH_LIB "" CACHE PATH "Location of libraries") +SET(ARCH "" CACHE PATH "Location of libraries") include(cmake/yaml-cpp.cmake) @@ -27,6 +29,12 @@ macro(safe_set_static_flag) endforeach(flag_var) endmacro() +if(NOT WIN32) + if (NOT DEFINED ARCH OR ${ARCH} STREQUAL "") + message(FATAL_ERROR "please set ARCH with -DARCH=x86 OR armv7") + endif() +endif() + if (NOT DEFINED OPENVINO_DIR OR ${OPENVINO_DIR} STREQUAL "") message(FATAL_ERROR "please set OPENVINO_DIR with -DOPENVINO_DIR=/path/influence_engine") endif() @@ -39,19 +47,32 @@ if (NOT DEFINED GFLAGS_DIR OR ${GFLAGS_DIR} STREQUAL "") message(FATAL_ERROR "please set GFLAGS_DIR with -DGFLAGS_DIR=/path/gflags") endif() +if (NOT DEFINED GLOG_DIR OR ${GLOG_DIR} STREQUAL "") + message(FATAL_ERROR "please set GLOG_DIR with -DLOG_DIR=/path/glog") +endif() + if (NOT DEFINED NGRAPH_LIB OR ${NGRAPH_LIB} STREQUAL "") message(FATAL_ERROR "please set NGRAPH_DIR with -DNGRAPH_DIR=/path/ngraph") endif() include_directories("${OPENVINO_DIR}") -link_directories("${OPENVINO_DIR}/lib") include_directories("${OPENVINO_DIR}/include") -link_directories("${OPENVINO_DIR}/external/tbb/lib") include_directories("${OPENVINO_DIR}/external/tbb/include/tbb") +link_directories("${OPENVINO_DIR}/lib") +link_directories("${OPENVINO_DIR}/external/tbb/lib") +if(WIN32) + link_directories("${OPENVINO_DIR}/lib/intel64/Release") + link_directories("${OPENVINO_DIR}/bin/intel64/Release") +endif() + + link_directories("${GFLAGS_DIR}/lib") include_directories("${GFLAGS_DIR}/include") +link_directories("${GLOG_DIR}/lib") +include_directories("${GLOG_DIR}/include") + link_directories("${NGRAPH_LIB}") link_directories("${NGRAPH_LIB}/lib") @@ -79,14 +100,29 @@ else() set(CMAKE_STATIC_LIBRARY_PREFIX "") endif() - -if(WITH_STATIC_LIB) - set(DEPS ${OPENVINO_DIR}/lib/intel64/libinference_engine${CMAKE_STATIC_LIBRARY_SUFFIX}) - set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/intel64/libinference_engine_legacy${CMAKE_STATIC_LIBRARY_SUFFIX}) +if(WIN32) + set(DEPS ${OPENVINO_DIR}/lib/intel64/Release/inference_engine${CMAKE_STATIC_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/intel64/Release/inference_engine_legacy${CMAKE_STATIC_LIBRARY_SUFFIX}) else() - set(DEPS ${OPENVINO_DIR}/lib/intel64/libinference_engine${CMAKE_SHARED_LIBRARY_SUFFIX}) - set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/intel64/libinference_engine_legacy${CMAKE_SHARED_LIBRARY_SUFFIX}) -endif() + if (ARCH STREQUAL "armv7") + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=armv7-a") + if(WITH_STATIC_LIB) + set(DEPS ${OPENVINO_DIR}/lib/armv7l/libinference_engine${CMAKE_STATIC_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/armv7l/libinference_engine_legacy${CMAKE_STATIC_LIBRARY_SUFFIX}) + else() + set(DEPS ${OPENVINO_DIR}/lib/armv7l/libinference_engine${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/armv7l/libinference_engine_legacy${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() + else() + if(WITH_STATIC_LIB) + set(DEPS ${OPENVINO_DIR}/lib/intel64/libinference_engine${CMAKE_STATIC_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/intel64/libinference_engine_legacy${CMAKE_STATIC_LIBRARY_SUFFIX}) + else() + set(DEPS ${OPENVINO_DIR}/lib/intel64/libinference_engine${CMAKE_SHARED_LIBRARY_SUFFIX}) + set(DEPS ${DEPS} ${OPENVINO_DIR}/lib/intel64/libinference_engine_legacy${CMAKE_SHARED_LIBRARY_SUFFIX}) + endif() + endif() +endif(WIN32) if (NOT WIN32) set(DEPS ${DEPS} @@ -94,7 +130,7 @@ if (NOT WIN32) ) else() set(DEPS ${DEPS} - glog gflags_static libprotobuf zlibstatic xxhash libyaml-cppmt) + glog gflags_static libyaml-cppmt) set(DEPS ${DEPS} libcmt shlwapi) endif(NOT WIN32) @@ -105,7 +141,14 @@ if (NOT WIN32) endif() set(DEPS ${DEPS} ${OpenCV_LIBS}) -add_executable(classifier src/classifier.cpp src/transforms.cpp src/paddlex.cpp) +add_executable(classifier demo/classifier.cpp src/transforms.cpp src/paddlex.cpp) ADD_DEPENDENCIES(classifier ext-yaml-cpp) target_link_libraries(classifier ${DEPS}) +add_executable(segmenter demo/segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp) +ADD_DEPENDENCIES(segmenter ext-yaml-cpp) +target_link_libraries(segmenter ${DEPS}) + +add_executable(detector demo/detector.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp) +ADD_DEPENDENCIES(detector ext-yaml-cpp) +target_link_libraries(detector ${DEPS}) diff --git a/deploy/openvino/CMakeSettings.json b/deploy/openvino/CMakeSettings.json old mode 100644 new mode 100755 index 861839dbc67816aeb96ca1ab174d95ca7dd292ef..bb3873b6022deb06ccec99830ed4d0d89aa42f6b --- a/deploy/openvino/CMakeSettings.json +++ b/deploy/openvino/CMakeSettings.json @@ -1,27 +1,47 @@ { - "configurations": [ + "configurations": [ + { + "name": "x64-Release", + "generator": "Ninja", + "configurationType": "RelWithDebInfo", + "inheritEnvironments": [ "msvc_x64_x64" ], + "buildRoot": "${projectDir}\\out\\build\\${name}", + "installRoot": "${projectDir}\\out\\install\\${name}", + "cmakeCommandArgs": "", + "buildCommandArgs": "-v", + "ctestCommandArgs": "", + "variables": [ { - "name": "x64-Release", - "generator": "Ninja", - "configurationType": "RelWithDebInfo", - "inheritEnvironments": [ "msvc_x64_x64" ], - "buildRoot": "${projectDir}\\out\\build\\${name}", - "installRoot": "${projectDir}\\out\\install\\${name}", - "cmakeCommandArgs": "", - "buildCommandArgs": "-v", - "ctestCommandArgs": "", - "variables": [ - { - "name": "OPENCV_DIR", - "value": "C:/projects/opencv", - "type": "PATH" - }, - { - "name": "OPENVINO_LIB", - "value": "C:/projetcs/inference_engine", - "type": "PATH" - } - ] + "name": "OPENCV_DIR", + "value": "/path/to/opencv", + "type": "PATH" + }, + { + "name": "OPENVINO_DIR", + "value": "C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/inference_engine", + "type": "PATH" + }, + { + "name": "NGRAPH_LIB", + "value": "C:/Program Files (x86)/IntelSWTools/openvino/deployment_tools/ngraph/lib", + "type": "PATH" + }, + { + "name": "GFLAGS_DIR", + "value": "/path/to/gflags", + "type": "PATH" + }, + { + "name": "WITH_STATIC_LIB", + "value": "True", + "type": "BOOL" + }, + { + "name": "GLOG_DIR", + "value": "/path/to/glog", + "type": "PATH" } - ] -} + ] + } + ] +} \ No newline at end of file diff --git a/deploy/openvino/cmake/yaml-cpp.cmake b/deploy/openvino/cmake/yaml-cpp.cmake old mode 100644 new mode 100755 index 30d904dc76196cf106abccb47c003eed485691f1..726433d904908ce96c51442246fc884d0899de04 --- a/deploy/openvino/cmake/yaml-cpp.cmake +++ b/deploy/openvino/cmake/yaml-cpp.cmake @@ -1,4 +1,3 @@ -find_package(Git REQUIRED) include(ExternalProject) diff --git a/deploy/openvino/src/classifier.cpp b/deploy/openvino/demo/classifier.cpp old mode 100644 new mode 100755 similarity index 87% rename from deploy/openvino/src/classifier.cpp rename to deploy/openvino/demo/classifier.cpp index 38c0da9b86d8b6d9c7d248aeb8526dfe1deab148..2180cb40e390affa2dd1ddcd720d900c715aab75 --- a/deploy/openvino/src/classifier.cpp +++ b/deploy/openvino/demo/classifier.cpp @@ -22,7 +22,7 @@ #include "include/paddlex/paddlex.h" DEFINE_string(model_dir, "", "Path of inference model"); -DEFINE_string(cfg_dir, "", "Path of inference model"); +DEFINE_string(cfg_file, "", "Path of PaddelX model yml file"); DEFINE_string(device, "CPU", "Device name"); DEFINE_string(image, "", "Path of test image file"); DEFINE_string(image_list, "", "Path of test image list file"); @@ -35,8 +35,8 @@ int main(int argc, char** argv) { std::cerr << "--model_dir need to be defined" << std::endl; return -1; } - if (FLAGS_cfg_dir == "") { - std::cerr << "--cfg_dir need to be defined" << std::endl; + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_file need to be defined" << std::endl; return -1; } if (FLAGS_image == "" & FLAGS_image_list == "") { @@ -44,11 +44,11 @@ int main(int argc, char** argv) { return -1; } - // 加载模型 + // load model PaddleX::Model model; - model.Init(FLAGS_model_dir, FLAGS_cfg_dir, FLAGS_device); + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_device); - // 进行预测 + // predict if (FLAGS_image_list != "") { std::ifstream inf(FLAGS_image_list); if (!inf) { diff --git a/deploy/openvino/demo/detector.cpp b/deploy/openvino/demo/detector.cpp new file mode 100644 index 0000000000000000000000000000000000000000..66a31cefc0fa500ad77353e0f9bdd43e4564cc81 --- /dev/null +++ b/deploy/openvino/demo/detector.cpp @@ -0,0 +1,110 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +#include +#include // NOLINT +#include +#include +#include +#include +#include + +#include "include/paddlex/paddlex.h" +#include "include/paddlex/visualize.h" + +using namespace std::chrono; // NOLINT + +DEFINE_string(model_dir, "", "Path of openvino model xml file"); +DEFINE_string(cfg_file, "", "Path of PaddleX model yaml file"); +DEFINE_string(image, "", "Path of test image file"); +DEFINE_string(image_list, "", "Path of test image list file"); +DEFINE_string(device, "CPU", "Device name"); +DEFINE_string(save_dir, "", "Path to save visualized image"); +DEFINE_int32(batch_size, 1, "Batch size of infering"); +DEFINE_double(threshold, + 0.5, + "The minimum scores of target boxes which are shown"); + +int main(int argc, char** argv) { + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir == "") { + std::cerr << "--model_dir need to be defined" << std::endl; + return -1; + } + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_file need to be defined" << std::endl; + return -1; + } + if (FLAGS_image == "" & FLAGS_image_list == "") { + std::cerr << "--image or --image_list need to be defined" << std::endl; + return -1; + } + + // load model + PaddleX::Model model; + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_device); + + int imgs = 1; + auto colormap = PaddleX::GenerateColorMap(model.labels.size()); + // predict + if (FLAGS_image_list != "") { + std::ifstream inf(FLAGS_image_list); + if (!inf) { + std::cerr << "Fail to open file " << FLAGS_image_list << std::endl; + return -1; + } + std::string image_path; + while (getline(inf, image_path)) { + PaddleX::DetResult result; + cv::Mat im = cv::imread(image_path, 1); + model.predict(im, &result); + if (FLAGS_save_dir != "") { + cv::Mat vis_img = PaddleX::Visualize( + im, result, model.labels, colormap, FLAGS_threshold); + std::string save_path = + PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image); + cv::imwrite(save_path, vis_img); + std::cout << "Visualized output saved as " << save_path << std::endl; + } + } + } else { + PaddleX::DetResult result; + cv::Mat im = cv::imread(FLAGS_image, 1); + model.predict(im, &result); + for (int i = 0; i < result.boxes.size(); ++i) { + std::cout << "image file: " << FLAGS_image << std::endl; + std::cout << ", predict label: " << result.boxes[i].category + << ", label_id:" << result.boxes[i].category_id + << ", score: " << result.boxes[i].score + << ", box(xmin, ymin, w, h):(" << result.boxes[i].coordinate[0] + << ", " << result.boxes[i].coordinate[1] << ", " + << result.boxes[i].coordinate[2] << ", " + << result.boxes[i].coordinate[3] << ")" << std::endl; + } + if (FLAGS_save_dir != "") { + // visualize + cv::Mat vis_img = PaddleX::Visualize( + im, result, model.labels, colormap, FLAGS_threshold); + std::string save_path = + PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image); + cv::imwrite(save_path, vis_img); + result.clear(); + std::cout << "Visualized output saved as " << save_path << std::endl; + } + } + return 0; +} diff --git a/deploy/openvino/demo/segmenter.cpp b/deploy/openvino/demo/segmenter.cpp new file mode 100644 index 0000000000000000000000000000000000000000..bb6886aae8def104a9a3923443f9609684b3b154 --- /dev/null +++ b/deploy/openvino/demo/segmenter.cpp @@ -0,0 +1,90 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include "include/paddlex/paddlex.h" +#include "include/paddlex/visualize.h" + + +DEFINE_string(model_dir, "", "Path of openvino model xml file"); +DEFINE_string(cfg_file, "", "Path of PaddleX model yaml file"); +DEFINE_string(image, "", "Path of test image file"); +DEFINE_string(image_list, "", "Path of test image list file"); +DEFINE_string(device, "CPU", "Device name"); +DEFINE_string(save_dir, "", "Path to save visualized image"); +DEFINE_int32(batch_size, 1, "Batch size of infering"); + + +int main(int argc, char** argv) { + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir == "") { + std::cerr << "--model_dir need to be defined" << std::endl; + return -1; + } + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_file need to be defined" << std::endl; + return -1; + } + if (FLAGS_image == "" & FLAGS_image_list == "") { + std::cerr << "--image or --image_list need to be defined" << std::endl; + return -1; + } + + // load model + PaddleX::Model model; + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_device); + int imgs = 1; + auto colormap = PaddleX::GenerateColorMap(model.labels.size()); + + if (FLAGS_image_list != "") { + std::ifstream inf(FLAGS_image_list); + if (!inf) { + std::cerr << "Fail to open file " << FLAGS_image_list < #include #include +#include +#include #include "yaml-cpp/yaml.h" @@ -30,35 +32,40 @@ #include "include/paddlex/config_parser.h" #include "include/paddlex/results.h" #include "include/paddlex/transforms.h" -using namespace InferenceEngine; + namespace PaddleX { class Model { public: void Init(const std::string& model_dir, - const std::string& cfg_dir, + const std::string& cfg_file, std::string device) { - create_predictor(model_dir, cfg_dir, device); + create_predictor(model_dir, cfg_file, device); } void create_predictor(const std::string& model_dir, - const std::string& cfg_dir, + const std::string& cfg_file, std::string device); bool load_config(const std::string& model_dir); - bool preprocess(cv::Mat* input_im); + bool preprocess(cv::Mat* input_im, ImageBlob* inputs); bool predict(const cv::Mat& im, ClsResult* result); + bool predict(const cv::Mat& im, DetResult* result); + + bool predict(const cv::Mat& im, SegResult* result); + + std::string type; std::string name; - std::vector labels; + std::map labels; Transforms transforms_; - Blob::Ptr inputs_; - Blob::Ptr output_; - CNNNetwork network_; - ExecutableNetwork executable_network_; + ImageBlob inputs_; + InferenceEngine::Blob::Ptr output_; + InferenceEngine::CNNNetwork network_; + InferenceEngine::ExecutableNetwork executable_network_; }; -} // namespce of PaddleX +} // namespace PaddleX diff --git a/deploy/openvino/include/paddlex/results.h b/deploy/openvino/include/paddlex/results.h old mode 100644 new mode 100755 index de90c4a85130f42c0201f0d671fd3e2d53b0f37d..7a77e0e2df4dbe4889f7be176df173b00dc454fa --- a/deploy/openvino/include/paddlex/results.h +++ b/deploy/openvino/include/paddlex/results.h @@ -61,11 +61,11 @@ class DetResult : public BaseResult { class SegResult : public BaseResult { public: - Mask label_map; + Mask label_map; Mask score_map; void clear() { label_map.clear(); score_map.clear(); } }; -} // namespce of PaddleX +} // namespace PaddleX diff --git a/deploy/openvino/include/paddlex/transforms.h b/deploy/openvino/include/paddlex/transforms.h old mode 100644 new mode 100755 index fa76a82999173ea01c80b8ea3b67ca1bc4f95fc7..b179c09fbbff082cd3844c8217d3d9e76e5b25c7 --- a/deploy/openvino/include/paddlex/transforms.h +++ b/deploy/openvino/include/paddlex/transforms.h @@ -16,26 +16,54 @@ #include -#include -#include #include #include +#include +#include #include +#include #include #include #include - #include -using namespace InferenceEngine; + namespace PaddleX { +/* + * @brief + * This class represents object for storing all preprocessed data + * */ +class ImageBlob { + public: + // Original image height and width + InferenceEngine::Blob::Ptr ori_im_size_; + + // Newest image height and width after process + std::vector new_im_size_ = std::vector(2); + // Image height and width before resize + std::vector> im_size_before_resize_; + // Reshape order + std::vector reshape_order_; + // Resize scale + float scale = 1.0; + // Buffer for image data after preprocessing + InferenceEngine::Blob::Ptr blob; + + void clear() { + im_size_before_resize_.clear(); + reshape_order_.clear(); + } +}; + + + // Abstraction of preprocessing opration class class Transform { public: virtual void Init(const YAML::Node& item) = 0; - virtual bool Run(cv::Mat* im) = 0; + virtual bool Run(cv::Mat* im, ImageBlob* data) = 0; }; class Normalize : public Transform { @@ -45,7 +73,7 @@ class Normalize : public Transform { std_ = item["std"].as>(); } - virtual bool Run(cv::Mat* im); + virtual bool Run(cv::Mat* im, ImageBlob* data); private: std::vector mean_; @@ -61,8 +89,8 @@ class ResizeByShort : public Transform { } else { max_size_ = -1; } - }; - virtual bool Run(cv::Mat* im); + } + virtual bool Run(cv::Mat* im, ImageBlob* data); private: float GenerateScale(const cv::Mat& im); @@ -70,6 +98,55 @@ class ResizeByShort : public Transform { int max_size_; }; +/* + * @brief + * This class execute resize by long operation on image matrix. At first, it resizes + * the long side of image matrix to specified length. Accordingly, the short side + * will be resized in the same proportion. + * */ +class ResizeByLong : public Transform { + public: + virtual void Init(const YAML::Node& item) { + long_size_ = item["long_size"].as(); + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int long_size_; +}; + +/* + * @brief + * This class execute resize operation on image matrix. It resizes width and height + * to specified length. + * */ +class Resize : public Transform { + public: + virtual void Init(const YAML::Node& item) { + if (item["interp"].IsDefined()) { + interp_ = item["interp"].as(); + } + if (item["target_size"].IsScalar()) { + height_ = item["target_size"].as(); + width_ = item["target_size"].as(); + } else if (item["target_size"].IsSequence()) { + std::vector target_size = item["target_size"].as>(); + width_ = target_size[0]; + height_ = target_size[1]; + } + if (height_ <= 0 || width_ <= 0) { + std::cerr << "[Resize] target_size should greater than 0" << std::endl; + exit(-1); + } + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int height_; + int width_; + std::string interp_; +}; + class CenterCrop : public Transform { public: @@ -83,22 +160,65 @@ class CenterCrop : public Transform { height_ = crop_size[1]; } } - virtual bool Run(cv::Mat* im); + virtual bool Run(cv::Mat* im, ImageBlob* data); private: int height_; int width_; }; + +/* + * @brief + * This class execute padding operation on image matrix. It makes border on edge + * of image matrix. + * */ +class Padding : public Transform { + public: + virtual void Init(const YAML::Node& item) { + if (item["coarsest_stride"].IsDefined()) { + coarsest_stride_ = item["coarsest_stride"].as(); + if (coarsest_stride_ < 1) { + std::cerr << "[Padding] coarest_stride should greater than 0" + << std::endl; + exit(-1); + } + } + if (item["target_size"].IsDefined()) { + if (item["target_size"].IsScalar()) { + width_ = item["target_size"].as(); + height_ = item["target_size"].as(); + } else if (item["target_size"].IsSequence()) { + width_ = item["target_size"].as>()[0]; + height_ = item["target_size"].as>()[1]; + } + } + if (item["im_padding_value"].IsDefined()) { + im_value_ = item["im_padding_value"].as>(); + } else { + im_value_ = {0, 0, 0}; + } + } + + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int coarsest_stride_ = -1; + int width_ = 0; + int height_ = 0; + std::vector im_value_; +}; + class Transforms { public: - void Init(const YAML::Node& node, bool to_rgb = true); + void Init(const YAML::Node& node, std::string type, bool to_rgb = true); std::shared_ptr CreateTransform(const std::string& name); - bool Run(cv::Mat* im, Blob::Ptr blob); + bool Run(cv::Mat* im, ImageBlob* data); private: std::vector> transforms_; bool to_rgb_ = true; + std::string type_; }; } // namespace PaddleX diff --git a/deploy/openvino/include/paddlex/visualize.h b/deploy/openvino/include/paddlex/visualize.h new file mode 100644 index 0000000000000000000000000000000000000000..d3eb094f525dc2c4e878dbfe11916dc98c63dd49 --- /dev/null +++ b/deploy/openvino/include/paddlex/visualize.h @@ -0,0 +1,97 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#ifdef _WIN32 +#include +#include +#else // Linux/Unix +#include +#include +#include +#include +#include +#endif +#include + +#include +#include +#include + +#include "include/paddlex/results.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleX { + +/* + * @brief + * Generate visualization colormap for each class + * + * @param number of class + * @return color map, the size of vector is 3 * num_class + * */ +std::vector GenerateColorMap(int num_class); + + +/* + * @brief + * Visualize the detection result + * + * @param img: initial image matrix + * @param results: the detection result + * @param labels: label map + * @param colormap: visualization color map + * @return visualized image matrix + * */ +cv::Mat Visualize(const cv::Mat& img, + const DetResult& results, + const std::map& labels, + const std::vector& colormap, + float threshold = 0.5); + +/* + * @brief + * Visualize the segmentation result + * + * @param img: initial image matrix + * @param results: the detection result + * @param labels: label map + * @param colormap: visualization color map + * @return visualized image matrix + * */ +cv::Mat Visualize(const cv::Mat& img, + const SegResult& result, + const std::map& labels, + const std::vector& colormap); + +/* + * @brief + * generate save path for visualized image matrix + * + * @param save_dir: directory for saving visualized image matrix + * @param file_path: sourcen image file path + * @return path of saving visualized result + * */ +std::string generate_save_path(const std::string& save_dir, + const std::string& file_path); +} // namespace PaddleX diff --git a/deploy/openvino/python/__init__.py b/deploy/openvino/python/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..abf198b97e6e818e1fbe59006f98492640bcee54 --- /dev/null +++ b/deploy/openvino/python/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/deploy/openvino/python/convertor.py b/deploy/openvino/python/convertor.py new file mode 100644 index 0000000000000000000000000000000000000000..f04720374b933f4472125a754a800ada1c48cae2 --- /dev/null +++ b/deploy/openvino/python/convertor.py @@ -0,0 +1,101 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from six import text_type as _text_type +import argparse +import sys +from utils import logging +import paddlex as pdx + +def arg_parser(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model_dir", + "-m", + type=_text_type, + default=None, + help="define model directory path") + parser.add_argument( + "--save_dir", + "-s", + type=_text_type, + default=None, + help="path to save inference model") + parser.add_argument( + "--fixed_input_shape", + "-fs", + default=None, + help="export openvino model with input shape:[w,h]") + parser.add_argument( + "--data_type", + "-dp", + default="FP32", + help="option, FP32 or FP16, the data_type of openvino IR") + return parser + + + + + +def export_openvino_model(model, args): + if model.model_type == "detector" or model.__class__.__name__ == "FastSCNN": + logging.error( + "Only image classifier models and semantic segmentation models(except FastSCNN) are supported to export to openvino") + try: + import x2paddle + if x2paddle.__version__ < '0.7.4': + logging.error("You need to upgrade x2paddle >= 0.7.4") + except: + logging.error( + "You need to install x2paddle first, pip install x2paddle>=0.7.4") + + import x2paddle.convert as x2pc + x2pc.paddle2onnx(args.model_dir, args.save_dir) + + import mo.main as mo + from mo.utils.cli_parser import get_onnx_cli_parser + onnx_parser = get_onnx_cli_parser() + onnx_parser.add_argument("--model_dir",type=_text_type) + onnx_parser.add_argument("--save_dir",type=_text_type) + onnx_parser.add_argument("--fixed_input_shape") + onnx_input = os.path.join(args.save_dir, 'x2paddle_model.onnx') + onnx_parser.set_defaults(input_model=onnx_input) + onnx_parser.set_defaults(output_dir=args.save_dir) + shape = '[1,3,' + shape = shape + args.fixed_input_shape[1:] + if model.__class__.__name__ == "YOLOV3": + shape = shape + ",[1,2]" + inputs = "image,im_size" + onnx_parser.set_defaults(input = inputs) + onnx_parser.set_defaults(input_shape = shape) + mo.main(onnx_parser,'onnx') + + +def main(): + parser = arg_parser() + args = parser.parse_args() + assert args.model_dir is not None, "--model_dir should be defined while exporting openvino model" + assert args.save_dir is not None, "--save_dir should be defined to create openvino model" + model = pdx.load_model(args.model_dir) + if model.status == "Normal" or model.status == "Prune": + logging.error( + "Only support inference model, try to export model first as below,", + exit=False) + export_openvino_model(model, args) + +if __name__ == "__main__": + main() + + diff --git a/deploy/openvino/python/demo.py b/deploy/openvino/python/demo.py new file mode 100644 index 0000000000000000000000000000000000000000..93ecaab8e526977402a798c21b8b8c5696f1f70b --- /dev/null +++ b/deploy/openvino/python/demo.py @@ -0,0 +1,78 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os +import argparse +import deploy + + +def arg_parser(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model_dir", + "-m", + type=str, + default=None, + help="path to openvino model .xml file") + parser.add_argument( + "--device", + "-d", + type=str, + default='CPU', + help="Specify the target device to infer on:[CPU, GPU, FPGA, HDDL, MYRIAD,HETERO]" + "Default value is CPU") + parser.add_argument( + "--img", "-i", type=str, default=None, help="path to an image files") + + parser.add_argument( + "--img_list", "-l", type=str, default=None, help="Path to a imglist") + + parser.add_argument( + "--cfg_file", + "-c", + type=str, + default=None, + help="Path to PaddelX model yml file") + + return parser + + +def main(): + parser = arg_parser() + args = parser.parse_args() + model_xml = args.model_dir + model_yaml = args.cfg_file + + #model init + if ("CPU" not in args.device): + predictor = deploy.Predictor(model_xml, model_yaml, args.device) + else: + predictor = deploy.Predictor(model_xml, model_yaml) + + #predict + if (args.img_list != None): + f = open(args.img_list) + lines = f.readlines() + for im_path in lines: + print(im_path) + predictor.predict(im_path.strip('\n')) + f.close() + else: + im_path = args.img + predictor.predict(im_path) + + +if __name__ == "__main__": + main() diff --git a/deploy/openvino/python/deploy.py b/deploy/openvino/python/deploy.py new file mode 100644 index 0000000000000000000000000000000000000000..b43f96d9894775d4bf0e54c5c8c56c4a9ed87fb4 --- /dev/null +++ b/deploy/openvino/python/deploy.py @@ -0,0 +1,227 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os +import os.path as osp +import time +import cv2 +import numpy as np +import yaml +from six import text_type as _text_type +from openvino.inference_engine import IECore + + +class Predictor: + def __init__(self, model_xml, model_yaml, device="CPU"): + self.device = device + if not osp.exists(model_xml): + print("model xml file is not exists in {}".format(model_xml)) + self.model_xml = model_xml + self.model_bin = osp.splitext(model_xml)[0] + ".bin" + if not osp.exists(model_yaml): + print("model yaml file is not exists in {}".format(model_yaml)) + with open(model_yaml) as f: + self.info = yaml.load(f.read(), Loader=yaml.Loader) + self.model_type = self.info['_Attributes']['model_type'] + self.model_name = self.info['Model'] + self.num_classes = self.info['_Attributes']['num_classes'] + self.labels = self.info['_Attributes']['labels'] + if self.info['Model'] == 'MaskRCNN': + if self.info['_init_params']['with_fpn']: + self.mask_head_resolution = 28 + else: + self.mask_head_resolution = 14 + transforms_mode = self.info.get('TransformsMode', 'RGB') + if transforms_mode == 'RGB': + to_rgb = True + else: + to_rgb = False + self.transforms = self.build_transforms(self.info['Transforms'], + to_rgb) + self.predictor, self.net = self.create_predictor() + self.total_time = 0 + self.count_num = 0 + + def create_predictor(self): + + #initialization for specified device + print("Creating Inference Engine") + ie = IECore() + print("Loading network files:\n\t{}\n\t{}".format(self.model_xml, + self.model_bin)) + net = ie.read_network(model=self.model_xml, weights=self.model_bin) + net.batch_size = 1 + network_config = {} + if self.device == "MYRIAD": + network_config = {'VPU_HW_STAGES_OPTIMIZATION': 'NO'} + exec_net = ie.load_network( + network=net, device_name=self.device, config=network_config) + return exec_net, net + + def build_transforms(self, transforms_info, to_rgb=True): + if self.model_type == "classifier": + import transforms.cls_transforms as transforms + elif self.model_type == "detector": + import transforms.det_transforms as transforms + elif self.model_type == "segmenter": + import transforms.seg_transforms as transforms + op_list = list() + for op_info in transforms_info: + op_name = list(op_info.keys())[0] + op_attr = op_info[op_name] + if not hasattr(transforms, op_name): + raise Exception( + "There's no operator named '{}' in transforms of {}". + format(op_name, self.model_type)) + op_list.append(getattr(transforms, op_name)(**op_attr)) + eval_transforms = transforms.Compose(op_list) + if hasattr(eval_transforms, 'to_rgb'): + eval_transforms.to_rgb = to_rgb + self.arrange_transforms(eval_transforms) + return eval_transforms + + def arrange_transforms(self, eval_transforms): + if self.model_type == 'classifier': + import transforms.cls_transforms as transforms + arrange_transform = transforms.ArrangeClassifier + elif self.model_type == 'segmenter': + import transforms.seg_transforms as transforms + arrange_transform = transforms.ArrangeSegmenter + elif self.model_type == 'detector': + import transforms.det_transforms as transforms + arrange_name = 'Arrange{}'.format(self.model_name) + arrange_transform = getattr(transforms, arrange_name) + else: + raise Exception("Unrecognized model type: {}".format( + self.model_type)) + if type(eval_transforms.transforms[-1]).__name__.startswith('Arrange'): + eval_transforms.transforms[-1] = arrange_transform(mode='test') + else: + eval_transforms.transforms.append(arrange_transform(mode='test')) + + def raw_predict(self, preprocessed_input): + self.count_num += 1 + feed_dict = {} + if self.model_name == "YOLOv3": + inputs = self.net.inputs + for name in inputs: + if (len(inputs[name].shape) == 2): + feed_dict[name] = preprocessed_input['im_size'] + elif (len(inputs[name].shape) == 4): + feed_dict[name] = preprocessed_input['image'] + else: + pass + else: + input_blob = next(iter(self.net.inputs)) + feed_dict[input_blob] = preprocessed_input['image'] + #Start sync inference + print("Starting inference in synchronous mode") + res = self.predictor.infer(inputs=feed_dict) + + #Processing output blob + print("Processing output blob") + return res + + def preprocess(self, image): + res = dict() + if self.model_type == "classifier": + im, = self.transforms(image) + im = np.expand_dims(im, axis=0).copy() + res['image'] = im + elif self.model_type == "detector": + if self.model_name == "YOLOv3": + im, im_shape = self.transforms(image) + im = np.expand_dims(im, axis=0).copy() + im_shape = np.expand_dims(im_shape, axis=0).copy() + res['image'] = im + res['im_size'] = im_shape + if self.model_name.count('RCNN') > 0: + im, im_resize_info, im_shape = self.transforms(image) + im = np.expand_dims(im, axis=0).copy() + im_resize_info = np.expand_dims(im_resize_info, axis=0).copy() + im_shape = np.expand_dims(im_shape, axis=0).copy() + res['image'] = im + res['im_info'] = im_resize_info + res['im_shape'] = im_shape + elif self.model_type == "segmenter": + im, im_info = self.transforms(image) + im = np.expand_dims(im, axis=0).copy() + res['image'] = im + res['im_info'] = im_info + return res + + def classifier_postprocess(self, preds, topk=1): + """ 对分类模型的预测结果做后处理 + """ + true_topk = min(self.num_classes, topk) + output_name = next(iter(self.net.outputs)) + pred_label = np.argsort(-preds[output_name][0])[:true_topk] + result = [{ + 'category_id': l, + 'category': self.labels[l], + 'score': preds[output_name][0][l], + } for l in pred_label] + print(result) + return result + + def segmenter_postprocess(self, preds, preprocessed_inputs): + """ 对语义分割结果做后处理 + """ + it = iter(self.net.outputs) + next(it) + score_name = next(it) + score_map = np.squeeze(preds[score_name]) + score_map = np.transpose(score_map, (1, 2, 0)) + label_name = next(it) + label_map = np.squeeze(preds[label_name]).astype('uint8') + im_info = preprocessed_inputs['im_info'] + for info in im_info[::-1]: + if info[0] == 'resize': + w, h = info[1][1], info[1][0] + label_map = cv2.resize(label_map, (w, h), cv2.INTER_NEAREST) + score_map = cv2.resize(score_map, (w, h), cv2.INTER_LINEAR) + elif info[0] == 'padding': + w, h = info[1][1], info[1][0] + label_map = label_map[0:h, 0:w] + score_map = score_map[0:h, 0:w, :] + else: + raise Exception("Unexpected info '{}' in im_info".format(info[ + 0])) + return {'label_map': label_map, 'score_map': score_map} + + def detector_postprocess(self, preds, preprocessed_inputs): + """对图像检测结果做后处理 + """ + output_name = next(iter(self.net.outputs)) + outputs = preds[output_name][0] + result = [] + for out in outputs: + if (out[0] > 0): + result.append(out.tolist()) + else: + pass + print(result) + return result + + def predict(self, image, topk=1, threshold=0.5): + preprocessed_input = self.preprocess(image) + model_pred = self.raw_predict(preprocessed_input) + if self.model_type == "classifier": + results = self.classifier_postprocess(model_pred, topk) + elif self.model_type == "detector": + results = self.detector_postprocess(model_pred, preprocessed_input) + elif self.model_type == "segmenter": + results = self.segmenter_postprocess(model_pred, + preprocessed_input) diff --git a/deploy/openvino/python/transforms/__init__.py b/deploy/openvino/python/transforms/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..9ec4809004549b5d564e7d69feb5d3a32fbebc98 --- /dev/null +++ b/deploy/openvino/python/transforms/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import cls_transforms +from . import det_transforms +from . import seg_transforms diff --git a/deploy/openvino/python/transforms/cls_transforms.py b/deploy/openvino/python/transforms/cls_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..120c2699238e99d57316eba86ebb2e845d4f3435 --- /dev/null +++ b/deploy/openvino/python/transforms/cls_transforms.py @@ -0,0 +1,281 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .ops import * +import random +import os.path as osp +import numpy as np +from PIL import Image, ImageEnhance + + +class ClsTransform: + """分类Transform的基类 + """ + + def __init__(self): + pass + + +class Compose(ClsTransform): + """根据数据预处理/增强算子对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强算子。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + + def __call__(self, im, label=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + Returns: + tuple: 根据网络所需字段所组成的tuple; + 字段由transforms中的最后一个数据预处理操作决定。 + """ + if isinstance(im, np.ndarray): + if len(im.shape) != 3: + raise Exception( + "im should be 3-dimension, but now is {}-dimensions". + format(len(im.shape))) + else: + try: + im = cv2.imread(im).astype('float32') + except: + raise TypeError('Can\'t read The image file {}!'.format(im)) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + for op in self.transforms: + outputs = op(im, label) + im = outputs[0] + if len(outputs) == 2: + label = outputs[1] + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class Normalize(ClsTransform): + """对图像进行标准化。 + + 1. 对图像进行归一化到区间[0.0, 1.0]。 + 2. 对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。 + std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。 + + """ + + def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]): + self.mean = mean + self.std = std + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + if label is None: + return (im, ) + else: + return (im, label) + + +class ResizeByShort(ClsTransform): + """根据图像短边对图像重新调整大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度; + 4. 根据调整大小的比例对图像进行resize。 + + Args: + short_size (int): 调整大小后的图像目标短边长度。默认为256。 + max_size (int): 长边目标长度的最大限制。默认为-1。 + """ + + def __init__(self, short_size=256, max_size=-1): + self.short_size = short_size + self.max_size = max_size + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_LINEAR) + + if label is None: + return (im, ) + else: + return (im, label) + + +class CenterCrop(ClsTransform): + """以图像中心点扩散裁剪长宽为`crop_size`的正方形 + + 1. 计算剪裁的起始点。 + 2. 剪裁图像。 + + Args: + crop_size (int): 裁剪的目标边长。默认为224。 + """ + + def __init__(self, crop_size=224): + self.crop_size = crop_size + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + im = center_crop(im, self.crop_size) + if label is None: + return (im, ) + else: + return (im, label) + + +class ArrangeClassifier(ClsTransform): + """获取训练/验证/预测所需信息。注意:此操作不需用户自己显示调用 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。 + """ + + def __init__(self, mode=None): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode must be in ['train', 'eval', 'test', 'quant']!") + self.mode = mode + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当mode为'train'或'eval'时,返回(im, label),分别对应图像np.ndarray数据、 + 图像类别id;当mode为'test'或'quant'时,返回(im, ),对应图像np.ndarray数据。 + """ + im = permute(im, False).astype('float32') + if self.mode == 'train' or self.mode == 'eval': + outputs = (im, label) + else: + outputs = (im, ) + return outputs + + +class ComposedClsTransforms(Compose): + """ 分类模型的基础Transforms流程,具体如下 + 训练阶段: + 1. 随机从图像中crop一块子图,并resize成crop_size大小 + 2. 将1的输出按0.5的概率随机进行水平翻转 + 3. 将图像进行归一化 + 验证/预测阶段: + 1. 将图像按比例Resize,使得最小边长度为crop_size[0] * 1.14 + 2. 从图像中心crop出一个大小为crop_size的图像 + 3. 将图像进行归一化 + + Args: + mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + crop_size(int|list): 输入模型里的图像大小 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + crop_size=[224, 224], + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + width = crop_size + if isinstance(crop_size, list): + if crop_size[0] != crop_size[1]: + raise Exception( + "In classifier model, width and height should be equal, please modify your parameter `crop_size`" + ) + width = crop_size[0] + if width % 32 != 0: + raise Exception( + "In classifier model, width and height should be multiple of 32, e.g 224、256、320...., please modify your parameter `crop_size`" + ) + + if mode == 'train': + pass + else: + # 验证/预测时的transforms + transforms = [ + ResizeByShort(short_size=int(width * 1.14)), + CenterCrop(crop_size=width), Normalize( + mean=mean, std=std) + ] + + super(ComposedClsTransforms, self).__init__(transforms) diff --git a/deploy/openvino/python/transforms/det_transforms.py b/deploy/openvino/python/transforms/det_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..0e2d1dc30c0d0bb768839709da9cd74f2140d84a --- /dev/null +++ b/deploy/openvino/python/transforms/det_transforms.py @@ -0,0 +1,540 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +import random +import os.path as osp +import numpy as np + +import cv2 +from PIL import Image, ImageEnhance + +from .ops import * + + +class DetTransform: + """检测数据处理基类 + """ + + def __init__(self): + pass + + +class Compose(DetTransform): + """根据数据预处理/增强列表对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强列表。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + self.use_mixup = False + for t in self.transforms: + if type(t).__name__ == 'MixupImage': + self.use_mixup = True + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + im_info (dict): 存储与图像相关的信息,dict中的字段如下: + - im_id (np.ndarray): 图像序列号,形状为(1,)。 + - image_shape (np.ndarray): 图像原始大小,形状为(2,), + image_shape[0]为高,image_shape[1]为宽。 + - mixup (list): list为[im, im_info, label_info],分别对应 + 与当前图像进行mixup的图像np.ndarray数据、图像相关信息、标注框相关信息; + 注意,当前epoch若无需进行mixup,则无该字段。 + label_info (dict): 存储与标注框相关的信息,dict中的字段如下: + - gt_bbox (np.ndarray): 真实标注框坐标[x1, y1, x2, y2],形状为(n, 4), + 其中n代表真实标注框的个数。 + - gt_class (np.ndarray): 每个真实标注框对应的类别序号,形状为(n, 1), + 其中n代表真实标注框的个数。 + - gt_score (np.ndarray): 每个真实标注框对应的混合得分,形状为(n, 1), + 其中n代表真实标注框的个数。 + - gt_poly (list): 每个真实标注框内的多边形分割区域,每个分割区域由点的x、y坐标组成, + 长度为n,其中n代表真实标注框的个数。 + - is_crowd (np.ndarray): 每个真实标注框中是否是一组对象,形状为(n, 1), + 其中n代表真实标注框的个数。 + - difficult (np.ndarray): 每个真实标注框中的对象是否为难识别对象,形状为(n, 1), + 其中n代表真实标注框的个数。 + Returns: + tuple: 根据网络所需字段所组成的tuple; + 字段由transforms中的最后一个数据预处理操作决定。 + """ + + def decode_image(im_file, im_info, label_info): + if im_info is None: + im_info = dict() + if isinstance(im_file, np.ndarray): + if len(im_file.shape) != 3: + raise Exception( + "im should be 3-dimensions, but now is {}-dimensions". + format(len(im_file.shape))) + im = im_file + else: + try: + im = cv2.imread(im_file).astype('float32') + except: + raise TypeError('Can\'t read The image file {}!'.format( + im_file)) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + # make default im_info with [h, w, 1] + im_info['im_resize_info'] = np.array( + [im.shape[0], im.shape[1], 1.], dtype=np.float32) + im_info['image_shape'] = np.array([im.shape[0], + im.shape[1]]).astype('int32') + if not self.use_mixup: + if 'mixup' in im_info: + del im_info['mixup'] + # decode mixup image + if 'mixup' in im_info: + im_info['mixup'] = \ + decode_image(im_info['mixup'][0], + im_info['mixup'][1], + im_info['mixup'][2]) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + outputs = decode_image(im, im_info, label_info) + im = outputs[0] + im_info = outputs[1] + if len(outputs) == 3: + label_info = outputs[2] + for op in self.transforms: + if im is None: + return None + outputs = op(im, im_info, label_info) + im = outputs[0] + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class ResizeByShort(DetTransform): + """根据图像的短边调整图像大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。 + 4. 根据调整大小的比例对图像进行resize。 + + Args: + target_size (int): 短边目标长度。默认为800。 + max_size (int): 长边目标长度的最大限制。默认为1333。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, short_size=800, max_size=1333): + self.max_size = int(max_size) + if not isinstance(short_size, int): + raise TypeError( + "Type of short_size is invalid. Must be Integer, now is {}". + format(type(short_size))) + self.short_size = short_size + if not (isinstance(self.max_size, int)): + raise TypeError("max_size: input type is invalid.") + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + 其中,im_info更新字段为: + - im_resize_info (np.ndarray): resize后的图像高、resize后的图像宽、resize后的图像相对原始图的缩放比例 + 三者组成的np.ndarray,形状为(3,)。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("ResizeByShort: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('ResizeByShort: image is not 3-dimensional.') + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im_resize_info = [resized_height, resized_width, scale] + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_LINEAR) + im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class Padding(DetTransform): + """1.将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640], + `coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值 + 进行padding,最终输出图像为[320, 640]。 + 2.或者,将图像的长和宽padding到target_size指定的shape,如输入的图像为[300,640], + a. `target_size` = 960,在图像最右和最下使用0值进行padding,最终输出 + 图像为[960, 960]。 + b. `target_size` = [640, 960],在图像最右和最下使用0值进行padding,最终 + 输出图像为[640, 960]。 + + 1. 如果coarsest_stride为1,target_size为None则直接返回。 + 2. 获取图像的高H、宽W。 + 3. 计算填充后图像的高H_new、宽W_new。 + 4. 构建大小为(H_new, W_new, 3)像素值为0的np.ndarray, + 并将原图的np.ndarray粘贴于左上角。 + + Args: + coarsest_stride (int): 填充后的图像长、宽为该参数的倍数,默认为1。 + target_size (int|list|tuple): 填充后的图像长、宽,默认为None,coarset_stride优先级更高。 + + Raises: + TypeError: 形参`target_size`数据类型不满足需求。 + ValueError: 形参`target_size`为(list|tuple)时,长度不满足需求。 + """ + + def __init__(self, coarsest_stride=1, target_size=None): + self.coarsest_stride = coarsest_stride + if target_size is not None: + if not isinstance(target_size, int): + if not isinstance(target_size, tuple) and not isinstance( + target_size, list): + raise TypeError( + "Padding: Type of target_size must in (int|list|tuple)." + ) + elif len(target_size) != 2: + raise ValueError( + "Padding: Length of target_size must equal 2.") + self.target_size = target_size + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + ValueError: coarsest_stride,target_size需有且只有一个被指定。 + ValueError: target_size小于原图的大小。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("Padding: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('Padding: image is not 3-dimensional.') + im_h, im_w, im_c = im.shape[:] + + if isinstance(self.target_size, int): + padding_im_h = self.target_size + padding_im_w = self.target_size + elif isinstance(self.target_size, list) or isinstance(self.target_size, + tuple): + padding_im_w = self.target_size[0] + padding_im_h = self.target_size[1] + elif self.coarsest_stride > 0: + padding_im_h = int( + np.ceil(im_h / self.coarsest_stride) * self.coarsest_stride) + padding_im_w = int( + np.ceil(im_w / self.coarsest_stride) * self.coarsest_stride) + else: + raise ValueError( + "coarsest_stridei(>1) or target_size(list|int) need setting in Padding transform" + ) + pad_height = padding_im_h - im_h + pad_width = padding_im_w - im_w + if pad_height < 0 or pad_width < 0: + raise ValueError( + 'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})' + .format(im_w, im_h, padding_im_w, padding_im_h)) + padding_im = np.zeros( + (padding_im_h, padding_im_w, im_c), dtype=np.float32) + padding_im[:im_h, :im_w, :] = im + if label_info is None: + return (padding_im, im_info) + else: + return (padding_im, im_info, label_info) + + +class Resize(DetTransform): + """调整图像大小(resize)。 + + - 当目标大小(target_size)类型为int时,根据插值方式, + 将图像resize为[target_size, target_size]。 + - 当目标大小(target_size)类型为list或tuple时,根据插值方式, + 将图像resize为target_size。 + 注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。 + + Args: + target_size (int/list/tuple): 短边目标长度。默认为608。 + interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为 + ['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC', + 'AREA', 'LANCZOS4', 'RANDOM']中。 + """ + + # The interpolation mode + interp_dict = { + 'NEAREST': cv2.INTER_NEAREST, + 'LINEAR': cv2.INTER_LINEAR, + 'CUBIC': cv2.INTER_CUBIC, + 'AREA': cv2.INTER_AREA, + 'LANCZOS4': cv2.INTER_LANCZOS4 + } + + def __init__(self, target_size=608, interp='LINEAR'): + self.interp = interp + if not (interp == "RANDOM" or interp in self.interp_dict): + raise ValueError("interp should be one of {}".format( + self.interp_dict.keys())) + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise TypeError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + + self.target_size = target_size + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("Resize: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('Resize: image is not 3-dimensional.') + if self.interp == "RANDOM": + interp = random.choice(list(self.interp_dict.keys())) + else: + interp = self.interp + im = resize(im, self.target_size, self.interp_dict[interp]) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class Normalize(DetTransform): + """对图像进行标准化。 + + 1. 归一化图像到到区间[0.0, 1.0]。 + 2. 对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。 + std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]): + self.mean = mean + self.std = std + if not (isinstance(self.mean, list) and isinstance(self.std, list)): + raise TypeError("NormalizeImage: input type is invalid.") + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise TypeError('NormalizeImage: std is invalid!') + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + """ + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class ArrangeYOLOv3(DetTransform): + """获取YOLOv3模型训练/验证/预测所需信息。 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。 + """ + + def __init__(self, mode=None): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode must be in ['train', 'eval', 'test', 'quant']!") + self.mode = mode + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当mode为'train'时,返回(im, gt_bbox, gt_class, gt_score, im_shape),分别对应 + 图像np.ndarray数据、真实标注框、真实标注框对应的类别、真实标注框混合得分、图像大小信息; + 当mode为'eval'时,返回(im, im_shape, im_id, gt_bbox, gt_class, difficult), + 分别对应图像np.ndarray数据、图像大小信息、图像id、真实标注框、真实标注框对应的类别、 + 真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_shape), + 分别对应图像np.ndarray数据、图像大小信息。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + im = permute(im, False) + if self.mode == 'train': + pass + elif self.mode == 'eval': + pass + else: + if im_info is None: + raise TypeError('Cannot do ArrangeYolov3! ' + + 'Becasuse the im_info can not be None!') + im_shape = im_info['image_shape'] + outputs = (im, im_shape) + return outputs + + +class ComposedYOLOv3Transforms(Compose): + """YOLOv3模型的图像预处理流程,具体如下, + 训练阶段: + 1. 在前mixup_epoch轮迭代中,使用MixupImage策略,见https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#mixupimage + 2. 对图像进行随机扰动,包括亮度,对比度,饱和度和色调 + 3. 随机扩充图像,见https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#randomexpand + 4. 随机裁剪图像 + 5. 将4步骤的输出图像Resize成shape参数的大小 + 6. 随机0.5的概率水平翻转图像 + 7. 图像归一化 + 验证/预测阶段: + 1. 将图像Resize成shape参数大小 + 2. 图像归一化 + + Args: + mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小 + mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + shape=[608, 608], + mixup_epoch=250, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + width = shape + if isinstance(shape, list): + if shape[0] != shape[1]: + raise Exception( + "In YOLOv3 model, width and height should be equal") + width = shape[0] + if width % 32 != 0: + raise Exception( + "In YOLOv3 model, width and height should be multiple of 32, e.g 224、256、320...." + ) + + if mode == 'train': + # 训练时的transforms,包含数据增强 + pass + else: + # 验证/预测时的transforms + transforms = [ + Resize( + target_size=width, interp='CUBIC'), Normalize( + mean=mean, std=std) + ] + super(ComposedYOLOv3Transforms, self).__init__(transforms) diff --git a/deploy/openvino/python/transforms/ops.py b/deploy/openvino/python/transforms/ops.py new file mode 100644 index 0000000000000000000000000000000000000000..3f298d7824be48355b69973a1e14486172efcb08 --- /dev/null +++ b/deploy/openvino/python/transforms/ops.py @@ -0,0 +1,186 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import math +import numpy as np +from PIL import Image, ImageEnhance + + +def normalize(im, mean, std): + im = im / 255.0 + im -= mean + im /= std + return im + + +def permute(im, to_bgr=False): + im = np.swapaxes(im, 1, 2) + im = np.swapaxes(im, 1, 0) + if to_bgr: + im = im[[2, 1, 0], :, :] + return im + + +def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR): + value = max(im.shape[0], im.shape[1]) + scale = float(long_size) / float(value) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + + im = cv2.resize( + im, (resized_width, resized_height), interpolation=interpolation) + return im + + +def resize(im, target_size=608, interp=cv2.INTER_LINEAR): + if isinstance(target_size, list) or isinstance(target_size, tuple): + w = target_size[0] + h = target_size[1] + else: + w = target_size + h = target_size + im = cv2.resize(im, (w, h), interpolation=interp) + return im + + +def random_crop(im, + crop_size=224, + lower_scale=0.08, + lower_ratio=3. / 4, + upper_ratio=4. / 3): + scale = [lower_scale, 1.0] + ratio = [lower_ratio, upper_ratio] + aspect_ratio = math.sqrt(np.random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + bound = min((float(im.shape[0]) / im.shape[1]) / (h**2), + (float(im.shape[1]) / im.shape[0]) / (w**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + target_area = im.shape[0] * im.shape[1] * np.random.uniform( + scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + i = np.random.randint(0, im.shape[0] - h + 1) + j = np.random.randint(0, im.shape[1] - w + 1) + im = im[i:i + h, j:j + w, :] + im = cv2.resize(im, (crop_size, crop_size)) + return im + + +def center_crop(im, crop_size=224): + height, width = im.shape[:2] + w_start = (width - crop_size) // 2 + h_start = (height - crop_size) // 2 + w_end = w_start + crop_size + h_end = h_start + crop_size + im = im[h_start:h_end, w_start:w_end, :] + return im + + +def horizontal_flip(im): + if len(im.shape) == 3: + im = im[:, ::-1, :] + elif len(im.shape) == 2: + im = im[:, ::-1] + return im + + +def vertical_flip(im): + if len(im.shape) == 3: + im = im[::-1, :, :] + elif len(im.shape) == 2: + im = im[::-1, :] + return im + + +def bgr2rgb(im): + return im[:, :, ::-1] + + +def hue(im, hue_lower, hue_upper): + delta = np.random.uniform(hue_lower, hue_upper) + u = np.cos(delta * np.pi) + w = np.sin(delta * np.pi) + bt = np.array([[1.0, 0.0, 0.0], [0.0, u, -w], [0.0, w, u]]) + tyiq = np.array([[0.299, 0.587, 0.114], [0.596, -0.274, -0.321], + [0.211, -0.523, 0.311]]) + ityiq = np.array([[1.0, 0.956, 0.621], [1.0, -0.272, -0.647], + [1.0, -1.107, 1.705]]) + t = np.dot(np.dot(ityiq, bt), tyiq).T + im = np.dot(im, t) + return im + + +def saturation(im, saturation_lower, saturation_upper): + delta = np.random.uniform(saturation_lower, saturation_upper) + gray = im * np.array([[[0.299, 0.587, 0.114]]], dtype=np.float32) + gray = gray.sum(axis=2, keepdims=True) + gray *= (1.0 - delta) + im *= delta + im += gray + return im + + +def contrast(im, contrast_lower, contrast_upper): + delta = np.random.uniform(contrast_lower, contrast_upper) + im *= delta + return im + + +def brightness(im, brightness_lower, brightness_upper): + delta = np.random.uniform(brightness_lower, brightness_upper) + im += delta + return im + +def rotate(im, rotate_lower, rotate_upper): + rotate_delta = np.random.uniform(rotate_lower, rotate_upper) + im = im.rotate(int(rotate_delta)) + return im + + +def resize_padding(im, max_side_len=2400): + ''' + resize image to a size multiple of 32 which is required by the network + :param im: the resized image + :param max_side_len: limit of max image size to avoid out of memory in gpu + :return: the resized image and the resize ratio + ''' + h, w, _ = im.shape + + resize_w = w + resize_h = h + + # limit the max side + if max(resize_h, resize_w) > max_side_len: + ratio = float( + max_side_len) / resize_h if resize_h > resize_w else float( + max_side_len) / resize_w + else: + ratio = 1. + resize_h = int(resize_h * ratio) + resize_w = int(resize_w * ratio) + + resize_h = resize_h if resize_h % 32 == 0 else (resize_h // 32 - 1) * 32 + resize_w = resize_w if resize_w % 32 == 0 else (resize_w // 32 - 1) * 32 + resize_h = max(32, resize_h) + resize_w = max(32, resize_w) + im = cv2.resize(im, (int(resize_w), int(resize_h))) + #im = cv2.resize(im, (512, 512)) + ratio_h = resize_h / float(h) + ratio_w = resize_w / float(w) + _ratio = np.array([ratio_h, ratio_w]).reshape(-1, 2) + return im, _ratio diff --git a/deploy/openvino/python/transforms/seg_transforms.py b/deploy/openvino/python/transforms/seg_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..a3fb6241d415939a33f73a29b843f9ed45976463 --- /dev/null +++ b/deploy/openvino/python/transforms/seg_transforms.py @@ -0,0 +1,1054 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .ops import * +import random +import os.path as osp +import numpy as np +from PIL import Image +import cv2 +from collections import OrderedDict + + +class SegTransform: + """ 分割transform基类 + """ + + def __init__(self): + pass + + +class Compose(SegTransform): + """根据数据预处理/增强算子对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强算子。 + + Raises: + TypeError: transforms不是list对象 + ValueError: transforms元素个数小于1。 + + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + self.to_rgb = False + + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。 + + Returns: + tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。 + """ + + if im_info is None: + im_info = list() + if isinstance(im, np.ndarray): + if len(im.shape) != 3: + raise Exception( + "im should be 3-dimensions, but now is {}-dimensions". + format(len(im.shape))) + else: + try: + im = cv2.imread(im).astype('float32') + except: + raise ValueError('Can\'t read The image file {}!'.format(im)) + if self.to_rgb: + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + if label is not None: + if not isinstance(label, np.ndarray): + label = np.asarray(Image.open(label)) + for op in self.transforms: + if isinstance(op, SegTransform): + outputs = op(im, im_info, label) + im = outputs[0] + if len(outputs) >= 2: + im_info = outputs[1] + if len(outputs) == 3: + label = outputs[2] + else: + im = execute_imgaug(op, im) + if label is not None: + outputs = (im, im_info, label) + else: + outputs = (im, im_info) + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class RandomHorizontalFlip(SegTransform): + """以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。 + + Args: + prob (float): 随机水平翻转的概率。默认值为0.5。 + + """ + + def __init__(self, prob=0.5): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if random.random() < self.prob: + im = horizontal_flip(im) + if label is not None: + label = horizontal_flip(label) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomVerticalFlip(SegTransform): + """以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。 + + Args: + prob (float): 随机垂直翻转的概率。默认值为0.1。 + """ + + def __init__(self, prob=0.1): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if random.random() < self.prob: + im = vertical_flip(im) + if label is not None: + label = vertical_flip(label) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Resize(SegTransform): + """调整图像大小(resize),当存在标注图像时,则同步进行处理。 + + - 当目标大小(target_size)类型为int时,根据插值方式, + 将图像resize为[target_size, target_size]。 + - 当目标大小(target_size)类型为list或tuple时,根据插值方式, + 将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。 + + Args: + target_size (int|list|tuple): 目标大小。 + interp (str): resize的插值方式,与opencv的插值方式对应, + 可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。 + + Raises: + TypeError: target_size不是int/list/tuple。 + ValueError: target_size为list/tuple时元素个数不等于2。 + AssertionError: interp的取值不在['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4']之内。 + """ + + # The interpolation mode + interp_dict = { + 'NEAREST': cv2.INTER_NEAREST, + 'LINEAR': cv2.INTER_LINEAR, + 'CUBIC': cv2.INTER_CUBIC, + 'AREA': cv2.INTER_AREA, + 'LANCZOS4': cv2.INTER_LANCZOS4 + } + + def __init__(self, target_size, interp='LINEAR'): + self.interp = interp + assert interp in self.interp_dict, "interp should be one of {}".format( + interp_dict.keys()) + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise ValueError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + + self.target_size = target_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info跟新字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + + Raises: + ZeroDivisionError: im的短边为0。 + TypeError: im不是np.ndarray数据。 + ValueError: im不是3维nd.ndarray。 + """ + if im_info is None: + im_info = OrderedDict() + im_info.append(('resize', im.shape[:2])) + + if not isinstance(im, np.ndarray): + raise TypeError("ResizeImage: image type is not np.ndarray.") + if len(im.shape) != 3: + raise ValueError('ResizeImage: image is not 3-dimensional.') + im_shape = im.shape + im_size_min = np.min(im_shape[0:2]) + im_size_max = np.max(im_shape[0:2]) + if float(im_size_min) == 0: + raise ZeroDivisionError('ResizeImage: min size of image is 0') + + if isinstance(self.target_size, int): + resize_w = self.target_size + resize_h = self.target_size + else: + resize_w = self.target_size[0] + resize_h = self.target_size[1] + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp_dict[self.interp]) + if label is not None: + label = cv2.resize( + label, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp_dict['NEAREST']) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeByLong(SegTransform): + """对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。 + + Args: + long_size (int): resize后图像的长边大小。 + """ + + def __init__(self, long_size): + self.long_size = long_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info新增字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + """ + if im_info is None: + im_info = OrderedDict() + + im_info.append(('resize', im.shape[:2])) + im = resize_long(im, self.long_size) + if label is not None: + label = resize_long(label, self.long_size, cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeByShort(SegTransform): + """根据图像的短边调整图像大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。 + 4. 根据调整大小的比例对图像进行resize。 + + Args: + target_size (int): 短边目标长度。默认为800。 + max_size (int): 长边目标长度的最大限制。默认为1333。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, short_size=800, max_size=1333): + self.max_size = int(max_size) + if not isinstance(short_size, int): + raise TypeError( + "Type of short_size is invalid. Must be Integer, now is {}". + format(type(short_size))) + self.short_size = short_size + if not (isinstance(self.max_size, int)): + raise TypeError("max_size: input type is invalid.") + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info更新字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = OrderedDict() + if not isinstance(im, np.ndarray): + raise TypeError("ResizeByShort: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('ResizeByShort: image is not 3-dimensional.') + im_info.append(('resize', im.shape[:2])) + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_NEAREST) + if label is not None: + im = cv2.resize( + label, (resized_width, resized_height), + interpolation=cv2.INTER_NEAREST) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeRangeScaling(SegTransform): + """对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。 + + Args: + min_value (int): 图像长边resize后的最小值。默认值400。 + max_value (int): 图像长边resize后的最大值。默认值600。 + + Raises: + ValueError: min_value大于max_value + """ + + def __init__(self, min_value=400, max_value=600): + if min_value > max_value: + raise ValueError('min_value must be less than max_value, ' + 'but they are {} and {}.'.format(min_value, + max_value)) + self.min_value = min_value + self.max_value = max_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_value == self.max_value: + random_size = self.max_value + else: + random_size = int( + np.random.uniform(self.min_value, self.max_value) + 0.5) + im = resize_long(im, random_size, cv2.INTER_LINEAR) + if label is not None: + label = resize_long(label, random_size, cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeStepScaling(SegTransform): + """对图像按照某一个比例resize,这个比例以scale_step_size为步长 + 在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。 + + Args: + min_scale_factor(float), resize最小尺度。默认值0.75。 + max_scale_factor (float), resize最大尺度。默认值1.25。 + scale_step_size (float), resize尺度范围间隔。默认值0.25。 + + Raises: + ValueError: min_scale_factor大于max_scale_factor + """ + + def __init__(self, + min_scale_factor=0.75, + max_scale_factor=1.25, + scale_step_size=0.25): + if min_scale_factor > max_scale_factor: + raise ValueError( + 'min_scale_factor must be less than max_scale_factor, ' + 'but they are {} and {}.'.format(min_scale_factor, + max_scale_factor)) + self.min_scale_factor = min_scale_factor + self.max_scale_factor = max_scale_factor + self.scale_step_size = scale_step_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_scale_factor == self.max_scale_factor: + scale_factor = self.min_scale_factor + + elif self.scale_step_size == 0: + scale_factor = np.random.uniform(self.min_scale_factor, + self.max_scale_factor) + + else: + num_steps = int((self.max_scale_factor - self.min_scale_factor) / + self.scale_step_size + 1) + scale_factors = np.linspace(self.min_scale_factor, + self.max_scale_factor, + num_steps).tolist() + np.random.shuffle(scale_factors) + scale_factor = scale_factors[0] + + im = cv2.resize( + im, (0, 0), + fx=scale_factor, + fy=scale_factor, + interpolation=cv2.INTER_LINEAR) + if label is not None: + label = cv2.resize( + label, (0, 0), + fx=scale_factor, + fy=scale_factor, + interpolation=cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Normalize(SegTransform): + """对图像进行标准化。 + 1.尺度缩放到 [0,1]。 + 2.对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。 + std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。 + + Raises: + ValueError: mean或std不是list对象。std包含0。 + """ + + def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]): + self.mean = mean + self.std = std + if not (isinstance(self.mean, list) and isinstance(self.std, list)): + raise ValueError("{}: input type is invalid.".format(self)) + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise ValueError('{}: std is invalid!'.format(self)) + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Padding(SegTransform): + """对图像或标注图像进行padding,padding方向为右和下。 + 根据提供的值对图像或标注图像进行padding操作。 + + Args: + target_size (int|list|tuple): padding后图像的大小。 + im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。 + label_padding_value (int): 标注图像padding的值。默认值为255。 + + Raises: + TypeError: target_size不是int|list|tuple。 + ValueError: target_size为list|tuple时元素个数不等于2。 + """ + + def __init__(self, + target_size, + im_padding_value=[127.5, 127.5, 127.5], + label_padding_value=255): + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise ValueError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + self.target_size = target_size + self.im_padding_value = im_padding_value + self.label_padding_value = label_padding_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info新增字段为: + -shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。 + + Raises: + ValueError: 输入图像im或label的形状大于目标值 + """ + if im_info is None: + im_info = OrderedDict() + im_info.append(('padding', im.shape[:2])) + + im_height, im_width = im.shape[0], im.shape[1] + if isinstance(self.target_size, int): + target_height = self.target_size + target_width = self.target_size + else: + target_height = self.target_size[1] + target_width = self.target_size[0] + pad_height = target_height - im_height + pad_width = target_width - im_width + if pad_height < 0 or pad_width < 0: + raise ValueError( + 'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})' + .format(im_width, im_height, target_width, target_height)) + else: + im = cv2.copyMakeBorder( + im, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.im_padding_value) + if label is not None: + label = cv2.copyMakeBorder( + label, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.label_padding_value) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomPaddingCrop(SegTransform): + """对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。 + + Args: + crop_size (int|list|tuple): 裁剪图像大小。默认为512。 + im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。 + label_padding_value (int): 标注图像padding的值。默认值为255。 + + Raises: + TypeError: crop_size不是int/list/tuple。 + ValueError: target_size为list/tuple时元素个数不等于2。 + """ + + def __init__(self, + crop_size=512, + im_padding_value=[127.5, 127.5, 127.5], + label_padding_value=255): + if isinstance(crop_size, list) or isinstance(crop_size, tuple): + if len(crop_size) != 2: + raise ValueError( + 'when crop_size is list or tuple, it should include 2 elements, but it is {}' + .format(crop_size)) + elif not isinstance(crop_size, int): + raise TypeError( + "Type of crop_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(crop_size))) + self.crop_size = crop_size + self.im_padding_value = im_padding_value + self.label_padding_value = label_padding_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if isinstance(self.crop_size, int): + crop_width = self.crop_size + crop_height = self.crop_size + else: + crop_width = self.crop_size[0] + crop_height = self.crop_size[1] + + img_height = im.shape[0] + img_width = im.shape[1] + + if img_height == crop_height and img_width == crop_width: + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + else: + pad_height = max(crop_height - img_height, 0) + pad_width = max(crop_width - img_width, 0) + if (pad_height > 0 or pad_width > 0): + im = cv2.copyMakeBorder( + im, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.im_padding_value) + if label is not None: + label = cv2.copyMakeBorder( + label, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.label_padding_value) + img_height = im.shape[0] + img_width = im.shape[1] + + if crop_height > 0 and crop_width > 0: + h_off = np.random.randint(img_height - crop_height + 1) + w_off = np.random.randint(img_width - crop_width + 1) + + im = im[h_off:(crop_height + h_off), w_off:(w_off + crop_width + ), :] + if label is not None: + label = label[h_off:(crop_height + h_off), w_off:( + w_off + crop_width)] + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomBlur(SegTransform): + """以一定的概率对图像进行高斯模糊。 + + Args: + prob (float): 图像模糊概率。默认为0.1。 + """ + + def __init__(self, prob=0.1): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.prob <= 0: + n = 0 + elif self.prob >= 1: + n = 1 + else: + n = int(1.0 / self.prob) + if n > 0: + if np.random.randint(0, n) == 0: + radius = np.random.randint(3, 10) + if radius % 2 != 1: + radius = radius + 1 + if radius > 9: + radius = 9 + im = cv2.GaussianBlur(im, (radius, radius), 0, 0) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + + + +class RandomScaleAspect(SegTransform): + """裁剪并resize回原始尺寸的图像和标注图像。 + 按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。 + + Args: + min_scale (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。 + aspect_ratio (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。 + """ + + def __init__(self, min_scale=0.5, aspect_ratio=0.33): + self.min_scale = min_scale + self.aspect_ratio = aspect_ratio + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_scale != 0 and self.aspect_ratio != 0: + img_height = im.shape[0] + img_width = im.shape[1] + for i in range(0, 10): + area = img_height * img_width + target_area = area * np.random.uniform(self.min_scale, 1.0) + aspectRatio = np.random.uniform(self.aspect_ratio, + 1.0 / self.aspect_ratio) + + dw = int(np.sqrt(target_area * 1.0 * aspectRatio)) + dh = int(np.sqrt(target_area * 1.0 / aspectRatio)) + if (np.random.randint(10) < 5): + tmp = dw + dw = dh + dh = tmp + + if (dh < img_height and dw < img_width): + h1 = np.random.randint(0, img_height - dh) + w1 = np.random.randint(0, img_width - dw) + + im = im[h1:(h1 + dh), w1:(w1 + dw), :] + label = label[h1:(h1 + dh), w1:(w1 + dw)] + im = cv2.resize( + im, (img_width, img_height), + interpolation=cv2.INTER_LINEAR) + label = cv2.resize( + label, (img_width, img_height), + interpolation=cv2.INTER_NEAREST) + break + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomDistort(SegTransform): + """对图像进行随机失真。 + + 1. 对变换的操作顺序进行随机化操作。 + 2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。 + + Args: + brightness_range (float): 明亮度因子的范围。默认为0.5。 + brightness_prob (float): 随机调整明亮度的概率。默认为0.5。 + contrast_range (float): 对比度因子的范围。默认为0.5。 + contrast_prob (float): 随机调整对比度的概率。默认为0.5。 + saturation_range (float): 饱和度因子的范围。默认为0.5。 + saturation_prob (float): 随机调整饱和度的概率。默认为0.5。 + hue_range (int): 色调因子的范围。默认为18。 + hue_prob (float): 随机调整色调的概率。默认为0.5。 + """ + + def __init__(self, + brightness_range=0.5, + brightness_prob=0.5, + contrast_range=0.5, + contrast_prob=0.5, + saturation_range=0.5, + saturation_prob=0.5, + hue_range=18, + hue_prob=0.5): + self.brightness_range = brightness_range + self.brightness_prob = brightness_prob + self.contrast_range = contrast_range + self.contrast_prob = contrast_prob + self.saturation_range = saturation_range + self.saturation_prob = saturation_prob + self.hue_range = hue_range + self.hue_prob = hue_prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + brightness_lower = 1 - self.brightness_range + brightness_upper = 1 + self.brightness_range + contrast_lower = 1 - self.contrast_range + contrast_upper = 1 + self.contrast_range + saturation_lower = 1 - self.saturation_range + saturation_upper = 1 + self.saturation_range + hue_lower = -self.hue_range + hue_upper = self.hue_range + ops = [brightness, contrast, saturation, hue] + random.shuffle(ops) + params_dict = { + 'brightness': { + 'brightness_lower': brightness_lower, + 'brightness_upper': brightness_upper + }, + 'contrast': { + 'contrast_lower': contrast_lower, + 'contrast_upper': contrast_upper + }, + 'saturation': { + 'saturation_lower': saturation_lower, + 'saturation_upper': saturation_upper + }, + 'hue': { + 'hue_lower': hue_lower, + 'hue_upper': hue_upper + } + } + prob_dict = { + 'brightness': self.brightness_prob, + 'contrast': self.contrast_prob, + 'saturation': self.saturation_prob, + 'hue': self.hue_prob + } + for id in range(4): + params = params_dict[ops[id].__name__] + prob = prob_dict[ops[id].__name__] + params['im'] = im + if np.random.uniform(0, 1) < prob: + im = ops[id](**params) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ArrangeSegmenter(SegTransform): + """获取训练/验证/预测所需的信息。 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内 + """ + + def __init__(self, mode): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode should be defined as one of ['train', 'eval', 'test', 'quant']!" + ) + self.mode = mode + + def __call__(self, im, im_info, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为 + 'quant'时,返回的tuple为(im,),为图像np.ndarray数据。 + """ + im = permute(im, False) + if self.mode == 'train' or self.mode == 'eval': + label = label[np.newaxis, :, :] + return (im, label) + elif self.mode == 'test': + return (im, im_info) + else: + return (im, ) + + +class ComposedSegTransforms(Compose): + """ 语义分割模型(UNet/DeepLabv3p)的图像处理流程,具体如下 + 训练阶段: + 1. 随机对图像以0.5的概率水平翻转 + 2. 按不同的比例随机Resize原图 + 3. 从原图中随机crop出大小为train_crop_size大小的子图,如若crop出来的图小于train_crop_size,则会将图padding到对应大小 + 4. 图像归一化 + 预测阶段: + 1. 图像归一化 + + Args: + mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + train_crop_size(list): 模型训练阶段,随机从原图crop的大小 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + train_crop_size=[769, 769], + mean=[0.5, 0.5, 0.5], + std=[0.5, 0.5, 0.5]): + if mode == 'train': + # 训练时的transforms,包含数据增强 + pass + else: + # 验证/预测时的transforms + transforms = [Normalize(mean=mean, std=std)] + + super(ComposedSegTransforms, self).__init__(transforms) diff --git a/deploy/openvino/scripts/bootstrap.sh b/deploy/openvino/scripts/bootstrap.sh deleted file mode 100644 index f9fc1d1edc327370f7b5d8e7494cb88d4fd4d12c..0000000000000000000000000000000000000000 --- a/deploy/openvino/scripts/bootstrap.sh +++ /dev/null @@ -1,10 +0,0 @@ -# download pre-compiled opencv lib -OPENCV_URL=https://paddleseg.bj.bcebos.com/deploy/docker/opencv3gcc4.8.tar.bz2 -if [ ! -d "./deps/opencv3gcc4.8" ]; then - mkdir -p deps - cd deps - wget -c ${OPENCV_URL} - tar xvfj opencv3gcc4.8.tar.bz2 - rm -rf opencv3gcc4.8.tar.bz2 - cd .. -fi diff --git a/deploy/openvino/scripts/build.sh b/deploy/openvino/scripts/build.sh old mode 100644 new mode 100755 index 17f988146a6147030be35bd4abee966b569caa5f..0e204b5cf87518da92405bed6d9987850f15e2fd --- a/deploy/openvino/scripts/build.sh +++ b/deploy/openvino/scripts/build.sh @@ -1,14 +1,23 @@ -# openvino预编译库的路径 -OPENVINO_DIR=/path/to/inference_engine/ -# gflags预编译库的路径 -GFLAGS_DIR=/path/to/gflags +# OpenVINO预编译库的路径 +OPENVINO_DIR=$INTEL_OPENVINO_DIR/inference_engine + # ngraph lib的路径,编译openvino时通常会生成 -NGRAPH_LIB=/path/to/ngraph/lib/ +NGRAPH_LIB=$INTEL_OPENVINO_DIR/deployment_tools/ngraph/lib + +# gflags预编译库的路径 +GFLAGS_DIR=$(pwd)/deps/gflags +# glog预编译库的路径 +GLOG_DIR=$(pwd)/deps/glog + +# opencv使用自带预编译版本 +OPENCV_DIR=$(pwd)/deps/opencv/ + +#cpu架构 +ARCH=x86 +export ARCH -# opencv预编译库的路径, 如果使用自带预编译版本可不修改 -OPENCV_DIR=$(pwd)/deps/opencv3gcc4.8/ -# 下载自带预编译版本 -sh $(pwd)/scripts/bootstrap.sh +#下载并编译third-part lib +sh $(pwd)/scripts/install_third-party.sh rm -rf build mkdir -p build @@ -16,6 +25,8 @@ cd build cmake .. \ -DOPENCV_DIR=${OPENCV_DIR} \ -DGFLAGS_DIR=${GFLAGS_DIR} \ + -DGLOG_DIR=${GLOG_DIR} \ -DOPENVINO_DIR=${OPENVINO_DIR} \ - -DNGRAPH_LIB=${NGRAPH_LIB} + -DNGRAPH_LIB=${NGRAPH_LIB} \ + -DARCH=${ARCH} make diff --git a/deploy/openvino/scripts/install_third-party.sh b/deploy/openvino/scripts/install_third-party.sh new file mode 100644 index 0000000000000000000000000000000000000000..8824f64a37d0a0c245cfb0be7e047b5828516be1 --- /dev/null +++ b/deploy/openvino/scripts/install_third-party.sh @@ -0,0 +1,37 @@ +# download third-part lib +if [ ! -d "./deps" ]; then + mkdir deps +fi +if [ ! -d "./deps/gflag" ]; then + cd deps + git clone https://github.com/gflags/gflags + cd gflags + cmake . + make -j 8 + cd .. + cd .. +fi +if [ ! -d "./deps/glog" ]; then + cd deps + git clone https://github.com/google/glog + sudo apt-get install autoconf automake libtool + cd glog + ./autogen.sh + ./configure + make -j 8 + cd .. + cd .. +fi + +if [ "$ARCH" = "x86" ]; then + OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/x86opencv/opencv.tar.bz2 +else + OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/armopencv/opencv.tar.bz2 +fi +if [ ! -d "./deps/opencv" ]; then + cd deps + wget -c ${OPENCV_URL} + tar xvfj opencv.tar.bz2 + rm -rf opencv.tar.bz2 + cd .. +fi diff --git a/deploy/openvino/src/paddlex.cpp b/deploy/openvino/src/paddlex.cpp old mode 100644 new mode 100755 index bdae99892735ccc67d2189e1b6dcc0a0789dcf95..f924b968d1a189846edfb20abc779131514cf0c5 --- a/deploy/openvino/src/paddlex.cpp +++ b/deploy/openvino/src/paddlex.cpp @@ -13,28 +13,47 @@ // limitations under the License. #include "include/paddlex/paddlex.h" +#include +#include -using namespace InferenceEngine; namespace PaddleX { void Model::create_predictor(const std::string& model_dir, - const std::string& cfg_dir, + const std::string& cfg_file, std::string device) { - Core ie; - network_ = ie.ReadNetwork(model_dir, model_dir.substr(0, model_dir.size() - 4) + ".bin"); + InferenceEngine::Core ie; + network_ = ie.ReadNetwork( + model_dir, model_dir.substr(0, model_dir.size() - 4) + ".bin"); network_.setBatchSize(1); - InputInfo::Ptr input_info = network_.getInputsInfo().begin()->second; - input_info->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR); - input_info->setLayout(Layout::NCHW); - input_info->setPrecision(Precision::FP32); - executable_network_ = ie.LoadNetwork(network_, device); - load_config(cfg_dir); + InferenceEngine::InputsDataMap inputInfo(network_.getInputsInfo()); + std::string imageInputName; + for (const auto & inputInfoItem : inputInfo) { + if (inputInfoItem.second->getTensorDesc().getDims().size() == 4) { + imageInputName = inputInfoItem.first; + inputInfoItem.second->setPrecision(InferenceEngine::Precision::FP32); + inputInfoItem.second->getPreProcess().setResizeAlgorithm( + InferenceEngine::RESIZE_BILINEAR); + inputInfoItem.second->setLayout(InferenceEngine::Layout::NCHW); + } + if (inputInfoItem.second->getTensorDesc().getDims().size() == 2) { + imageInputName = inputInfoItem.first; + inputInfoItem.second->setPrecision(InferenceEngine::Precision::FP32); + } + } + if (device == "MYRIAD") { + std::map networkConfig; + networkConfig["VPU_HW_STAGES_OPTIMIZATION"] = "ON"; + executable_network_ = ie.LoadNetwork(network_, device, networkConfig); + } else { + executable_network_ = ie.LoadNetwork(network_, device); + } + load_config(cfg_file); } -bool Model::load_config(const std::string& cfg_dir) { - YAML::Node config = YAML::LoadFile(cfg_dir); +bool Model::load_config(const std::string& cfg_file) { + YAML::Node config = YAML::LoadFile(cfg_file); type = config["_Attributes"]["model_type"].as(); name = config["Model"].as(); bool to_rgb = true; @@ -48,22 +67,26 @@ bool Model::load_config(const std::string& cfg_dir) { return false; } } - // 构建数据处理流 - transforms_.Init(config["Transforms"], to_rgb); - // 读入label list - labels.clear(); - labels = config["_Attributes"]["labels"].as>(); + // init preprocess ops + transforms_.Init(config["Transforms"], type, to_rgb); + // read label list + for (const auto& item : config["_Attributes"]["labels"]) { + int index = labels.size(); + labels[index] = item.as(); + } + return true; } -bool Model::preprocess(cv::Mat* input_im) { - if (!transforms_.Run(input_im, inputs_)) { +bool Model::preprocess(cv::Mat* input_im, ImageBlob* inputs) { + if (!transforms_.Run(input_im, inputs)) { return false; } return true; } bool Model::predict(const cv::Mat& im, ClsResult* result) { + inputs_.clear(); if (type == "detector") { std::cerr << "Loading model is a 'detector', DetResult should be passed to " "function predict()!" @@ -75,34 +98,221 @@ bool Model::predict(const cv::Mat& im, ClsResult* result) { << std::endl; return false; } - // 处理输入图像 - InferRequest infer_request = executable_network_.CreateInferRequest(); + // preprocess + InferenceEngine::InferRequest infer_request = + executable_network_.CreateInferRequest(); std::string input_name = network_.getInputsInfo().begin()->first; - inputs_ = infer_request.GetBlob(input_name); - - auto im_clone = im.clone(); - if (!preprocess(&im_clone)) { + inputs_.blob = infer_request.GetBlob(input_name); + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { std::cerr << "Preprocess failed!" << std::endl; return false; } + // predict infer_request.Infer(); std::string output_name = network_.getOutputsInfo().begin()->first; output_ = infer_request.GetBlob(output_name); - MemoryBlob::CPtr moutput = as(output_); + InferenceEngine::MemoryBlob::CPtr moutput = + InferenceEngine::as(output_); auto moutputHolder = moutput->rmap(); float* outputs_data = moutputHolder.as(); - // 对模型输出结果进行后处理 + // post process auto ptr = std::max_element(outputs_data, outputs_data+sizeof(outputs_data)); result->category_id = std::distance(outputs_data, ptr); result->score = *ptr; result->category = labels[result->category_id]; - //for (int i=0;iclear(); + if (type == "classifier") { + std::cerr << "Loading model is a 'classifier', ClsResult should be passed " + "to function predict()!" << std::endl; + return false; + } else if (type == "segmenter") { + std::cerr << "Loading model is a 'segmenter', SegResult should be passed " + "to function predict()!" << std::endl; + return false; + } + InferenceEngine::InferRequest infer_request = + executable_network_.CreateInferRequest(); + InferenceEngine::InputsDataMap input_maps = network_.getInputsInfo(); + std::string inputName; + for (const auto & input_map : input_maps) { + if (input_map.second->getTensorDesc().getDims().size() == 4) { + inputName = input_map.first; + inputs_.blob = infer_request.GetBlob(inputName); + } + if (input_map.second->getTensorDesc().getDims().size() == 2) { + inputName = input_map.first; + inputs_.ori_im_size_ = infer_request.GetBlob(inputName); + } + } + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { + std::cerr << "Preprocess failed!" << std::endl; + return false; + } + + infer_request.Infer(); + + InferenceEngine::OutputsDataMap out_map = network_.getOutputsInfo(); + auto iter = out_map.begin(); + std::string outputName = iter->first; + InferenceEngine::Blob::Ptr output = infer_request.GetBlob(outputName); + InferenceEngine::MemoryBlob::CPtr moutput = + InferenceEngine::as(output); + InferenceEngine::TensorDesc blob_output = moutput->getTensorDesc(); + std::vector output_shape = blob_output.getDims(); + auto moutputHolder = moutput->rmap(); + float* data = moutputHolder.as(); + int size = 1; + for (auto& i : output_shape) { + size *= static_cast(i); + } + int num_boxes = size / 6; + for (int i = 0; i < num_boxes; ++i) { + if (data[i * 6] > 0) { + Box box; + box.category_id = static_cast(data[i * 6]); + box.category = labels[box.category_id]; + box.score = data[i * 6 + 1]; + float xmin = data[i * 6 + 2]; + float ymin = data[i * 6 + 3]; + float xmax = data[i * 6 + 4]; + float ymax = data[i * 6 + 5]; + float w = xmax - xmin + 1; + float h = ymax - ymin + 1; + box.coordinate = {xmin, ymin, w, h}; + result->boxes.push_back(std::move(box)); + } + } } -} // namespce of PaddleX + +bool Model::predict(const cv::Mat& im, SegResult* result) { + result->clear(); + inputs_.clear(); + if (type == "classifier") { + std::cerr << "Loading model is a 'classifier', ClsResult should be passed " + "to function predict()!" << std::endl; + return false; + } else if (type == "detector") { + std::cerr << "Loading model is a 'detector', DetResult should be passed to " + "function predict()!" << std::endl; + return false; + } + // init infer + InferenceEngine::InferRequest infer_request = + executable_network_.CreateInferRequest(); + std::string input_name = network_.getInputsInfo().begin()->first; + inputs_.blob = infer_request.GetBlob(input_name); + + // preprocess + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { + std::cerr << "Preprocess failed!" << std::endl; + return false; + } + + // predict + infer_request.Infer(); + + InferenceEngine::OutputsDataMap out_map = network_.getOutputsInfo(); + auto iter = out_map.begin(); + iter++; + std::string output_name_score = iter->first; + InferenceEngine::Blob::Ptr output_score = + infer_request.GetBlob(output_name_score); + InferenceEngine::MemoryBlob::CPtr moutput_score = + InferenceEngine::as(output_score); + InferenceEngine::TensorDesc blob_score = moutput_score->getTensorDesc(); + std::vector output_score_shape = blob_score.getDims(); + int size = 1; + for (auto& i : output_score_shape) { + size *= static_cast(i); + result->score_map.shape.push_back(static_cast(i)); + } + result->score_map.data.resize(size); + auto moutputHolder_score = moutput_score->rmap(); + float* score_data = moutputHolder_score.as(); + memcpy(result->score_map.data.data(), score_data, moutput_score->byteSize()); + + iter++; + std::string output_name_label = iter->first; + InferenceEngine::Blob::Ptr output_label = + infer_request.GetBlob(output_name_label); + InferenceEngine::MemoryBlob::CPtr moutput_label = + InferenceEngine::as(output_label); + InferenceEngine::TensorDesc blob_label = moutput_label->getTensorDesc(); + std::vector output_label_shape = blob_label.getDims(); + size = 1; + for (auto& i : output_label_shape) { + size *= static_cast(i); + result->label_map.shape.push_back(static_cast(i)); + } + result->label_map.data.resize(size); + auto moutputHolder_label = moutput_label->rmap(); + int* label_data = moutputHolder_label.as(); + memcpy(result->label_map.data.data(), label_data, moutput_label->byteSize()); + + + + std::vector label_map(result->label_map.data.begin(), + result->label_map.data.end()); + cv::Mat mask_label(result->label_map.shape[1], + result->label_map.shape[2], + CV_8UC1, + label_map.data()); + + cv::Mat mask_score(result->score_map.shape[2], + result->score_map.shape[3], + CV_32FC1, + result->score_map.data.data()); + int idx = 1; + int len_postprocess = inputs_.im_size_before_resize_.size(); + for (std::vector::reverse_iterator iter = + inputs_.reshape_order_.rbegin(); + iter != inputs_.reshape_order_.rend(); + ++iter) { + if (*iter == "padding") { + auto before_shape = inputs_.im_size_before_resize_[len_postprocess - idx]; + inputs_.im_size_before_resize_.pop_back(); + auto padding_w = before_shape[0]; + auto padding_h = before_shape[1]; + mask_label = mask_label(cv::Rect(0, 0, padding_h, padding_w)); + mask_score = mask_score(cv::Rect(0, 0, padding_h, padding_w)); + } else if (*iter == "resize") { + auto before_shape = inputs_.im_size_before_resize_[len_postprocess - idx]; + inputs_.im_size_before_resize_.pop_back(); + auto resize_w = before_shape[0]; + auto resize_h = before_shape[1]; + cv::resize(mask_label, + mask_label, + cv::Size(resize_h, resize_w), + 0, + 0, + cv::INTER_NEAREST); + cv::resize(mask_score, + mask_score, + cv::Size(resize_h, resize_w), + 0, + 0, + cv::INTER_LINEAR); + } + ++idx; + } + result->label_map.data.assign(mask_label.begin(), + mask_label.end()); + result->label_map.shape = {mask_label.rows, mask_label.cols}; + result->score_map.data.assign(mask_score.begin(), + mask_score.end()); + result->score_map.shape = {mask_score.rows, mask_score.cols}; + return true; +} +} // namespace PaddleX diff --git a/deploy/openvino/src/transforms.cpp b/deploy/openvino/src/transforms.cpp old mode 100644 new mode 100755 index 1c7fe2ee68433a67645f5c91c18e525d62a6c4d3..b65eaf7fd2df4e48b0dcefbe9561eb28cd9c7ba7 --- a/deploy/openvino/src/transforms.cpp +++ b/deploy/openvino/src/transforms.cpp @@ -12,11 +12,15 @@ // See the License for the specific language governing permissions and // limitations under the License. +#include "include/paddlex/transforms.h" + +#include + #include +#include #include #include -#include "include/paddlex/transforms.h" namespace PaddleX { @@ -26,7 +30,7 @@ std::map interpolations = {{"LINEAR", cv::INTER_LINEAR}, {"CUBIC", cv::INTER_CUBIC}, {"LANCZOS4", cv::INTER_LANCZOS4}}; -bool Normalize::Run(cv::Mat* im){ +bool Normalize::Run(cv::Mat* im, ImageBlob* data) { for (int h = 0; h < im->rows; h++) { for (int w = 0; w < im->cols; w++) { im->at(h, w)[0] = @@ -40,19 +44,6 @@ bool Normalize::Run(cv::Mat* im){ return true; } -bool CenterCrop::Run(cv::Mat* im) { - int height = static_cast(im->rows); - int width = static_cast(im->cols); - if (height < height_ || width < width_) { - std::cerr << "[CenterCrop] Image size less than crop size" << std::endl; - return false; - } - int offset_x = static_cast((width - width_) / 2); - int offset_y = static_cast((height - height_) / 2); - cv::Rect crop_roi(offset_x, offset_y, width_, height_); - *im = (*im)(crop_roi); - return true; -} float ResizeByShort::GenerateScale(const cv::Mat& im) { @@ -70,17 +61,115 @@ float ResizeByShort::GenerateScale(const cv::Mat& im) { return scale; } -bool ResizeByShort::Run(cv::Mat* im) { +bool ResizeByShort::Run(cv::Mat* im, ImageBlob* data) { + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); float scale = GenerateScale(*im); - int width = static_cast(scale * im->cols); - int height = static_cast(scale * im->rows); + int width = static_cast(round(scale * im->cols)); + int height = static_cast(round(scale * im->rows)); cv::resize(*im, *im, cv::Size(width, height), 0, 0, cv::INTER_LINEAR); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + data->scale = scale; return true; } -void Transforms::Init(const YAML::Node& transforms_node, bool to_rgb) { +bool CenterCrop::Run(cv::Mat* im, ImageBlob* data) { + int height = static_cast(im->rows); + int width = static_cast(im->cols); + if (height < height_ || width < width_) { + std::cerr << "[CenterCrop] Image size less than crop size" << std::endl; + return false; + } + int offset_x = static_cast((width - width_) / 2); + int offset_y = static_cast((height - height_) / 2); + cv::Rect crop_roi(offset_x, offset_y, width_, height_); + *im = (*im)(crop_roi); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + + +bool Padding::Run(cv::Mat* im, ImageBlob* data) { + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("padding"); + + int padding_w = 0; + int padding_h = 0; + if (width_ > 1 & height_ > 1) { + padding_w = width_ - im->cols; + padding_h = height_ - im->rows; + } else if (coarsest_stride_ >= 1) { + int h = im->rows; + int w = im->cols; + padding_h = + ceil(h * 1.0 / coarsest_stride_) * coarsest_stride_ - im->rows; + padding_w = + ceil(w * 1.0 / coarsest_stride_) * coarsest_stride_ - im->cols; + } + + if (padding_h < 0 || padding_w < 0) { + std::cerr << "[Padding] Computed padding_h=" << padding_h + << ", padding_w=" << padding_w + << ", but they should be greater than 0." << std::endl; + return false; + } + cv::Scalar value = cv::Scalar(im_value_[0], im_value_[1], im_value_[2]); + cv::copyMakeBorder( + *im, *im, 0, padding_h, 0, padding_w, cv::BORDER_CONSTANT, value); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + +bool ResizeByLong::Run(cv::Mat* im, ImageBlob* data) { + if (long_size_ <= 0) { + std::cerr << "[ResizeByLong] long_size should be greater than 0" + << std::endl; + return false; + } + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); + int origin_w = im->cols; + int origin_h = im->rows; + + int im_size_max = std::max(origin_w, origin_h); + float scale = + static_cast(long_size_) / static_cast(im_size_max); + cv::resize(*im, *im, cv::Size(), scale, scale, cv::INTER_NEAREST); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + data->scale = scale; + return true; +} + +bool Resize::Run(cv::Mat* im, ImageBlob* data) { + if (width_ <= 0 || height_ <= 0) { + std::cerr << "[Resize] width and height should be greater than 0" + << std::endl; + return false; + } + if (interpolations.count(interp_) <= 0) { + std::cerr << "[Resize] Invalid interpolation method: '" << interp_ << "'" + << std::endl; + return false; + } + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); + + cv::resize( + *im, *im, cv::Size(width_, height_), 0, 0, interpolations[interp_]); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + +void Transforms::Init( + const YAML::Node& transforms_node, std::string type, bool to_rgb) { transforms_.clear(); to_rgb_ = to_rgb; + type_ = type; for (const auto& item : transforms_node) { std::string name = item.begin()->first.as(); std::cout << "trans name: " << name << std::endl; @@ -94,10 +183,16 @@ std::shared_ptr Transforms::CreateTransform( const std::string& transform_name) { if (transform_name == "Normalize") { return std::make_shared(); - } else if (transform_name == "CenterCrop") { - return std::make_shared(); } else if (transform_name == "ResizeByShort") { return std::make_shared(); + } else if (transform_name == "CenterCrop") { + return std::make_shared(); + } else if (transform_name == "Resize") { + return std::make_shared(); + } else if (transform_name == "Padding") { + return std::make_shared(); + } else if (transform_name == "ResizeByLong") { + return std::make_shared(); } else { std::cerr << "There's unexpected transform(name='" << transform_name << "')." << std::endl; @@ -105,27 +200,38 @@ std::shared_ptr Transforms::CreateTransform( } } -bool Transforms::Run(cv::Mat* im, Blob::Ptr blob) { - // 按照transforms中预处理算子顺序处理图像 +bool Transforms::Run(cv::Mat* im, ImageBlob* data) { + // preprocess by order if (to_rgb_) { cv::cvtColor(*im, *im, cv::COLOR_BGR2RGB); } (*im).convertTo(*im, CV_32FC3); + if (type_ == "detector") { + InferenceEngine::LockedMemory input2Mapped = + InferenceEngine::as( + data->ori_im_size_)->wmap(); + float *p = input2Mapped.as(); + p[0] = im->rows; + p[1] = im->cols; + } + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; for (int i = 0; i < transforms_.size(); ++i) { - if (!transforms_[i]->Run(im)) { + if (!transforms_[i]->Run(im, data)) { std::cerr << "Apply transforms to image failed!" << std::endl; return false; } } - // 将图像由NHWC转为NCHW格式 - // 同时转为连续的内存块存储到Blob - SizeVector blobSize = blob->getTensorDesc().getDims(); + // image format NHWC to NCHW + // img data save to ImageBlob + InferenceEngine::SizeVector blobSize = data->blob->getTensorDesc().getDims(); const size_t width = blobSize[3]; const size_t height = blobSize[2]; const size_t channels = blobSize[1]; - MemoryBlob::Ptr mblob = InferenceEngine::as(blob); + InferenceEngine::MemoryBlob::Ptr mblob = + InferenceEngine::as(data->blob); auto mblobHolder = mblob->wmap(); float *blob_data = mblobHolder.as(); for (size_t c = 0; c < channels; c++) { diff --git a/deploy/openvino/src/visualize.cpp b/deploy/openvino/src/visualize.cpp new file mode 100644 index 0000000000000000000000000000000000000000..dcfa8e7910d2e5e193bee2a74eb00eb65d60d7f0 --- /dev/null +++ b/deploy/openvino/src/visualize.cpp @@ -0,0 +1,148 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/paddlex/visualize.h" + +namespace PaddleX { +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +cv::Mat Visualize(const cv::Mat& img, + const DetResult& result, + const std::map& labels, + const std::vector& colormap, + float threshold) { + cv::Mat vis_img = img.clone(); + auto boxes = result.boxes; + for (int i = 0; i < boxes.size(); ++i) { + if (boxes[i].score < threshold) { + continue; + } + cv::Rect roi = cv::Rect(boxes[i].coordinate[0], + boxes[i].coordinate[1], + boxes[i].coordinate[2], + boxes[i].coordinate[3]); + + // draw box and title + std::string text = boxes[i].category; + int c1 = colormap[3 * boxes[i].category_id + 0]; + int c2 = colormap[3 * boxes[i].category_id + 1]; + int c3 = colormap[3 * boxes[i].category_id + 2]; + cv::Scalar roi_color = cv::Scalar(c1, c2, c3); + text += std::to_string(static_cast(boxes[i].score * 100)) + "%"; + int font_face = cv::FONT_HERSHEY_SIMPLEX; + double font_scale = 0.5f; + float thickness = 0.5; + cv::Size text_size = + cv::getTextSize(text, font_face, font_scale, thickness, nullptr); + cv::Point origin; + origin.x = roi.x; + origin.y = roi.y; + + // background + cv::Rect text_back = cv::Rect(boxes[i].coordinate[0], + boxes[i].coordinate[1] - text_size.height, + text_size.width, + text_size.height); + + // draw + cv::rectangle(vis_img, roi, roi_color, 2); + cv::rectangle(vis_img, text_back, roi_color, -1); + cv::putText(vis_img, + text, + origin, + font_face, + font_scale, + cv::Scalar(255, 255, 255), + thickness); + + // mask + if (boxes[i].mask.data.size() == 0) { + continue; + } + cv::Mat bin_mask(result.mask_resolution, + result.mask_resolution, + CV_32FC1, + boxes[i].mask.data.data()); + cv::resize(bin_mask, + bin_mask, + cv::Size(boxes[i].mask.shape[0], boxes[i].mask.shape[1])); + cv::threshold(bin_mask, bin_mask, 0.5, 1, cv::THRESH_BINARY); + cv::Mat full_mask = cv::Mat::zeros(vis_img.size(), CV_8UC1); + bin_mask.copyTo(full_mask(roi)); + cv::Mat mask_ch[3]; + mask_ch[0] = full_mask * c1; + mask_ch[1] = full_mask * c2; + mask_ch[2] = full_mask * c3; + cv::Mat mask; + cv::merge(mask_ch, 3, mask); + cv::addWeighted(vis_img, 1, mask, 0.5, 0, vis_img); + } + return vis_img; +} + +cv::Mat Visualize(const cv::Mat& img, + const SegResult& result, + const std::map& labels, + const std::vector& colormap) { + std::vector label_map(result.label_map.data.begin(), + result.label_map.data.end()); + cv::Mat mask(result.label_map.shape[0], + result.label_map.shape[1], + CV_8UC1, + label_map.data()); + cv::Mat color_mask = cv::Mat::zeros( + result.label_map.shape[0], result.label_map.shape[1], CV_8UC3); + int rows = img.rows; + int cols = img.cols; + for (int i = 0; i < rows; i++) { + for (int j = 0; j < cols; j++) { + int category_id = static_cast(mask.at(i, j)); + color_mask.at(i, j)[0] = colormap[3 * category_id + 0]; + color_mask.at(i, j)[1] = colormap[3 * category_id + 1]; + color_mask.at(i, j)[2] = colormap[3 * category_id + 2]; + } + } + return color_mask; +} + +std::string generate_save_path(const std::string& save_dir, + const std::string& file_path) { + if (access(save_dir.c_str(), 0) < 0) { +#ifdef _WIN32 + mkdir(save_dir.c_str()); +#else + if (mkdir(save_dir.c_str(), S_IRWXU) < 0) { + std::cerr << "Fail to create " << save_dir << "directory." << std::endl; + } +#endif + } + int pos = file_path.find_last_of(OS_PATH_SEP); + std::string image_name(file_path.substr(pos + 1)); + return save_dir + OS_PATH_SEP + image_name; +} +} // namespace PaddleX diff --git a/deploy/raspberry/CMakeLists.txt b/deploy/raspberry/CMakeLists.txt new file mode 100755 index 0000000000000000000000000000000000000000..c2d8a14da75dca2ec57d6afc3ef6b9d616a23617 --- /dev/null +++ b/deploy/raspberry/CMakeLists.txt @@ -0,0 +1,116 @@ +cmake_minimum_required(VERSION 3.0) +project(PaddleX CXX C) + + +option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." OFF) + +SET(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH}) +SET(LITE_DIR "" CACHE PATH "Location of libraries") +SET(OPENCV_DIR "" CACHE PATH "Location of libraries") +SET(NGRAPH_LIB "" CACHE PATH "Location of libraries") + + +include(cmake/yaml-cpp.cmake) + +include_directories("${CMAKE_SOURCE_DIR}/") +link_directories("${CMAKE_CURRENT_BINARY_DIR}") +include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include") +link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib") + +macro(safe_set_static_flag) + foreach(flag_var + CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE + CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO) + if(${flag_var} MATCHES "/MD") + string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}") + endif(${flag_var} MATCHES "/MD") + endforeach(flag_var) +endmacro() + +if (NOT DEFINED LITE_DIR OR ${LITE_DIR} STREQUAL "") + message(FATAL_ERROR "please set LITE_DIR with -LITE_DIR=/path/influence_engine") +endif() + +if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "") + message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv") +endif() + +if (NOT DEFINED GFLAGS_DIR OR ${GFLAGS_DIR} STREQUAL "") + message(FATAL_ERROR "please set GFLAGS_DIR with -DGFLAGS_DIR=/path/gflags") +endif() + + + + + +link_directories("${LITE_DIR}/lib") +include_directories("${LITE_DIR}/include") + + + +link_directories("${GFLAGS_DIR}/lib") +include_directories("${GFLAGS_DIR}/include") + + + + +if (WIN32) + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH) + unset(OpenCV_DIR CACHE) +else () + find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/cmake NO_DEFAULT_PATH) +endif () + +include_directories(${OpenCV_INCLUDE_DIRS}) + +if (WIN32) + add_definitions("/DGOOGLE_GLOG_DLL_DECL=") + set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT") + set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd") + set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT") + if (WITH_STATIC_LIB) + safe_set_static_flag() + add_definitions(-DSTATIC_LIB) + endif() +else() + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfloat-abi=hard -mfpu=neon-vfpv4 -g -o2 -fopenmp -std=c++11") + set(CMAKE_STATIC_LIBRARY_PREFIX "") +endif() + + +if(WITH_STATIC_LIB) + set(DEPS ${LITE_DIR}/lib/libpaddle_full_api_shared${CMAKE_STATIC_LIBRARY_SUFFIX}) +else() + set(DEPS ${LITE_DIR}/lib/libpaddle_full_api_shared${CMAKE_SHARED_LIBRARY_SUFFIX}) +endif() + +if (NOT WIN32) + set(DEPS ${DEPS} + glog gflags z yaml-cpp + ) +else() + set(DEPS ${DEPS} + glog gflags_static libprotobuf zlibstatic xxhash libyaml-cppmt) + set(DEPS ${DEPS} libcmt shlwapi) +endif(NOT WIN32) + + +if (NOT WIN32) + set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread") + set(DEPS ${DEPS} ${EXTERNAL_LIB}) +endif() + +set(DEPS ${DEPS} ${OpenCV_LIBS}) +add_executable(classifier demo/classifier.cpp src/transforms.cpp src/paddlex.cpp) +ADD_DEPENDENCIES(classifier ext-yaml-cpp) +target_link_libraries(classifier ${DEPS}) + + +add_executable(segmenter demo/segmenter.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp) +ADD_DEPENDENCIES(segmenter ext-yaml-cpp) +target_link_libraries(segmenter ${DEPS}) + +add_executable(detector demo/detector.cpp src/transforms.cpp src/paddlex.cpp src/visualize.cpp) +ADD_DEPENDENCIES(detector ext-yaml-cpp) +target_link_libraries(detector ${DEPS}) diff --git a/deploy/raspberry/cmake/yaml-cpp.cmake b/deploy/raspberry/cmake/yaml-cpp.cmake new file mode 100755 index 0000000000000000000000000000000000000000..726433d904908ce96c51442246fc884d0899de04 --- /dev/null +++ b/deploy/raspberry/cmake/yaml-cpp.cmake @@ -0,0 +1,29 @@ + +include(ExternalProject) + +message("${CMAKE_BUILD_TYPE}") + +ExternalProject_Add( + ext-yaml-cpp + URL https://bj.bcebos.com/paddlex/deploy/deps/yaml-cpp.zip + URL_MD5 9542d6de397d1fbd649ed468cb5850e6 + CMAKE_ARGS + -DYAML_CPP_BUILD_TESTS=OFF + -DYAML_CPP_BUILD_TOOLS=OFF + -DYAML_CPP_INSTALL=OFF + -DYAML_CPP_BUILD_CONTRIB=OFF + -DMSVC_SHARED_RT=OFF + -DBUILD_SHARED_LIBS=OFF + -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} + -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} + -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG} + -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE} + -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + -DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib + PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp" + # Disable install step + INSTALL_COMMAND "" + LOG_DOWNLOAD ON + LOG_BUILD 1 +) + diff --git a/deploy/raspberry/demo/classifier.cpp b/deploy/raspberry/demo/classifier.cpp new file mode 100755 index 0000000000000000000000000000000000000000..7754a5f1dddfa0d0567b1545c781b00361e8abbf --- /dev/null +++ b/deploy/raspberry/demo/classifier.cpp @@ -0,0 +1,78 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include + +#include "include/paddlex/paddlex.h" + +DEFINE_string(model_dir, "", "Path of inference model"); +DEFINE_string(cfg_file, "", "Path of PaddelX model yml file"); +DEFINE_string(image, "", "Path of test image file"); +DEFINE_string(image_list, "", "Path of test image list file"); +DEFINE_int32(thread_num, 1, "num of thread to infer"); + +int main(int argc, char** argv) { + // Parsing command-line + google::ParseCommandLineFlags(&argc, &argv, true); + + if (FLAGS_model_dir == "") { + std::cerr << "--model_dir need to be defined" << std::endl; + return -1; + } + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_flie need to be defined" << std::endl; + return -1; + } + if (FLAGS_image == "" & FLAGS_image_list == "") { + std::cerr << "--image or --image_list need to be defined" << std::endl; + return -1; + } + + // load model + PaddleX::Model model; + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_thread_num); + std::cout << "init is done" << std::endl; + // predict + if (FLAGS_image_list != "") { + std::ifstream inf(FLAGS_image_list); + if (!inf) { + std::cerr << "Fail to open file " << FLAGS_image_list << std::endl; + return -1; + } + std::string image_path; + + while (getline(inf, image_path)) { + PaddleX::ClsResult result; + cv::Mat im = cv::imread(image_path, 1); + model.predict(im, &result); + std::cout << "Predict label: " << result.category + << ", label_id:" << result.category_id + << ", score: " << result.score << std::endl; + } + } else { + PaddleX::ClsResult result; + cv::Mat im = cv::imread(FLAGS_image, 1); + model.predict(im, &result); + std::cout << "Predict label: " << result.category + << ", label_id:" << result.category_id + << ", score: " << result.score << std::endl; + } + + return 0; +} diff --git a/deploy/raspberry/demo/detector.cpp b/deploy/raspberry/demo/detector.cpp new file mode 100755 index 0000000000000000000000000000000000000000..e75ff2e62b50ad1fdde618c8d42cc9a0709fae3b --- /dev/null +++ b/deploy/raspberry/demo/detector.cpp @@ -0,0 +1,111 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include + +#include +#include // NOLINT +#include +#include +#include +#include +#include + +#include "include/paddlex/paddlex.h" +#include "include/paddlex/visualize.h" + +using namespace std::chrono; // NOLINT + +DEFINE_string(model_dir, "", "Path of openvino model xml file"); +DEFINE_string(cfg_file, "", "Path of PaddleX model yaml file"); +DEFINE_string(image, "", "Path of test image file"); +DEFINE_string(image_list, "", "Path of test image list file"); +DEFINE_int32(thread_num, 1, "num of thread to infer"); +DEFINE_string(save_dir, "", "Path to save visualized image"); +DEFINE_int32(batch_size, 1, "Batch size of infering"); +DEFINE_double(threshold, + 0.5, + "The minimum scores of target boxes which are shown"); + +int main(int argc, char** argv) { + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir == "") { + std::cerr << "--model_dir need to be defined" << std::endl; + return -1; + } + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_file need to be defined" << std::endl; + return -1; + } + if (FLAGS_image == "" & FLAGS_image_list == "") { + std::cerr << "--image or --image_list need to be defined" << std::endl; + return -1; + } + + // load model + PaddleX::Model model; + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_thread_num); + + int imgs = 1; + auto colormap = PaddleX::GenerateColorMap(model.labels.size()); + // predict + if (FLAGS_image_list != "") { + std::ifstream inf(FLAGS_image_list); + if (!inf) { + std::cerr << "Fail to open file " << FLAGS_image_list << std::endl; + return -1; + } + std::string image_path; + + while (getline(inf, image_path)) { + PaddleX::DetResult result; + cv::Mat im = cv::imread(image_path, 1); + model.predict(im, &result); + if (FLAGS_save_dir != "") { + cv::Mat vis_img = PaddleX::Visualize( + im, result, model.labels, colormap, FLAGS_threshold); + std::string save_path = + PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image); + cv::imwrite(save_path, vis_img); + std::cout << "Visualized output saved as " << save_path << std::endl; + } + } + } else { + PaddleX::DetResult result; + cv::Mat im = cv::imread(FLAGS_image, 1); + model.predict(im, &result); + for (int i = 0; i < result.boxes.size(); ++i) { + std::cout << "image file: " << FLAGS_image << std::endl; + std::cout << ", predict label: " << result.boxes[i].category + << ", label_id:" << result.boxes[i].category_id + << ", score: " << result.boxes[i].score + << ", box(xmin, ymin, w, h):(" << result.boxes[i].coordinate[0] + << ", " << result.boxes[i].coordinate[1] << ", " + << result.boxes[i].coordinate[2] << ", " + << result.boxes[i].coordinate[3] << ")" << std::endl; + } + if (FLAGS_save_dir != "") { + // visualize + cv::Mat vis_img = PaddleX::Visualize( + im, result, model.labels, colormap, FLAGS_threshold); + std::string save_path = + PaddleX::generate_save_path(FLAGS_save_dir, FLAGS_image); + cv::imwrite(save_path, vis_img); + result.clear(); + std::cout << "Visualized output saved as " << save_path << std::endl; + } + } + return 0; +} diff --git a/deploy/raspberry/demo/segmenter.cpp b/deploy/raspberry/demo/segmenter.cpp new file mode 100755 index 0000000000000000000000000000000000000000..21bfcd1ae338fad61443e1dcfe8adc3a25165609 --- /dev/null +++ b/deploy/raspberry/demo/segmenter.cpp @@ -0,0 +1,91 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include +#include +#include +#include +#include +#include +#include "include/paddlex/paddlex.h" +#include "include/paddlex/visualize.h" + + +DEFINE_string(model_dir, "", "Path of openvino model xml file"); +DEFINE_string(cfg_file, "", "Path of PaddleX model yaml file"); +DEFINE_string(image, "", "Path of test image file"); +DEFINE_string(image_list, "", "Path of test image list file"); +DEFINE_string(save_dir, "", "Path to save visualized image"); +DEFINE_int32(batch_size, 1, "Batch size of infering"); +DEFINE_int32(thread_num, 1, "num of thread to infer"); + +int main(int argc, char** argv) { + google::ParseCommandLineFlags(&argc, &argv, true); + if (FLAGS_model_dir == "") { + std::cerr << "--model_dir need to be defined" << std::endl; + return -1; + } + if (FLAGS_cfg_file == "") { + std::cerr << "--cfg_file need to be defined" << std::endl; + return -1; + } + if (FLAGS_image == "" & FLAGS_image_list == "") { + std::cerr << "--image or --image_list need to be defined" << std::endl; + return -1; + } + + // load model + std::cout << "init start" << std::endl; + PaddleX::Model model; + model.Init(FLAGS_model_dir, FLAGS_cfg_file, FLAGS_thread_num); + std::cout << "init done" << std::endl; + int imgs = 1; + auto colormap = PaddleX::GenerateColorMap(model.labels.size()); + if (FLAGS_image_list != "") { + std::ifstream inf(FLAGS_image_list); + if (!inf) { + std::cerr << "Fail to open file " << FLAGS_image_list < +#include +#include +#include + +#include "yaml-cpp/yaml.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleX { + +// Inference model configuration parser +class ConfigPaser { + public: + ConfigPaser() {} + + ~ConfigPaser() {} + + bool load_config(const std::string& model_dir, + const std::string& cfg = "model.yml") { + // Load as a YAML::Node + YAML::Node config; + config = YAML::LoadFile(model_dir + OS_PATH_SEP + cfg); + + if (config["Transforms"].IsDefined()) { + YAML::Node transforms_ = config["Transforms"]; + } else { + std::cerr << "There's no field 'Transforms' in model.yml" << std::endl; + return false; + } + return true; + } + + YAML::Node Transforms_; +}; + +} // namespace PaddleX diff --git a/deploy/raspberry/include/paddlex/paddlex.h b/deploy/raspberry/include/paddlex/paddlex.h new file mode 100755 index 0000000000000000000000000000000000000000..7c4a7065b043140be09cd032a5465f4bb2951398 --- /dev/null +++ b/deploy/raspberry/include/paddlex/paddlex.h @@ -0,0 +1,79 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "include/paddlex/config_parser.h" +#include "include/paddlex/results.h" +#include "include/paddlex/transforms.h" + + + +#include "yaml-cpp/yaml.h" + + + + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + + + + +namespace PaddleX { + +class Model { + public: + void Init(const std::string& model_dir, + const std::string& cfg_file, + int thread_num) { + create_predictor(model_dir, cfg_file, thread_num); + } + + void create_predictor(const std::string& model_dir, + const std::string& cfg_file, + int thread_num); + + bool load_config(const std::string& model_dir); + + bool preprocess(cv::Mat* input_im, ImageBlob* inputs); + + bool predict(const cv::Mat& im, ClsResult* result); + + bool predict(const cv::Mat& im, DetResult* result); + + bool predict(const cv::Mat& im, SegResult* result); + + + std::string type; + std::string name; + std::map labels; + Transforms transforms_; + ImageBlob inputs_; + std::shared_ptr predictor_; +}; +} // namespace PaddleX diff --git a/deploy/raspberry/include/paddlex/results.h b/deploy/raspberry/include/paddlex/results.h new file mode 100755 index 0000000000000000000000000000000000000000..099e2c98b4b99c68b48c4dd99c8fbdfa1d2cf4fa --- /dev/null +++ b/deploy/raspberry/include/paddlex/results.h @@ -0,0 +1,71 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include + +namespace PaddleX { + +template +struct Mask { + std::vector data; + std::vector shape; + void clear() { + data.clear(); + shape.clear(); + } +}; + +struct Box { + int category_id; + std::string category; + float score; + std::vector coordinate; + Mask mask; +}; + +class BaseResult { + public: + std::string type = "base"; +}; + +class ClsResult : public BaseResult { + public: + int category_id; + std::string category; + float score; + std::string type = "cls"; +}; + +class DetResult : public BaseResult { + public: + std::vector boxes; + int mask_resolution; + std::string type = "det"; + void clear() { boxes.clear(); } +}; + +class SegResult : public BaseResult { + public: + Mask label_map; + Mask score_map; + void clear() { + label_map.clear(); + score_map.clear(); + } +}; +} // namespace PaddleX diff --git a/deploy/raspberry/include/paddlex/transforms.h b/deploy/raspberry/include/paddlex/transforms.h new file mode 100755 index 0000000000000000000000000000000000000000..60bf1750f8a10795d8123d2d98d43c68cf94a33d --- /dev/null +++ b/deploy/raspberry/include/paddlex/transforms.h @@ -0,0 +1,224 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + + + +namespace PaddleX { + +/* + * @brief + * This class represents object for storing all preprocessed data + * */ +class ImageBlob { + public: + // Original image height and width + std::vector ori_im_size_ = std::vector(2); + + // Newest image height and width after process + std::vector new_im_size_ = std::vector(2); + // Image height and width before resize + std::vector> im_size_before_resize_; + // Reshape order + std::vector reshape_order_; + // Resize scale + float scale = 1.0; + // Buffer for image data after preprocessing + std::unique_ptr input_tensor_; + + void clear() { + im_size_before_resize_.clear(); + reshape_order_.clear(); + } +}; + + + +// Abstraction of preprocessing opration class +class Transform { + public: + virtual void Init(const YAML::Node& item) = 0; + virtual bool Run(cv::Mat* im, ImageBlob* data) = 0; +}; + +class Normalize : public Transform { + public: + virtual void Init(const YAML::Node& item) { + mean_ = item["mean"].as>(); + std_ = item["std"].as>(); + } + + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector mean_; + std::vector std_; +}; + +class ResizeByShort : public Transform { + public: + virtual void Init(const YAML::Node& item) { + short_size_ = item["short_size"].as(); + if (item["max_size"].IsDefined()) { + max_size_ = item["max_size"].as(); + } else { + max_size_ = -1; + } + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + float GenerateScale(const cv::Mat& im); + int short_size_; + int max_size_; +}; + +/* + * @brief + * This class execute resize by long operation on image matrix. At first, it resizes + * the long side of image matrix to specified length. Accordingly, the short side + * will be resized in the same proportion. + * */ +class ResizeByLong : public Transform { + public: + virtual void Init(const YAML::Node& item) { + long_size_ = item["long_size"].as(); + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int long_size_; +}; + +/* + * @brief + * This class execute resize operation on image matrix. It resizes width and height + * to specified length. + * */ +class Resize : public Transform { + public: + virtual void Init(const YAML::Node& item) { + if (item["interp"].IsDefined()) { + interp_ = item["interp"].as(); + } + if (item["target_size"].IsScalar()) { + height_ = item["target_size"].as(); + width_ = item["target_size"].as(); + } else if (item["target_size"].IsSequence()) { + std::vector target_size = item["target_size"].as>(); + width_ = target_size[0]; + height_ = target_size[1]; + } + if (height_ <= 0 || width_ <= 0) { + std::cerr << "[Resize] target_size should greater than 0" << std::endl; + exit(-1); + } + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int height_; + int width_; + std::string interp_; +}; + + +class CenterCrop : public Transform { + public: + virtual void Init(const YAML::Node& item) { + if (item["crop_size"].IsScalar()) { + height_ = item["crop_size"].as(); + width_ = item["crop_size"].as(); + } else if (item["crop_size"].IsSequence()) { + std::vector crop_size = item["crop_size"].as>(); + width_ = crop_size[0]; + height_ = crop_size[1]; + } + } + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int height_; + int width_; +}; + + +/* + * @brief + * This class execute padding operation on image matrix. It makes border on edge + * of image matrix. + * */ +class Padding : public Transform { + public: + virtual void Init(const YAML::Node& item) { + if (item["coarsest_stride"].IsDefined()) { + coarsest_stride_ = item["coarsest_stride"].as(); + if (coarsest_stride_ < 1) { + std::cerr << "[Padding] coarest_stride should greater than 0" + << std::endl; + exit(-1); + } + } + if (item["target_size"].IsDefined()) { + if (item["target_size"].IsScalar()) { + width_ = item["target_size"].as(); + height_ = item["target_size"].as(); + } else if (item["target_size"].IsSequence()) { + width_ = item["target_size"].as>()[0]; + height_ = item["target_size"].as>()[1]; + } + } + if (item["im_padding_value"].IsDefined()) { + im_value_ = item["im_padding_value"].as>(); + } else { + im_value_ = {0, 0, 0}; + } + } + + virtual bool Run(cv::Mat* im, ImageBlob* data); + + private: + int coarsest_stride_ = -1; + int width_ = 0; + int height_ = 0; + std::vector im_value_; +}; + +class Transforms { + public: + void Init(const YAML::Node& node, bool to_rgb = true); + std::shared_ptr CreateTransform(const std::string& name); + bool Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector> transforms_; + bool to_rgb_ = true; +}; + +} // namespace PaddleX diff --git a/deploy/raspberry/include/paddlex/visualize.h b/deploy/raspberry/include/paddlex/visualize.h new file mode 100755 index 0000000000000000000000000000000000000000..d3eb094f525dc2c4e878dbfe11916dc98c63dd49 --- /dev/null +++ b/deploy/raspberry/include/paddlex/visualize.h @@ -0,0 +1,97 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include +#include +#include +#ifdef _WIN32 +#include +#include +#else // Linux/Unix +#include +#include +#include +#include +#include +#endif +#include + +#include +#include +#include + +#include "include/paddlex/results.h" + +#ifdef _WIN32 +#define OS_PATH_SEP "\\" +#else +#define OS_PATH_SEP "/" +#endif + +namespace PaddleX { + +/* + * @brief + * Generate visualization colormap for each class + * + * @param number of class + * @return color map, the size of vector is 3 * num_class + * */ +std::vector GenerateColorMap(int num_class); + + +/* + * @brief + * Visualize the detection result + * + * @param img: initial image matrix + * @param results: the detection result + * @param labels: label map + * @param colormap: visualization color map + * @return visualized image matrix + * */ +cv::Mat Visualize(const cv::Mat& img, + const DetResult& results, + const std::map& labels, + const std::vector& colormap, + float threshold = 0.5); + +/* + * @brief + * Visualize the segmentation result + * + * @param img: initial image matrix + * @param results: the detection result + * @param labels: label map + * @param colormap: visualization color map + * @return visualized image matrix + * */ +cv::Mat Visualize(const cv::Mat& img, + const SegResult& result, + const std::map& labels, + const std::vector& colormap); + +/* + * @brief + * generate save path for visualized image matrix + * + * @param save_dir: directory for saving visualized image matrix + * @param file_path: sourcen image file path + * @return path of saving visualized result + * */ +std::string generate_save_path(const std::string& save_dir, + const std::string& file_path); +} // namespace PaddleX diff --git a/deploy/raspberry/python/__init__.py b/deploy/raspberry/python/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..131e0650f5c8db6885ebab5cd342b37630f13be8 --- /dev/null +++ b/deploy/raspberry/python/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/deploy/raspberry/python/demo.py b/deploy/raspberry/python/demo.py new file mode 100644 index 0000000000000000000000000000000000000000..512426bd380e58538e18ec71e722b1b510380b75 --- /dev/null +++ b/deploy/raspberry/python/demo.py @@ -0,0 +1,85 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os +import argparse +import deploy + + +def arg_parser(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model_dir", + "-m", + type=str, + default=None, + help="path to openvino model .xml file") + parser.add_argument( + "--img", "-i", type=str, default=None, help="path to an image files") + + parser.add_argument( + "--img_list", "-l", type=str, default=None, help="Path to a imglist") + + parser.add_argument( + "--cfg_file", + "-c", + type=str, + default=None, + help="Path to PaddelX model yml file") + + parser.add_argument( + "--thread_num", + "-t", + type=int, + default=1, + help="Path to PaddelX model yml file") + + parser.add_argument( + "--input_shape", + "-ip", + type=str, + default=None, + help=" image input shape of model [NCHW] like [1,3,224,244] ") + + return parser + + +def main(): + parser = arg_parser() + args = parser.parse_args() + model_nb = args.model_dir + model_yaml = args.cfg_file + thread_num = args.thread_num + input_shape = args.input_shape + input_shape = input_shape[1:-1].split(",", 3) + shape = list(map(int, input_shape)) + #model init + predictor = deploy.Predictor(model_nb, model_yaml, thread_num, shape) + + #predict + if (args.img_list != None): + f = open(args.img_list) + lines = f.readlines() + for im_path in lines: + print(im_path) + predictor.predict(im_path.strip('\n')) + f.close() + else: + im_path = args.img + predictor.predict(im_path) + + +if __name__ == "__main__": + main() diff --git a/deploy/raspberry/python/transforms/__init__.py b/deploy/raspberry/python/transforms/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..9ec4809004549b5d564e7d69feb5d3a32fbebc98 --- /dev/null +++ b/deploy/raspberry/python/transforms/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import cls_transforms +from . import det_transforms +from . import seg_transforms diff --git a/deploy/raspberry/python/transforms/cls_transforms.py b/deploy/raspberry/python/transforms/cls_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..120c2699238e99d57316eba86ebb2e845d4f3435 --- /dev/null +++ b/deploy/raspberry/python/transforms/cls_transforms.py @@ -0,0 +1,281 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .ops import * +import random +import os.path as osp +import numpy as np +from PIL import Image, ImageEnhance + + +class ClsTransform: + """分类Transform的基类 + """ + + def __init__(self): + pass + + +class Compose(ClsTransform): + """根据数据预处理/增强算子对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强算子。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + + def __call__(self, im, label=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + Returns: + tuple: 根据网络所需字段所组成的tuple; + 字段由transforms中的最后一个数据预处理操作决定。 + """ + if isinstance(im, np.ndarray): + if len(im.shape) != 3: + raise Exception( + "im should be 3-dimension, but now is {}-dimensions". + format(len(im.shape))) + else: + try: + im = cv2.imread(im).astype('float32') + except: + raise TypeError('Can\'t read The image file {}!'.format(im)) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + for op in self.transforms: + outputs = op(im, label) + im = outputs[0] + if len(outputs) == 2: + label = outputs[1] + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class Normalize(ClsTransform): + """对图像进行标准化。 + + 1. 对图像进行归一化到区间[0.0, 1.0]。 + 2. 对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。 + std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。 + + """ + + def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]): + self.mean = mean + self.std = std + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + if label is None: + return (im, ) + else: + return (im, label) + + +class ResizeByShort(ClsTransform): + """根据图像短边对图像重新调整大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度; + 4. 根据调整大小的比例对图像进行resize。 + + Args: + short_size (int): 调整大小后的图像目标短边长度。默认为256。 + max_size (int): 长边目标长度的最大限制。默认为-1。 + """ + + def __init__(self, short_size=256, max_size=-1): + self.short_size = short_size + self.max_size = max_size + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_LINEAR) + + if label is None: + return (im, ) + else: + return (im, label) + + +class CenterCrop(ClsTransform): + """以图像中心点扩散裁剪长宽为`crop_size`的正方形 + + 1. 计算剪裁的起始点。 + 2. 剪裁图像。 + + Args: + crop_size (int): 裁剪的目标边长。默认为224。 + """ + + def __init__(self, crop_size=224): + self.crop_size = crop_size + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; + 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 + """ + im = center_crop(im, self.crop_size) + if label is None: + return (im, ) + else: + return (im, label) + + +class ArrangeClassifier(ClsTransform): + """获取训练/验证/预测所需信息。注意:此操作不需用户自己显示调用 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。 + """ + + def __init__(self, mode=None): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode must be in ['train', 'eval', 'test', 'quant']!") + self.mode = mode + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + label (int): 每张图像所对应的类别序号。 + + Returns: + tuple: 当mode为'train'或'eval'时,返回(im, label),分别对应图像np.ndarray数据、 + 图像类别id;当mode为'test'或'quant'时,返回(im, ),对应图像np.ndarray数据。 + """ + im = permute(im, False).astype('float32') + if self.mode == 'train' or self.mode == 'eval': + outputs = (im, label) + else: + outputs = (im, ) + return outputs + + +class ComposedClsTransforms(Compose): + """ 分类模型的基础Transforms流程,具体如下 + 训练阶段: + 1. 随机从图像中crop一块子图,并resize成crop_size大小 + 2. 将1的输出按0.5的概率随机进行水平翻转 + 3. 将图像进行归一化 + 验证/预测阶段: + 1. 将图像按比例Resize,使得最小边长度为crop_size[0] * 1.14 + 2. 从图像中心crop出一个大小为crop_size的图像 + 3. 将图像进行归一化 + + Args: + mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + crop_size(int|list): 输入模型里的图像大小 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + crop_size=[224, 224], + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + width = crop_size + if isinstance(crop_size, list): + if crop_size[0] != crop_size[1]: + raise Exception( + "In classifier model, width and height should be equal, please modify your parameter `crop_size`" + ) + width = crop_size[0] + if width % 32 != 0: + raise Exception( + "In classifier model, width and height should be multiple of 32, e.g 224、256、320...., please modify your parameter `crop_size`" + ) + + if mode == 'train': + pass + else: + # 验证/预测时的transforms + transforms = [ + ResizeByShort(short_size=int(width * 1.14)), + CenterCrop(crop_size=width), Normalize( + mean=mean, std=std) + ] + + super(ComposedClsTransforms, self).__init__(transforms) diff --git a/deploy/raspberry/python/transforms/det_transforms.py b/deploy/raspberry/python/transforms/det_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..0e2d1dc30c0d0bb768839709da9cd74f2140d84a --- /dev/null +++ b/deploy/raspberry/python/transforms/det_transforms.py @@ -0,0 +1,540 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +try: + from collections.abc import Sequence +except Exception: + from collections import Sequence + +import random +import os.path as osp +import numpy as np + +import cv2 +from PIL import Image, ImageEnhance + +from .ops import * + + +class DetTransform: + """检测数据处理基类 + """ + + def __init__(self): + pass + + +class Compose(DetTransform): + """根据数据预处理/增强列表对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强列表。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + self.use_mixup = False + for t in self.transforms: + if type(t).__name__ == 'MixupImage': + self.use_mixup = True + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + im_info (dict): 存储与图像相关的信息,dict中的字段如下: + - im_id (np.ndarray): 图像序列号,形状为(1,)。 + - image_shape (np.ndarray): 图像原始大小,形状为(2,), + image_shape[0]为高,image_shape[1]为宽。 + - mixup (list): list为[im, im_info, label_info],分别对应 + 与当前图像进行mixup的图像np.ndarray数据、图像相关信息、标注框相关信息; + 注意,当前epoch若无需进行mixup,则无该字段。 + label_info (dict): 存储与标注框相关的信息,dict中的字段如下: + - gt_bbox (np.ndarray): 真实标注框坐标[x1, y1, x2, y2],形状为(n, 4), + 其中n代表真实标注框的个数。 + - gt_class (np.ndarray): 每个真实标注框对应的类别序号,形状为(n, 1), + 其中n代表真实标注框的个数。 + - gt_score (np.ndarray): 每个真实标注框对应的混合得分,形状为(n, 1), + 其中n代表真实标注框的个数。 + - gt_poly (list): 每个真实标注框内的多边形分割区域,每个分割区域由点的x、y坐标组成, + 长度为n,其中n代表真实标注框的个数。 + - is_crowd (np.ndarray): 每个真实标注框中是否是一组对象,形状为(n, 1), + 其中n代表真实标注框的个数。 + - difficult (np.ndarray): 每个真实标注框中的对象是否为难识别对象,形状为(n, 1), + 其中n代表真实标注框的个数。 + Returns: + tuple: 根据网络所需字段所组成的tuple; + 字段由transforms中的最后一个数据预处理操作决定。 + """ + + def decode_image(im_file, im_info, label_info): + if im_info is None: + im_info = dict() + if isinstance(im_file, np.ndarray): + if len(im_file.shape) != 3: + raise Exception( + "im should be 3-dimensions, but now is {}-dimensions". + format(len(im_file.shape))) + im = im_file + else: + try: + im = cv2.imread(im_file).astype('float32') + except: + raise TypeError('Can\'t read The image file {}!'.format( + im_file)) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + # make default im_info with [h, w, 1] + im_info['im_resize_info'] = np.array( + [im.shape[0], im.shape[1], 1.], dtype=np.float32) + im_info['image_shape'] = np.array([im.shape[0], + im.shape[1]]).astype('int32') + if not self.use_mixup: + if 'mixup' in im_info: + del im_info['mixup'] + # decode mixup image + if 'mixup' in im_info: + im_info['mixup'] = \ + decode_image(im_info['mixup'][0], + im_info['mixup'][1], + im_info['mixup'][2]) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + outputs = decode_image(im, im_info, label_info) + im = outputs[0] + im_info = outputs[1] + if len(outputs) == 3: + label_info = outputs[2] + for op in self.transforms: + if im is None: + return None + outputs = op(im, im_info, label_info) + im = outputs[0] + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class ResizeByShort(DetTransform): + """根据图像的短边调整图像大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。 + 4. 根据调整大小的比例对图像进行resize。 + + Args: + target_size (int): 短边目标长度。默认为800。 + max_size (int): 长边目标长度的最大限制。默认为1333。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, short_size=800, max_size=1333): + self.max_size = int(max_size) + if not isinstance(short_size, int): + raise TypeError( + "Type of short_size is invalid. Must be Integer, now is {}". + format(type(short_size))) + self.short_size = short_size + if not (isinstance(self.max_size, int)): + raise TypeError("max_size: input type is invalid.") + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + 其中,im_info更新字段为: + - im_resize_info (np.ndarray): resize后的图像高、resize后的图像宽、resize后的图像相对原始图的缩放比例 + 三者组成的np.ndarray,形状为(3,)。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("ResizeByShort: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('ResizeByShort: image is not 3-dimensional.') + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im_resize_info = [resized_height, resized_width, scale] + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_LINEAR) + im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class Padding(DetTransform): + """1.将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640], + `coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值 + 进行padding,最终输出图像为[320, 640]。 + 2.或者,将图像的长和宽padding到target_size指定的shape,如输入的图像为[300,640], + a. `target_size` = 960,在图像最右和最下使用0值进行padding,最终输出 + 图像为[960, 960]。 + b. `target_size` = [640, 960],在图像最右和最下使用0值进行padding,最终 + 输出图像为[640, 960]。 + + 1. 如果coarsest_stride为1,target_size为None则直接返回。 + 2. 获取图像的高H、宽W。 + 3. 计算填充后图像的高H_new、宽W_new。 + 4. 构建大小为(H_new, W_new, 3)像素值为0的np.ndarray, + 并将原图的np.ndarray粘贴于左上角。 + + Args: + coarsest_stride (int): 填充后的图像长、宽为该参数的倍数,默认为1。 + target_size (int|list|tuple): 填充后的图像长、宽,默认为None,coarset_stride优先级更高。 + + Raises: + TypeError: 形参`target_size`数据类型不满足需求。 + ValueError: 形参`target_size`为(list|tuple)时,长度不满足需求。 + """ + + def __init__(self, coarsest_stride=1, target_size=None): + self.coarsest_stride = coarsest_stride + if target_size is not None: + if not isinstance(target_size, int): + if not isinstance(target_size, tuple) and not isinstance( + target_size, list): + raise TypeError( + "Padding: Type of target_size must in (int|list|tuple)." + ) + elif len(target_size) != 2: + raise ValueError( + "Padding: Length of target_size must equal 2.") + self.target_size = target_size + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + ValueError: coarsest_stride,target_size需有且只有一个被指定。 + ValueError: target_size小于原图的大小。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("Padding: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('Padding: image is not 3-dimensional.') + im_h, im_w, im_c = im.shape[:] + + if isinstance(self.target_size, int): + padding_im_h = self.target_size + padding_im_w = self.target_size + elif isinstance(self.target_size, list) or isinstance(self.target_size, + tuple): + padding_im_w = self.target_size[0] + padding_im_h = self.target_size[1] + elif self.coarsest_stride > 0: + padding_im_h = int( + np.ceil(im_h / self.coarsest_stride) * self.coarsest_stride) + padding_im_w = int( + np.ceil(im_w / self.coarsest_stride) * self.coarsest_stride) + else: + raise ValueError( + "coarsest_stridei(>1) or target_size(list|int) need setting in Padding transform" + ) + pad_height = padding_im_h - im_h + pad_width = padding_im_w - im_w + if pad_height < 0 or pad_width < 0: + raise ValueError( + 'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})' + .format(im_w, im_h, padding_im_w, padding_im_h)) + padding_im = np.zeros( + (padding_im_h, padding_im_w, im_c), dtype=np.float32) + padding_im[:im_h, :im_w, :] = im + if label_info is None: + return (padding_im, im_info) + else: + return (padding_im, im_info, label_info) + + +class Resize(DetTransform): + """调整图像大小(resize)。 + + - 当目标大小(target_size)类型为int时,根据插值方式, + 将图像resize为[target_size, target_size]。 + - 当目标大小(target_size)类型为list或tuple时,根据插值方式, + 将图像resize为target_size。 + 注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。 + + Args: + target_size (int/list/tuple): 短边目标长度。默认为608。 + interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为 + ['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC', + 'AREA', 'LANCZOS4', 'RANDOM']中。 + """ + + # The interpolation mode + interp_dict = { + 'NEAREST': cv2.INTER_NEAREST, + 'LINEAR': cv2.INTER_LINEAR, + 'CUBIC': cv2.INTER_CUBIC, + 'AREA': cv2.INTER_AREA, + 'LANCZOS4': cv2.INTER_LANCZOS4 + } + + def __init__(self, target_size=608, interp='LINEAR'): + self.interp = interp + if not (interp == "RANDOM" or interp in self.interp_dict): + raise ValueError("interp should be one of {}".format( + self.interp_dict.keys())) + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise TypeError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + + self.target_size = target_size + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = dict() + if not isinstance(im, np.ndarray): + raise TypeError("Resize: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('Resize: image is not 3-dimensional.') + if self.interp == "RANDOM": + interp = random.choice(list(self.interp_dict.keys())) + else: + interp = self.interp + im = resize(im, self.target_size, self.interp_dict[interp]) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class Normalize(DetTransform): + """对图像进行标准化。 + + 1. 归一化图像到到区间[0.0, 1.0]。 + 2. 对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。 + std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]): + self.mean = mean + self.std = std + if not (isinstance(self.mean, list) and isinstance(self.std, list)): + raise TypeError("NormalizeImage: input type is invalid.") + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise TypeError('NormalizeImage: std is invalid!') + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、 + 存储与标注框相关信息的字典。 + """ + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + if label_info is None: + return (im, im_info) + else: + return (im, im_info, label_info) + + +class ArrangeYOLOv3(DetTransform): + """获取YOLOv3模型训练/验证/预测所需信息。 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。 + """ + + def __init__(self, mode=None): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode must be in ['train', 'eval', 'test', 'quant']!") + self.mode = mode + + def __call__(self, im, im_info=None, label_info=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (dict, 可选): 存储与图像相关的信息。 + label_info (dict, 可选): 存储与标注框相关的信息。 + + Returns: + tuple: 当mode为'train'时,返回(im, gt_bbox, gt_class, gt_score, im_shape),分别对应 + 图像np.ndarray数据、真实标注框、真实标注框对应的类别、真实标注框混合得分、图像大小信息; + 当mode为'eval'时,返回(im, im_shape, im_id, gt_bbox, gt_class, difficult), + 分别对应图像np.ndarray数据、图像大小信息、图像id、真实标注框、真实标注框对应的类别、 + 真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_shape), + 分别对应图像np.ndarray数据、图像大小信息。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + im = permute(im, False) + if self.mode == 'train': + pass + elif self.mode == 'eval': + pass + else: + if im_info is None: + raise TypeError('Cannot do ArrangeYolov3! ' + + 'Becasuse the im_info can not be None!') + im_shape = im_info['image_shape'] + outputs = (im, im_shape) + return outputs + + +class ComposedYOLOv3Transforms(Compose): + """YOLOv3模型的图像预处理流程,具体如下, + 训练阶段: + 1. 在前mixup_epoch轮迭代中,使用MixupImage策略,见https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#mixupimage + 2. 对图像进行随机扰动,包括亮度,对比度,饱和度和色调 + 3. 随机扩充图像,见https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#randomexpand + 4. 随机裁剪图像 + 5. 将4步骤的输出图像Resize成shape参数的大小 + 6. 随机0.5的概率水平翻转图像 + 7. 图像归一化 + 验证/预测阶段: + 1. 将图像Resize成shape参数大小 + 2. 图像归一化 + + Args: + mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小 + mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + shape=[608, 608], + mixup_epoch=250, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]): + width = shape + if isinstance(shape, list): + if shape[0] != shape[1]: + raise Exception( + "In YOLOv3 model, width and height should be equal") + width = shape[0] + if width % 32 != 0: + raise Exception( + "In YOLOv3 model, width and height should be multiple of 32, e.g 224、256、320...." + ) + + if mode == 'train': + # 训练时的transforms,包含数据增强 + pass + else: + # 验证/预测时的transforms + transforms = [ + Resize( + target_size=width, interp='CUBIC'), Normalize( + mean=mean, std=std) + ] + super(ComposedYOLOv3Transforms, self).__init__(transforms) diff --git a/deploy/raspberry/python/transforms/ops.py b/deploy/raspberry/python/transforms/ops.py new file mode 100644 index 0000000000000000000000000000000000000000..3f298d7824be48355b69973a1e14486172efcb08 --- /dev/null +++ b/deploy/raspberry/python/transforms/ops.py @@ -0,0 +1,186 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import math +import numpy as np +from PIL import Image, ImageEnhance + + +def normalize(im, mean, std): + im = im / 255.0 + im -= mean + im /= std + return im + + +def permute(im, to_bgr=False): + im = np.swapaxes(im, 1, 2) + im = np.swapaxes(im, 1, 0) + if to_bgr: + im = im[[2, 1, 0], :, :] + return im + + +def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR): + value = max(im.shape[0], im.shape[1]) + scale = float(long_size) / float(value) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + + im = cv2.resize( + im, (resized_width, resized_height), interpolation=interpolation) + return im + + +def resize(im, target_size=608, interp=cv2.INTER_LINEAR): + if isinstance(target_size, list) or isinstance(target_size, tuple): + w = target_size[0] + h = target_size[1] + else: + w = target_size + h = target_size + im = cv2.resize(im, (w, h), interpolation=interp) + return im + + +def random_crop(im, + crop_size=224, + lower_scale=0.08, + lower_ratio=3. / 4, + upper_ratio=4. / 3): + scale = [lower_scale, 1.0] + ratio = [lower_ratio, upper_ratio] + aspect_ratio = math.sqrt(np.random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + bound = min((float(im.shape[0]) / im.shape[1]) / (h**2), + (float(im.shape[1]) / im.shape[0]) / (w**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + target_area = im.shape[0] * im.shape[1] * np.random.uniform( + scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + i = np.random.randint(0, im.shape[0] - h + 1) + j = np.random.randint(0, im.shape[1] - w + 1) + im = im[i:i + h, j:j + w, :] + im = cv2.resize(im, (crop_size, crop_size)) + return im + + +def center_crop(im, crop_size=224): + height, width = im.shape[:2] + w_start = (width - crop_size) // 2 + h_start = (height - crop_size) // 2 + w_end = w_start + crop_size + h_end = h_start + crop_size + im = im[h_start:h_end, w_start:w_end, :] + return im + + +def horizontal_flip(im): + if len(im.shape) == 3: + im = im[:, ::-1, :] + elif len(im.shape) == 2: + im = im[:, ::-1] + return im + + +def vertical_flip(im): + if len(im.shape) == 3: + im = im[::-1, :, :] + elif len(im.shape) == 2: + im = im[::-1, :] + return im + + +def bgr2rgb(im): + return im[:, :, ::-1] + + +def hue(im, hue_lower, hue_upper): + delta = np.random.uniform(hue_lower, hue_upper) + u = np.cos(delta * np.pi) + w = np.sin(delta * np.pi) + bt = np.array([[1.0, 0.0, 0.0], [0.0, u, -w], [0.0, w, u]]) + tyiq = np.array([[0.299, 0.587, 0.114], [0.596, -0.274, -0.321], + [0.211, -0.523, 0.311]]) + ityiq = np.array([[1.0, 0.956, 0.621], [1.0, -0.272, -0.647], + [1.0, -1.107, 1.705]]) + t = np.dot(np.dot(ityiq, bt), tyiq).T + im = np.dot(im, t) + return im + + +def saturation(im, saturation_lower, saturation_upper): + delta = np.random.uniform(saturation_lower, saturation_upper) + gray = im * np.array([[[0.299, 0.587, 0.114]]], dtype=np.float32) + gray = gray.sum(axis=2, keepdims=True) + gray *= (1.0 - delta) + im *= delta + im += gray + return im + + +def contrast(im, contrast_lower, contrast_upper): + delta = np.random.uniform(contrast_lower, contrast_upper) + im *= delta + return im + + +def brightness(im, brightness_lower, brightness_upper): + delta = np.random.uniform(brightness_lower, brightness_upper) + im += delta + return im + +def rotate(im, rotate_lower, rotate_upper): + rotate_delta = np.random.uniform(rotate_lower, rotate_upper) + im = im.rotate(int(rotate_delta)) + return im + + +def resize_padding(im, max_side_len=2400): + ''' + resize image to a size multiple of 32 which is required by the network + :param im: the resized image + :param max_side_len: limit of max image size to avoid out of memory in gpu + :return: the resized image and the resize ratio + ''' + h, w, _ = im.shape + + resize_w = w + resize_h = h + + # limit the max side + if max(resize_h, resize_w) > max_side_len: + ratio = float( + max_side_len) / resize_h if resize_h > resize_w else float( + max_side_len) / resize_w + else: + ratio = 1. + resize_h = int(resize_h * ratio) + resize_w = int(resize_w * ratio) + + resize_h = resize_h if resize_h % 32 == 0 else (resize_h // 32 - 1) * 32 + resize_w = resize_w if resize_w % 32 == 0 else (resize_w // 32 - 1) * 32 + resize_h = max(32, resize_h) + resize_w = max(32, resize_w) + im = cv2.resize(im, (int(resize_w), int(resize_h))) + #im = cv2.resize(im, (512, 512)) + ratio_h = resize_h / float(h) + ratio_w = resize_w / float(w) + _ratio = np.array([ratio_h, ratio_w]).reshape(-1, 2) + return im, _ratio diff --git a/deploy/raspberry/python/transforms/seg_transforms.py b/deploy/raspberry/python/transforms/seg_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..a3fb6241d415939a33f73a29b843f9ed45976463 --- /dev/null +++ b/deploy/raspberry/python/transforms/seg_transforms.py @@ -0,0 +1,1054 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .ops import * +import random +import os.path as osp +import numpy as np +from PIL import Image +import cv2 +from collections import OrderedDict + + +class SegTransform: + """ 分割transform基类 + """ + + def __init__(self): + pass + + +class Compose(SegTransform): + """根据数据预处理/增强算子对输入数据进行操作。 + 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。 + + Args: + transforms (list): 数据预处理/增强算子。 + + Raises: + TypeError: transforms不是list对象 + ValueError: transforms元素个数小于1。 + + """ + + def __init__(self, transforms): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + if len(transforms) < 1: + raise ValueError('The length of transforms ' + \ + 'must be equal or larger than 1!') + self.transforms = transforms + self.to_rgb = False + + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (str/np.ndarray): 图像路径/图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。 + + Returns: + tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。 + """ + + if im_info is None: + im_info = list() + if isinstance(im, np.ndarray): + if len(im.shape) != 3: + raise Exception( + "im should be 3-dimensions, but now is {}-dimensions". + format(len(im.shape))) + else: + try: + im = cv2.imread(im).astype('float32') + except: + raise ValueError('Can\'t read The image file {}!'.format(im)) + if self.to_rgb: + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + if label is not None: + if not isinstance(label, np.ndarray): + label = np.asarray(Image.open(label)) + for op in self.transforms: + if isinstance(op, SegTransform): + outputs = op(im, im_info, label) + im = outputs[0] + if len(outputs) >= 2: + im_info = outputs[1] + if len(outputs) == 3: + label = outputs[2] + else: + im = execute_imgaug(op, im) + if label is not None: + outputs = (im, im_info, label) + else: + outputs = (im, im_info) + return outputs + + def add_augmenters(self, augmenters): + if not isinstance(augmenters, list): + raise Exception( + "augmenters should be list type in func add_augmenters()") + transform_names = [type(x).__name__ for x in self.transforms] + for aug in augmenters: + if type(aug).__name__ in transform_names: + print("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) + self.transforms = augmenters + self.transforms + + +class RandomHorizontalFlip(SegTransform): + """以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。 + + Args: + prob (float): 随机水平翻转的概率。默认值为0.5。 + + """ + + def __init__(self, prob=0.5): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if random.random() < self.prob: + im = horizontal_flip(im) + if label is not None: + label = horizontal_flip(label) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomVerticalFlip(SegTransform): + """以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。 + + Args: + prob (float): 随机垂直翻转的概率。默认值为0.1。 + """ + + def __init__(self, prob=0.1): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if random.random() < self.prob: + im = vertical_flip(im) + if label is not None: + label = vertical_flip(label) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Resize(SegTransform): + """调整图像大小(resize),当存在标注图像时,则同步进行处理。 + + - 当目标大小(target_size)类型为int时,根据插值方式, + 将图像resize为[target_size, target_size]。 + - 当目标大小(target_size)类型为list或tuple时,根据插值方式, + 将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。 + + Args: + target_size (int|list|tuple): 目标大小。 + interp (str): resize的插值方式,与opencv的插值方式对应, + 可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。 + + Raises: + TypeError: target_size不是int/list/tuple。 + ValueError: target_size为list/tuple时元素个数不等于2。 + AssertionError: interp的取值不在['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4']之内。 + """ + + # The interpolation mode + interp_dict = { + 'NEAREST': cv2.INTER_NEAREST, + 'LINEAR': cv2.INTER_LINEAR, + 'CUBIC': cv2.INTER_CUBIC, + 'AREA': cv2.INTER_AREA, + 'LANCZOS4': cv2.INTER_LANCZOS4 + } + + def __init__(self, target_size, interp='LINEAR'): + self.interp = interp + assert interp in self.interp_dict, "interp should be one of {}".format( + interp_dict.keys()) + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise ValueError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + + self.target_size = target_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info跟新字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + + Raises: + ZeroDivisionError: im的短边为0。 + TypeError: im不是np.ndarray数据。 + ValueError: im不是3维nd.ndarray。 + """ + if im_info is None: + im_info = OrderedDict() + im_info.append(('resize', im.shape[:2])) + + if not isinstance(im, np.ndarray): + raise TypeError("ResizeImage: image type is not np.ndarray.") + if len(im.shape) != 3: + raise ValueError('ResizeImage: image is not 3-dimensional.') + im_shape = im.shape + im_size_min = np.min(im_shape[0:2]) + im_size_max = np.max(im_shape[0:2]) + if float(im_size_min) == 0: + raise ZeroDivisionError('ResizeImage: min size of image is 0') + + if isinstance(self.target_size, int): + resize_w = self.target_size + resize_h = self.target_size + else: + resize_w = self.target_size[0] + resize_h = self.target_size[1] + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp_dict[self.interp]) + if label is not None: + label = cv2.resize( + label, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp_dict['NEAREST']) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeByLong(SegTransform): + """对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。 + + Args: + long_size (int): resize后图像的长边大小。 + """ + + def __init__(self, long_size): + self.long_size = long_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info新增字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + """ + if im_info is None: + im_info = OrderedDict() + + im_info.append(('resize', im.shape[:2])) + im = resize_long(im, self.long_size) + if label is not None: + label = resize_long(label, self.long_size, cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeByShort(SegTransform): + """根据图像的短边调整图像大小(resize)。 + + 1. 获取图像的长边和短边长度。 + 2. 根据短边与short_size的比例,计算长边的目标长度, + 此时高、宽的resize比例为short_size/原图短边长度。 + 3. 如果max_size>0,调整resize比例: + 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。 + 4. 根据调整大小的比例对图像进行resize。 + + Args: + target_size (int): 短边目标长度。默认为800。 + max_size (int): 长边目标长度的最大限制。默认为1333。 + + Raises: + TypeError: 形参数据类型不满足需求。 + """ + + def __init__(self, short_size=800, max_size=1333): + self.max_size = int(max_size) + if not isinstance(short_size, int): + raise TypeError( + "Type of short_size is invalid. Must be Integer, now is {}". + format(type(short_size))) + self.short_size = short_size + if not (isinstance(self.max_size, int)): + raise TypeError("max_size: input type is invalid.") + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (numnp.ndarraypy): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info更新字段为: + -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。 + + Raises: + TypeError: 形参数据类型不满足需求。 + ValueError: 数据长度不匹配。 + """ + if im_info is None: + im_info = OrderedDict() + if not isinstance(im, np.ndarray): + raise TypeError("ResizeByShort: image type is not numpy.") + if len(im.shape) != 3: + raise ValueError('ResizeByShort: image is not 3-dimensional.') + im_info.append(('resize', im.shape[:2])) + im_short_size = min(im.shape[0], im.shape[1]) + im_long_size = max(im.shape[0], im.shape[1]) + scale = float(self.short_size) / im_short_size + if self.max_size > 0 and np.round(scale * + im_long_size) > self.max_size: + scale = float(self.max_size) / float(im_long_size) + resized_width = int(round(im.shape[1] * scale)) + resized_height = int(round(im.shape[0] * scale)) + im = cv2.resize( + im, (resized_width, resized_height), + interpolation=cv2.INTER_NEAREST) + if label is not None: + im = cv2.resize( + label, (resized_width, resized_height), + interpolation=cv2.INTER_NEAREST) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeRangeScaling(SegTransform): + """对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。 + + Args: + min_value (int): 图像长边resize后的最小值。默认值400。 + max_value (int): 图像长边resize后的最大值。默认值600。 + + Raises: + ValueError: min_value大于max_value + """ + + def __init__(self, min_value=400, max_value=600): + if min_value > max_value: + raise ValueError('min_value must be less than max_value, ' + 'but they are {} and {}.'.format(min_value, + max_value)) + self.min_value = min_value + self.max_value = max_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_value == self.max_value: + random_size = self.max_value + else: + random_size = int( + np.random.uniform(self.min_value, self.max_value) + 0.5) + im = resize_long(im, random_size, cv2.INTER_LINEAR) + if label is not None: + label = resize_long(label, random_size, cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ResizeStepScaling(SegTransform): + """对图像按照某一个比例resize,这个比例以scale_step_size为步长 + 在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。 + + Args: + min_scale_factor(float), resize最小尺度。默认值0.75。 + max_scale_factor (float), resize最大尺度。默认值1.25。 + scale_step_size (float), resize尺度范围间隔。默认值0.25。 + + Raises: + ValueError: min_scale_factor大于max_scale_factor + """ + + def __init__(self, + min_scale_factor=0.75, + max_scale_factor=1.25, + scale_step_size=0.25): + if min_scale_factor > max_scale_factor: + raise ValueError( + 'min_scale_factor must be less than max_scale_factor, ' + 'but they are {} and {}.'.format(min_scale_factor, + max_scale_factor)) + self.min_scale_factor = min_scale_factor + self.max_scale_factor = max_scale_factor + self.scale_step_size = scale_step_size + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_scale_factor == self.max_scale_factor: + scale_factor = self.min_scale_factor + + elif self.scale_step_size == 0: + scale_factor = np.random.uniform(self.min_scale_factor, + self.max_scale_factor) + + else: + num_steps = int((self.max_scale_factor - self.min_scale_factor) / + self.scale_step_size + 1) + scale_factors = np.linspace(self.min_scale_factor, + self.max_scale_factor, + num_steps).tolist() + np.random.shuffle(scale_factors) + scale_factor = scale_factors[0] + + im = cv2.resize( + im, (0, 0), + fx=scale_factor, + fy=scale_factor, + interpolation=cv2.INTER_LINEAR) + if label is not None: + label = cv2.resize( + label, (0, 0), + fx=scale_factor, + fy=scale_factor, + interpolation=cv2.INTER_NEAREST) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Normalize(SegTransform): + """对图像进行标准化。 + 1.尺度缩放到 [0,1]。 + 2.对图像进行减均值除以标准差操作。 + + Args: + mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。 + std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。 + + Raises: + ValueError: mean或std不是list对象。std包含0。 + """ + + def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]): + self.mean = mean + self.std = std + if not (isinstance(self.mean, list) and isinstance(self.std, list)): + raise ValueError("{}: input type is invalid.".format(self)) + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise ValueError('{}: std is invalid!'.format(self)) + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = normalize(im, mean, std) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class Padding(SegTransform): + """对图像或标注图像进行padding,padding方向为右和下。 + 根据提供的值对图像或标注图像进行padding操作。 + + Args: + target_size (int|list|tuple): padding后图像的大小。 + im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。 + label_padding_value (int): 标注图像padding的值。默认值为255。 + + Raises: + TypeError: target_size不是int|list|tuple。 + ValueError: target_size为list|tuple时元素个数不等于2。 + """ + + def __init__(self, + target_size, + im_padding_value=[127.5, 127.5, 127.5], + label_padding_value=255): + if isinstance(target_size, list) or isinstance(target_size, tuple): + if len(target_size) != 2: + raise ValueError( + 'when target is list or tuple, it should include 2 elements, but it is {}' + .format(target_size)) + elif not isinstance(target_size, int): + raise TypeError( + "Type of target_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(target_size))) + self.target_size = target_size + self.im_padding_value = im_padding_value + self.label_padding_value = label_padding_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + 其中,im_info新增字段为: + -shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。 + + Raises: + ValueError: 输入图像im或label的形状大于目标值 + """ + if im_info is None: + im_info = OrderedDict() + im_info.append(('padding', im.shape[:2])) + + im_height, im_width = im.shape[0], im.shape[1] + if isinstance(self.target_size, int): + target_height = self.target_size + target_width = self.target_size + else: + target_height = self.target_size[1] + target_width = self.target_size[0] + pad_height = target_height - im_height + pad_width = target_width - im_width + if pad_height < 0 or pad_width < 0: + raise ValueError( + 'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})' + .format(im_width, im_height, target_width, target_height)) + else: + im = cv2.copyMakeBorder( + im, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.im_padding_value) + if label is not None: + label = cv2.copyMakeBorder( + label, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.label_padding_value) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomPaddingCrop(SegTransform): + """对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。 + + Args: + crop_size (int|list|tuple): 裁剪图像大小。默认为512。 + im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。 + label_padding_value (int): 标注图像padding的值。默认值为255。 + + Raises: + TypeError: crop_size不是int/list/tuple。 + ValueError: target_size为list/tuple时元素个数不等于2。 + """ + + def __init__(self, + crop_size=512, + im_padding_value=[127.5, 127.5, 127.5], + label_padding_value=255): + if isinstance(crop_size, list) or isinstance(crop_size, tuple): + if len(crop_size) != 2: + raise ValueError( + 'when crop_size is list or tuple, it should include 2 elements, but it is {}' + .format(crop_size)) + elif not isinstance(crop_size, int): + raise TypeError( + "Type of crop_size is invalid. Must be Integer or List or tuple, now is {}" + .format(type(crop_size))) + self.crop_size = crop_size + self.im_padding_value = im_padding_value + self.label_padding_value = label_padding_value + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if isinstance(self.crop_size, int): + crop_width = self.crop_size + crop_height = self.crop_size + else: + crop_width = self.crop_size[0] + crop_height = self.crop_size[1] + + img_height = im.shape[0] + img_width = im.shape[1] + + if img_height == crop_height and img_width == crop_width: + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + else: + pad_height = max(crop_height - img_height, 0) + pad_width = max(crop_width - img_width, 0) + if (pad_height > 0 or pad_width > 0): + im = cv2.copyMakeBorder( + im, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.im_padding_value) + if label is not None: + label = cv2.copyMakeBorder( + label, + 0, + pad_height, + 0, + pad_width, + cv2.BORDER_CONSTANT, + value=self.label_padding_value) + img_height = im.shape[0] + img_width = im.shape[1] + + if crop_height > 0 and crop_width > 0: + h_off = np.random.randint(img_height - crop_height + 1) + w_off = np.random.randint(img_width - crop_width + 1) + + im = im[h_off:(crop_height + h_off), w_off:(w_off + crop_width + ), :] + if label is not None: + label = label[h_off:(crop_height + h_off), w_off:( + w_off + crop_width)] + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomBlur(SegTransform): + """以一定的概率对图像进行高斯模糊。 + + Args: + prob (float): 图像模糊概率。默认为0.1。 + """ + + def __init__(self, prob=0.1): + self.prob = prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.prob <= 0: + n = 0 + elif self.prob >= 1: + n = 1 + else: + n = int(1.0 / self.prob) + if n > 0: + if np.random.randint(0, n) == 0: + radius = np.random.randint(3, 10) + if radius % 2 != 1: + radius = radius + 1 + if radius > 9: + radius = 9 + im = cv2.GaussianBlur(im, (radius, radius), 0, 0) + + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + + + +class RandomScaleAspect(SegTransform): + """裁剪并resize回原始尺寸的图像和标注图像。 + 按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。 + + Args: + min_scale (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。 + aspect_ratio (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。 + """ + + def __init__(self, min_scale=0.5, aspect_ratio=0.33): + self.min_scale = min_scale + self.aspect_ratio = aspect_ratio + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + if self.min_scale != 0 and self.aspect_ratio != 0: + img_height = im.shape[0] + img_width = im.shape[1] + for i in range(0, 10): + area = img_height * img_width + target_area = area * np.random.uniform(self.min_scale, 1.0) + aspectRatio = np.random.uniform(self.aspect_ratio, + 1.0 / self.aspect_ratio) + + dw = int(np.sqrt(target_area * 1.0 * aspectRatio)) + dh = int(np.sqrt(target_area * 1.0 / aspectRatio)) + if (np.random.randint(10) < 5): + tmp = dw + dw = dh + dh = tmp + + if (dh < img_height and dw < img_width): + h1 = np.random.randint(0, img_height - dh) + w1 = np.random.randint(0, img_width - dw) + + im = im[h1:(h1 + dh), w1:(w1 + dw), :] + label = label[h1:(h1 + dh), w1:(w1 + dw)] + im = cv2.resize( + im, (img_width, img_height), + interpolation=cv2.INTER_LINEAR) + label = cv2.resize( + label, (img_width, img_height), + interpolation=cv2.INTER_NEAREST) + break + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class RandomDistort(SegTransform): + """对图像进行随机失真。 + + 1. 对变换的操作顺序进行随机化操作。 + 2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。 + + Args: + brightness_range (float): 明亮度因子的范围。默认为0.5。 + brightness_prob (float): 随机调整明亮度的概率。默认为0.5。 + contrast_range (float): 对比度因子的范围。默认为0.5。 + contrast_prob (float): 随机调整对比度的概率。默认为0.5。 + saturation_range (float): 饱和度因子的范围。默认为0.5。 + saturation_prob (float): 随机调整饱和度的概率。默认为0.5。 + hue_range (int): 色调因子的范围。默认为18。 + hue_prob (float): 随机调整色调的概率。默认为0.5。 + """ + + def __init__(self, + brightness_range=0.5, + brightness_prob=0.5, + contrast_range=0.5, + contrast_prob=0.5, + saturation_range=0.5, + saturation_prob=0.5, + hue_range=18, + hue_prob=0.5): + self.brightness_range = brightness_range + self.brightness_prob = brightness_prob + self.contrast_range = contrast_range + self.contrast_prob = contrast_prob + self.saturation_range = saturation_range + self.saturation_prob = saturation_prob + self.hue_range = hue_range + self.hue_prob = hue_prob + + def __call__(self, im, im_info=None, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、 + 存储与图像相关信息的字典和标注图像np.ndarray数据。 + """ + brightness_lower = 1 - self.brightness_range + brightness_upper = 1 + self.brightness_range + contrast_lower = 1 - self.contrast_range + contrast_upper = 1 + self.contrast_range + saturation_lower = 1 - self.saturation_range + saturation_upper = 1 + self.saturation_range + hue_lower = -self.hue_range + hue_upper = self.hue_range + ops = [brightness, contrast, saturation, hue] + random.shuffle(ops) + params_dict = { + 'brightness': { + 'brightness_lower': brightness_lower, + 'brightness_upper': brightness_upper + }, + 'contrast': { + 'contrast_lower': contrast_lower, + 'contrast_upper': contrast_upper + }, + 'saturation': { + 'saturation_lower': saturation_lower, + 'saturation_upper': saturation_upper + }, + 'hue': { + 'hue_lower': hue_lower, + 'hue_upper': hue_upper + } + } + prob_dict = { + 'brightness': self.brightness_prob, + 'contrast': self.contrast_prob, + 'saturation': self.saturation_prob, + 'hue': self.hue_prob + } + for id in range(4): + params = params_dict[ops[id].__name__] + prob = prob_dict[ops[id].__name__] + params['im'] = im + if np.random.uniform(0, 1) < prob: + im = ops[id](**params) + if label is None: + return (im, im_info) + else: + return (im, im_info, label) + + +class ArrangeSegmenter(SegTransform): + """获取训练/验证/预测所需的信息。 + + Args: + mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。 + + Raises: + ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内 + """ + + def __init__(self, mode): + if mode not in ['train', 'eval', 'test', 'quant']: + raise ValueError( + "mode should be defined as one of ['train', 'eval', 'test', 'quant']!" + ) + self.mode = mode + + def __call__(self, im, im_info, label=None): + """ + Args: + im (np.ndarray): 图像np.ndarray数据。 + im_info (list): 存储图像reisze或padding前的shape信息,如 + [('resize', [200, 300]), ('padding', [400, 600])]表示 + 图像在过resize前shape为(200, 300), 过padding前shape为 + (400, 600) + label (np.ndarray): 标注图像np.ndarray数据。 + + Returns: + tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典; + 当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为 + 'quant'时,返回的tuple为(im,),为图像np.ndarray数据。 + """ + im = permute(im, False) + if self.mode == 'train' or self.mode == 'eval': + label = label[np.newaxis, :, :] + return (im, label) + elif self.mode == 'test': + return (im, im_info) + else: + return (im, ) + + +class ComposedSegTransforms(Compose): + """ 语义分割模型(UNet/DeepLabv3p)的图像处理流程,具体如下 + 训练阶段: + 1. 随机对图像以0.5的概率水平翻转 + 2. 按不同的比例随机Resize原图 + 3. 从原图中随机crop出大小为train_crop_size大小的子图,如若crop出来的图小于train_crop_size,则会将图padding到对应大小 + 4. 图像归一化 + 预测阶段: + 1. 图像归一化 + + Args: + mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' + train_crop_size(list): 模型训练阶段,随机从原图crop的大小 + mean(list): 图像均值 + std(list): 图像方差 + """ + + def __init__(self, + mode, + train_crop_size=[769, 769], + mean=[0.5, 0.5, 0.5], + std=[0.5, 0.5, 0.5]): + if mode == 'train': + # 训练时的transforms,包含数据增强 + pass + else: + # 验证/预测时的transforms + transforms = [Normalize(mean=mean, std=std)] + + super(ComposedSegTransforms, self).__init__(transforms) diff --git a/deploy/raspberry/scripts/build.sh b/deploy/raspberry/scripts/build.sh new file mode 100755 index 0000000000000000000000000000000000000000..ef268a1b72abd0fee258769764840efe13447058 --- /dev/null +++ b/deploy/raspberry/scripts/build.sh @@ -0,0 +1,22 @@ +# Paddle-Lite预编译库的路径 +LITE_DIR=/path/to/Paddle-Lite/inference/lib + +# gflags预编译库的路径 +GFLAGS_DIR=$(pwd)/deps/gflags +# glog预编译库的路径 +GLOG_DIR=$(pwd)/deps/glog + +# opencv预编译库的路径, 如果使用自带预编译版本可不修改 +OPENCV_DIR=$(pwd)/deps/opencv +# 下载自带预编译版本 +exec $(pwd)/scripts/install_third-party.sh + +rm -rf build +mkdir -p build +cd build +cmake .. \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DGFLAGS_DIR=${GFLAGS_DIR} \ + -DLITE_DIR=${LITE_DIR} \ + -DCMAKE_CXX_FLAGS="-march=armv7-a" +make diff --git a/deploy/raspberry/scripts/install_third-party.sh b/deploy/raspberry/scripts/install_third-party.sh new file mode 100755 index 0000000000000000000000000000000000000000..decc380d4d2c24b99d785ddd3c1a21d217388539 --- /dev/null +++ b/deploy/raspberry/scripts/install_third-party.sh @@ -0,0 +1,32 @@ +# download third-part lib +if [ ! -d "./deps" ]; then + mkdir deps +fi +if [ ! -d "./deps/gflag" ]; then + cd deps + git clone https://github.com/gflags/gflags + cd gflags + cmake . + make -j 4 + cd .. + cd .. +fi +if [ ! -d "./deps/glog" ]; then + cd deps + git clone https://github.com/google/glog + sudo apt-get install autoconf automake libtool + cd glog + ./autogen.sh + ./configure + make -j 4 + cd .. + cd .. +fi +OPENCV_URL=https://bj.bcebos.com/paddlex/deploy/armopencv/opencv.tar.bz2 +if [ ! -d "./deps/opencv" ]; then + cd deps + wget -c ${OPENCV_URL} + tar xvfj opencv.tar.bz2 + rm -rf opencv.tar.bz2 + cd .. +fi diff --git a/deploy/raspberry/src/paddlex.cpp b/deploy/raspberry/src/paddlex.cpp new file mode 100755 index 0000000000000000000000000000000000000000..081a1ffb7acc56a5efb22e1e92264cad1d807f4d --- /dev/null +++ b/deploy/raspberry/src/paddlex.cpp @@ -0,0 +1,256 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/paddlex/paddlex.h" +#include +#include + + +namespace PaddleX { + +void Model::create_predictor(const std::string& model_dir, + const std::string& cfg_file, + int thread_num) { + paddle::lite_api::MobileConfig config; + config.set_model_from_file(model_dir); + config.set_threads(thread_num); + load_config(cfg_file); + predictor_ = + paddle::lite_api::CreatePaddlePredictor( + config); +} + +bool Model::load_config(const std::string& cfg_file) { + YAML::Node config = YAML::LoadFile(cfg_file); + type = config["_Attributes"]["model_type"].as(); + name = config["Model"].as(); + bool to_rgb = true; + if (config["TransformsMode"].IsDefined()) { + std::string mode = config["TransformsMode"].as(); + if (mode == "BGR") { + to_rgb = false; + } else if (mode != "RGB") { + std::cerr << "[Init] Only 'RGB' or 'BGR' is supported for TransformsMode" + << std::endl; + return false; + } + } + // init preprocess ops + transforms_.Init(config["Transforms"], to_rgb); + // read label list + for (const auto& item : config["_Attributes"]["labels"]) { + int index = labels.size(); + labels[index] = item.as(); + } + + return true; +} + +bool Model::preprocess(cv::Mat* input_im, ImageBlob* inputs) { + if (!transforms_.Run(input_im, inputs)) { + return false; + } + return true; +} + +bool Model::predict(const cv::Mat& im, ClsResult* result) { + inputs_.clear(); + if (type == "detector") { + std::cerr << "Loading model is a 'detector', DetResult should be passed to " + "function predict()!" + << std::endl; + return false; + } else if (type == "segmenter") { + std::cerr << "Loading model is a 'segmenter', SegResult should be passed " + "to function predict()!" + << std::endl; + return false; + } + // preprocess + inputs_.input_tensor_ = std::move(predictor_->GetInput(0)); + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { + std::cerr << "Preprocess failed!" << std::endl; + return false; + } + // predict + predictor_->Run(); + + std::unique_ptr output_tensor( + std::move(predictor_->GetOutput(0))); + const float *outputs_data = output_tensor->mutable_data(); + + + // postprocess + auto ptr = std::max_element(outputs_data, outputs_data+sizeof(outputs_data)); + result->category_id = std::distance(outputs_data, ptr); + result->score = *ptr; + result->category = labels[result->category_id]; +} + +bool Model::predict(const cv::Mat& im, DetResult* result) { + inputs_.clear(); + result->clear(); + if (type == "classifier") { + std::cerr << "Loading model is a 'classifier', ClsResult should be passed " + "to function predict()!" << std::endl; + return false; + } else if (type == "segmenter") { + std::cerr << "Loading model is a 'segmenter', SegResult should be passed " + "to function predict()!" << std::endl; + return false; + } + inputs_.input_tensor_ = std::move(predictor_->GetInput(0)); + + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { + std::cerr << "Preprocess failed!" << std::endl; + return false; + } + int h = inputs_.new_im_size_[0]; + int w = inputs_.new_im_size_[1]; + if (name == "YOLOv3") { + std::unique_ptr im_size_tensor( + std::move(predictor_->GetInput(1))); + const std::vector IM_SIZE_SHAPE = {1, 2}; + im_size_tensor->Resize(IM_SIZE_SHAPE); + auto *im_size_data = im_size_tensor->mutable_data(); + memcpy(im_size_data, inputs_.ori_im_size_.data(), 1*2*sizeof(int)); + } + predictor_->Run(); + auto output_names = predictor_->GetOutputNames(); + auto output_box_tensor = predictor_->GetTensor(output_names[0]); + const float *output_box = output_box_tensor->mutable_data(); + std::vector output_box_shape = output_box_tensor->shape(); + int size = 1; + for (const auto& i : output_box_shape) { + size *= i; + } + int num_boxes = size / 6; + for (int i = 0; i < num_boxes; ++i) { + Box box; + box.category_id = static_cast(round(output_box[i * 6])); + box.category = labels[box.category_id]; + box.score = output_box[i * 6 + 1]; + float xmin = output_box[i * 6 + 2]; + float ymin = output_box[i * 6 + 3]; + float xmax = output_box[i * 6 + 4]; + float ymax = output_box[i * 6 + 5]; + float w = xmax - xmin + 1; + float h = ymax - ymin + 1; + box.coordinate = {xmin, ymin, w, h}; + result->boxes.push_back(std::move(box)); + } + return true; +} + + +bool Model::predict(const cv::Mat& im, SegResult* result) { + result->clear(); + inputs_.clear(); + if (type == "classifier") { + std::cerr << "Loading model is a 'classifier', ClsResult should be passed " + "to function predict()!" << std::endl; + return false; + } else if (type == "detector") { + std::cerr << "Loading model is a 'detector', DetResult should be passed to " + "function predict()!" << std::endl; + return false; + } + inputs_.input_tensor_ = std::move(predictor_->GetInput(0)); + cv::Mat im_clone = im.clone(); + if (!preprocess(&im_clone, &inputs_)) { + std::cerr << "Preprocess failed!" << std::endl; + return false; + } + std::cout << "Preprocess is done" << std::endl; + predictor_->Run(); + auto output_names = predictor_->GetOutputNames(); + + auto output_label_tensor = predictor_->GetTensor(output_names[0]); + const int64_t *label_data = output_label_tensor->mutable_data(); + std::vector output_label_shape = output_label_tensor->shape(); + int size = 1; + for (const auto& i : output_label_shape) { + size *= i; + result->label_map.shape.push_back(i); + } + result->label_map.data.resize(size); + memcpy(result->label_map.data.data(), label_data, size*sizeof(int64_t)); + + auto output_score_tensor = predictor_->GetTensor(output_names[1]); + const float *score_data = output_score_tensor->mutable_data(); + std::vector output_score_shape = output_score_tensor->shape(); + size = 1; + for (const auto& i : output_score_shape) { + size *= i; + result->score_map.shape.push_back(i); + } + result->score_map.data.resize(size); + memcpy(result->score_map.data.data(), score_data, size*sizeof(float)); + + + std::vector label_map(result->label_map.data.begin(), + result->label_map.data.end()); + cv::Mat mask_label(result->label_map.shape[1], + result->label_map.shape[2], + CV_8UC1, + label_map.data()); + + cv::Mat mask_score(result->score_map.shape[2], + result->score_map.shape[3], + CV_32FC1, + result->score_map.data.data()); + int idx = 1; + int len_postprocess = inputs_.im_size_before_resize_.size(); + for (std::vector::reverse_iterator iter = + inputs_.reshape_order_.rbegin(); + iter != inputs_.reshape_order_.rend(); + ++iter) { + if (*iter == "padding") { + auto before_shape = inputs_.im_size_before_resize_[len_postprocess - idx]; + inputs_.im_size_before_resize_.pop_back(); + auto padding_w = before_shape[0]; + auto padding_h = before_shape[1]; + mask_label = mask_label(cv::Rect(0, 0, padding_h, padding_w)); + mask_score = mask_score(cv::Rect(0, 0, padding_h, padding_w)); + } else if (*iter == "resize") { + auto before_shape = inputs_.im_size_before_resize_[len_postprocess - idx]; + inputs_.im_size_before_resize_.pop_back(); + auto resize_w = before_shape[0]; + auto resize_h = before_shape[1]; + cv::resize(mask_label, + mask_label, + cv::Size(resize_h, resize_w), + 0, + 0, + cv::INTER_NEAREST); + cv::resize(mask_score, + mask_score, + cv::Size(resize_h, resize_w), + 0, + 0, + cv::INTER_LINEAR); + } + ++idx; + } + result->label_map.data.assign(mask_label.begin(), + mask_label.end()); + result->label_map.shape = {mask_label.rows, mask_label.cols}; + result->score_map.data.assign(mask_score.begin(), + mask_score.end()); + result->score_map.shape = {mask_score.rows, mask_score.cols}; + return true; +} +} // namespace PaddleX diff --git a/deploy/raspberry/src/transforms.cpp b/deploy/raspberry/src/transforms.cpp new file mode 100755 index 0000000000000000000000000000000000000000..026c0e02e155e32224a79ef66ab203fd0afa40b3 --- /dev/null +++ b/deploy/raspberry/src/transforms.cpp @@ -0,0 +1,239 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + + +#include "include/paddlex/transforms.h" + +#include + +#include +#include +#include + + + +namespace PaddleX { + +std::map interpolations = {{"LINEAR", cv::INTER_LINEAR}, + {"NEAREST", cv::INTER_NEAREST}, + {"AREA", cv::INTER_AREA}, + {"CUBIC", cv::INTER_CUBIC}, + {"LANCZOS4", cv::INTER_LANCZOS4}}; + +bool Normalize::Run(cv::Mat* im, ImageBlob* data) { + for (int h = 0; h < im->rows; h++) { + for (int w = 0; w < im->cols; w++) { + im->at(h, w)[0] = + (im->at(h, w)[0] / 255.0 - mean_[0]) / std_[0]; + im->at(h, w)[1] = + (im->at(h, w)[1] / 255.0 - mean_[1]) / std_[1]; + im->at(h, w)[2] = + (im->at(h, w)[2] / 255.0 - mean_[2]) / std_[2]; + } + } + return true; +} + + + +float ResizeByShort::GenerateScale(const cv::Mat& im) { + int origin_w = im.cols; + int origin_h = im.rows; + int im_size_max = std::max(origin_w, origin_h); + int im_size_min = std::min(origin_w, origin_h); + float scale = + static_cast(short_size_) / static_cast(im_size_min); + if (max_size_ > 0) { + if (round(scale * im_size_max) > max_size_) { + scale = static_cast(max_size_) / static_cast(im_size_max); + } + } + return scale; +} + +bool ResizeByShort::Run(cv::Mat* im, ImageBlob* data) { + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); + + float scale = GenerateScale(*im); + int width = static_cast(round(scale * im->cols)); + int height = static_cast(round(scale * im->rows)); + cv::resize(*im, *im, cv::Size(width, height), 0, 0, cv::INTER_LINEAR); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + data->scale = scale; + return true; +} + +bool CenterCrop::Run(cv::Mat* im, ImageBlob* data) { + int height = static_cast(im->rows); + int width = static_cast(im->cols); + if (height < height_ || width < width_) { + std::cerr << "[CenterCrop] Image size less than crop size" << std::endl; + return false; + } + int offset_x = static_cast((width - width_) / 2); + int offset_y = static_cast((height - height_) / 2); + cv::Rect crop_roi(offset_x, offset_y, width_, height_); + *im = (*im)(crop_roi); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + + +bool Padding::Run(cv::Mat* im, ImageBlob* data) { + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("padding"); + + int padding_w = 0; + int padding_h = 0; + if (width_ > 1 & height_ > 1) { + padding_w = width_ - im->cols; + padding_h = height_ - im->rows; + } else if (coarsest_stride_ >= 1) { + int h = im->rows; + int w = im->cols; + padding_h = + ceil(h * 1.0 / coarsest_stride_) * coarsest_stride_ - im->rows; + padding_w = + ceil(w * 1.0 / coarsest_stride_) * coarsest_stride_ - im->cols; + } + + if (padding_h < 0 || padding_w < 0) { + std::cerr << "[Padding] Computed padding_h=" << padding_h + << ", padding_w=" << padding_w + << ", but they should be greater than 0." << std::endl; + return false; + } + cv::Scalar value = cv::Scalar(im_value_[0], im_value_[1], im_value_[2]); + cv::copyMakeBorder( + *im, *im, 0, padding_h, 0, padding_w, cv::BORDER_CONSTANT, value); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + +bool ResizeByLong::Run(cv::Mat* im, ImageBlob* data) { + if (long_size_ <= 0) { + std::cerr << "[ResizeByLong] long_size should be greater than 0" + << std::endl; + return false; + } + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); + int origin_w = im->cols; + int origin_h = im->rows; + + int im_size_max = std::max(origin_w, origin_h); + float scale = + static_cast(long_size_) / static_cast(im_size_max); + cv::resize(*im, *im, cv::Size(), scale, scale, cv::INTER_NEAREST); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + data->scale = scale; + return true; +} + +bool Resize::Run(cv::Mat* im, ImageBlob* data) { + if (width_ <= 0 || height_ <= 0) { + std::cerr << "[Resize] width and height should be greater than 0" + << std::endl; + return false; + } + if (interpolations.count(interp_) <= 0) { + std::cerr << "[Resize] Invalid interpolation method: '" << interp_ << "'" + << std::endl; + return false; + } + data->im_size_before_resize_.push_back({im->rows, im->cols}); + data->reshape_order_.push_back("resize"); + + cv::resize( + *im, *im, cv::Size(width_, height_), 0, 0, interpolations[interp_]); + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + return true; +} + +void Transforms::Init(const YAML::Node& transforms_node, bool to_rgb) { + transforms_.clear(); + to_rgb_ = to_rgb; + for (const auto& item : transforms_node) { + std::string name = item.begin()->first.as(); + std::cout << "trans name: " << name << std::endl; + std::shared_ptr transform = CreateTransform(name); + transform->Init(item.begin()->second); + transforms_.push_back(transform); + } +} + +std::shared_ptr Transforms::CreateTransform( + const std::string& transform_name) { + if (transform_name == "Normalize") { + return std::make_shared(); + } else if (transform_name == "ResizeByShort") { + return std::make_shared(); + } else if (transform_name == "CenterCrop") { + return std::make_shared(); + } else if (transform_name == "Resize") { + return std::make_shared(); + } else if (transform_name == "Padding") { + return std::make_shared(); + } else if (transform_name == "ResizeByLong") { + return std::make_shared(); + } else { + std::cerr << "There's unexpected transform(name='" << transform_name + << "')." << std::endl; + exit(-1); + } +} + +bool Transforms::Run(cv::Mat* im, ImageBlob* data) { + // preprocess by order + if (to_rgb_) { + cv::cvtColor(*im, *im, cv::COLOR_BGR2RGB); + } + (*im).convertTo(*im, CV_32FC3); + data->ori_im_size_[0] = im->rows; + data->ori_im_size_[1] = im->cols; + data->new_im_size_[0] = im->rows; + data->new_im_size_[1] = im->cols; + + for (int i = 0; i < transforms_.size(); ++i) { + if (!transforms_[i]->Run(im, data)) { + std::cerr << "Apply transforms to image failed!" << std::endl; + return false; + } + } + + // image format NHWC to NCHW + // img data save to ImageBlob + int height = im->rows; + int width = im->cols; + int channels = im->channels(); + const std::vector INPUT_SHAPE = {1, channels, height, width}; + data->input_tensor_->Resize(INPUT_SHAPE); + auto *input_data = data->input_tensor_->mutable_data(); + for (size_t c = 0; c < channels; c++) { + for (size_t h = 0; h < height; h++) { + for (size_t w = 0; w < width; w++) { + input_data[c * width * height + h * width + w] = + im->at(h, w)[c]; + } + } + } + return true; +} +} // namespace PaddleX diff --git a/deploy/raspberry/src/visualize.cpp b/deploy/raspberry/src/visualize.cpp new file mode 100755 index 0000000000000000000000000000000000000000..df2cd768495ea8638acf020ad53437bd827cb95e --- /dev/null +++ b/deploy/raspberry/src/visualize.cpp @@ -0,0 +1,148 @@ +// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "include/paddlex/visualize.h" + +namespace PaddleX { +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; + } + } + return colormap; +} + +cv::Mat Visualize(const cv::Mat& img, + const DetResult& result, + const std::map& labels, + const std::vector& colormap, + float threshold) { + cv::Mat vis_img = img.clone(); + auto boxes = result.boxes; + for (int i = 0; i < boxes.size(); ++i) { + if (boxes[i].score < threshold) { + continue; + } + cv::Rect roi = cv::Rect(boxes[i].coordinate[0], + boxes[i].coordinate[1], + boxes[i].coordinate[2], + boxes[i].coordinate[3]); + + // 生成预测框和标题 + std::string text = boxes[i].category; + int c1 = colormap[3 * boxes[i].category_id + 0]; + int c2 = colormap[3 * boxes[i].category_id + 1]; + int c3 = colormap[3 * boxes[i].category_id + 2]; + cv::Scalar roi_color = cv::Scalar(c1, c2, c3); + text += std::to_string(static_cast(boxes[i].score * 100)) + "%"; + int font_face = cv::FONT_HERSHEY_SIMPLEX; + double font_scale = 0.5f; + float thickness = 0.5; + cv::Size text_size = + cv::getTextSize(text, font_face, font_scale, thickness, nullptr); + cv::Point origin; + origin.x = roi.x; + origin.y = roi.y; + + // 生成预测框标题的背景 + cv::Rect text_back = cv::Rect(boxes[i].coordinate[0], + boxes[i].coordinate[1] - text_size.height, + text_size.width, + text_size.height); + + // 绘图和文字 + cv::rectangle(vis_img, roi, roi_color, 2); + cv::rectangle(vis_img, text_back, roi_color, -1); + cv::putText(vis_img, + text, + origin, + font_face, + font_scale, + cv::Scalar(255, 255, 255), + thickness); + + // 生成实例分割mask + if (boxes[i].mask.data.size() == 0) { + continue; + } + cv::Mat bin_mask(result.mask_resolution, + result.mask_resolution, + CV_32FC1, + boxes[i].mask.data.data()); + cv::resize(bin_mask, + bin_mask, + cv::Size(boxes[i].mask.shape[0], boxes[i].mask.shape[1])); + cv::threshold(bin_mask, bin_mask, 0.5, 1, cv::THRESH_BINARY); + cv::Mat full_mask = cv::Mat::zeros(vis_img.size(), CV_8UC1); + bin_mask.copyTo(full_mask(roi)); + cv::Mat mask_ch[3]; + mask_ch[0] = full_mask * c1; + mask_ch[1] = full_mask * c2; + mask_ch[2] = full_mask * c3; + cv::Mat mask; + cv::merge(mask_ch, 3, mask); + cv::addWeighted(vis_img, 1, mask, 0.5, 0, vis_img); + } + return vis_img; +} + +cv::Mat Visualize(const cv::Mat& img, + const SegResult& result, + const std::map& labels, + const std::vector& colormap) { + std::vector label_map(result.label_map.data.begin(), + result.label_map.data.end()); + cv::Mat mask(result.label_map.shape[0], + result.label_map.shape[1], + CV_8UC1, + label_map.data()); + cv::Mat color_mask = cv::Mat::zeros( + result.label_map.shape[0], result.label_map.shape[1], CV_8UC3); + int rows = img.rows; + int cols = img.cols; + for (int i = 0; i < rows; i++) { + for (int j = 0; j < cols; j++) { + int category_id = static_cast(mask.at(i, j)); + color_mask.at(i, j)[0] = colormap[3 * category_id + 0]; + color_mask.at(i, j)[1] = colormap[3 * category_id + 1]; + color_mask.at(i, j)[2] = colormap[3 * category_id + 2]; + } + } + return color_mask; +} + +std::string generate_save_path(const std::string& save_dir, + const std::string& file_path) { + if (access(save_dir.c_str(), 0) < 0) { +#ifdef _WIN32 + mkdir(save_dir.c_str()); +#else + if (mkdir(save_dir.c_str(), S_IRWXU) < 0) { + std::cerr << "Fail to create " << save_dir << "directory." << std::endl; + } +#endif + } + int pos = file_path.find_last_of(OS_PATH_SEP); + std::string image_name(file_path.substr(pos + 1)); + return save_dir + OS_PATH_SEP + image_name; +} +} // namespace PaddleX diff --git a/docs/.DS_Store b/docs/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5735e5119eff09185cce840471f450be898479ec Binary files /dev/null and b/docs/.DS_Store differ diff --git a/docs/README.md b/docs/README.md index df45f14400cac1d6816e9a55ad92dc59ba141650..4c1c6a00b7866487b4d9c53c0a9139f4e38de730 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,6 @@ # PaddleX文档 -PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/latest/index.html)进行查阅。 +PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://paddlex.readthedocs.io/zh_CN/develop/index.html)进行查阅。 ## 编译文档 在本目录下按如下步骤进行文档编译 diff --git a/docs/apis/analysis.md b/docs/apis/analysis.md new file mode 100644 index 0000000000000000000000000000000000000000..81b1c87cf5d78f104d2ce5e1fccae64a40c888ec --- /dev/null +++ b/docs/apis/analysis.md @@ -0,0 +1,48 @@ +# 数据集分析 + +## paddlex.datasets.analysis.Seg +```python +paddlex.datasets.analysis.Seg(data_dir, file_list, label_list) +``` + +构建统计分析语义分类数据集的分析器。 + +> **参数** +> > * **data_dir** (str): 数据集所在的目录路径。 +> > * **file_list** (str): 描述数据集图片文件和类别id的文件路径(文本内每行路径为相对`data_dir`的相对路径)。 +> > * **label_list** (str): 描述数据集包含的类别信息文件路径。 + +### analysis +```python +analysis(self) +``` + +Seg分析器的分析接口,完成以下信息的分析统计: + +> * 图像数量 +> * 图像最大和最小的尺寸 +> * 图像通道数量 +> * 图像各通道的最小值和最大值 +> * 图像各通道的像素值分布 +> * 图像各通道归一化后的均值和方差 +> * 标注图中各类别的数量及比重 + +[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/examples/multi-channel_remote_sensing/tools/analysis.py) + +[统计信息示例](../../examples/multi-channel_remote_sensing/analysis.html#id2) + +### cal_clipped_mean_std +```python +cal_clipped_mean_std(self, clip_min_value, clip_max_value, data_info_file) +``` + +Seg分析器用于计算图像截断后的均值和方差的接口。 + +> **参数** +> > * **clip_min_value** (list): 截断的下限,小于min_val的数值均设为min_val。 +> > * **clip_max_value** (list): 截断的上限,大于max_val的数值均设为max_val。 +> > * **data_info_file** (str): 在analysis()接口中保存的分析结果文件(名为`train_information.pkl`)的路径。 + +[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/examples/multi-channel_remote_sensing/tools/cal_clipped_mean_std.py) + +[计算结果示例](../../examples/multi-channel_remote_sensing/analysis.html#id4) diff --git a/docs/apis/deploy.md b/docs/apis/deploy.md index 4edaace7e4681fd92a3f352ba8a26989b767635d..5b906239b9fb45f92e9bca1ba450c028817885ff 100644 --- a/docs/apis/deploy.md +++ b/docs/apis/deploy.md @@ -7,7 +7,7 @@ 图像分类、目标检测、实例分割、语义分割统一的预测器,实现高性能预测。 ``` -paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_trt=False, use_glog=False, memory_optimize=True) +paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, mkl_thread_num=4, use_trt=False, use_glog=False, memory_optimize=True) ``` **参数** @@ -16,6 +16,7 @@ paddlex.deploy.Predictor(model_dir, use_gpu=False, gpu_id=0, use_mkl=False, use_ > * **use_gpu** (bool): 是否使用GPU进行预测。 > * **gpu_id** (int): 使用的GPU序列号。 > * **use_mkl** (bool): 是否使用mkldnn加速库。 +> * **mkl_thread_num** (int): 使用mkldnn加速库时的线程数,默认为4 > * **use_trt** (boll): 是否使用TensorRT预测引擎。 > * **use_glog** (bool): 是否打印中间日志。 > * **memory_optimize** (bool): 是否优化内存使用。 @@ -44,7 +45,7 @@ predict(image, topk=1) ### batch_predict 接口 ``` -batch_predict(image_list, topk=1, thread_num=2) +batch_predict(image_list, topk=1) ``` 批量图片预测接口。 @@ -52,4 +53,3 @@ batch_predict(image_list, topk=1, thread_num=2) > > > * **image_list** (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径或numpy数组(HWC排列,BGR格式)。 > > * **topk** (int): 图像分类时使用的参数,表示预测前topk个可能的分类。 -> > * **thread_num** (int): 并发执行各图像预处理时的线程数。 diff --git a/docs/apis/index.rst b/docs/apis/index.rst index 57a035122717982bb4ce77d1073eacf51d5e380a..5ee5a6071a2a9a212db5ba5dc1867a45649f8225 100755 --- a/docs/apis/index.rst +++ b/docs/apis/index.rst @@ -6,6 +6,7 @@ API接口说明 transforms/index.rst datasets.md + analysis.md models/index.rst slim.md visualize.md diff --git a/docs/apis/interpret.md b/docs/apis/interpret.md index 60dfb9c6c11dcecf3d2da912e2b5dd68dad1de91..36e65da28c4611d50eff98c873a288c9cf473a5e 100644 --- a/docs/apis/interpret.md +++ b/docs/apis/interpret.md @@ -23,8 +23,12 @@ LIME表示与模型无关的局部可解释性,可以解释任何模型。LIME >* **batch_size** (int): 预测数据batch大小,默认为50。 >* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。 +### 可视化效果 + +![](./docs/gui/images/LIME.png) ### 使用示例 + > 对预测可解释性结果可视化的过程可参见[代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/lime.py)。 diff --git a/docs/apis/models/classification.md b/docs/apis/models/classification.md index 793a889568f8cb597fdea650310acada6512a1e9..96f76b8b5d49d800ad12439eefdb46530fc4c834 100755 --- a/docs/apis/models/classification.md +++ b/docs/apis/models/classification.md @@ -62,7 +62,7 @@ evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False) ### predict ```python -predict(self, img_file, transforms=None, topk=5) +predict(self, img_file, transforms=None, topk=1) ``` > 分类模型预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`ResNet50.test_transforms`和`ResNet50.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`predict`接口时,用户需要再重新定义test_transforms传入给`predict`接口。 @@ -81,7 +81,7 @@ predict(self, img_file, transforms=None, topk=5) ### batch_predict ```python -batch_predict(self, img_file_list, transforms=None, topk=5, thread_num=2) +batch_predict(self, img_file_list, transforms=None, topk=1) ``` > 分类模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`ResNet50.test_transforms`和`ResNet50.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义test_transforms传入给`batch_predict`接口。 @@ -91,7 +91,6 @@ batch_predict(self, img_file_list, transforms=None, topk=5, thread_num=2) > > - **img_file_list** (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.cls.transforms): 数据预处理操作。 > > - **topk** (int): 预测时前k个最大值。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > **返回值** > diff --git a/docs/apis/models/detection.md b/docs/apis/models/detection.md index b3873ce5eba6516c4296d13d2d99510d5e3e6e45..3cc911377bc45d3831faacfb160a760bcc1cd8e2 100755 --- a/docs/apis/models/detection.md +++ b/docs/apis/models/detection.md @@ -108,7 +108,7 @@ predict(self, img_file, transforms=None) ### batch_predict ```python -batch_predict(self, img_file_list, transforms=None, thread_num=2) +batch_predict(self, img_file_list, transforms=None) ``` > PPYOLO模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义`test_transforms`传入给`batch_predict`接口 @@ -117,7 +117,6 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2) > > > - **img_file_list** (str|np.ndarray): 对列表(或元组)中的图像同时进行预测,列表中的元素是预测图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.det.transforms): 数据预处理操作。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > > **返回值** > @@ -222,7 +221,7 @@ predict(self, img_file, transforms=None) ### batch_predict ```python -batch_predict(self, img_file_list, transforms=None, thread_num=2) +batch_predict(self, img_file_list, transforms=None) ``` > YOLOv3模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`YOLOv3.test_transforms`和`YOLOv3.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义`test_transforms`传入给`batch_predict`接口 @@ -231,7 +230,6 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2) > > > - **img_file_list** (str|np.ndarray): 对列表(或元组)中的图像同时进行预测,列表中的元素是预测图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.det.transforms): 数据预处理操作。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > > **返回值** > @@ -327,7 +325,7 @@ predict(self, img_file, transforms=None) ### batch_predict ```python -batch_predict(self, img_file_list, transforms=None, thread_num=2) +batch_predict(self, img_file_list, transforms=None) ``` > FasterRCNN模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`FasterRCNN.test_transforms`和`FasterRCNN.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义test_transforms传入给`batch_predict`接口。 @@ -336,7 +334,6 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2) > > > - **img_file_list** (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素是预测图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.det.transforms): 数据预处理操作。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > > **返回值** > diff --git a/docs/apis/models/instance_segmentation.md b/docs/apis/models/instance_segmentation.md index 494cde32a1888897b5771e6d94d8691d6ff79ce8..3c86096cccef3d66dcad15a2a496023102750386 100755 --- a/docs/apis/models/instance_segmentation.md +++ b/docs/apis/models/instance_segmentation.md @@ -88,7 +88,7 @@ predict(self, img_file, transforms=None) #### batch_predict ```python -batch_predict(self, img_file_list, transforms=None, thread_num=2) +batch_predict(self, img_file_list, transforms=None) ``` > MaskRCNN模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`FasterRCNN.test_transforms`和`FasterRCNN.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义test_transforms传入给`batch_predict`接口。 @@ -97,7 +97,6 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2) > > > - **img_file_list** (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是预测图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.det.transforms): 数据预处理操作。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > > **返回值** > diff --git a/docs/apis/models/semantic_segmentation.md b/docs/apis/models/semantic_segmentation.md index a78cffb5cb95bf89f548b0bcb62850e5a0feaafe..f5c06c6c8dc71579ca92687f4d6e14ce764a4e3f 100755 --- a/docs/apis/models/semantic_segmentation.md +++ b/docs/apis/models/semantic_segmentation.md @@ -3,8 +3,7 @@ ## paddlex.seg.DeepLabv3p ```python -paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None) - +paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None, input_channel=3) ``` > 构建DeepLabv3p分割器。 @@ -23,6 +22,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride > > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 > > - **pooling_crop_size** (int):当backbone为`MobileNetV3_large_x1_0_ssld`时,需设置为训练过程中模型输入大小,格式为[W, H]。例如模型输入大小为[512, 512], 则`pooling_crop_size`应该设置为[512, 512]。在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用`avg_pool`算子得到平均值。默认值None。 +> > - **input_channel** (int): 输入图像通道数。默认值3。 ### train @@ -95,7 +95,7 @@ predict(self, img_file, transforms=None): ### batch_predict ``` -batch_predict(self, img_file_list, transforms=None, thread_num=2): +batch_predict(self, img_file_list, transforms=None): ``` > DeepLabv3p模型批量预测接口。需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`DeepLabv3p.test_transforms`和`DeepLabv3p.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`batch_predict`接口时,用户需要再重新定义test_transforms传入给`batch_predict`接口。 @@ -104,13 +104,40 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2): > > > > - **img_file_list** (list|tuple): 对列表(或元组)中的图像同时进行预测,列表中的元素可以是预测图像路径或numpy数组(HWC排列,BGR格式)。 > > - **transforms** (paddlex.seg.transforms): 数据预处理操作。 -> > - **thread_num** (int): 并发执行各图像预处理时的线程数。 > **返回值** > > > > - **dict**: 每个元素都为列表,表示各图像的预测结果。各图像的预测结果用字典表示,包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。 +### overlap_tile_predict + +``` +overlap_tile_predict(self, img_file, tile_size=[512, 512], pad_size=[64, 64], batch_size=32, transforms=None) +``` + +> DeepLabv3p模型的滑动预测接口, 支持有重叠和无重叠两种方式。 + +> **无重叠的滑动窗口预测**:在输入图片上以固定大小的窗口滑动,分别对每个窗口下的图像进行预测,最后将各窗口的预测结果拼接成输入图片的预测结果。**使用时需要把参数`pad_size`设置为`[0, 0]`**。 + +> **有重叠的滑动窗口预测**:在Unet论文中,作者提出一种有重叠的滑动窗口预测策略(Overlap-tile strategy)来消除拼接处的裂痕感。对各滑动窗口预测时,会向四周扩展一定的面积,对扩展后的窗口进行预测,例如下图中的蓝色部分区域,到拼接时只取各窗口中间部分的预测结果,例如下图中的黄色部分区域。位于输入图像边缘处的窗口,其扩展面积下的像素则通过将边缘部分像素镜像填补得到。 + +![](../../../examples/remote_sensing/images/overlap_tile.png) + +> 需要注意的是,只有在训练过程中定义了eval_dataset,模型在保存时才会将预测时的图像处理流程保存在`DeepLabv3p.test_transforms`和`DeepLabv3p.eval_transforms`中。如未在训练时定义eval_dataset,那在调用预测`overlap_tile_predict`接口时,用户需要再重新定义test_transforms传入给`overlap_tile_predict`接口。 + +> **参数** +> > +> > - **img_file** (str|np.ndarray): 预测图像路径或numpy数组(HWC排列,BGR格式)。 +> > - **tile_size** (list|tuple): 滑动窗口的大小,该区域内用于拼接预测结果,格式为(W,H)。默认值为[512, 512]。 +> > - **pad_size** (list|tuple): 滑动窗口向四周扩展的大小,扩展区域内不用于拼接预测结果,格式为(W,H)。默认值为[64, 64]。 +> > - **batch_size** (int):对窗口进行批量预测时的批量大小。默认值为32。 +> > - **transforms** (paddlex.seg.transforms): 数据预处理操作。 + +> **返回值** +> > +> > - **dict**: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。 + ### tile_predict @@ -155,7 +182,7 @@ batch_predict(self, img_file_list, transforms=None, thread_num=2): ## paddlex.seg.UNet ```python -paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255) +paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, input_channel=3) ``` > 构建UNet分割器。 @@ -168,18 +195,18 @@ paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, us > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 > > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 +> > - **input_channel** (int): 输入图像通道数。默认值3。 > - train 训练接口说明同 [DeepLabv3p模型train接口](#train) > - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate) > - predict 预测接口说明同 [DeepLabv3p模型predict接口](#predict) > - batch_predict 批量预测接口说明同 [DeepLabv3p模型predict接口](#batch-predict) -> - tile_predict 无重叠的大图切小图预测接口同 [DeepLabv3p模型tile_predict接口](#tile-predict) -> - overlap_tile_predict 有重叠的大图切小图预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) +> - overlap_tile_predict 滑动窗口预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) ## paddlex.seg.HRNet ```python -paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255) +paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, input_channel=3) ``` > 构建HRNet分割器。 @@ -192,18 +219,18 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 > > - **class_weight** (list|str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 +> > - **input_channel** (int): 输入图像通道数。默认值3。 > - train 训练接口说明同 [DeepLabv3p模型train接口](#train) > - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate) > - predict 预测接口说明同 [DeepLabv3p模型predict接口](#predict) > - batch_predict 批量预测接口说明同 [DeepLabv3p模型predict接口](#batch-predict) -> - tile_predict 无重叠的大图切小图预测接口同 [DeepLabv3p模型tile_predict接口](#tile-predict) -> - overlap_tile_predict 有重叠的大图切小图预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) +> - overlap_tile_predict 滑动窗预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) ## paddlex.seg.FastSCNN ```python -paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, multi_loss_weight=[1.0]) +paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, multi_loss_weight=[1.0], input_channel=3) ``` > 构建FastSCNN分割器。 @@ -216,10 +243,10 @@ paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, cla > > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 > > - **multi_loss_weight** (list): 多分支上的loss权重。默认计算一个分支上的loss,即默认值为[1.0]。也支持计算两个分支或三个分支上的loss,权重按[fusion_branch_weight, higher_branch_weight, lower_branch_weight]排列,fusion_branch_weight为空间细节分支和全局上下文分支融合后的分支上的loss权重,higher_branch_weight为空间细节分支上的loss权重,lower_branch_weight为全局上下文分支上的loss权重,若higher_branch_weight和lower_branch_weight未设置则不会计算这两个分支上的loss。 +> > - **input_channel** (int): 输入图像通道数。默认值3。 > - train 训练接口说明同 [DeepLabv3p模型train接口](#train) > - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate) > - predict 预测接口说明同 [DeepLabv3p模型predict接口](#predict) > - batch_predict 批量预测接口说明同 [DeepLabv3p模型predict接口](#batch-predict) -> - tile_predict 无重叠的大图切小图预测接口同 [DeepLabv3p模型tile_predict接口](#tile-predict) -> - overlap_tile_predict 有重叠的大图切小图预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) +> - overlap_tile_predict 滑动窗预测接口同 [DeepLabv3p模型poverlap_tile_predict接口](#overlap-tile-predict) diff --git a/docs/apis/transforms/seg_transforms.md b/docs/apis/transforms/seg_transforms.md index f353a8f4436e2793cb4cc7a4c9a086ad4883a87f..4195867a4ad6f40a8d31de7cb518e1d1c0a1094b 100755 --- a/docs/apis/transforms/seg_transforms.md +++ b/docs/apis/transforms/seg_transforms.md @@ -78,16 +78,19 @@ paddlex.seg.transforms.ResizeStepScaling(min_scale_factor=0.75, max_scale_factor ## Normalize ```python -paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) +paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0]) ``` 对图像进行标准化。 -1.图像像素归一化到区间 [0.0, 1.0]。 -2.对图像进行减均值除以标准差操作。 +1.像素值减去min_val +2.像素值除以(max_val-min_val), 归一化到区间 [0.0, 1.0]。 +3.对图像进行减均值除以标准差操作。 + ### 参数 * **mean** (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。 * **std** (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。 - +* **min_val** (list): 图像数据集的最小值。默认值[0, 0, 0]。 +* **max_val** (list): 图像数据集的最大值。默认值[255.0, 255.0, 255.0]。 ## Padding ```python @@ -167,6 +170,16 @@ paddlex.seg.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5, * **hue_range** (int): 色调因子的范围。默认为18。 * **hue_prob** (float): 随机调整色调的概率。默认为0.5。 +## Clip +```python +paddlex.seg.transforms.Clip(min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0]) +``` +对图像上超出一定范围的数据进行截断。 + +### 参数 +* **min_val** (list): 裁剪的下限,小于min_val的数值均设为min_val. 默认值0。 +* **max_val** (list): 裁剪的上限,大于max_val的数值均设为max_val. 默认值255.0。 +