diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md index 250d1fc0c989e6db50deffc06d1c9db47ba4c2f0..e67f69e9b3ae0279b65901afae4ae937dfc84610 100644 --- a/PPOCRLabel/README.md +++ b/PPOCRLabel/README.md @@ -8,7 +8,7 @@ PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, | :-------------------------------------------------: | :--------------------------------------------: | | | | | **irregular text annotation** | **key information annotation** | -| | | +| | | ### Recent Update diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md index 1901e8f3c184497414723fef9ad69829b79cb218..b3c283b4069a925ba92ed4217c962051d8008c27 100644 --- a/PPOCRLabel/README_ch.md +++ b/PPOCRLabel/README_ch.md @@ -8,7 +8,7 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置P | :---------------------------------------------------: | :----------------------------------------------: | | | | | **不规则文本标注** | **关键信息标注** | -| | | +| | | #### 近期更新 diff --git a/README.md b/README.md index 87be91f1ea046b05b601385500369da1e5027ebc..b66b8c1cc0d5cf8d6d9bcdf5efdd8336daafb0d1 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ![](./doc/features_en.png) -> It is recommended to start with the “quick experience” in the document tutorial +> It is recommended to start with the “quick start” in the document tutorial ## Quick Experience @@ -63,10 +63,11 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel -## Community +## Community👬 -- **Join us**👬: Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation. +- For international developers, we regard [PaddleOCR Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) as our international community platform. All ideas and questions can be discussed here in English. +- For Chinese develops, Scan the QR code below with your Wechat, you can join the official technical discussion group. For richer community content, please refer to [中文README](README_ch.md), looking forward to your participation.
diff --git a/README_ch.md b/README_ch.md index a705e46d52d70a4ba07c4952d0392d466fa8b380..6ee37e51ca6e4b52dce12db9d50b652aa6d2cd26 100755 --- a/README_ch.md +++ b/README_ch.md @@ -29,15 +29,9 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - **🔥2022.5.25~26 OCR产业应用两日直播课** - - 25日:车牌识别产业应用实战 - - 26日:一招搞定工业常见数码管、PCB字符识别 + - 25日:车牌识别产业应用实战([AI Studio项目链接](https://aistudio.baidu.com/aistudio/projectdetail/3919091?contributionType=1)) + - 26日:一招搞定工业常见数码管、PCB字符识别(AI Studio项目链接:[数码管识别](https://aistudio.baidu.com/aistudio/projectdetail/4049044?contributionType=1),[PCB字符识别](https://aistudio.baidu.com/aistudio/projectdetail/4008973)) - 扫描下方二维码填写问卷后进入群聊,获取直播链接! -
- -
- - - **🔥2022.5.9 发布PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上; - 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能; @@ -75,11 +69,10 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 开源社区 - +- **项目合作📑:** 如果您是企业开发者且有明确的OCR垂类应用需求,填写[问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx)后可免费与官方团队展开不同层次的合作。 - **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利 - **获取PaddleOCR最新发版解说《OCR超强技术详解与产业应用实战》系列直播课回放链接** - **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。 - - **社区贡献**🏅️:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。 - **社区常规赛**🎁:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。 diff --git a/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_dml.yml b/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_dml.yml index f3ad966488dbc2f6f7ca12033bc4a3d35e1b3bd7..8b160f63538d51dc57b08ba83f7ebf019e3c9dbb 100644 --- a/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_dml.yml +++ b/configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_dml.yml @@ -65,7 +65,7 @@ Loss: - ["Student", "Teacher"] maps_name: "thrink_maps" weight: 1.0 - act: "softmax" + # act: None model_name_pairs: ["Student", "Teacher"] key: maps - DistillationDBLoss: diff --git a/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml b/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml index 3f30ada13f2f1c3c01ba8886bbfba006da516f17..c85fc4b781c2c1aeadf92e0f02685386116a7c3e 100644 --- a/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml +++ b/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml @@ -60,7 +60,7 @@ Loss: - ["Student", "Student2"] maps_name: "thrink_maps" weight: 1.0 - act: "softmax" + # act: None model_name_pairs: ["Student", "Student2"] key: maps - DistillationDBLoss: diff --git a/deploy/android_demo/app/src/main/cpp/native.cpp b/deploy/android_demo/app/src/main/cpp/native.cpp index ced932556f09244d1e9e962e7b75461203a7cc3a..4961e5ecf141bb50701ecf9c3654a54f062937ce 100644 --- a/deploy/android_demo/app/src/main/cpp/native.cpp +++ b/deploy/android_demo/app/src/main/cpp/native.cpp @@ -47,7 +47,7 @@ str_to_cpu_mode(const std::string &cpu_mode) { std::string upper_key; std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), ::toupper); - auto index = cpu_mode_map.find(upper_key); + auto index = cpu_mode_map.find(upper_key.c_str()); if (index == cpu_mode_map.end()) { LOGE("cpu_mode not found %s", upper_key.c_str()); return paddle::lite_api::LITE_POWER_HIGH; @@ -116,4 +116,4 @@ Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release( ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *)java_pointer; delete ppredictor; -} \ No newline at end of file +} diff --git a/deploy/cpp_infer/CMakeLists.txt b/deploy/cpp_infer/CMakeLists.txt index 6d3ecb6ac2e9e6993814f077ca772d0d94f5d008..7deacfadc585e8905870e7fd218c6b115dcd8256 100644 --- a/deploy/cpp_infer/CMakeLists.txt +++ b/deploy/cpp_infer/CMakeLists.txt @@ -92,6 +92,8 @@ include_directories("${PADDLE_LIB}/third_party/install/glog/include") include_directories("${PADDLE_LIB}/third_party/install/gflags/include") include_directories("${PADDLE_LIB}/third_party/install/xxhash/include") include_directories("${PADDLE_LIB}/third_party/install/zlib/include") +include_directories("${PADDLE_LIB}/third_party/install/onnxruntime/include") +include_directories("${PADDLE_LIB}/third_party/install/paddle2onnx/include") include_directories("${PADDLE_LIB}/third_party/boost") include_directories("${PADDLE_LIB}/third_party/eigen3") @@ -110,6 +112,8 @@ link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib") link_directories("${PADDLE_LIB}/third_party/install/glog/lib") link_directories("${PADDLE_LIB}/third_party/install/gflags/lib") link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib") +link_directories("${PADDLE_LIB}/third_party/install/onnxruntime/lib") +link_directories("${PADDLE_LIB}/third_party/install/paddle2onnx/lib") link_directories("${PADDLE_LIB}/paddle/lib") diff --git a/deploy/cpp_infer/docs/windows_vs2019_build.md b/deploy/cpp_infer/docs/windows_vs2019_build.md index 4f391d925008b4bffcbd123e937eb608f502c646..bcaefa46f83a30a4c232add78dc2e9f521b9f84f 100644 --- a/deploy/cpp_infer/docs/windows_vs2019_build.md +++ b/deploy/cpp_infer/docs/windows_vs2019_build.md @@ -109,8 +109,10 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT 运行之前,将下面文件拷贝到`build/Release/`文件夹下 1. `paddle_inference/paddle/lib/paddle_inference.dll` -2. `opencv/build/x64/vc15/bin/opencv_world455.dll` -3. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll` +2. `paddle_inference/third_party/install/onnxruntime/lib/onnxruntime.dll` +3. `paddle_inference/third_party/install/paddle2onnx/lib/paddle2onnx.dll` +4. `opencv/build/x64/vc15/bin/opencv_world455.dll` +5. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll` ### Step4: 预测 diff --git a/deploy/cpp_infer/readme.md b/deploy/cpp_infer/readme.md index ddd15d49558454a5ffb0731665b118c929e607f0..a87db7e6596bc2528bfb4a93c3170ebf0482ccad 100644 --- a/deploy/cpp_infer/readme.md +++ b/deploy/cpp_infer/readme.md @@ -208,7 +208,7 @@ Execute the built executable file: ./build/ppocr [--param1] [--param2] [...] ``` -**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, so if you use the recognition function, you need to add the parameter `--rec_img_h=48`, if you do not use the default `PP-OCRv3` model, you do not need to set this parameter. +**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, if you want to use the old version model, you should add the parameter `--rec_img_h=32`. Specifically, @@ -222,7 +222,6 @@ Specifically, --det=true \ --rec=true \ --cls=true \ - --rec_img_h=48\ ``` ##### 2. det+rec: @@ -234,7 +233,6 @@ Specifically, --det=true \ --rec=true \ --cls=false \ - --rec_img_h=48\ ``` ##### 3. det @@ -254,7 +252,6 @@ Specifically, --det=false \ --rec=true \ --cls=true \ - --rec_img_h=48\ ``` ##### 5. rec @@ -265,7 +262,6 @@ Specifically, --det=false \ --rec=true \ --cls=false \ - --rec_img_h=48\ ``` ##### 6. cls @@ -330,7 +326,7 @@ More parameters are as follows, |rec_model_dir|string|-|Address of recognition inference model| |rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file| |rec_batch_num|int|6|batch size of recognition| -|rec_img_h|int|32|image height of recognition| +|rec_img_h|int|48|image height of recognition| |rec_img_w|int|320|image width of recognition| * Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`. diff --git a/deploy/cpp_infer/readme_ch.md b/deploy/cpp_infer/readme_ch.md index e5a4869eca1d35765013e63011c680e59b33ac00..8c334851c0d44acd393c6daa79edf25dc9e6fa24 100644 --- a/deploy/cpp_infer/readme_ch.md +++ b/deploy/cpp_infer/readme_ch.md @@ -213,7 +213,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir 本demo支持系统串联调用,也支持单个功能的调用,如,只使用检测或识别功能。 -**注意** ppocr默认使用`PP-OCRv3`模型,识别模型使用的输入shape为`3,48,320`, 因此如果使用识别功能,需要添加参数`--rec_img_h=48`,如果不使用默认的`PP-OCRv3`模型,则无需设置该参数。 +**注意** ppocr默认使用`PP-OCRv3`模型,识别模型使用的输入shape为`3,48,320`, 如需使用旧版本的PP-OCR模型,则需要设置参数`--rec_img_h=32`。 运行方式: @@ -232,7 +232,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir --det=true \ --rec=true \ --cls=true \ - --rec_img_h=48\ ``` ##### 2. 检测+识别: @@ -244,7 +243,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir --det=true \ --rec=true \ --cls=false \ - --rec_img_h=48\ ``` ##### 3. 检测: @@ -264,7 +262,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir --det=false \ --rec=true \ --cls=true \ - --rec_img_h=48\ ``` ##### 5. 识别: @@ -275,7 +272,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir --det=false \ --rec=true \ --cls=false \ - --rec_img_h=48\ ``` ##### 6. 分类: @@ -339,7 +335,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir |rec_model_dir|string|-|识别模型inference model地址| |rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件| |rec_batch_num|int|6|识别模型batchsize| -|rec_img_h|int|32|识别模型输入图像高度| +|rec_img_h|int|48|识别模型输入图像高度| |rec_img_w|int|320|识别模型输入图像宽度| diff --git a/deploy/cpp_infer/src/args.cpp b/deploy/cpp_infer/src/args.cpp index fe58236734568035dfb26570df39f21154f4e9ef..93d0f5ea5fd07bdc3eb44537bc1c0d4e131736d3 100644 --- a/deploy/cpp_infer/src/args.cpp +++ b/deploy/cpp_infer/src/args.cpp @@ -47,7 +47,7 @@ DEFINE_string(rec_model_dir, "", "Path of rec inference model."); DEFINE_int32(rec_batch_num, 6, "rec_batch_num."); DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt", "Path of dictionary."); -DEFINE_int32(rec_img_h, 32, "rec image height"); +DEFINE_int32(rec_img_h, 48, "rec image height"); DEFINE_int32(rec_img_w, 320, "rec image width"); // ocr forward related diff --git a/deploy/cpp_infer/src/ocr_rec.cpp b/deploy/cpp_infer/src/ocr_rec.cpp index f69f37b8f51ecec5925d556f2b3e169bb0e80715..31a1a884a1aa25134d19e80f9ddac9bc35637fba 100644 --- a/deploy/cpp_infer/src/ocr_rec.cpp +++ b/deploy/cpp_infer/src/ocr_rec.cpp @@ -132,7 +132,9 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) { paddle_infer::Config config; config.SetModel(model_dir + "/inference.pdmodel", model_dir + "/inference.pdiparams"); - + std::cout << "In PP-OCRv3, default rec_img_h is 48," + << "if you use other model, you should set the param rec_img_h=32" + << std::endl; if (this->use_gpu_) { config.EnableUseGpu(this->gpu_mem_, this->gpu_id_); if (this->use_tensorrt_) { diff --git a/deploy/lite/config.txt b/deploy/lite/config.txt index 4c68105d39031830a8222b3d88163aebc8cac257..dda0d2b0320544d3a82f59b0672c086c64d83d3d 100644 --- a/deploy/lite/config.txt +++ b/deploy/lite/config.txt @@ -4,4 +4,5 @@ det_db_box_thresh 0.5 det_db_unclip_ratio 1.6 det_db_use_dilate 0 det_use_polygon_score 1 -use_direction_classify 1 \ No newline at end of file +use_direction_classify 1 +rec_image_height 32 \ No newline at end of file diff --git a/deploy/lite/crnn_process.cc b/deploy/lite/crnn_process.cc index 7528f36fe6316c84724891a4421c047fbdd33fa2..6d5fc1504e7b1b3faa35a80662442f60d2e30499 100644 --- a/deploy/lite/crnn_process.cc +++ b/deploy/lite/crnn_process.cc @@ -19,25 +19,27 @@ const std::vector rec_image_shape{3, 32, 320}; -cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio) { +cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio, int rec_image_height) { int imgC, imgH, imgW; imgC = rec_image_shape[0]; + imgH = rec_image_height; imgW = rec_image_shape[2]; - imgH = rec_image_shape[1]; - imgW = int(32 * wh_ratio); + imgW = int(imgH * wh_ratio); - float ratio = static_cast(img.cols) / static_cast(img.rows); + float ratio = float(img.cols) / float(img.rows); int resize_w, resize_h; + if (ceilf(imgH * ratio) > imgW) resize_w = imgW; else - resize_w = static_cast(ceilf(imgH * ratio)); - cv::Mat resize_img; + resize_w = int(ceilf(imgH * ratio)); + cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f, cv::INTER_LINEAR); - - return resize_img; + cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, + int(imgW - resize_img.cols), cv::BORDER_CONSTANT, + {127, 127, 127}); } std::vector ReadDict(std::string path) { diff --git a/deploy/lite/crnn_process.h b/deploy/lite/crnn_process.h index 29e67906976198210394c4960786105bf884dce8..ed7a3167069538a0c40d1bc01f0073c36cb7e461 100644 --- a/deploy/lite/crnn_process.h +++ b/deploy/lite/crnn_process.h @@ -26,7 +26,7 @@ #include "opencv2/imgcodecs.hpp" #include "opencv2/imgproc.hpp" -cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio); +cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio, int rec_image_height); std::vector ReadDict(std::string path); diff --git a/deploy/lite/ocr_db_crnn.cc b/deploy/lite/ocr_db_crnn.cc index 1ffbbacb74545b0bbea4957e25b6235225bad02b..cb2bf7791a4307d4e8d2167197d41d903410e0b4 100644 --- a/deploy/lite/ocr_db_crnn.cc +++ b/deploy/lite/ocr_db_crnn.cc @@ -162,7 +162,8 @@ void RunRecModel(std::vector>> boxes, cv::Mat img, std::vector charactor_dict, std::shared_ptr predictor_cls, int use_direction_classify, - std::vector *times) { + std::vector *times, + int rec_image_height) { std::vector mean = {0.5f, 0.5f, 0.5f}; std::vector scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; @@ -183,7 +184,7 @@ void RunRecModel(std::vector>> boxes, cv::Mat img, float wh_ratio = static_cast(crop_img.cols) / static_cast(crop_img.rows); - resize_img = CrnnResizeImg(crop_img, wh_ratio); + resize_img = CrnnResizeImg(crop_img, wh_ratio, rec_image_height); resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f); const float *dimg = reinterpret_cast(resize_img.data); @@ -444,7 +445,7 @@ void system(char **argv){ //// load config from txt file auto Config = LoadConfigTxt(det_config_path); int use_direction_classify = int(Config["use_direction_classify"]); - + int rec_image_height = int(Config["rec_image_height"]); auto charactor_dict = ReadDict(dict_path); charactor_dict.insert(charactor_dict.begin(), "#"); // blank char for ctc charactor_dict.push_back(" "); @@ -590,12 +591,16 @@ void rec(int argc, char **argv) { std::string batchsize = argv[6]; std::string img_dir = argv[7]; std::string dict_path = argv[8]; + std::string config_path = argv[9]; if (strcmp(argv[4], "FP32") != 0 && strcmp(argv[4], "INT8") != 0) { std::cerr << "Only support FP32 or INT8." << std::endl; exit(1); } + auto Config = LoadConfigTxt(config_path); + int rec_image_height = int(Config["rec_image_height"]); + std::vector cv_all_img_names; cv::glob(img_dir, cv_all_img_names); @@ -630,7 +635,7 @@ void rec(int argc, char **argv) { std::vector rec_text_score; std::vector times; RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score, - charactor_dict, cls_predictor, 0, ×); + charactor_dict, cls_predictor, 0, ×, rec_image_height); //// print recognized text for (int i = 0; i < rec_text.size(); i++) { diff --git a/deploy/lite/readme.md b/deploy/lite/readme.md index 9926e2dd8c973b25b5397fd5825f790528ede279..a1bef8120e52dd91db0fda4ac2a4d91cc2800818 100644 --- a/deploy/lite/readme.md +++ b/deploy/lite/readme.md @@ -34,7 +34,7 @@ For the compilation process of different development environments, please refer ### 1.2 Prepare Paddle-Lite library There are two ways to obtain the Paddle-Lite library: -- 1. Download directly, the download link of the Paddle-Lite library is as follows: +- 1. [Recommended] Download directly, the download link of the Paddle-Lite library is as follows: | Platform | Paddle-Lite library download link | |---|---| @@ -43,7 +43,9 @@ There are two ways to obtain the Paddle-Lite library: Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10). -- 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows: + **Note: It is recommended to use paddlelite>=2.10 version of the prediction library, other prediction library versions [download link](https://github.com/PaddlePaddle/Paddle-Lite/tags)** + +- 2. Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows: ``` git clone https://github.com/PaddlePaddle/Paddle-Lite.git cd Paddle-Lite @@ -104,21 +106,17 @@ If you directly use the model in the above table for deployment, you can skip th If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model. -The `opt` tool can be obtained by compiling Paddle Lite. +- Step 1: Refer to [document](https://www.paddlepaddle.org.cn/lite/v2.10/user_guides/opt/opt_python.html) to install paddlelite, which is used to convert paddle inference model to paddlelite required for running nb model ``` -git clone https://github.com/PaddlePaddle/Paddle-Lite.git -cd Paddle-Lite -git checkout release/v2.10 -./lite/tools/build.sh build_optimize_tool +pip install paddlelite==2.10 # The paddlelite version should be the same as the prediction library version ``` - -After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways: - +After installation, the following commands can view the help information ``` -cd build.opt/lite/api/ -./opt +paddle_lite_opt ``` +Introduction to paddle_lite_opt parameters: + |Options|Description| |---|---| |--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)| @@ -131,6 +129,8 @@ cd build.opt/lite/api/ `--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file. +- Step 2: Use paddle_lite_opt to convert the inference model to the mobile model format. + The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model ``` @@ -240,6 +240,7 @@ det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use +rec_image_height 32 # The height of the input image of the recognition model, the PP-OCRv3 model needs to be set to 48, and the PP-OCRv2 model needs to be set to 32 ``` 5. Run Model on phone @@ -258,8 +259,15 @@ After the above steps are completed, you can use adb to push the file to the pho cd /data/local/tmp/debug export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH # The use of ocr_db_crnn is: - # ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path - ./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_opt.nb ./11.jpg ppocr_keys_v1.txt + # ./ocr_db_crnn Mode Detection model file Orientation classifier model file Recognition model file Hardware Precision Threads Batchsize Test image path Dictionary file path + ./ocr_db_crnn system ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt ppocr_keys_v1.txt True +# precision can be INT8 for quantitative model or FP32 for normal model. + +# Only using detection model +./ocr_db_crnn det ch_PP-OCRv2_det_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt + +# Only using recognition model +./ocr_db_crnn rec ch_PP-OCRv2_rec_slim_opt.nb arm8 INT8 10 1 word_1.jpg ppocr_keys_v1.txt config.txt ``` If you modify the code, you need to recompile and push to the phone. @@ -283,3 +291,7 @@ A2: Replace the .jpg test image under ./debug with the image you want to test, a Q3: How to package it into the mobile APP? A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference. + +Q4: When running the demo, an error is reported `Error: This model is not supported, because kernel for 'io_copy' is not supported by Paddle-Lite.` + +A4: The problem is that the installed paddlelite version does not match the downloaded prediction library version. Make sure that the paddleliteopt tool matches your prediction library version, and try to switch to the nb model again. diff --git a/deploy/lite/readme_ch.md b/deploy/lite/readme_ch.md index 99a543d0d60455443dd872c56a5832c8ca0ff4e9..0793827fe647c470944fc36e2b243c8f7e704e99 100644 --- a/deploy/lite/readme_ch.md +++ b/deploy/lite/readme_ch.md @@ -8,7 +8,7 @@ - [2.1 模型优化](#21-模型优化) - [2.2 与手机联调](#22-与手机联调) - [FAQ](#faq) - + 本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleOCR超轻量中文检测、识别模型的详细步骤。 @@ -32,7 +32,7 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 ### 1.2 准备预测库 预测库有两种获取方式: -- 1. 直接下载,预测库下载链接如下: +- 1. [推荐]直接下载,预测库下载链接如下: | 平台 | 预测库下载链接 | |---|---| @@ -41,7 +41,9 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理 注:1. 上述预测库为PaddleLite 2.10分支编译得到,有关PaddleLite 2.10 详细信息可参考 [链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10) 。 -- 2. [推荐]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: +**注:建议使用paddlelite>=2.10版本的预测库,其他预测库版本[下载链接](https://github.com/PaddlePaddle/Paddle-Lite/tags)** + +- 2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: ``` git clone https://github.com/PaddlePaddle/Paddle-Lite.git cd Paddle-Lite @@ -102,22 +104,16 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括 如果要部署的模型不在上述表格中,则需要按照如下步骤获得优化后的模型。 -模型优化需要Paddle-Lite的opt可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下: +- 步骤1:参考[文档](https://www.paddlepaddle.org.cn/lite/v2.10/user_guides/opt/opt_python.html)安装paddlelite,用于转换paddle inference model为paddlelite运行所需的nb模型 ``` -# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite -git clone https://github.com/PaddlePaddle/Paddle-Lite.git -cd Paddle-Lite -git checkout release/v2.10 -# 启动编译 -./lite/tools/build.sh build_optimize_tool +pip install paddlelite==2.10 # paddlelite版本要与预测库版本一致 ``` - -编译完成后,opt文件位于`build.opt/lite/api/`下,可通过如下方式查看opt的运行选项和使用方式; +安装完后,如下指令可以查看帮助信息 ``` -cd build.opt/lite/api/ -./opt +paddle_lite_opt ``` +paddle_lite_opt 参数介绍: |选项|说明| |---|---| |--model_dir|待优化的PaddlePaddle模型(非combined形式)的路径| @@ -130,6 +126,8 @@ cd build.opt/lite/api/ `--model_dir`适用于待优化的模型是非combined方式,PaddleOCR的inference模型是combined方式,即模型结构和模型参数使用单独一个文件存储。 +- 步骤2:使用paddle_lite_opt将inference模型转换成移动端模型格式。 + 下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。 ``` @@ -148,7 +146,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls 转换成功后,inference模型目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。 -注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型 +注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt工具的输入模型是paddle保存的inference模型 ### 2.2 与手机联调 @@ -234,13 +232,14 @@ ppocr_keys_v1.txt # 中文字典 ... ``` -2. `config.txt` 包含了检测器、分类器的超参数,如下: +2. `config.txt` 包含了检测器、分类器、识别器的超参数,如下: ``` max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960 det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显 -det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小 +det_db_box_thresh 0.5 # 检测器后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小 det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本 use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用 +rec_image_height 32 # 识别模型输入图像的高度,PP-OCRv3模型设置为48,PP-OCRv2模型需要设置为32 ``` 5. 启动调试 @@ -259,8 +258,14 @@ use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1 cd /data/local/tmp/debug export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH # 开始使用,ocr_db_crnn可执行文件的使用方式为: - # ./ocr_db_crnn 检测模型文件 方向分类器模型文件 识别模型文件 测试图像路径 字典文件路径 - ./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb ./11.jpg ppocr_keys_v1.txt + # ./ocr_db_crnn 预测模式 检测模型文件 方向分类器模型文件 识别模型文件 运行硬件 运行精度 线程数 batchsize 测试图像路径 参数配置路径 字典文件路径 是否使用benchmark参数 + ./ocr_db_crnn system ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt ppocr_keys_v1.txt True + +# 仅使用文本检测模型,使用方式如下: +./ocr_db_crnn det ch_PP-OCRv2_det_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt + +# 仅使用文本识别模型,使用方式如下: +./ocr_db_crnn rec ch_PP-OCRv2_rec_slim_opt.nb arm8 INT8 10 1 word_1.jpg ppocr_keys_v1.txt config.txt ``` 如果对代码做了修改,则需要重新编译并push到手机上。 @@ -284,3 +289,7 @@ A2:替换debug下的.jpg测试图像为你想要测试的图像,adb push 到 Q3:如何封装到手机APP中? A3:此demo旨在提供能在手机上运行OCR的核心算法部分,PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例,供参考 + +Q4:运行demo时遇到报错`Error: This model is not supported, because kernel for 'io_copy' is not supported by Paddle-Lite.` + +A4:问题是安装的paddlelite版本和下载的预测库版本不匹配,确保paddleliteopt工具和你的预测库版本匹配,重新转nb模型试试。 diff --git a/deploy/pdserving/ocr_reader.py b/deploy/pdserving/ocr_reader.py index 75f0f3d5c3aea488f82ec01a72e20310663d565b..d488cc0920391eded6c08945597b5c938b7c7a42 100644 --- a/deploy/pdserving/ocr_reader.py +++ b/deploy/pdserving/ocr_reader.py @@ -339,7 +339,7 @@ class CharacterOps(object): class OCRReader(object): def __init__(self, algorithm="CRNN", - image_shape=[3, 32, 320], + image_shape=[3, 48, 320], char_type="ch", batch_num=1, char_dict_path="./ppocr_keys_v1.txt"): @@ -356,7 +356,7 @@ class OCRReader(object): def resize_norm_img(self, img, max_wh_ratio): imgC, imgH, imgW = self.rec_image_shape if self.character_type == "ch": - imgW = int(32 * max_wh_ratio) + imgW = int(imgH * max_wh_ratio) h = img.shape[0] w = img.shape[1] ratio = w / float(h) @@ -377,7 +377,7 @@ class OCRReader(object): def preprocess(self, img_list): img_num = len(img_list) norm_img_batch = [] - max_wh_ratio = 0 + max_wh_ratio = 320/48. for ino in range(img_num): h, w = img_list[ino].shape[0:2] wh_ratio = w * 1.0 / h diff --git a/deploy/pdserving/web_service.py b/deploy/pdserving/web_service.py index f05806ce030238144568a3ca137798a9132027e4..d8491dc572fa2c5c4186a426ce689254d312cb45 100644 --- a/deploy/pdserving/web_service.py +++ b/deploy/pdserving/web_service.py @@ -63,7 +63,6 @@ class DetOp(Op): dt_boxes_list = self.post_func(det_out, [ratio_list]) dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w]) out_dict = {"dt_boxes": dt_boxes, "image": self.raw_im} - return out_dict, None, "" @@ -86,7 +85,7 @@ class RecOp(Op): dt_boxes = copy.deepcopy(self.dt_list) feed_list = [] img_list = [] - max_wh_ratio = 0 + max_wh_ratio = 320/48. ## Many mini-batchs, the type of feed_data is list. max_batch_size = 6 # len(dt_boxes) @@ -150,7 +149,8 @@ class RecOp(Op): for i in range(dt_num): text = rec_list[i] dt_box = self.dt_list[i] - result_list.append([text, dt_box.tolist()]) + if text[1] >= 0.5: + result_list.append([text, dt_box.tolist()]) res = {"result": str(result_list)} return res, None, "" diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md index 33b2c4784afa4be68c8b9db1a02d83013c886655..c6796ae9dc256496308e432023c45ef1026c3d92 100644 --- a/deploy/slim/quantization/README_en.md +++ b/deploy/slim/quantization/README_en.md @@ -73,4 +73,4 @@ python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_ The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. The derived model can be converted through the `opt tool` of PaddleLite. -For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md) +For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme.md) diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index 24f8a3e92be1c003ec7c37b74d14f4ae4117086a..fa2a392b7047a5c5d1662673a4c70bd7715d0e9a 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -455,6 +455,19 @@ A:以检测中的resnet骨干网络为例,图像输入网络之后,需要 A:可以在命令中加入 --det_db_unclip_ratio ,参数定义位置,这个参数是检测后处理时控制文本框大小的,默认1.6,可以尝试改成2.5或者更大,反之,如果觉得文本框不够紧凑,也可以把该参数调小。 +#### Q:PP-OCR文本检测出现明显漏检,该如何调整参数或者训练。 + +A:以[#issue5851](https://github.com/PaddlePaddle/PaddleOCR/issues/5815)为例,文字出现漏检时,先分析问题原因。 +首先,在[后处理处](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/ppocr/postprocess/db_postprocess.py#L177)添加如下代码,可视化模型对文字的分割区域: +``` +import cv2 +im = np.array(segmentation[0]*255).astype(np.uint8) +cv2.imwrite("db_seg_vis.jpg", im) +``` +如果模型在漏检文字区域有分割区域,但是没有生成框,此类情况多为文字太小,文字弯曲,或者某一行开头的文字过大等等。可减小DB后处理参数[det_db_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L52)和[det_db_box_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L53),或者设置[use_dilation](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L56)为True扩大文字分割区域。 +如果模型在漏检文字区域没有分割区域,是模型对此类数据没有训练好,建议用PP-OCR的模型在你的数据场景上finetune。 + + ### 2.7 模型结构 @@ -839,3 +852,15 @@ nvidia-smi --lock-gpu-clocks=1590 -i 0 #### Q: 预测时显存爆炸、内存泄漏问题? **A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)。 + + +#### Q: TRT预测报错:InvalidArgumentError: some trt inputs dynamic shape info not set, check the INFO log above for more details. + +**A**: PP-OCR的模型采用动态shape预测,因为TRT子图划分问题,需要设置额外参数的shape; +首先,屏蔽此行代码[config.disable_glog_info()](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L306)输出日志,重新预测,根据报错信息中的提示设置某些参数的动态shape。 +假设报错信息中: +trt input [lstm_1.tmp_0] dynamic shape info not set, please check and retry. + +可以参考[检测模型里设置动态shape的方式](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L268-L270)设置lstm_1.tmp_0的动态shape,识别模型的动态shape在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L275-L277); + +如果不清楚lstm_1.tmp_0的shape是多少,可以把inference.pdmodel 放在网页 https://netron.app/ 里可视化,搜索lstm_1.tmp_0 查看该参数的shape信息。 \ No newline at end of file diff --git a/doc/doc_ch/dataset/layout_datasets.md b/doc/doc_ch/dataset/layout_datasets.md index e7055b4e607aae358a9ec1e93f3640b2b68ea4a1..728a9be5fdd33a78482adb1e705afea7117a3037 100644 --- a/doc/doc_ch/dataset/layout_datasets.md +++ b/doc/doc_ch/dataset/layout_datasets.md @@ -15,8 +15,8 @@ - **数据简介**:publaynet数据集的训练集合中包含35万张图像,验证集合中包含1.1万张图像。总共包含5个类别,分别是: `text, title, list, table, figure`。部分图像以及标注框可视化如下所示。
- - + +
- **下载地址**:https://developer.ibm.com/exchanges/data/all/publaynet/ @@ -30,8 +30,8 @@ - **数据简介**:CDLA据集的训练集合中包含5000张图像,验证集合中包含1000张图像。总共包含10个类别,分别是: `Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation`。部分图像以及标注框可视化如下所示。
- - + +
- **下载地址**:https://github.com/buptlihang/CDLA @@ -45,8 +45,8 @@ - **数据简介**:TableBank数据集包含Latex(训练集187199张,验证集7265张,测试集5719张)与Word(训练集73383张,验证集2735张,测试集2281张)两种类别的文档。仅包含`Table` 1个类别。部分图像以及标注框可视化如下所示。
- - + +
- **下载地址**:https://doc-analysis.github.io/tablebank-page/index.html diff --git a/doc/doc_ch/knowledge_distillation.md b/doc/doc_ch/knowledge_distillation.md index 2adba3659e101fe214d31b805d0800fd5128595c..ff474797b8a086896df5d886e013e470c716aa87 100644 --- a/doc/doc_ch/knowledge_distillation.md +++ b/doc/doc_ch/knowledge_distillation.md @@ -591,8 +591,9 @@ Metric: #### 2.2.5 检测蒸馏模型finetune PP-OCRv3检测蒸馏有两种方式: -- 采用ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型 +- 采用ch_PP-OCRv3_det_cml.yml,采用CML蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型。 - 采用ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法,在PaddleOCR采用的数据集上相比单独训练Student模型有1%-2%的提升。 +> 如果您在自己的场景中没有训练过高精度大模型,或原始PP-OCR模型在您的场景中表现不好,则无法使用CML训练以达到更高精度,更应该采用DML训练 在具体fine-tune时,需要在网络结构的`pretrained`参数中设置要加载的预训练模型。 diff --git a/doc/doc_ch/ocr_book.md b/doc/doc_ch/ocr_book.md index 2760494af44e84e8670a36b61fc57ed907d96906..8697e29a55ca41273b86f127a77f7b02c378bda2 100644 --- a/doc/doc_ch/ocr_book.md +++ b/doc/doc_ch/ocr_book.md @@ -1,6 +1,6 @@ # 《动手学OCR》电子书 -《动手学OCR》是PaddleOCR团队携手复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下: +《动手学OCR》是PaddleOCR团队携手华中科技大学博导/教授,IAPR Fellow 白翔、复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉、中国工商银行大数据人工智能实验室研究员等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下: - 覆盖从文本检测识别到文档分析的OCR全栈技术 - 紧密结合理论实践,跨越代码实现鸿沟,并配套教学视频 diff --git a/doc/doc_ch/ppocr_introduction.md b/doc/doc_ch/ppocr_introduction.md index 59de124e2ab855d0b4abb90d0a356aefd6db586d..bd62087c8b212098cba6b0ee1cbaf413ab23015f 100644 --- a/doc/doc_ch/ppocr_introduction.md +++ b/doc/doc_ch/ppocr_introduction.md @@ -30,11 +30,11 @@ PP-OCR系统pipeline如下: PP-OCR系统在持续迭代优化,目前已发布PP-OCR和PP-OCRv2两个版本: -PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 +PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考[PP-OCR技术报告](https://arxiv.org/abs/2009.09941)。 #### PP-OCRv2 -PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)。 +PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考[PP-OCRv2技术报告](https://arxiv.org/abs/2109.03144)。 #### PP-OCRv3 @@ -48,7 +48,7 @@ PP-OCRv3系统pipeline如下:
-更多细节请参考PP-OCRv3[技术报告](./PP-OCRv3_introduction.md)。 +更多细节请参考[PP-OCRv3技术报告](https://arxiv.org/abs/2206.03001v2) 👉[中文简洁版](./PP-OCRv3_introduction.md) diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 29ca48fa838be4a60f08d31d5031180b951e33bc..e425cdd8a87d320554e61c72e05001875d022e43 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -101,8 +101,17 @@ cd /path/to/ppocr_img ['韩国小馆', 0.994467] ``` +**版本说明** +paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`),如需使用其他版本可通过设置参数`--ocr_version`,具体版本说明如下: +| 版本名称 | 版本说明 | +| --- | --- | +| PP-OCRv3 | 支持中、英文检测和识别,方向分类器,支持多语种识别 | +| PP-OCRv2 | 支持中英文的检测和识别,方向分类器,多语言暂未更新 | +| PP-OCR | 支持中、英文检测和识别,方向分类器,支持多语种识别 | -如需使用2.0模型,请指定参数`--ocr_version PP-OCR`,paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`)。更多whl包使用可参考[whl包文档](./whl.md) +如需新增自己训练的模型,可以在[paddleocr](../../paddleocr.py)中增加模型链接和字段,重新编译即可。 + +更多whl包使用可参考[whl包文档](./whl.md) diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index 8457df69ff4c09b196b0f0f91271a92344217d75..52cef725d22903742e03c48b6e6972879c8ad2fe 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -550,4 +550,4 @@ inference/en_PP-OCRv3_rec/ Q1: 训练模型转inference 模型之后预测效果不一致? -**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。 +**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。更多内容请参考[FAQ](./FAQ.md#210-%E6%A8%A1%E5%9E%8B%E6%95%88%E6%9E%9C%E4%B8%8E%E6%95%88%E6%9E%9C%E4%B8%8D%E4%B8%80%E8%87%B4). diff --git a/doc/doc_ch/thirdparty.md b/doc/doc_ch/thirdparty.md index ff9059cdf698938fcd04de852ecef2419b23ee85..f63e9cf51f0ef517ab1309c91726fd25fe745594 100644 --- a/doc/doc_ch/thirdparty.md +++ b/doc/doc_ch/thirdparty.md @@ -1,4 +1,4 @@ -# 社区贡献 +# 开源社区 感谢大家长久以来对PaddleOCR的支持和关注,与广大开发者共同构建一个专业、和谐、相互帮助的开源社区是PaddleOCR的目标。本文档展示了已有的社区贡献、对于各类贡献说明、新的机会与流程,希望贡献流程更加高效、路径更加清晰。 @@ -119,25 +119,30 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署 ## 附录:社区常规赛积分榜 -| 开发者 | 总积分 | 开发者 | 总积分 | -| ------------------------------------------------------- | ------ | ----------------------------------------------------- | ------ | -| [RangeKing](https://github.com/RangeKing) | 220 | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 | -| [hao6699](https://github.com/hao6699) | 145 | [v3fc](https://github.com/v3fc) | 35 | -| [mymagicpower](https://github.com/mymagicpower) | 140 | [imiyu](https://github.com/imiyu) | 30 | -| [raoyutian](https://github.com/raoyutian) | 90 | [haigang1975](https://github.com/haigang1975) | 29 | -| [sdcb](https://github.com/sdcb) | 80 | [daassh](https://github.com/daassh) | 23 | -| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 70 | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 20 | -| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 | -| [livingbody](https://github.com/livingbody) | 70 | [nmusik](https://github.com/nmusik) | 20 | -| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 | -| [bupt906](https://github.com/bupt906) | 60 | [chccc1994](https://github.com/chccc1994) | 20 | -| [edencfc](https://github.com/edencfc) | 57 | [BeyondYourself ](https://github.com/BeyondYourself) | 20 | -| [zhangyingying520](https://github.com/zhangyingying520) | 57 | chenguoqi08161 | 18 | -| [ITerydh](https://github.com/ITerydh) | 55 | [weiwenlan](https://github.com/weiwenlan) | 10 | -| [telppa](https://github.com/telppa) | 40 | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 | -| sosojust1984 | 40 | [jordan2013](https://github.com/jordan2013) | 10 | -| [redearly123](https://github.com/redearly123) | 40 | [JimEverest](https://github.com/JimEverest) | 10 | -| [OneYearIsEnough](https://github.com/OneYearIsEnough) | 40 | [HustBestCat](https://github.com/HustBestCat) | 10 | -| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | | -| [GreatV](https://github.com/GreatV) | 40 | | | -| CLXK294 | 40 | | | +| 开发者 | 总积分(+新增积分) | 开发者 | 总积分(+新增积分) | +| ---------------------------------------------------------- | ---------------- | ---------------------------------------------------- | --------------- | +| [RangeKing](https://github.com/RangeKing) 🥇 | 222(+2) | CLXK294 | 40 | +| [mymagicpower](https://github.com/mymagicpower) 🥈 | 150(+10) | [telppa](https://github.com/telppa) | 40 | +| [hao6699 ](https://github.com/hao6699) 🥉 | 145 | sosojust1984 | 40 | +| [raoyutian](https://github.com/raoyutian) | 135(+45) | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 | +| [OneYearIsEnough](https://github.com/OneYearIsEnough) 🏅 ↑12 | 120(+80) | [v3fc](https://github.com/v3fc) | 35 | +| [edencfc](https://github.com/edencfc) ↑5 | 97(+40) | [imiyu](https://github.com/imiyu) | 30 | +| [zhangyingying520](https://github.com/zhangyingying520) ↑5 | 91(+34) | [haigang1975](https://github.com/haigang1975) | 29 | +| [sdcb](https://github.com/sdcb) | 90(+10) | [daassh](https://github.com/daassh) | 23 | +| [xiaxianlei](https://github.com/xiaxianlei) ✨ 🏆 | +81 | [BeyondYourself](https://github.com/BeyondYourself) | 26(+6) | +| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 80(+10) | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 26(+6) | +| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 | +| [livingbody](https://github.com/livingbody) | 74(+4) | [nmusik](https://github.com/nmusik) | 20 | +| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 | +| [bupt906](https://github.com/bupt906) | 64(+4) | [chccc1994](https://github.com/chccc1994) | 20 | +| [PeterH0323](https://github.com/PeterH0323) ✨ 🎖 | +55 | chenguoqi08161 | 18 | +| [ITerydh](https://github.com/ITerydh) | 55 | [JimEverest](https://github.com/JimEverest) | 14(+4) | +| [d2623587501](https://github.com/d2623587501) | 51(+6) | [weiwenlan](https://github.com/weiwenlan) | 10 | +| [Wei-JL](https://github.com/Wei-JL) | 46(+6) | [HustBestCat](https://github.com/HustBestCat) | 10 | +| [GreatV](https://github.com/GreatV) | 42(+2) | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 | +| [fuhengwu2021](https://github.com/fuhengwu2021) ✨ | +40 | [jordan2013](https://github.com/jordan2013) | 10 | +| [manangoel99](https://github.com/manangoel99) ✨ | +40 | | | +| [redearly123](https://github.com/redearly123) | 40 | | | +| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | | + + diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 07591ea126f5b168389657141673142c084e67ad..7851be62ddf22373e5e88b538f4e3d3605ee2461 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -4,8 +4,7 @@ - 半自动标注工具[PPOCRLabelv2](../../PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能; - OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求 - 交互式OCR开源电子书[《动手学OCR》](./ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。 -- 2022.5.7 添加对[Weights & Biases](https://docs.wandb.ai/)训练日志记录工具的支持。 -- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207 +- 2021.12.21 《动手学OCR·十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】[报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)。 - 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。 - 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 - 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 @@ -29,7 +28,7 @@ - 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档 - 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md) - 2020.6.8 添加[数据集](dataset/datasets.md),并保持持续更新 -- 2020.6.5 支持 `attetnion` 模型导出 `inference_model` +- 2020.6.5 支持 `attention` 模型导出 `inference_model` - 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.5.30 提供超轻量级中文OCR在线体验 - 2020.5.30 模型预测、训练支持Windows系统 diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md index 9ab25653e219c18e1acaaf7c99b050f790bcb1b9..baa6c9beed6136361596076fba0c8d352f88722e 100644 --- a/doc/doc_en/PP-OCRv3_introduction_en.md +++ b/doc/doc_en/PP-OCRv3_introduction_en.md @@ -77,7 +77,7 @@ LK-PAN (Large Kernel PAN) is a lightweight [PAN](https://arxiv.org/pdf/1803.0153 **(2) DML: Deep Mutual Learning Strategy for Teacher Model** -[DML](https://arxiv.org/abs/1706.00384)(Collaborative Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%. +[DML](https://arxiv.org/abs/1706.00384)(Deep Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
@@ -101,7 +101,7 @@ Considering that the features of some channels will be suppressed if the convolu The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability. -The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model. +The recognition accuracy of SVTR_tiny outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
diff --git a/doc/doc_en/ppocr_introduction_en.md b/doc/doc_en/ppocr_introduction_en.md index b13d7f9bf1915de4bbbbec7b384d278e1d7ab8b4..5c0f6d2d7e5f82fce9a29b286a7e27b97306833a 100644 --- a/doc/doc_en/ppocr_introduction_en.md +++ b/doc/doc_en/ppocr_introduction_en.md @@ -29,10 +29,10 @@ PP-OCR pipeline is as follows: PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released: -PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). +PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the [PP-OCR technical report](https://arxiv.org/abs/2009.09941). #### PP-OCRv2 -On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144). +On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the [PP-OCRv2 technical report](https://arxiv.org/abs/2109.03144). #### PP-OCRv3 @@ -46,7 +46,7 @@ PP-OCRv3 pipeline is as follows:
-For more details, please refer to [PP-OCRv3 technical report](./PP-OCRv3_introduction_en.md). +For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/abs/2206.03001v2). ## 2. Features diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index 53f4313579bf39204e085bb0a90d219e3a1c1b5d..8ea32b0f5733139759d9fc5739e37cafa02a1bb2 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -119,7 +119,18 @@ If you do not use the provided test image, you can replace the following `--imag ['PAIN', 0.9934559464454651] ``` -If you need to use the 2.0 model, please specify the parameter `--ocr_version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md) +**Version** +paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). If you want to use other versions, you can set the parameter `--ocr_version`, the specific version description is as follows: +| version name | description | +| --- | --- | +| PP-OCRv3 | support Chinese and English detection and recognition, direction classifier, support multilingual recognition | +| PP-OCRv2 | only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated | +| PP-OCR | support Chinese and English detection and recognition, direction classifier, support multilingual recognition | + +If you want to add your own trained model, you can add model links and keys in [paddleocr](../../paddleocr.py) and recompile. + +More whl package usage can be found in [whl package](./whl_en.md) + #### 2.1.2 Multi-language Model diff --git a/doc/doc_en/whl_en.md b/doc/doc_en/whl_en.md index d81e5532cf1db0193abf61b972420bdc3bacfd0b..64757ad18bbe422b7e2f60c896f600458e3ce2fd 100644 --- a/doc/doc_en/whl_en.md +++ b/doc/doc_en/whl_en.md @@ -1,4 +1,4 @@ -# Paddleocr Package +# PaddleOCR Package ## 1 Get started quickly ### 1.1 install package diff --git a/paddleocr.py b/paddleocr.py index a1265f79def7018a5586be954127e5b7fdba011e..f6aca07ab5653b563337b000bc5eb2cce892ca6e 100644 --- a/paddleocr.py +++ b/paddleocr.py @@ -154,7 +154,13 @@ MODEL_URLS = { 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' } - } + }, + 'cls': { + 'ch': { + 'url': + 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar', + } + }, }, 'PP-OCR': { 'det': { @@ -440,7 +446,7 @@ class PaddleOCR(predict_system.TextSystem): """ ocr with paddleocr args: - img: img for ocr, support ndarray, img_path and list or ndarray + img: img for ocr, support ndarray, img_path and list of ndarray det: use text detection or not. If false, only rec will be exec. Default is True rec: use text recognition or not. If false, only det will be exec. Default is True cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False. diff --git a/ppocr/data/imaug/copy_paste.py b/ppocr/data/imaug/copy_paste.py index 0b3386c896792bd670cd2bfc757eb3b80f22bac4..79343da60fd40f8dc0ffe8927398b70cb751b532 100644 --- a/ppocr/data/imaug/copy_paste.py +++ b/ppocr/data/imaug/copy_paste.py @@ -35,10 +35,12 @@ class CopyPaste(object): point_num = data['polys'].shape[1] src_img = data['image'] src_polys = data['polys'].tolist() + src_texts = data['texts'] src_ignores = data['ignore_tags'].tolist() ext_data = data['ext_data'][0] ext_image = ext_data['image'] ext_polys = ext_data['polys'] + ext_texts = ext_data['texts'] ext_ignores = ext_data['ignore_tags'] indexs = [i for i in range(len(ext_ignores)) if not ext_ignores[i]] @@ -53,7 +55,7 @@ class CopyPaste(object): src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB) ext_image = cv2.cvtColor(ext_image, cv2.COLOR_BGR2RGB) src_img = Image.fromarray(src_img).convert('RGBA') - for poly, tag in zip(select_polys, select_ignores): + for idx, poly, tag in zip(select_idxs, select_polys, select_ignores): box_img = get_rotate_crop_image(ext_image, poly) src_img, box = self.paste_img(src_img, box_img, src_polys) @@ -62,6 +64,7 @@ class CopyPaste(object): for _ in range(len(box), point_num): box.append(box[-1]) src_polys.append(box) + src_texts.append(ext_texts[idx]) src_ignores.append(tag) src_img = cv2.cvtColor(np.array(src_img), cv2.COLOR_RGB2BGR) h, w = src_img.shape[:2] @@ -70,6 +73,7 @@ class CopyPaste(object): src_polys[:, :, 1] = np.clip(src_polys[:, :, 1], 0, h) data['image'] = src_img data['polys'] = src_polys + data['texts'] = src_texts data['ignore_tags'] = np.array(src_ignores) return data diff --git a/ppocr/data/imaug/label_ops.py b/ppocr/data/imaug/label_ops.py index 8b017b3219993328287d91a047e598eebaded198..c20eef2c8c4481d46fae3f9006946b7a1b5c6bda 100644 --- a/ppocr/data/imaug/label_ops.py +++ b/ppocr/data/imaug/label_ops.py @@ -23,7 +23,7 @@ import string from shapely.geometry import LineString, Point, Polygon import json import copy - +from scipy.spatial import distance as dist from ppocr.utils.logging import get_logger @@ -74,9 +74,10 @@ class DetLabelEncode(object): s = pts.sum(axis=1) rect[0] = pts[np.argmin(s)] rect[2] = pts[np.argmax(s)] - diff = np.diff(pts, axis=1) - rect[1] = pts[np.argmin(diff)] - rect[3] = pts[np.argmax(diff)] + tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0) + diff = np.diff(np.array(tmp), axis=1) + rect[1] = tmp[np.argmin(diff)] + rect[3] = tmp[np.argmax(diff)] return rect def expand_points_num(self, boxes): @@ -443,7 +444,9 @@ class KieLabelEncode(object): elif 'key_cls' in ann.keys(): labels.append(ann['key_cls']) else: - raise ValueError("Cannot found 'key_cls' in ann.keys(), please check your training annotation.") + raise ValueError( + "Cannot found 'key_cls' in ann.keys(), please check your training annotation." + ) edges.append(ann.get('edge', 0)) ann_infos = dict( image=data['image'], diff --git a/ppocr/data/simple_dataset.py b/ppocr/data/simple_dataset.py index b5da9b8898423facf888839f941dff01caa03643..402f1e38fed9e32722e2dd160f10f779028807a3 100644 --- a/ppocr/data/simple_dataset.py +++ b/ppocr/data/simple_dataset.py @@ -33,7 +33,7 @@ class SimpleDataSet(Dataset): self.delimiter = dataset_config.get('delimiter', '\t') label_file_list = dataset_config.pop('label_file_list') data_source_num = len(label_file_list) - ratio_list = dataset_config.get("ratio_list", [1.0]) + ratio_list = dataset_config.get("ratio_list", 1.0) if isinstance(ratio_list, (float, int)): ratio_list = [float(ratio_list)] * int(data_source_num) diff --git a/ppocr/losses/basic_loss.py b/ppocr/losses/basic_loss.py index 2df96ea2642d10a50eb892d738f89318dc5e0f4c..a0ab10fbbaccaf6781598d5e788813d3febe07e4 100644 --- a/ppocr/losses/basic_loss.py +++ b/ppocr/losses/basic_loss.py @@ -57,17 +57,27 @@ class CELoss(nn.Layer): class KLJSLoss(object): def __init__(self, mode='kl'): assert mode in ['kl', 'js', 'KL', 'JS' - ], "mode can only be one of ['kl', 'js', 'KL', 'JS']" + ], "mode can only be one of ['kl', 'KL', 'js', 'JS']" self.mode = mode def __call__(self, p1, p2, reduction="mean"): - loss = paddle.multiply(p2, paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5)) - - if self.mode.lower() == "js": + if self.mode.lower() == 'kl': + loss = paddle.multiply(p2, + paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5)) loss += paddle.multiply( p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5)) loss *= 0.5 + elif self.mode.lower() == "js": + loss = paddle.multiply( + p2, paddle.log((2 * p2 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5)) + loss += paddle.multiply( + p1, paddle.log((2 * p1 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5)) + loss *= 0.5 + else: + raise ValueError( + "The mode.lower() if KLJSLoss should be one of ['kl', 'js']") + if reduction == "mean": loss = paddle.mean(loss, axis=[1, 2]) elif reduction == "none" or reduction is None: @@ -95,7 +105,7 @@ class DMLLoss(nn.Layer): self.act = None self.use_log = use_log - self.jskl_loss = KLJSLoss(mode="js") + self.jskl_loss = KLJSLoss(mode="kl") def _kldiv(self, x, target): eps = 1.0e-10 diff --git a/ppocr/losses/rec_aster_loss.py b/ppocr/losses/rec_aster_loss.py index fbb99d29a638540b02649a8912051339c08b22dd..52605e46db35339cc22f7f1e6642456bfaf02f11 100644 --- a/ppocr/losses/rec_aster_loss.py +++ b/ppocr/losses/rec_aster_loss.py @@ -27,12 +27,12 @@ class CosineEmbeddingLoss(nn.Layer): self.epsilon = 1e-12 def forward(self, x1, x2, target): - similarity = paddle.fluid.layers.reduce_sum( + similarity = paddle.sum( x1 * x2, dim=-1) / (paddle.norm( x1, axis=-1) * paddle.norm( x2, axis=-1) + self.epsilon) one_list = paddle.full_like(target, fill_value=1) - out = paddle.fluid.layers.reduce_mean( + out = paddle.mean( paddle.where( paddle.equal(target, one_list), 1. - similarity, paddle.maximum( diff --git a/ppocr/losses/table_att_loss.py b/ppocr/losses/table_att_loss.py index d7fd99e6952aacc0182a482ca5ae5ddaf959a026..51377efa2b5e802fe9f9fc1973c74deb00fc4816 100644 --- a/ppocr/losses/table_att_loss.py +++ b/ppocr/losses/table_att_loss.py @@ -19,7 +19,6 @@ from __future__ import print_function import paddle from paddle import nn from paddle.nn import functional as F -from paddle import fluid class TableAttentionLoss(nn.Layer): def __init__(self, structure_weight, loc_weight, use_giou=False, giou_weight=1.0, **kwargs): @@ -36,13 +35,13 @@ class TableAttentionLoss(nn.Layer): :param bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,] :return: loss ''' - ix1 = fluid.layers.elementwise_max(preds[:, 0], bbox[:, 0]) - iy1 = fluid.layers.elementwise_max(preds[:, 1], bbox[:, 1]) - ix2 = fluid.layers.elementwise_min(preds[:, 2], bbox[:, 2]) - iy2 = fluid.layers.elementwise_min(preds[:, 3], bbox[:, 3]) + ix1 = paddle.maximum(preds[:, 0], bbox[:, 0]) + iy1 = paddle.maximum(preds[:, 1], bbox[:, 1]) + ix2 = paddle.minimum(preds[:, 2], bbox[:, 2]) + iy2 = paddle.minimum(preds[:, 3], bbox[:, 3]) - iw = fluid.layers.clip(ix2 - ix1 + 1e-3, 0., 1e10) - ih = fluid.layers.clip(iy2 - iy1 + 1e-3, 0., 1e10) + iw = paddle.clip(ix2 - ix1 + 1e-3, 0., 1e10) + ih = paddle.clip(iy2 - iy1 + 1e-3, 0., 1e10) # overlap inters = iw * ih @@ -55,12 +54,12 @@ class TableAttentionLoss(nn.Layer): # ious ious = inters / uni - ex1 = fluid.layers.elementwise_min(preds[:, 0], bbox[:, 0]) - ey1 = fluid.layers.elementwise_min(preds[:, 1], bbox[:, 1]) - ex2 = fluid.layers.elementwise_max(preds[:, 2], bbox[:, 2]) - ey2 = fluid.layers.elementwise_max(preds[:, 3], bbox[:, 3]) - ew = fluid.layers.clip(ex2 - ex1 + 1e-3, 0., 1e10) - eh = fluid.layers.clip(ey2 - ey1 + 1e-3, 0., 1e10) + ex1 = paddle.minimum(preds[:, 0], bbox[:, 0]) + ey1 = paddle.minimum(preds[:, 1], bbox[:, 1]) + ex2 = paddle.maximum(preds[:, 2], bbox[:, 2]) + ey2 = paddle.maximum(preds[:, 3], bbox[:, 3]) + ew = paddle.clip(ex2 - ex1 + 1e-3, 0., 1e10) + eh = paddle.clip(ey2 - ey1 + 1e-3, 0., 1e10) # enclose erea enclose = ew * eh + eps diff --git a/ppocr/modeling/backbones/kie_unet_sdmgr.py b/ppocr/modeling/backbones/kie_unet_sdmgr.py index 545e4e7511e58c3d8220e9ec0be35474deba8806..4b1bd8030060b26acb9e60bd671a5b23d936347b 100644 --- a/ppocr/modeling/backbones/kie_unet_sdmgr.py +++ b/ppocr/modeling/backbones/kie_unet_sdmgr.py @@ -175,12 +175,7 @@ class Kie_backbone(nn.Layer): img, relations, texts, gt_bboxes, tag, img_size) x = self.img_feat(img) boxes, rois_num = self.bbox2roi(gt_bboxes) - feats = paddle.fluid.layers.roi_align( - x, - boxes, - spatial_scale=1.0, - pooled_height=7, - pooled_width=7, - rois_num=rois_num) + feats = paddle.vision.ops.roi_align( + x, boxes, spatial_scale=1.0, output_size=7, boxes_num=rois_num) feats = self.maxpool(feats).squeeze(-1).squeeze(-1) return [relations, texts, feats] diff --git a/ppocr/modeling/backbones/rec_resnet_fpn.py b/ppocr/modeling/backbones/rec_resnet_fpn.py index a7e876a2bd52a0ea70479c2009a291e4e2f8ce1f..79efd6e41e231ecad99aa4d01a8226a8550bd1ef 100644 --- a/ppocr/modeling/backbones/rec_resnet_fpn.py +++ b/ppocr/modeling/backbones/rec_resnet_fpn.py @@ -18,7 +18,6 @@ from __future__ import print_function from paddle import nn, ParamAttr from paddle.nn import functional as F -import paddle.fluid as fluid import paddle import numpy as np diff --git a/ppocr/modeling/heads/rec_srn_head.py b/ppocr/modeling/heads/rec_srn_head.py index 8d59e4711a043afd9234f430a62c9876c0a8f6f4..1070d8cd648eb686c0a2e66df092b7dc6de29c42 100644 --- a/ppocr/modeling/heads/rec_srn_head.py +++ b/ppocr/modeling/heads/rec_srn_head.py @@ -20,13 +20,11 @@ import math import paddle from paddle import nn, ParamAttr from paddle.nn import functional as F -import paddle.fluid as fluid import numpy as np from .self_attention import WrapEncoderForFeature from .self_attention import WrapEncoder from paddle.static import Program from ppocr.modeling.backbones.rec_resnet_fpn import ResNetFPN -import paddle.fluid.framework as framework from collections import OrderedDict gradient_clip = 10 diff --git a/ppocr/modeling/heads/self_attention.py b/ppocr/modeling/heads/self_attention.py index 6c27fdbe434166e9277cc8d695bce2743cbd8ec6..6e4c65e3931ae74a0fde2a16694a69fdfa69b5ed 100644 --- a/ppocr/modeling/heads/self_attention.py +++ b/ppocr/modeling/heads/self_attention.py @@ -22,7 +22,6 @@ import paddle from paddle import ParamAttr, nn from paddle import nn, ParamAttr from paddle.nn import functional as F -import paddle.fluid as fluid import numpy as np gradient_clip = 10 @@ -288,10 +287,10 @@ class PrePostProcessLayer(nn.Layer): "layer_norm_%d" % len(self.sublayers()), paddle.nn.LayerNorm( normalized_shape=d_model, - weight_attr=fluid.ParamAttr( - initializer=fluid.initializer.Constant(1.)), - bias_attr=fluid.ParamAttr( - initializer=fluid.initializer.Constant(0.))))) + weight_attr=paddle.ParamAttr( + initializer=paddle.nn.initializer.Constant(1.)), + bias_attr=paddle.ParamAttr( + initializer=paddle.nn.initializer.Constant(0.))))) elif cmd == "d": # add dropout self.functors.append(lambda x: F.dropout( x, p=dropout_rate, mode="downscale_in_infer") @@ -324,7 +323,7 @@ class PrepareEncoder(nn.Layer): def forward(self, src_word, src_pos): src_word_emb = src_word - src_word_emb = fluid.layers.cast(src_word_emb, 'float32') + src_word_emb = paddle.cast(src_word_emb, 'float32') src_word_emb = paddle.scale(x=src_word_emb, scale=self.src_emb_dim**0.5) src_pos = paddle.squeeze(src_pos, axis=-1) src_pos_enc = self.emb(src_pos) @@ -367,7 +366,7 @@ class PrepareDecoder(nn.Layer): self.dropout_rate = dropout_rate def forward(self, src_word, src_pos): - src_word = fluid.layers.cast(src_word, 'int64') + src_word = paddle.cast(src_word, 'int64') src_word = paddle.squeeze(src_word, axis=-1) src_word_emb = self.emb0(src_word) src_word_emb = paddle.scale(x=src_word_emb, scale=self.src_emb_dim**0.5) diff --git a/ppocr/postprocess/db_postprocess.py b/ppocr/postprocess/db_postprocess.py index 27b428ef2e73c9abf81d3881b23979343c8595b2..1c42cd55cd8f85dff3df90e2f5365ccde8a725f3 100755 --- a/ppocr/postprocess/db_postprocess.py +++ b/ppocr/postprocess/db_postprocess.py @@ -38,6 +38,7 @@ class DBPostProcess(object): unclip_ratio=2.0, use_dilation=False, score_mode="fast", + visual_output=False, **kwargs): self.thresh = thresh self.box_thresh = box_thresh @@ -51,6 +52,7 @@ class DBPostProcess(object): self.dilation_kernel = None if not use_dilation else np.array( [[1, 1], [1, 1]]) + self.visual = visual_output def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height): ''' @@ -169,12 +171,19 @@ class DBPostProcess(object): cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1) return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] + def visual_output(self, pred): + im = np.array(pred[0] * 255).astype(np.uint8) + cv2.imwrite("db_probability_map.png", im) + print("The probalibity map is visualized in db_probability_map.png") + def __call__(self, outs_dict, shape_list): pred = outs_dict['maps'] if isinstance(pred, paddle.Tensor): pred = pred.numpy() pred = pred[:, 0, :, :] segmentation = pred > self.thresh + if self.visual: + self.visual_output(pred) boxes_batch = [] for batch_index in range(pred.shape[0]): diff --git a/ppocr/utils/save_load.py b/ppocr/utils/save_load.py index b09f1db6e938e8eb99148d69efce016f1cbe8628..3647111fddaa848a75873ab689559c63dd6d4814 100644 --- a/ppocr/utils/save_load.py +++ b/ppocr/utils/save_load.py @@ -177,9 +177,9 @@ def save_model(model, model.backbone.model.save_pretrained(model_prefix) metric_prefix = os.path.join(model_prefix, 'metric') # save metric and config + with open(metric_prefix + '.states', 'wb') as f: + pickle.dump(kwargs, f, protocol=2) if is_best: - with open(metric_prefix + '.states', 'wb') as f: - pickle.dump(kwargs, f, protocol=2) logger.info('save best model is to {}'.format(model_prefix)) else: logger.info("save model in {}".format(model_prefix)) diff --git a/ppstructure/docs/kie.md b/ppstructure/docs/kie.md index 35498b33478d1010fd2548dfcb8586b4710723a1..8fd5a7921e67922b69c9da1f72f7bb514c95323a 100644 --- a/ppstructure/docs/kie.md +++ b/ppstructure/docs/kie.md @@ -19,6 +19,24 @@ SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类 wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar ``` +数据集格式: +``` +./wildreceipt +├── class_list.txt # box内的文本类别,比如金额、时间、日期等。 +├── dict.txt # 识别的字典文件,数据集中包含的字符列表 +├── wildreceipt_train.txt # 训练数据标签文件 +└── wildreceipt_test.txt # 评估数据标签文件 +└── image_files/ # 图像数据文件夹 +``` + +其中标签文件里的格式为: +``` +" 图像文件名 json.dumps编码的图像标注信息" +image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg [{"label": 1, "transcription": "SAFEWAY", "points": [[550.0, 190.0], [937.0, 190.0], [937.0, 104.0], [550.0, 104.0]]}, {"label": 25, "transcription": "TM", "points": [[1048.0, 211.0], [1074.0, 211.0], [1074.0, 196.0], [1048.0, 196.0]]}, {"label": 25, "transcription": "ATOREMGRTOMMILAZZO", "points": [[535.0, 239.0], [833.0, 239.0], [833.0, 200.0], [535.0, 200.0]]}, {"label": 5, "transcription": "703-777-5833", "points": [[907.0, 256.0], [1081.0, 256.0], [1081.0, 223.0], [907.0, 223.0]]}...... +``` + +**注:如果您希望在自己的数据集上训练,建议按照上述数据个数准备数据集。** + 执行预测: ``` diff --git a/ppstructure/docs/kie_en.md b/ppstructure/docs/kie_en.md index 1fe38b0b399e9290526dafa5409673dc87026db7..e895ee88d65911f4151096f56c17c9c13af3277c 100644 --- a/ppstructure/docs/kie_en.md +++ b/ppstructure/docs/kie_en.md @@ -18,6 +18,22 @@ This section provides a tutorial example on how to quickly use, train, and evalu wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar ``` +The dataset format are as follows: +``` +./wildreceipt +├── class_list.txt # The text category inside the box, such as amount, time, date, etc. +├── dict.txt # A recognized dictionary file, a list of characters contained in the dataset +├── wildreceipt_train.txt # training data label file +└── wildreceipt_test.txt # testing data label file +└── image_files/ # image dataset file +``` + +The format in the label file is: +``` +" The image file path Image annotation information encoded by json.dumps" +image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg [{"label": 1, "transcription": "SAFEWAY", "points": [[550.0, 190.0], [937.0, 190.0], [937.0, 104.0], [550.0, 104.0]]}, {"label": 25, "transcription": "TM", "points": [[1048.0, 211.0], [1074.0, 211.0], [1074.0, 196.0], [1048.0, 196.0]]}, {"label": 25, "transcription": "ATOREMGRTOMMILAZZO", "points": [[535.0, 239.0], [833.0, 239.0], [833.0, 200.0], [535.0, 200.0]]}, {"label": 5, "transcription": "703-777-5833", "points": [[907.0, 256.0], [1081.0, 256.0], [1081.0, 223.0], [907.0, 223.0]]}...... +``` + Download the pretrained model and predict the result: ```shell diff --git a/ppstructure/vqa/README.md b/ppstructure/vqa/README.md index e3a10671ddb6494eb15073e7ac007aa1e8e6a32a..3bfca3049731534aaa6799d79ec29af7f4219078 100644 --- a/ppstructure/vqa/README.md +++ b/ppstructure/vqa/README.md @@ -192,7 +192,7 @@ Finally, `precision`, `recall`, `hmean` and other indicators will be printed Use the following command to complete the series prediction of `OCR engine + SER`, taking the pretrained SER model as an example: ```shell -CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_42.jpg +CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg ```` Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`. @@ -203,7 +203,7 @@ First use the `tools/infer_vqa_token_ser.py` script to complete the prediction o ```shell export CUDA_VISIBLE_DEVICES=0 -python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt +python3 ppstructure/vqa/tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt ```` @@ -247,7 +247,7 @@ Finally, `precision`, `recall`, `hmean` and other indicators will be printed Use the following command to complete the series prediction of `OCR engine + SER + RE`, taking the pretrained SER and RE models as an example: ```shell export CUDA_VISIBLE_DEVICES=0 -python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ +python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ ```` Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`. diff --git a/ppstructure/vqa/README_ch.md b/ppstructure/vqa/README_ch.md index b677dc07bce6c1a752d753b6a1c538b4d3f99271..abf8e4883c25d3092baa5d1fcc86d1571d04ac93 100644 --- a/ppstructure/vqa/README_ch.md +++ b/ppstructure/vqa/README_ch.md @@ -198,7 +198,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/l ```shell export CUDA_VISIBLE_DEVICES=0 -python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt +python3 ppstructure/vqa/tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt ``` ### 5.3 RE diff --git a/test_tipc/test_train_inference_python.sh b/test_tipc/test_train_inference_python.sh index fe98cb00f6cc428995d7f91db55895e0f1cd9bfd..2c9a7e73e6843921b0aba176a725aed4629c5476 100644 --- a/test_tipc/test_train_inference_python.sh +++ b/test_tipc/test_train_inference_python.sh @@ -329,6 +329,7 @@ else set_save_model=$(func_set_params "${save_model_key}" "${save_log}") if [ ${#gpu} -le 2 ];then # train with cpu or single gpu + eval ${env} cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} ${set_amp_config} " elif [ ${#ips} -le 26 ];then # train with multi-gpu cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} ${set_amp_config}" diff --git a/tools/export_model.py b/tools/export_model.py index c0cbcd361cec31c51616a7154836c234f076a86e..3ea0228f857a2fadb36678ecd3b91bc865e56e46 100755 --- a/tools/export_model.py +++ b/tools/export_model.py @@ -76,7 +76,7 @@ def export_single_model(model, arch_config, save_path, logger, quanter=None): else: infer_shape = [3, -1, -1] if arch_config["model_type"] == "rec": - infer_shape = [3, 32, -1] # for rec model, H must be 32 + infer_shape = [3, 48, -1] # for rec model, H must be 32 if "Transform" in arch_config and arch_config[ "Transform"] is not None and arch_config["Transform"][ "name"] == "TPS": diff --git a/tools/infer/predict_det.py b/tools/infer/predict_det.py index 5f2675d667c2aab8186886a60d8d447f43419954..cf495c59c25cfe24ed0987b56cbe810579f1d542 100755 --- a/tools/infer/predict_det.py +++ b/tools/infer/predict_det.py @@ -24,6 +24,7 @@ import cv2 import numpy as np import time import sys +from scipy.spatial import distance as dist import tools.infer.utility as utility from ppocr.utils.logging import get_logger @@ -154,9 +155,10 @@ class TextDetector(object): s = pts.sum(axis=1) rect[0] = pts[np.argmin(s)] rect[2] = pts[np.argmax(s)] - diff = np.diff(pts, axis=1) - rect[1] = pts[np.argmin(diff)] - rect[3] = pts[np.argmax(diff)] + tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0) + diff = np.diff(np.array(tmp), axis=1) + rect[1] = tmp[np.argmin(diff)] + rect[3] = tmp[np.argmax(diff)] return rect def clip_det_res(self, points, img_height, img_width): diff --git a/tools/infer/predict_system.py b/tools/infer/predict_system.py index 625d365f45c578d051974d7174e26246e9bc2442..1fac2918a15c8b1b858b83d032ecfd889679e6f9 100755 --- a/tools/infer/predict_system.py +++ b/tools/infer/predict_system.py @@ -114,11 +114,14 @@ def sorted_boxes(dt_boxes): _boxes = list(sorted_boxes) for i in range(num_boxes - 1): - if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \ - (_boxes[i + 1][0][0] < _boxes[i][0][0]): - tmp = _boxes[i] - _boxes[i] = _boxes[i + 1] - _boxes[i + 1] = tmp + for j in range(i, 0, -1): + if abs(_boxes[j + 1][0][1] - _boxes[j][0][1]) < 10 and \ + (_boxes[j + 1][0][0] < _boxes[j][0][0]): + tmp = _boxes[j] + _boxes[j] = _boxes[j + 1] + _boxes[j + 1] = tmp + else: + break return _boxes @@ -135,7 +138,7 @@ def main(args): logger.info("In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', " "if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320") - + # warm up 10 times if args.warmup: img = np.random.uniform(0, 255, [640, 640, 3]).astype(np.uint8) @@ -198,7 +201,12 @@ def main(args): text_sys.text_detector.autolog.report() text_sys.text_recognizer.autolog.report() - with open(os.path.join(draw_img_save_dir, "system_results.txt"), 'w', encoding='utf-8') as f: + if args.total_process_num > 1: + save_results_path = os.path.join(draw_img_save_dir, f"system_results_{args.process_id}.txt") + else: + save_results_path = os.path.join(draw_img_save_dir, "system_results.txt") + + with open(save_results_path, 'w', encoding='utf-8') as f: f.writelines(save_results) diff --git a/tools/infer/utility.py b/tools/infer/utility.py index d27aec63edd2fb5c0240ff0254ce1057b62162b0..6d9935a70e79bb20c5f6380783911ef141b0be17 100644 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -55,6 +55,7 @@ def init_args(): parser.add_argument("--max_batch_size", type=int, default=10) parser.add_argument("--use_dilation", type=str2bool, default=False) parser.add_argument("--det_db_score_mode", type=str, default="fast") + parser.add_argument("--vis_seg_map", type=str2bool, default=False) # EAST parmas parser.add_argument("--det_east_score_thresh", type=float, default=0.8) parser.add_argument("--det_east_cover_thresh", type=float, default=0.1) @@ -276,6 +277,7 @@ def create_predictor(args, mode, logger): min_input_shape = {"x": [1, 3, imgH, 10]} max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 2304]} opt_input_shape = {"x": [args.rec_batch_num, 3, imgH, 320]} + config.exp_disable_tensorrt_ops(["transpose2"]) elif mode == "cls": min_input_shape = {"x": [1, 3, 48, 10]} max_input_shape = {"x": [args.rec_batch_num, 3, 48, 1024]} @@ -587,7 +589,7 @@ def text_visual(texts, def base64_to_cv2(b64str): import base64 data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) + data = np.frombuffer(data, np.uint8) data = cv2.imdecode(data, cv2.IMREAD_COLOR) return data