提交 4c6b03ad 编写于 作者: qq_25193841's avatar qq_25193841

Merge remote-tracking branch 'origin/release/2.5' into release2.5

......@@ -8,7 +8,7 @@ PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field,
| :-------------------------------------------------: | :--------------------------------------------: |
| <img src="./data/gif/steps_en.gif" width="80%"/> | <img src="./data/gif/table.gif" width="100%"/> |
| **irregular text annotation** | **key information annotation** |
| <img src="./data/gif/multi-point.gif" width="80%"/> | <img src="./data/gif/kie.gif" width="300%"/> |
| <img src="./data/gif/multi-point.gif" width="80%"/> | <img src="./data/gif/kie.gif" width="100%"/> |
### Recent Update
......
......@@ -8,7 +8,7 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置P
| :---------------------------------------------------: | :----------------------------------------------: |
| <img src="./data/gif/steps_en.gif" width="80%"/> | <img src="./data/gif/table.gif" width="100%"/> |
| **不规则文本标注** | **关键信息标注** |
| <img src="./data/gif/multi-point.gif" width="80%"/> | <img src="./data/gif/kie.gif" width="300%"/> |
| <img src="./data/gif/multi-point.gif" width="80%"/> | <img src="./data/gif/kie.gif" width="100%"/> |
#### 近期更新
......
......@@ -47,7 +47,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
![](./doc/features_en.png)
> It is recommended to start with the “quick experience” in the document tutorial
> It is recommended to start with the “quick start” in the document tutorial
## Quick Experience
......@@ -63,10 +63,11 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
<a name="Community"></a>
## Community
## Community👬
- **Join us**👬: Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation.
- For international developers, we regard [PaddleOCR Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) as our international community platform. All ideas and questions can be discussed here in English.
- For Chinese develops, Scan the QR code below with your Wechat, you can join the official technical discussion group. For richer community content, please refer to [中文README](README_ch.md), looking forward to your participation.
<div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
......
......@@ -29,15 +29,9 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- **🔥2022.5.25~26 OCR产业应用两日直播课**
- 25日:车牌识别产业应用实战
- 26日:一招搞定工业常见数码管、PCB字符识别
- 25日:车牌识别产业应用实战[AI Studio项目链接](https://aistudio.baidu.com/aistudio/projectdetail/3919091?contributionType=1)
- 26日:一招搞定工业常见数码管、PCB字符识别(AI Studio项目链接:[数码管识别](https://aistudio.baidu.com/aistudio/projectdetail/4049044?contributionType=1)[PCB字符识别](https://aistudio.baidu.com/aistudio/projectdetail/4008973)
扫描下方二维码填写问卷后进入群聊,获取直播链接!
<div align="center">
<img src="https://user-images.githubusercontent.com/50011306/170023861-38814d84-b35a-4102-94d9-28482f9a39f8.png" width = "150" height = "150" />
</div>
- **🔥2022.5.9 发布PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
......@@ -75,11 +69,10 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<a name="开源社区"></a>
## 开源社区
- **项目合作📑:** 如果您是企业开发者且有明确的OCR垂类应用需求,填写[问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx)后可免费与官方团队展开不同层次的合作。
- **加入社区👬:** 微信扫描二维码并填写问卷之后,加入交流群领取福利
- **获取PaddleOCR最新发版解说《OCR超强技术详解与产业应用实战》系列直播课回放链接**
- **10G重磅OCR学习大礼包:**《动手学OCR》电子书,配套讲解视频和notebook项目;66篇OCR相关顶会前沿论文打包放送,包括CVPR、AAAI、IJCAI、ICCV等;PaddleOCR历次发版直播课视频;OCR社区优秀开发者项目分享视频。
- **社区贡献**🏅️:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。
- **社区常规赛**🎁:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
......
......@@ -65,7 +65,7 @@ Loss:
- ["Student", "Teacher"]
maps_name: "thrink_maps"
weight: 1.0
act: "softmax"
# act: None
model_name_pairs: ["Student", "Teacher"]
key: maps
- DistillationDBLoss:
......
......@@ -60,7 +60,7 @@ Loss:
- ["Student", "Student2"]
maps_name: "thrink_maps"
weight: 1.0
act: "softmax"
# act: None
model_name_pairs: ["Student", "Student2"]
key: maps
- DistillationDBLoss:
......
......@@ -47,7 +47,7 @@ str_to_cpu_mode(const std::string &cpu_mode) {
std::string upper_key;
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(),
::toupper);
auto index = cpu_mode_map.find(upper_key);
auto index = cpu_mode_map.find(upper_key.c_str());
if (index == cpu_mode_map.end()) {
LOGE("cpu_mode not found %s", upper_key.c_str());
return paddle::lite_api::LITE_POWER_HIGH;
......@@ -116,4 +116,4 @@ Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(
ppredictor::OCR_PPredictor *ppredictor =
(ppredictor::OCR_PPredictor *)java_pointer;
delete ppredictor;
}
\ No newline at end of file
}
......@@ -92,6 +92,8 @@ include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
include_directories("${PADDLE_LIB}/third_party/install/onnxruntime/include")
include_directories("${PADDLE_LIB}/third_party/install/paddle2onnx/include")
include_directories("${PADDLE_LIB}/third_party/boost")
include_directories("${PADDLE_LIB}/third_party/eigen3")
......@@ -110,6 +112,8 @@ link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/third_party/install/onnxruntime/lib")
link_directories("${PADDLE_LIB}/third_party/install/paddle2onnx/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
......
......@@ -109,8 +109,10 @@ CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT
运行之前,将下面文件拷贝到`build/Release/`文件夹下
1. `paddle_inference/paddle/lib/paddle_inference.dll`
2. `opencv/build/x64/vc15/bin/opencv_world455.dll`
3. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll`
2. `paddle_inference/third_party/install/onnxruntime/lib/onnxruntime.dll`
3. `paddle_inference/third_party/install/paddle2onnx/lib/paddle2onnx.dll`
4. `opencv/build/x64/vc15/bin/opencv_world455.dll`
5. 如果使用openblas版本的预测库还需要拷贝 `paddle_inference/third_party/install/openblas/lib/openblas.dll`
### Step4: 预测
......
......@@ -208,7 +208,7 @@ Execute the built executable file:
./build/ppocr [--param1] [--param2] [...]
```
**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, so if you use the recognition function, you need to add the parameter `--rec_img_h=48`, if you do not use the default `PP-OCRv3` model, you do not need to set this parameter.
**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, if you want to use the old version model, you should add the parameter `--rec_img_h=32`.
Specifically,
......@@ -222,7 +222,6 @@ Specifically,
--det=true \
--rec=true \
--cls=true \
--rec_img_h=48\
```
##### 2. det+rec:
......@@ -234,7 +233,6 @@ Specifically,
--det=true \
--rec=true \
--cls=false \
--rec_img_h=48\
```
##### 3. det
......@@ -254,7 +252,6 @@ Specifically,
--det=false \
--rec=true \
--cls=true \
--rec_img_h=48\
```
##### 5. rec
......@@ -265,7 +262,6 @@ Specifically,
--det=false \
--rec=true \
--cls=false \
--rec_img_h=48\
```
##### 6. cls
......@@ -330,7 +326,7 @@ More parameters are as follows,
|rec_model_dir|string|-|Address of recognition inference model|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file|
|rec_batch_num|int|6|batch size of recognition|
|rec_img_h|int|32|image height of recognition|
|rec_img_h|int|48|image height of recognition|
|rec_img_w|int|320|image width of recognition|
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
......
......@@ -213,7 +213,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
本demo支持系统串联调用,也支持单个功能的调用,如,只使用检测或识别功能。
**注意** ppocr默认使用`PP-OCRv3`模型,识别模型使用的输入shape为`3,48,320`, 因此如果使用识别功能,需要添加参数`--rec_img_h=48`,如果不使用默认的`PP-OCRv3`模型,则无需设置该参数
**注意** ppocr默认使用`PP-OCRv3`模型,识别模型使用的输入shape为`3,48,320`, 如需使用旧版本的PP-OCR模型,则需要设置参数`--rec_img_h=32`
运行方式:
......@@ -232,7 +232,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--det=true \
--rec=true \
--cls=true \
--rec_img_h=48\
```
##### 2. 检测+识别:
......@@ -244,7 +243,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--det=true \
--rec=true \
--cls=false \
--rec_img_h=48\
```
##### 3. 检测:
......@@ -264,7 +262,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--det=false \
--rec=true \
--cls=true \
--rec_img_h=48\
```
##### 5. 识别:
......@@ -275,7 +272,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
--det=false \
--rec=true \
--cls=false \
--rec_img_h=48\
```
##### 6. 分类:
......@@ -339,7 +335,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
|rec_model_dir|string|-|识别模型inference model地址|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件|
|rec_batch_num|int|6|识别模型batchsize|
|rec_img_h|int|32|识别模型输入图像高度|
|rec_img_h|int|48|识别模型输入图像高度|
|rec_img_w|int|320|识别模型输入图像宽度|
......
......@@ -47,7 +47,7 @@ DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt",
"Path of dictionary.");
DEFINE_int32(rec_img_h, 32, "rec image height");
DEFINE_int32(rec_img_h, 48, "rec image height");
DEFINE_int32(rec_img_w, 320, "rec image width");
// ocr forward related
......
......@@ -132,7 +132,9 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
paddle_infer::Config config;
config.SetModel(model_dir + "/inference.pdmodel",
model_dir + "/inference.pdiparams");
std::cout << "In PP-OCRv3, default rec_img_h is 48,"
<< "if you use other model, you should set the param rec_img_h=32"
<< std::endl;
if (this->use_gpu_) {
config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
if (this->use_tensorrt_) {
......
......@@ -4,4 +4,5 @@ det_db_box_thresh 0.5
det_db_unclip_ratio 1.6
det_db_use_dilate 0
det_use_polygon_score 1
use_direction_classify 1
\ No newline at end of file
use_direction_classify 1
rec_image_height 32
\ No newline at end of file
......@@ -19,25 +19,27 @@
const std::vector<int> rec_image_shape{3, 32, 320};
cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio) {
cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio, int rec_image_height) {
int imgC, imgH, imgW;
imgC = rec_image_shape[0];
imgH = rec_image_height;
imgW = rec_image_shape[2];
imgH = rec_image_shape[1];
imgW = int(32 * wh_ratio);
imgW = int(imgH * wh_ratio);
float ratio = static_cast<float>(img.cols) / static_cast<float>(img.rows);
float ratio = float(img.cols) / float(img.rows);
int resize_w, resize_h;
if (ceilf(imgH * ratio) > imgW)
resize_w = imgW;
else
resize_w = static_cast<int>(ceilf(imgH * ratio));
cv::Mat resize_img;
resize_w = int(ceilf(imgH * ratio));
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
cv::INTER_LINEAR);
return resize_img;
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0,
int(imgW - resize_img.cols), cv::BORDER_CONSTANT,
{127, 127, 127});
}
std::vector<std::string> ReadDict(std::string path) {
......
......@@ -26,7 +26,7 @@
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio);
cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio, int rec_image_height);
std::vector<std::string> ReadDict(std::string path);
......
......@@ -162,7 +162,8 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
std::vector<std::string> charactor_dict,
std::shared_ptr<PaddlePredictor> predictor_cls,
int use_direction_classify,
std::vector<double> *times) {
std::vector<double> *times,
int rec_image_height) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
......@@ -183,7 +184,7 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
float wh_ratio =
static_cast<float>(crop_img.cols) / static_cast<float>(crop_img.rows);
resize_img = CrnnResizeImg(crop_img, wh_ratio);
resize_img = CrnnResizeImg(crop_img, wh_ratio, rec_image_height);
resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f);
const float *dimg = reinterpret_cast<const float *>(resize_img.data);
......@@ -444,7 +445,7 @@ void system(char **argv){
//// load config from txt file
auto Config = LoadConfigTxt(det_config_path);
int use_direction_classify = int(Config["use_direction_classify"]);
int rec_image_height = int(Config["rec_image_height"]);
auto charactor_dict = ReadDict(dict_path);
charactor_dict.insert(charactor_dict.begin(), "#"); // blank char for ctc
charactor_dict.push_back(" ");
......@@ -590,12 +591,16 @@ void rec(int argc, char **argv) {
std::string batchsize = argv[6];
std::string img_dir = argv[7];
std::string dict_path = argv[8];
std::string config_path = argv[9];
if (strcmp(argv[4], "FP32") != 0 && strcmp(argv[4], "INT8") != 0) {
std::cerr << "Only support FP32 or INT8." << std::endl;
exit(1);
}
auto Config = LoadConfigTxt(config_path);
int rec_image_height = int(Config["rec_image_height"]);
std::vector<cv::String> cv_all_img_names;
cv::glob(img_dir, cv_all_img_names);
......@@ -630,7 +635,7 @@ void rec(int argc, char **argv) {
std::vector<float> rec_text_score;
std::vector<double> times;
RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score,
charactor_dict, cls_predictor, 0, &times);
charactor_dict, cls_predictor, 0, &times, rec_image_height);
//// print recognized text
for (int i = 0; i < rec_text.size(); i++) {
......
......@@ -34,7 +34,7 @@ For the compilation process of different development environments, please refer
### 1.2 Prepare Paddle-Lite library
There are two ways to obtain the Paddle-Lite library:
- 1. Download directly, the download link of the Paddle-Lite library is as follows:
- 1. [Recommended] Download directly, the download link of the Paddle-Lite library is as follows:
| Platform | Paddle-Lite library download link |
|---|---|
......@@ -43,7 +43,9 @@ There are two ways to obtain the Paddle-Lite library:
Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10).
- 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows:
**Note: It is recommended to use paddlelite>=2.10 version of the prediction library, other prediction library versions [download link](https://github.com/PaddlePaddle/Paddle-Lite/tags)**
- 2. Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows:
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
......@@ -104,21 +106,17 @@ If you directly use the model in the above table for deployment, you can skip th
If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
The `opt` tool can be obtained by compiling Paddle Lite.
- Step 1: Refer to [document](https://www.paddlepaddle.org.cn/lite/v2.10/user_guides/opt/opt_python.html) to install paddlelite, which is used to convert paddle inference model to paddlelite required for running nb model
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout release/v2.10
./lite/tools/build.sh build_optimize_tool
pip install paddlelite==2.10 # The paddlelite version should be the same as the prediction library version
```
After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways:
After installation, the following commands can view the help information
```
cd build.opt/lite/api/
./opt
paddle_lite_opt
```
Introduction to paddle_lite_opt parameters:
|Options|Description|
|---|---|
|--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
......@@ -131,6 +129,8 @@ cd build.opt/lite/api/
`--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
- Step 2: Use paddle_lite_opt to convert the inference model to the mobile model format.
The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
```
......@@ -240,6 +240,7 @@ det_db_thresh 0.3 # Used to filter the binarized image of DB prediction,
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
rec_image_height 32 # The height of the input image of the recognition model, the PP-OCRv3 model needs to be set to 48, and the PP-OCRv2 model needs to be set to 32
```
5. Run Model on phone
......@@ -258,8 +259,15 @@ After the above steps are completed, you can use adb to push the file to the pho
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# The use of ocr_db_crnn is:
# ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path
./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_opt.nb ./11.jpg ppocr_keys_v1.txt
# ./ocr_db_crnn Mode Detection model file Orientation classifier model file Recognition model file Hardware Precision Threads Batchsize Test image path Dictionary file path
./ocr_db_crnn system ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt ppocr_keys_v1.txt True
# precision can be INT8 for quantitative model or FP32 for normal model.
# Only using detection model
./ocr_db_crnn det ch_PP-OCRv2_det_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt
# Only using recognition model
./ocr_db_crnn rec ch_PP-OCRv2_rec_slim_opt.nb arm8 INT8 10 1 word_1.jpg ppocr_keys_v1.txt config.txt
```
If you modify the code, you need to recompile and push to the phone.
......@@ -283,3 +291,7 @@ A2: Replace the .jpg test image under ./debug with the image you want to test, a
Q3: How to package it into the mobile APP?
A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
Q4: When running the demo, an error is reported `Error: This model is not supported, because kernel for 'io_copy' is not supported by Paddle-Lite.`
A4: The problem is that the installed paddlelite version does not match the downloaded prediction library version. Make sure that the paddleliteopt tool matches your prediction library version, and try to switch to the nb model again.
......@@ -8,7 +8,7 @@
- [2.1 模型优化](#21-模型优化)
- [2.2 与手机联调](#22-与手机联调)
- [FAQ](#faq)
本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleOCR超轻量中文检测、识别模型的详细步骤。
......@@ -32,7 +32,7 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理
### 1.2 准备预测库
预测库有两种获取方式:
- 1. 直接下载,预测库下载链接如下:
- 1. [推荐]直接下载,预测库下载链接如下:
| 平台 | 预测库下载链接 |
|---|---|
......@@ -41,7 +41,9 @@ Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理
注:1. 上述预测库为PaddleLite 2.10分支编译得到,有关PaddleLite 2.10 详细信息可参考 [链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10) 。
- 2. [推荐]编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
**注:建议使用paddlelite>=2.10版本的预测库,其他预测库版本[下载链接](https://github.com/PaddlePaddle/Paddle-Lite/tags)**
- 2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
```
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
......@@ -102,22 +104,16 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括
如果要部署的模型不在上述表格中,则需要按照如下步骤获得优化后的模型。
模型优化需要Paddle-Lite的opt可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下:
- 步骤1:参考[文档](https://www.paddlepaddle.org.cn/lite/v2.10/user_guides/opt/opt_python.html)安装paddlelite,用于转换paddle inference model为paddlelite运行所需的nb模型
```
# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout release/v2.10
# 启动编译
./lite/tools/build.sh build_optimize_tool
pip install paddlelite==2.10 # paddlelite版本要与预测库版本一致
```
编译完成后,opt文件位于`build.opt/lite/api/`下,可通过如下方式查看opt的运行选项和使用方式;
安装完后,如下指令可以查看帮助信息
```
cd build.opt/lite/api/
./opt
paddle_lite_opt
```
paddle_lite_opt 参数介绍:
|选项|说明|
|---|---|
|--model_dir|待优化的PaddlePaddle模型(非combined形式)的路径|
......@@ -130,6 +126,8 @@ cd build.opt/lite/api/
`--model_dir`适用于待优化的模型是非combined方式,PaddleOCR的inference模型是combined方式,即模型结构和模型参数使用单独一个文件存储。
- 步骤2:使用paddle_lite_opt将inference模型转换成移动端模型格式。
下面以PaddleOCR的超轻量中文模型为例,介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。
```
......@@ -148,7 +146,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls
转换成功后,inference模型目录下会多出`.nb`结尾的文件,即是转换成功的模型文件。
注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型
注意:使用paddle-lite部署时,需要使用opt工具优化后的模型。 opt工具的输入模型是paddle保存的inference模型
<a name="2.2与手机联调"></a>
### 2.2 与手机联调
......@@ -234,13 +232,14 @@ ppocr_keys_v1.txt # 中文字典
...
```
2. `config.txt` 包含了检测器、分类器的超参数,如下:
2. `config.txt` 包含了检测器、分类器、识别器的超参数,如下:
```
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh 0.5 # DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_box_thresh 0.5 # 检测器后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本
use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用
rec_image_height 32 # 识别模型输入图像的高度,PP-OCRv3模型设置为48,PP-OCRv2模型需要设置为32
```
5. 启动调试
......@@ -259,8 +258,14 @@ use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# 开始使用,ocr_db_crnn可执行文件的使用方式为:
# ./ocr_db_crnn 检测模型文件 方向分类器模型文件 识别模型文件 测试图像路径 字典文件路径
./ocr_db_crnn ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb ./11.jpg ppocr_keys_v1.txt
# ./ocr_db_crnn 预测模式 检测模型文件 方向分类器模型文件 识别模型文件 运行硬件 运行精度 线程数 batchsize 测试图像路径 参数配置路径 字典文件路径 是否使用benchmark参数
./ocr_db_crnn system ch_PP-OCRv2_det_slim_opt.nb ch_PP-OCRv2_rec_slim_opt.nb ch_ppocr_mobile_v2.0_cls_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt ppocr_keys_v1.txt True
# 仅使用文本检测模型,使用方式如下:
./ocr_db_crnn det ch_PP-OCRv2_det_slim_opt.nb arm8 INT8 10 1 ./11.jpg config.txt
# 仅使用文本识别模型,使用方式如下:
./ocr_db_crnn rec ch_PP-OCRv2_rec_slim_opt.nb arm8 INT8 10 1 word_1.jpg ppocr_keys_v1.txt config.txt
```
如果对代码做了修改,则需要重新编译并push到手机上。
......@@ -284,3 +289,7 @@ A2:替换debug下的.jpg测试图像为你想要测试的图像,adb push 到
Q3:如何封装到手机APP中?
A3:此demo旨在提供能在手机上运行OCR的核心算法部分,PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例,供参考
Q4:运行demo时遇到报错`Error: This model is not supported, because kernel for 'io_copy' is not supported by Paddle-Lite.`
A4:问题是安装的paddlelite版本和下载的预测库版本不匹配,确保paddleliteopt工具和你的预测库版本匹配,重新转nb模型试试。
......@@ -339,7 +339,7 @@ class CharacterOps(object):
class OCRReader(object):
def __init__(self,
algorithm="CRNN",
image_shape=[3, 32, 320],
image_shape=[3, 48, 320],
char_type="ch",
batch_num=1,
char_dict_path="./ppocr_keys_v1.txt"):
......@@ -356,7 +356,7 @@ class OCRReader(object):
def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
if self.character_type == "ch":
imgW = int(32 * max_wh_ratio)
imgW = int(imgH * max_wh_ratio)
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
......@@ -377,7 +377,7 @@ class OCRReader(object):
def preprocess(self, img_list):
img_num = len(img_list)
norm_img_batch = []
max_wh_ratio = 0
max_wh_ratio = 320/48.
for ino in range(img_num):
h, w = img_list[ino].shape[0:2]
wh_ratio = w * 1.0 / h
......
......@@ -63,7 +63,6 @@ class DetOp(Op):
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
out_dict = {"dt_boxes": dt_boxes, "image": self.raw_im}
return out_dict, None, ""
......@@ -86,7 +85,7 @@ class RecOp(Op):
dt_boxes = copy.deepcopy(self.dt_list)
feed_list = []
img_list = []
max_wh_ratio = 0
max_wh_ratio = 320/48.
## Many mini-batchs, the type of feed_data is list.
max_batch_size = 6 # len(dt_boxes)
......@@ -150,7 +149,8 @@ class RecOp(Op):
for i in range(dt_num):
text = rec_list[i]
dt_box = self.dt_list[i]
result_list.append([text, dt_box.tolist()])
if text[1] >= 0.5:
result_list.append([text, dt_box.tolist()])
res = {"result": str(result_list)}
return res, None, ""
......
......@@ -73,4 +73,4 @@ python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_
The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
The derived model can be converted through the `opt tool` of PaddleLite.
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md)
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme.md)
......@@ -455,6 +455,19 @@ A:以检测中的resnet骨干网络为例,图像输入网络之后,需要
A:可以在命令中加入 --det_db_unclip_ratio ,参数定义位置,这个参数是检测后处理时控制文本框大小的,默认1.6,可以尝试改成2.5或者更大,反之,如果觉得文本框不够紧凑,也可以把该参数调小。
#### Q:PP-OCR文本检测出现明显漏检,该如何调整参数或者训练。
A:以[#issue5851](https://github.com/PaddlePaddle/PaddleOCR/issues/5815)为例,文字出现漏检时,先分析问题原因。
首先,在[后处理处](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/ppocr/postprocess/db_postprocess.py#L177)添加如下代码,可视化模型对文字的分割区域:
```
import cv2
im = np.array(segmentation[0]*255).astype(np.uint8)
cv2.imwrite("db_seg_vis.jpg", im)
```
如果模型在漏检文字区域有分割区域,但是没有生成框,此类情况多为文字太小,文字弯曲,或者某一行开头的文字过大等等。可减小DB后处理参数[det_db_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L52)[det_db_box_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L53),或者设置[use_dilation](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L56)为True扩大文字分割区域。
如果模型在漏检文字区域没有分割区域,是模型对此类数据没有训练好,建议用PP-OCR的模型在你的数据场景上finetune。
<a name="27"></a>
### 2.7 模型结构
......@@ -839,3 +852,15 @@ nvidia-smi --lock-gpu-clocks=1590 -i 0
#### Q: 预测时显存爆炸、内存泄漏问题?
**A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)
#### Q: TRT预测报错:InvalidArgumentError: some trt inputs dynamic shape info not set, check the INFO log above for more details.
**A**: PP-OCR的模型采用动态shape预测,因为TRT子图划分问题,需要设置额外参数的shape;
首先,屏蔽此行代码[config.disable_glog_info()](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L306)输出日志,重新预测,根据报错信息中的提示设置某些参数的动态shape。
假设报错信息中:
trt input [lstm_1.tmp_0] dynamic shape info not set, please check and retry.
可以参考[检测模型里设置动态shape的方式](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L268-L270)设置lstm_1.tmp_0的动态shape,识别模型的动态shape在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L275-L277)
如果不清楚lstm_1.tmp_0的shape是多少,可以把inference.pdmodel 放在网页 https://netron.app/ 里可视化,搜索lstm_1.tmp_0 查看该参数的shape信息。
\ No newline at end of file
......@@ -15,8 +15,8 @@
- **数据简介**:publaynet数据集的训练集合中包含35万张图像,验证集合中包含1.1万张图像。总共包含5个类别,分别是: `text, title, list, table, figure`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/publaynet_demo/gt_PMC3724501_00006.jpg" width="500">
<img src="../datasets/publaynet_demo/gt_PMC5086060_00002.jpg" width="500">
<img src="../../datasets/publaynet_demo/gt_PMC3724501_00006.jpg" width="500">
<img src="../../datasets/publaynet_demo/gt_PMC5086060_00002.jpg" width="500">
</div>
- **下载地址**:https://developer.ibm.com/exchanges/data/all/publaynet/
......@@ -30,8 +30,8 @@
- **数据简介**:CDLA据集的训练集合中包含5000张图像,验证集合中包含1000张图像。总共包含10个类别,分别是: `Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/CDLA_demo/val_0633.jpg" width="500">
<img src="../datasets/CDLA_demo/val_0941.jpg" width="500">
<img src="../../datasets/CDLA_demo/val_0633.jpg" width="500">
<img src="../../datasets/CDLA_demo/val_0941.jpg" width="500">
</div>
- **下载地址**:https://github.com/buptlihang/CDLA
......@@ -45,8 +45,8 @@
- **数据简介**:TableBank数据集包含Latex(训练集187199张,验证集7265张,测试集5719张)与Word(训练集73383张,验证集2735张,测试集2281张)两种类别的文档。仅包含`Table` 1个类别。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/tablebank_demo/004.png" height="700">
<img src="../datasets/tablebank_demo/005.png" height="700">
<img src="../../datasets/tablebank_demo/004.png" height="700">
<img src="../../datasets/tablebank_demo/005.png" height="700">
</div>
- **下载地址**:https://doc-analysis.github.io/tablebank-page/index.html
......
......@@ -591,8 +591,9 @@ Metric:
#### 2.2.5 检测蒸馏模型finetune
PP-OCRv3检测蒸馏有两种方式:
- 采用ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv3_det_cml.yml,采用CML蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型。
- 采用ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法,在PaddleOCR采用的数据集上相比单独训练Student模型有1%-2%的提升。
> 如果您在自己的场景中没有训练过高精度大模型,或原始PP-OCR模型在您的场景中表现不好,则无法使用CML训练以达到更高精度,更应该采用DML训练
在具体fine-tune时,需要在网络结构的`pretrained`参数中设置要加载的预训练模型。
......
# 《动手学OCR》电子书
《动手学OCR》是PaddleOCR团队携手复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下:
《动手学OCR》是PaddleOCR团队携手华中科技大学博导/教授,IAPR Fellow 白翔、复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉、中国工商银行大数据人工智能实验室研究员等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下:
- 覆盖从文本检测识别到文档分析的OCR全栈技术
- 紧密结合理论实践,跨越代码实现鸿沟,并配套教学视频
......
......@@ -30,11 +30,11 @@ PP-OCR系统pipeline如下:
PP-OCR系统在持续迭代优化,目前已发布PP-OCR和PP-OCRv2两个版本:
PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考[PP-OCR技术报告](https://arxiv.org/abs/2009.09941)
#### PP-OCRv2
PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)
PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考[PP-OCRv2技术报告](https://arxiv.org/abs/2109.03144)
#### PP-OCRv3
......@@ -48,7 +48,7 @@ PP-OCRv3系统pipeline如下:
<img src="../ppocrv3_framework.png" width="800">
</div>
更多细节请参考PP-OCRv3[技术报告](./PP-OCRv3_introduction.md)
更多细节请参考[PP-OCRv3技术报告](https://arxiv.org/abs/2206.03001v2) 👉[中文简洁版](./PP-OCRv3_introduction.md)
<a name="2"></a>
......
......@@ -101,8 +101,17 @@ cd /path/to/ppocr_img
['韩国小馆', 0.994467]
```
**版本说明**
paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`),如需使用其他版本可通过设置参数`--ocr_version`,具体版本说明如下:
| 版本名称 | 版本说明 |
| --- | --- |
| PP-OCRv3 | 支持中、英文检测和识别,方向分类器,支持多语种识别 |
| PP-OCRv2 | 支持中英文的检测和识别,方向分类器,多语言暂未更新 |
| PP-OCR | 支持中、英文检测和识别,方向分类器,支持多语种识别 |
如需使用2.0模型,请指定参数`--ocr_version PP-OCR`,paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`)。更多whl包使用可参考[whl包文档](./whl.md)
如需新增自己训练的模型,可以在[paddleocr](../../paddleocr.py)中增加模型链接和字段,重新编译即可。
更多whl包使用可参考[whl包文档](./whl.md)
<a name="212"></a>
......
......@@ -550,4 +550,4 @@ inference/en_PP-OCRv3_rec/
Q1: 训练模型转inference 模型之后预测效果不一致?
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。更多内容请参考[FAQ](./FAQ.md#210-%E6%A8%A1%E5%9E%8B%E6%95%88%E6%9E%9C%E4%B8%8E%E6%95%88%E6%9E%9C%E4%B8%8D%E4%B8%80%E8%87%B4).
# 社区贡献
# 开源社区
感谢大家长久以来对PaddleOCR的支持和关注,与广大开发者共同构建一个专业、和谐、相互帮助的开源社区是PaddleOCR的目标。本文档展示了已有的社区贡献、对于各类贡献说明、新的机会与流程,希望贡献流程更加高效、路径更加清晰。
......@@ -119,25 +119,30 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
## 附录:社区常规赛积分榜
| 开发者 | 总积分 | 开发者 | 总积分 |
| ------------------------------------------------------- | ------ | ----------------------------------------------------- | ------ |
| [RangeKing](https://github.com/RangeKing) | 220 | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [hao6699](https://github.com/hao6699) | 145 | [v3fc](https://github.com/v3fc) | 35 |
| [mymagicpower](https://github.com/mymagicpower) | 140 | [imiyu](https://github.com/imiyu) | 30 |
| [raoyutian](https://github.com/raoyutian) | 90 | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 80 | [daassh](https://github.com/daassh) | 23 |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 70 | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 20 |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 70 | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 60 | [chccc1994](https://github.com/chccc1994) | 20 |
| [edencfc](https://github.com/edencfc) | 57 | [BeyondYourself ](https://github.com/BeyondYourself) | 20 |
| [zhangyingying520](https://github.com/zhangyingying520) | 57 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [telppa](https://github.com/telppa) | 40 | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| sosojust1984 | 40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [redearly123](https://github.com/redearly123) | 40 | [JimEverest](https://github.com/JimEverest) | 10 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) | 40 | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
| [GreatV](https://github.com/GreatV) | 40 | | |
| CLXK294 | 40 | | |
| 开发者 | 总积分(+新增积分) | 开发者 | 总积分(+新增积分) |
| ---------------------------------------------------------- | ---------------- | ---------------------------------------------------- | --------------- |
| [RangeKing](https://github.com/RangeKing) 🥇 | 222(+2) | CLXK294 | 40 |
| [mymagicpower](https://github.com/mymagicpower) 🥈 | 150(+10) | [telppa](https://github.com/telppa) | 40 |
| [hao6699 ](https://github.com/hao6699) 🥉 | 145 | sosojust1984 | 40 |
| [raoyutian](https://github.com/raoyutian) | 135(+45) | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) 🏅 ↑12 | 120(+80) | [v3fc](https://github.com/v3fc) | 35 |
| [edencfc](https://github.com/edencfc) ↑5 | 97(+40) | [imiyu](https://github.com/imiyu) | 30 |
| [zhangyingying520](https://github.com/zhangyingying520) ↑5 | 91(+34) | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 90(+10) | [daassh](https://github.com/daassh) | 23 |
| [xiaxianlei](https://github.com/xiaxianlei) ✨ 🏆 | +81 | [BeyondYourself](https://github.com/BeyondYourself) | 26(+6) |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 80(+10) | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 26(+6) |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 74(+4) | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 64(+4) | [chccc1994](https://github.com/chccc1994) | 20 |
| [PeterH0323](https://github.com/PeterH0323) ✨ 🎖 | +55 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [JimEverest](https://github.com/JimEverest) | 14(+4) |
| [d2623587501](https://github.com/d2623587501) | 51(+6) | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [Wei-JL](https://github.com/Wei-JL) | 46(+6) | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [GreatV](https://github.com/GreatV) | 42(+2) | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| [fuhengwu2021](https://github.com/fuhengwu2021) ✨ | +40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [manangoel99](https://github.com/manangoel99) ✨ | +40 | | |
| [redearly123](https://github.com/redearly123) | 40 | | |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
......@@ -4,8 +4,7 @@
- 半自动标注工具[PPOCRLabelv2](../../PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
- OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求
- 交互式OCR开源电子书[《动手学OCR》](./ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
- 2022.5.7 添加对[Weights & Biases](https://docs.wandb.ai/)训练日志记录工具的支持。
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 《动手学OCR·十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】[报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
......@@ -29,7 +28,7 @@
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md)
- 2020.6.8 添加[数据集](dataset/datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持 `attention` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验
- 2020.5.30 模型预测、训练支持Windows系统
......
......@@ -77,7 +77,7 @@ LK-PAN (Large Kernel PAN) is a lightweight [PAN](https://arxiv.org/pdf/1803.0153
**(2) DML: Deep Mutual Learning Strategy for Teacher Model**
[DML](https://arxiv.org/abs/1706.00384)(Collaborative Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
[DML](https://arxiv.org/abs/1706.00384)(Deep Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
<div align="center">
......@@ -101,7 +101,7 @@ Considering that the features of some channels will be suppressed if the convolu
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
The recognition accuracy of SVTR_tiny outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
<div align="center">
<img src="../ppocr_v3/v3_rec_pipeline.png" width=800>
......
......@@ -29,10 +29,10 @@ PP-OCR pipeline is as follows:
PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the [PP-OCR technical report](https://arxiv.org/abs/2009.09941).
#### PP-OCRv2
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the [PP-OCRv2 technical report](https://arxiv.org/abs/2109.03144).
#### PP-OCRv3
......@@ -46,7 +46,7 @@ PP-OCRv3 pipeline is as follows:
<img src="../ppocrv3_framework.png" width="800">
</div>
For more details, please refer to [PP-OCRv3 technical report](./PP-OCRv3_introduction_en.md).
For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/abs/2206.03001v2).
<a name="2"></a>
## 2. Features
......
......@@ -119,7 +119,18 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.9934559464454651]
```
If you need to use the 2.0 model, please specify the parameter `--ocr_version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md)
**Version**
paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). If you want to use other versions, you can set the parameter `--ocr_version`, the specific version description is as follows:
| version name | description |
| --- | --- |
| PP-OCRv3 | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
| PP-OCRv2 | only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated |
| PP-OCR | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
If you want to add your own trained model, you can add model links and keys in [paddleocr](../../paddleocr.py) and recompile.
More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a>
#### 2.1.2 Multi-language Model
......
# Paddleocr Package
# PaddleOCR Package
## 1 Get started quickly
### 1.1 install package
......
......@@ -154,7 +154,13 @@ MODEL_URLS = {
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar',
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
}
}
},
'cls': {
'ch': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
}
},
},
'PP-OCR': {
'det': {
......@@ -440,7 +446,7 @@ class PaddleOCR(predict_system.TextSystem):
"""
ocr with paddleocr
args:
img: img for ocr, support ndarray, img_path and list or ndarray
img: img for ocr, support ndarray, img_path and list of ndarray
det: use text detection or not. If false, only rec will be exec. Default is True
rec: use text recognition or not. If false, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
......
......@@ -35,10 +35,12 @@ class CopyPaste(object):
point_num = data['polys'].shape[1]
src_img = data['image']
src_polys = data['polys'].tolist()
src_texts = data['texts']
src_ignores = data['ignore_tags'].tolist()
ext_data = data['ext_data'][0]
ext_image = ext_data['image']
ext_polys = ext_data['polys']
ext_texts = ext_data['texts']
ext_ignores = ext_data['ignore_tags']
indexs = [i for i in range(len(ext_ignores)) if not ext_ignores[i]]
......@@ -53,7 +55,7 @@ class CopyPaste(object):
src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
ext_image = cv2.cvtColor(ext_image, cv2.COLOR_BGR2RGB)
src_img = Image.fromarray(src_img).convert('RGBA')
for poly, tag in zip(select_polys, select_ignores):
for idx, poly, tag in zip(select_idxs, select_polys, select_ignores):
box_img = get_rotate_crop_image(ext_image, poly)
src_img, box = self.paste_img(src_img, box_img, src_polys)
......@@ -62,6 +64,7 @@ class CopyPaste(object):
for _ in range(len(box), point_num):
box.append(box[-1])
src_polys.append(box)
src_texts.append(ext_texts[idx])
src_ignores.append(tag)
src_img = cv2.cvtColor(np.array(src_img), cv2.COLOR_RGB2BGR)
h, w = src_img.shape[:2]
......@@ -70,6 +73,7 @@ class CopyPaste(object):
src_polys[:, :, 1] = np.clip(src_polys[:, :, 1], 0, h)
data['image'] = src_img
data['polys'] = src_polys
data['texts'] = src_texts
data['ignore_tags'] = np.array(src_ignores)
return data
......
......@@ -23,7 +23,7 @@ import string
from shapely.geometry import LineString, Point, Polygon
import json
import copy
from scipy.spatial import distance as dist
from ppocr.utils.logging import get_logger
......@@ -74,9 +74,10 @@ class DetLabelEncode(object):
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0)
diff = np.diff(np.array(tmp), axis=1)
rect[1] = tmp[np.argmin(diff)]
rect[3] = tmp[np.argmax(diff)]
return rect
def expand_points_num(self, boxes):
......@@ -443,7 +444,9 @@ class KieLabelEncode(object):
elif 'key_cls' in ann.keys():
labels.append(ann['key_cls'])
else:
raise ValueError("Cannot found 'key_cls' in ann.keys(), please check your training annotation.")
raise ValueError(
"Cannot found 'key_cls' in ann.keys(), please check your training annotation."
)
edges.append(ann.get('edge', 0))
ann_infos = dict(
image=data['image'],
......
......@@ -33,7 +33,7 @@ class SimpleDataSet(Dataset):
self.delimiter = dataset_config.get('delimiter', '\t')
label_file_list = dataset_config.pop('label_file_list')
data_source_num = len(label_file_list)
ratio_list = dataset_config.get("ratio_list", [1.0])
ratio_list = dataset_config.get("ratio_list", 1.0)
if isinstance(ratio_list, (float, int)):
ratio_list = [float(ratio_list)] * int(data_source_num)
......
......@@ -57,17 +57,27 @@ class CELoss(nn.Layer):
class KLJSLoss(object):
def __init__(self, mode='kl'):
assert mode in ['kl', 'js', 'KL', 'JS'
], "mode can only be one of ['kl', 'js', 'KL', 'JS']"
], "mode can only be one of ['kl', 'KL', 'js', 'JS']"
self.mode = mode
def __call__(self, p1, p2, reduction="mean"):
loss = paddle.multiply(p2, paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
if self.mode.lower() == "js":
if self.mode.lower() == 'kl':
loss = paddle.multiply(p2,
paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
loss += paddle.multiply(
p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5))
loss *= 0.5
elif self.mode.lower() == "js":
loss = paddle.multiply(
p2, paddle.log((2 * p2 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss += paddle.multiply(
p1, paddle.log((2 * p1 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss *= 0.5
else:
raise ValueError(
"The mode.lower() if KLJSLoss should be one of ['kl', 'js']")
if reduction == "mean":
loss = paddle.mean(loss, axis=[1, 2])
elif reduction == "none" or reduction is None:
......@@ -95,7 +105,7 @@ class DMLLoss(nn.Layer):
self.act = None
self.use_log = use_log
self.jskl_loss = KLJSLoss(mode="js")
self.jskl_loss = KLJSLoss(mode="kl")
def _kldiv(self, x, target):
eps = 1.0e-10
......
......@@ -27,12 +27,12 @@ class CosineEmbeddingLoss(nn.Layer):
self.epsilon = 1e-12
def forward(self, x1, x2, target):
similarity = paddle.fluid.layers.reduce_sum(
similarity = paddle.sum(
x1 * x2, dim=-1) / (paddle.norm(
x1, axis=-1) * paddle.norm(
x2, axis=-1) + self.epsilon)
one_list = paddle.full_like(target, fill_value=1)
out = paddle.fluid.layers.reduce_mean(
out = paddle.mean(
paddle.where(
paddle.equal(target, one_list), 1. - similarity,
paddle.maximum(
......
......@@ -19,7 +19,6 @@ from __future__ import print_function
import paddle
from paddle import nn
from paddle.nn import functional as F
from paddle import fluid
class TableAttentionLoss(nn.Layer):
def __init__(self, structure_weight, loc_weight, use_giou=False, giou_weight=1.0, **kwargs):
......@@ -36,13 +35,13 @@ class TableAttentionLoss(nn.Layer):
:param bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
:return: loss
'''
ix1 = fluid.layers.elementwise_max(preds[:, 0], bbox[:, 0])
iy1 = fluid.layers.elementwise_max(preds[:, 1], bbox[:, 1])
ix2 = fluid.layers.elementwise_min(preds[:, 2], bbox[:, 2])
iy2 = fluid.layers.elementwise_min(preds[:, 3], bbox[:, 3])
ix1 = paddle.maximum(preds[:, 0], bbox[:, 0])
iy1 = paddle.maximum(preds[:, 1], bbox[:, 1])
ix2 = paddle.minimum(preds[:, 2], bbox[:, 2])
iy2 = paddle.minimum(preds[:, 3], bbox[:, 3])
iw = fluid.layers.clip(ix2 - ix1 + 1e-3, 0., 1e10)
ih = fluid.layers.clip(iy2 - iy1 + 1e-3, 0., 1e10)
iw = paddle.clip(ix2 - ix1 + 1e-3, 0., 1e10)
ih = paddle.clip(iy2 - iy1 + 1e-3, 0., 1e10)
# overlap
inters = iw * ih
......@@ -55,12 +54,12 @@ class TableAttentionLoss(nn.Layer):
# ious
ious = inters / uni
ex1 = fluid.layers.elementwise_min(preds[:, 0], bbox[:, 0])
ey1 = fluid.layers.elementwise_min(preds[:, 1], bbox[:, 1])
ex2 = fluid.layers.elementwise_max(preds[:, 2], bbox[:, 2])
ey2 = fluid.layers.elementwise_max(preds[:, 3], bbox[:, 3])
ew = fluid.layers.clip(ex2 - ex1 + 1e-3, 0., 1e10)
eh = fluid.layers.clip(ey2 - ey1 + 1e-3, 0., 1e10)
ex1 = paddle.minimum(preds[:, 0], bbox[:, 0])
ey1 = paddle.minimum(preds[:, 1], bbox[:, 1])
ex2 = paddle.maximum(preds[:, 2], bbox[:, 2])
ey2 = paddle.maximum(preds[:, 3], bbox[:, 3])
ew = paddle.clip(ex2 - ex1 + 1e-3, 0., 1e10)
eh = paddle.clip(ey2 - ey1 + 1e-3, 0., 1e10)
# enclose erea
enclose = ew * eh + eps
......
......@@ -175,12 +175,7 @@ class Kie_backbone(nn.Layer):
img, relations, texts, gt_bboxes, tag, img_size)
x = self.img_feat(img)
boxes, rois_num = self.bbox2roi(gt_bboxes)
feats = paddle.fluid.layers.roi_align(
x,
boxes,
spatial_scale=1.0,
pooled_height=7,
pooled_width=7,
rois_num=rois_num)
feats = paddle.vision.ops.roi_align(
x, boxes, spatial_scale=1.0, output_size=7, boxes_num=rois_num)
feats = self.maxpool(feats).squeeze(-1).squeeze(-1)
return [relations, texts, feats]
......@@ -18,7 +18,6 @@ from __future__ import print_function
from paddle import nn, ParamAttr
from paddle.nn import functional as F
import paddle.fluid as fluid
import paddle
import numpy as np
......
......@@ -20,13 +20,11 @@ import math
import paddle
from paddle import nn, ParamAttr
from paddle.nn import functional as F
import paddle.fluid as fluid
import numpy as np
from .self_attention import WrapEncoderForFeature
from .self_attention import WrapEncoder
from paddle.static import Program
from ppocr.modeling.backbones.rec_resnet_fpn import ResNetFPN
import paddle.fluid.framework as framework
from collections import OrderedDict
gradient_clip = 10
......
......@@ -22,7 +22,6 @@ import paddle
from paddle import ParamAttr, nn
from paddle import nn, ParamAttr
from paddle.nn import functional as F
import paddle.fluid as fluid
import numpy as np
gradient_clip = 10
......@@ -288,10 +287,10 @@ class PrePostProcessLayer(nn.Layer):
"layer_norm_%d" % len(self.sublayers()),
paddle.nn.LayerNorm(
normalized_shape=d_model,
weight_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(0.)))))
weight_attr=paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(1.)),
bias_attr=paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(0.)))))
elif cmd == "d": # add dropout
self.functors.append(lambda x: F.dropout(
x, p=dropout_rate, mode="downscale_in_infer")
......@@ -324,7 +323,7 @@ class PrepareEncoder(nn.Layer):
def forward(self, src_word, src_pos):
src_word_emb = src_word
src_word_emb = fluid.layers.cast(src_word_emb, 'float32')
src_word_emb = paddle.cast(src_word_emb, 'float32')
src_word_emb = paddle.scale(x=src_word_emb, scale=self.src_emb_dim**0.5)
src_pos = paddle.squeeze(src_pos, axis=-1)
src_pos_enc = self.emb(src_pos)
......@@ -367,7 +366,7 @@ class PrepareDecoder(nn.Layer):
self.dropout_rate = dropout_rate
def forward(self, src_word, src_pos):
src_word = fluid.layers.cast(src_word, 'int64')
src_word = paddle.cast(src_word, 'int64')
src_word = paddle.squeeze(src_word, axis=-1)
src_word_emb = self.emb0(src_word)
src_word_emb = paddle.scale(x=src_word_emb, scale=self.src_emb_dim**0.5)
......
......@@ -38,6 +38,7 @@ class DBPostProcess(object):
unclip_ratio=2.0,
use_dilation=False,
score_mode="fast",
visual_output=False,
**kwargs):
self.thresh = thresh
self.box_thresh = box_thresh
......@@ -51,6 +52,7 @@ class DBPostProcess(object):
self.dilation_kernel = None if not use_dilation else np.array(
[[1, 1], [1, 1]])
self.visual = visual_output
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
'''
......@@ -169,12 +171,19 @@ class DBPostProcess(object):
cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
def visual_output(self, pred):
im = np.array(pred[0] * 255).astype(np.uint8)
cv2.imwrite("db_probability_map.png", im)
print("The probalibity map is visualized in db_probability_map.png")
def __call__(self, outs_dict, shape_list):
pred = outs_dict['maps']
if isinstance(pred, paddle.Tensor):
pred = pred.numpy()
pred = pred[:, 0, :, :]
segmentation = pred > self.thresh
if self.visual:
self.visual_output(pred)
boxes_batch = []
for batch_index in range(pred.shape[0]):
......
......@@ -177,9 +177,9 @@ def save_model(model,
model.backbone.model.save_pretrained(model_prefix)
metric_prefix = os.path.join(model_prefix, 'metric')
# save metric and config
with open(metric_prefix + '.states', 'wb') as f:
pickle.dump(kwargs, f, protocol=2)
if is_best:
with open(metric_prefix + '.states', 'wb') as f:
pickle.dump(kwargs, f, protocol=2)
logger.info('save best model is to {}'.format(model_prefix))
else:
logger.info("save model in {}".format(model_prefix))
......@@ -19,6 +19,24 @@ SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
数据集格式:
```
./wildreceipt
├── class_list.txt # box内的文本类别,比如金额、时间、日期等。
├── dict.txt # 识别的字典文件,数据集中包含的字符列表
├── wildreceipt_train.txt # 训练数据标签文件
└── wildreceipt_test.txt # 评估数据标签文件
└── image_files/ # 图像数据文件夹
```
其中标签文件里的格式为:
```
" 图像文件名 json.dumps编码的图像标注信息"
image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg [{"label": 1, "transcription": "SAFEWAY", "points": [[550.0, 190.0], [937.0, 190.0], [937.0, 104.0], [550.0, 104.0]]}, {"label": 25, "transcription": "TM", "points": [[1048.0, 211.0], [1074.0, 211.0], [1074.0, 196.0], [1048.0, 196.0]]}, {"label": 25, "transcription": "ATOREMGRTOMMILAZZO", "points": [[535.0, 239.0], [833.0, 239.0], [833.0, 200.0], [535.0, 200.0]]}, {"label": 5, "transcription": "703-777-5833", "points": [[907.0, 256.0], [1081.0, 256.0], [1081.0, 223.0], [907.0, 223.0]]}......
```
**注:如果您希望在自己的数据集上训练,建议按照上述数据个数准备数据集。**
执行预测:
```
......
......@@ -18,6 +18,22 @@ This section provides a tutorial example on how to quickly use, train, and evalu
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
The dataset format are as follows:
```
./wildreceipt
├── class_list.txt # The text category inside the box, such as amount, time, date, etc.
├── dict.txt # A recognized dictionary file, a list of characters contained in the dataset
├── wildreceipt_train.txt # training data label file
└── wildreceipt_test.txt # testing data label file
└── image_files/ # image dataset file
```
The format in the label file is:
```
" The image file path Image annotation information encoded by json.dumps"
image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg [{"label": 1, "transcription": "SAFEWAY", "points": [[550.0, 190.0], [937.0, 190.0], [937.0, 104.0], [550.0, 104.0]]}, {"label": 25, "transcription": "TM", "points": [[1048.0, 211.0], [1074.0, 211.0], [1074.0, 196.0], [1048.0, 196.0]]}, {"label": 25, "transcription": "ATOREMGRTOMMILAZZO", "points": [[535.0, 239.0], [833.0, 239.0], [833.0, 200.0], [535.0, 200.0]]}, {"label": 5, "transcription": "703-777-5833", "points": [[907.0, 256.0], [1081.0, 256.0], [1081.0, 223.0], [907.0, 223.0]]}......
```
Download the pretrained model and predict the result:
```shell
......
......@@ -192,7 +192,7 @@ Finally, `precision`, `recall`, `hmean` and other indicators will be printed
Use the following command to complete the series prediction of `OCR engine + SER`, taking the pretrained SER model as an example:
```shell
CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_42.jpg
CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_42.jpg
````
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
......@@ -203,7 +203,7 @@ First use the `tools/infer_vqa_token_ser.py` script to complete the prediction o
```shell
export CUDA_VISIBLE_DEVICES=0
python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
python3 ppstructure/vqa/tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
````
<a name="53"></a>
......@@ -247,7 +247,7 @@ Finally, `precision`, `recall`, `hmean` and other indicators will be printed
Use the following command to complete the series prediction of `OCR engine + SER + RE`, taking the pretrained SER and RE models as an example:
```shell
export CUDA_VISIBLE_DEVICES=0
python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm. yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
````
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
......
......@@ -198,7 +198,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser.py -c configs/vqa/ser/l
```shell
export CUDA_VISIBLE_DEVICES=0
python3 tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
python3 ppstructure/vqa/tools/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
```
### 5.3 RE
......
......@@ -329,6 +329,7 @@ else
set_save_model=$(func_set_params "${save_model_key}" "${save_log}")
if [ ${#gpu} -le 2 ];then # train with cpu or single gpu
eval ${env}
cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} ${set_amp_config} "
elif [ ${#ips} -le 26 ];then # train with multi-gpu
cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} ${set_amp_config}"
......
......@@ -76,7 +76,7 @@ def export_single_model(model, arch_config, save_path, logger, quanter=None):
else:
infer_shape = [3, -1, -1]
if arch_config["model_type"] == "rec":
infer_shape = [3, 32, -1] # for rec model, H must be 32
infer_shape = [3, 48, -1] # for rec model, H must be 32
if "Transform" in arch_config and arch_config[
"Transform"] is not None and arch_config["Transform"][
"name"] == "TPS":
......
......@@ -24,6 +24,7 @@ import cv2
import numpy as np
import time
import sys
from scipy.spatial import distance as dist
import tools.infer.utility as utility
from ppocr.utils.logging import get_logger
......@@ -154,9 +155,10 @@ class TextDetector(object):
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0)
diff = np.diff(np.array(tmp), axis=1)
rect[1] = tmp[np.argmin(diff)]
rect[3] = tmp[np.argmax(diff)]
return rect
def clip_det_res(self, points, img_height, img_width):
......
......@@ -114,11 +114,14 @@ def sorted_boxes(dt_boxes):
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
for j in range(i, 0, -1):
if abs(_boxes[j + 1][0][1] - _boxes[j][0][1]) < 10 and \
(_boxes[j + 1][0][0] < _boxes[j][0][0]):
tmp = _boxes[j]
_boxes[j] = _boxes[j + 1]
_boxes[j + 1] = tmp
else:
break
return _boxes
......@@ -135,7 +138,7 @@ def main(args):
logger.info("In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', "
"if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320")
# warm up 10 times
if args.warmup:
img = np.random.uniform(0, 255, [640, 640, 3]).astype(np.uint8)
......@@ -198,7 +201,12 @@ def main(args):
text_sys.text_detector.autolog.report()
text_sys.text_recognizer.autolog.report()
with open(os.path.join(draw_img_save_dir, "system_results.txt"), 'w', encoding='utf-8') as f:
if args.total_process_num > 1:
save_results_path = os.path.join(draw_img_save_dir, f"system_results_{args.process_id}.txt")
else:
save_results_path = os.path.join(draw_img_save_dir, "system_results.txt")
with open(save_results_path, 'w', encoding='utf-8') as f:
f.writelines(save_results)
......
......@@ -55,6 +55,7 @@ def init_args():
parser.add_argument("--max_batch_size", type=int, default=10)
parser.add_argument("--use_dilation", type=str2bool, default=False)
parser.add_argument("--det_db_score_mode", type=str, default="fast")
parser.add_argument("--vis_seg_map", type=str2bool, default=False)
# EAST parmas
parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
......@@ -276,6 +277,7 @@ def create_predictor(args, mode, logger):
min_input_shape = {"x": [1, 3, imgH, 10]}
max_input_shape = {"x": [args.rec_batch_num, 3, imgH, 2304]}
opt_input_shape = {"x": [args.rec_batch_num, 3, imgH, 320]}
config.exp_disable_tensorrt_ops(["transpose2"])
elif mode == "cls":
min_input_shape = {"x": [1, 3, 48, 10]}
max_input_shape = {"x": [args.rec_batch_num, 3, 48, 1024]}
......@@ -587,7 +589,7 @@ def text_visual(texts,
def base64_to_cv2(b64str):
import base64
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = np.frombuffer(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册