diff --git a/README_ch.md b/README_ch.md old mode 100644 new mode 100755 index d4383eb4989d746ba4fbf124324f45abfb06302a..afdd053613866c3b683881ca50119fa231a4ec85 --- a/README_ch.md +++ b/README_ch.md @@ -9,7 +9,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式 **近期更新** - 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。 -- 2020.12.07 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。 +- 2020.12.14 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数127个,每周一都会更新,欢迎大家持续关注。 - 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 - 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 - [More](./doc/doc_ch/update.md) @@ -39,6 +39,14 @@ PaddleOCR同时支持动态图与静态图两种编程范式 上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 + +## 欢迎加入PaddleOCR技术交流群 +- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 + +
+ +
+ ## 快速体验 - PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr @@ -121,7 +129,7 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框 - 英文模型
- +
- 其他语言模型 @@ -130,13 +138,6 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框 - -## 欢迎加入PaddleOCR技术交流群 -请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍 - -
- -
## 许可证书 diff --git a/StyleText/README.md b/StyleText/README.md index bbce7c5eba280e95452933f713524b19ff5778e2..648b12674d23a9f413317644cc198fd7fda24bc8 100644 --- a/StyleText/README.md +++ b/StyleText/README.md @@ -72,7 +72,10 @@ fusion_generator: python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en ``` -* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean. +* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean. +* Note 2: Synth-Text is mainly used to generate images for OCR recognition models. + So the height of style images should be around 32 pixels. Images in other sizes may behave poorly. + For example, enter the following image and corpus `PaddleOCR`. @@ -116,8 +119,16 @@ In actual application scenarios, it is often necessary to synthesize pictures in * `CorpusGenerator`: * `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`. * `language`:Language of the corpus. - * `corpus_file`: Filepath of the corpus. + * `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time. + +Example of corpus file: +``` +PaddleOCR +飞桨文字识别 +StyleText +风格文本图像数据合成 +``` We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below : @@ -130,7 +141,18 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag ``` bash python -m tools.synth_dataset.py -c configs/dataset_config.yml ``` - +We also provide example corpus and images in `examples` folder. +
+ + +
+If you run the code above directly, you will get example output data in `output_data` folder. +You will get synthesis images and labels as below: +
+ +
+There will be some cache under the `label` folder. If the program exit unexpectedly, you can find cached labels there. +When the program finish normally, you will find all the labels in `label.txt` which give the final results. ### Applications diff --git a/StyleText/README_ch.md b/StyleText/README_ch.md index eb557ff24547f228610ffa2cbbaf993e2b4569c3..0dd5822b1eac488099477d289dff83a99577b8c9 100644 --- a/StyleText/README_ch.md +++ b/StyleText/README_ch.md @@ -63,7 +63,10 @@ fusion_generator: ```python python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en ``` -* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。 +* 注1:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。 +* 注2:Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计,我们主要支持高度在32左右的风格图像。 + 如果输入图像尺寸相差过多,效果可能不佳。 + 例如,输入如下图片和语料"PaddleOCR": @@ -102,7 +105,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_ * `CorpusGenerator`: * `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`; * `language`:语料的语种; - * `corpus_file`: 语料文件路径。 + * `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。 + + 语料文件格式示例: + ``` + PaddleOCR + 飞桨文字识别 + StyleText + 风格文本图像数据合成 + ... + ``` Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。 @@ -117,6 +129,19 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_ ``` bash python -m tools.synth_dataset -c configs/dataset_config.yml ``` + 我们在examples目录下提供了样例图片和语料。 +
+ + +
+ + 直接运行上述命令,可以在output_data中产生样例输出,包括图片和用于训练识别模型的标注文件: +
+ +
+ + 其中label目录下的标注文件为程序运行过程中产生的缓存,如果程序在中途异常终止,可以使用缓存的标注文件。 + 如果程序正常运行完毕,则会在output_data下生成label.txt,为最终的标注结果。 ### 四、应用案例 diff --git a/StyleText/configs/dataset_config.yml b/StyleText/configs/dataset_config.yml index e047489e5d82e4c561a835ccf4de1b385e4f5c08..aa4ec69b8ce383d16d505ec70ea6f75488a6fd17 100644 --- a/StyleText/configs/dataset_config.yml +++ b/StyleText/configs/dataset_config.yml @@ -33,7 +33,7 @@ Predictor: - 0.5 expand_result: false bg_generator: - pretrain: models/style_text_rec/bg_generator + pretrain: style_text_models/bg_generator module_name: bg_generator generator_type: BgGeneratorWithMask encode_dim: 64 @@ -43,7 +43,7 @@ Predictor: conv_block_dilation: true output_factor: 1.05 text_generator: - pretrain: models/style_text_rec/text_generator + pretrain: style_text_models/text_generator module_name: text_generator generator_type: TextGenerator encode_dim: 64 @@ -52,7 +52,7 @@ Predictor: conv_block_dropout: false conv_block_dilation: true fusion_generator: - pretrain: models/style_text_rec/fusion_generator + pretrain: style_text_models/fusion_generator module_name: fusion_generator generator_type: FusionGeneratorSimple encode_dim: 64 diff --git a/StyleText/doc/images/12.png b/StyleText/doc/images/12.png new file mode 100644 index 0000000000000000000000000000000000000000..74ba4a07ec73240e71a8a97b9b313556a92babcb Binary files /dev/null and b/StyleText/doc/images/12.png differ diff --git a/StyleText/examples/corpus/example.txt b/StyleText/examples/corpus/example.txt index 78451cc3d92a3353f5de0c74c2cb0a06e6197653..93ba35af3bae1c5a81166a22a6d69b460b9ff5e5 100644 --- a/StyleText/examples/corpus/example.txt +++ b/StyleText/examples/corpus/example.txt @@ -1,2 +1,2 @@ -PaddleOCR +Paddle 飞桨文字识别 diff --git a/configs/det/det_mv3_db.yml b/configs/det/det_mv3_db.yml index 36a6f755383e525a8a496b060465cf027f3f31f8..5c8a0923427bc96c10f0a1275c3639cea735f1f4 100644 --- a/configs/det/det_mv3_db.yml +++ b/configs/det/det_mv3_db.yml @@ -2,11 +2,11 @@ Global: use_gpu: true epoch_num: 1200 log_smooth_window: 20 - print_batch_step: 2 + print_batch_step: 10 save_model_dir: ./output/db_mv3/ save_epoch_step: 1200 - # evaluation is run every 5000 iterations after the 4000th iteration - eval_batch_step: [4000, 5000] + # evaluation is run every 2000 iterations + eval_batch_step: [0, 2000] # if pretrained_model is saved in static mode, load_static_weights must set to True load_static_weights: True cal_metric_during_train: False @@ -39,7 +39,7 @@ Loss: alpha: 5 beta: 10 ohem_ratio: 3 - + Optimizer: name: Adam beta1: 0.9 @@ -100,7 +100,7 @@ Train: loader: shuffle: True drop_last: False - batch_size_per_card: 4 + batch_size_per_card: 16 num_workers: 8 Eval: @@ -128,4 +128,4 @@ Eval: shuffle: False drop_last: False batch_size_per_card: 1 # must be 1 - num_workers: 2 \ No newline at end of file + num_workers: 8 \ No newline at end of file diff --git a/configs/det/det_r50_vd_db.yml b/configs/det/det_r50_vd_db.yml index b70ab7505a4c1970f084b6c02233526d7f7188b9..f1188fe357ea5c02f8839239e788a629221bf118 100644 --- a/configs/det/det_r50_vd_db.yml +++ b/configs/det/det_r50_vd_db.yml @@ -5,8 +5,8 @@ Global: print_batch_step: 10 save_model_dir: ./output/det_r50_vd/ save_epoch_step: 1200 - # evaluation is run every 5000 iterations after the 4000th iteration - eval_batch_step: [5000,4000] + # evaluation is run every 2000 iterations + eval_batch_step: [0,2000] # if pretrained_model is saved in static mode, load_static_weights must set to True load_static_weights: True cal_metric_during_train: False diff --git a/configs/det/det_r50_vd_sast_totaltext.yml b/configs/det/det_r50_vd_sast_totaltext.yml index 257ecf2490bdde6280cf4b20bb66f2457b4b833b..a92f1b6e539b9f78d2edc705cd9cda0fb6522c28 100755 --- a/configs/det/det_r50_vd_sast_totaltext.yml +++ b/configs/det/det_r50_vd_sast_totaltext.yml @@ -60,7 +60,8 @@ Metric: Train: dataset: name: SimpleDataSet - label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] + data_dir: ./train_data/ + label_file_list: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] data_ratio_list: [0.5, 0.5] transforms: - DecodeImage: # load image diff --git a/deploy/cpp_infer/readme.md b/deploy/cpp_infer/readme.md index 7296936643af87817da40592753550fed9a7c8b5..66302a0114186306fde0572fca23aabd27620f95 100644 --- a/deploy/cpp_infer/readme.md +++ b/deploy/cpp_infer/readme.md @@ -103,17 +103,17 @@ make inference_lib_dist 更多编译参数选项可以参考Paddle C++预测库官网:[https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。 -* 编译完成之后,可以在`build/fluid_inference_install_dir/`文件下看到生成了以下文件及文件夹。 +* 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。 ``` -build/fluid_inference_install_dir/ +build/paddle_inference_install_dir/ |-- CMakeCache.txt |-- paddle |-- third_party |-- version.txt ``` -其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 +其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 #### 1.2.2 直接下载安装 diff --git a/deploy/cpp_infer/tools/config.txt b/deploy/cpp_infer/tools/config.txt index 40beea3a2e6f0260a42202d6411ffb10907bf871..95d7989bfc06a9e061874a824d070ca60bc3848d 100644 --- a/deploy/cpp_infer/tools/config.txt +++ b/deploy/cpp_infer/tools/config.txt @@ -11,10 +11,15 @@ max_side_len 960 det_db_thresh 0.3 det_db_box_thresh 0.5 det_db_unclip_ratio 2.0 -det_model_dir ./inference/det_db +det_model_dir ./inference/ch__ppocr_mobile_v2.0_det_infer/ + +# cls config +use_angle_cls 0 +cls_model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer/ +cls_thresh 0.9 # rec config -rec_model_dir ./inference/rec_crnn +rec_model_dir ./inference/ch_ppocr_mobile_v2.0_rec_infer/ char_list_file ../../ppocr/utils/ppocr_keys_v1.txt # show the detection results diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md old mode 100644 new mode 100755 index a44969f850701a68d3c66f681685ade7f786bb4b..27b3126cd45d2fc7043d6b55f9192984cb08e3ec --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -9,44 +9,42 @@ ## PaddleOCR常见问题汇总(持续更新) -* [近期更新(2020.12.07)](#近期更新) +* [近期更新(2020.12.14)](#近期更新) * [【精选】OCR精选10个问题](#OCR精选10个问题) * [【理论篇】OCR通用30个问题](#OCR通用问题) * [基础知识7题](#基础知识) * [数据集7题](#数据集2) * [模型训练调优7题](#模型训练调优2) * [预测部署9题](#预测部署2) -* [【实战篇】PaddleOCR实战84个问题](#PaddleOCR实战问题) - * [使用咨询20题](#使用咨询) +* [【实战篇】PaddleOCR实战87个问题](#PaddleOCR实战问题) + * [使用咨询21题](#使用咨询) * [数据集17题](#数据集3) - * [模型训练调优24题](#模型训练调优3) - * [预测部署23题](#预测部署3) + * [模型训练调优25题](#模型训练调优3) + * [预测部署24题](#预测部署3) -## 近期更新(2020.12.07) +## 近期更新(2020.12.14) -#### Q2.4.9:弯曲文本有试过opencv的TPS进行弯曲校正吗? +#### Q3.1.21:PaddleOCR支持动态图吗? -**A**:opencv的tps需要标出上下边界对应的点,这些点很难通过传统方法或者深度学习方法获取。PaddleOCR里StarNet网络中的tps模块实现了自动学点,自动校正,可以直接尝试这个。 +**A**:动态图版本正在紧锣密鼓开发中,将于2020年12月16日发布,敬请关注。 -#### Q3.3.20: 文字检测时怎么模糊的数据增强? +#### Q3.3.23:检测模型训练或预测时出现elementwise_add报错 -**A**: 模糊的数据增强需要修改代码进行添加,以DB为例,参考[Normalize](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/operators.py#L60) ,添加模糊的增强就行 +**A**:设置的输入尺寸必须是32的倍数,否则在网络多次下采样和上采样后,feature map会产生1个像素的diff,从而导致elementwise_add时报shape不匹配的错误。 -#### Q3.3.21: 文字检测时怎么更改图片旋转的角度,实现360度任意旋转? +#### Q3.3.24: DB检测训练输入尺寸640,可以改大一些吗? -**A**: 将[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/iaa_augment.py#L64) 的(-10,10) 改为(-180,180)即可 +**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸,并非直接将原图进行resize,多数场景下这个尺寸并不小了,改大后可能反而并不合适,而且训练会变慢。另外,代码里可能有的地方参数按照预设输入尺寸适配的,改大后可能有隐藏风险。 -#### Q3.3.22: 训练数据的长宽比过大怎么修改shape +#### Q3.3.25: 识别模型训练时,loss能正常下降,但acc一直为0 -**A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) , -检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85) +**A**: 识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。 +#### Q3.4.24:DB模型能正确推理预测,但换成EAST或SAST模型时报错或结果不正确 -#### Q3.4.23:安装paddleocr后,提示没有paddle - -**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。 +**A**:使用EAST或SAST模型进行推理预测时,需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST",使用DB时不用指定是因为该参数默认值是"DB":https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43 ## 【精选】OCR精选10个问题 @@ -390,6 +388,10 @@ **A**:PaddleOCR主要聚焦通用ocr,如果有垂类需求,您可以用PaddleOCR+垂类数据自己训练; 如果缺少带标注的数据,或者不想投入研发成本,建议直接调用开放的API,开放的API覆盖了目前比较常见的一些垂类。 +#### Q3.1.21:PaddleOCR支持动态图吗? + +**A**:动态图版本正在紧锣密鼓开发中,将于2020年12月16日发布,敬请关注。 + ### 数据集 @@ -603,6 +605,18 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9 **A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) , 检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85) +#### Q3.3.23:检测模型训练或预测时出现elementwise_add报错 + +**A**:设置的输入尺寸必须是32的倍数,否则在网络多次下采样和上采样后,feature map会产生1个像素的diff,从而导致elementwise_add时报shape不匹配的错误。 + +#### Q3.3.24: DB检测训练输入尺寸640,可以改大一些吗? + +**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸,并非直接将原图进行resize,多数场景下这个尺寸并不小了,改大后可能反而并不合适,而且训练会变慢。另外,代码里可能有的地方参数按照预设输入尺寸适配的,改大后可能有隐藏风险。 + +#### Q3.3.25: 识别模型训练时,loss能正常下降,但acc一直为0 + +**A**: 识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。 + ### 预测部署 @@ -710,4 +724,8 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9 #### Q3.4.23:安装paddleocr后,提示没有paddle -**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。 \ No newline at end of file +**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。 + +#### Q3.4.24:DB模型能正确推理预测,但换成EAST或SAST模型时报错或结果不正确 + +**A**:使用EAST或SAST模型进行推理预测时,需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST",使用DB时不用指定是因为该参数默认值是"DB":https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43 \ No newline at end of file diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index aea7ff1de242dec75cae26a2bf3d6838d7559882..09303e93534b091ee50cb6d62045b95faee6cfe5 100755 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -131,12 +131,12 @@ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_mo # 下载超轻量中文检测模型: wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar tar xf ch_ppocr_mobile_v2.0_det_infer.tar -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](../imgs_results/det_res_22.jpg) +![](../imgs_results/det_res_00018069.jpg) 通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制, `litmit_type`可选参数为[`max`, `min`], diff --git a/doc/doc_ch/installation.md b/doc/doc_ch/installation.md index 0dddfec0a6e17e26a73d284ac98c9c95e449c378..36565cd4197a9b8b8404f57b378aa49637cdc58b 100644 --- a/doc/doc_ch/installation.md +++ b/doc/doc_ch/installation.md @@ -58,7 +58,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR **4. 安装第三方库** ``` cd PaddleOCR -pip3 install -r requirments.txt +pip3 install -r requirements.txt ``` 注意,windows环境下,建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装, diff --git a/doc/doc_ch/tree.md b/doc/doc_ch/tree.md index 511860869188020ccfe7531db55ecc5008ccb56b..5f048db022dbe422a78f87b0236d04e00ccc4d48 100644 --- a/doc/doc_ch/tree.md +++ b/doc/doc_ch/tree.md @@ -115,7 +115,7 @@ PaddleOCR │ │ │ ├── text_image_aug // 文本识别的 tia 数据扩充 │ │ │ │ ├── __init__.py │ │ │ │ ├── augment.py // tia_distort,tia_stretch 和 tia_perspective 的代码 -│ │ │ │ ├── warp_mls.py +│ │ │ │ ├── warp_mls.py │ │ │ ├── __init__.py │ │ │ ├── east_process.py // EAST 算法的数据处理步骤 │ │ │ ├── make_border_map.py // 生成边界图 @@ -167,7 +167,7 @@ PaddleOCR │ │ │ ├── det_east_head.py // EAST 检测头 │ │ │ ├── det_sast_head.py // SAST 检测头 │ │ │ ├── rec_ctc_head.py // 识别 ctc -│ │ │ ├── rec_att_head.py // 识别 attention +│ │ │ ├── rec_att_head.py // 识别 attention │ │ ├── transforms // 图像变换 │ │ │ ├── __init__.py // 构造 transform 相关代码 │ │ │ └── tps.py // TPS 变换 @@ -185,7 +185,7 @@ PaddleOCR │ │ └── sast_postprocess.py // SAST 后处理 │ └── utils // 工具 │ ├── dict // 小语种字典 -│ .... +│ .... │ ├── ic15_dict.txt // 英文数字字典,区分大小写 │ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型 │ ├── logging.py // logger @@ -207,10 +207,10 @@ PaddleOCR │ ├── program.py // 整体流程 │ ├── test_hubserving.py │ └── train.py // 启动训练 -├── paddleocr.py +├── paddleocr.py ├── README_ch.md // 中文说明文档 ├── README_en.md // 英文说明文档 ├── README.md // 主页说明文档 -├── requirments.txt // 安装依赖 +├── requirements.txt // 安装依赖 ├── setup.py // whl包打包脚本 -├── train.sh // 启动训练脚本 \ No newline at end of file +├── train.sh // 启动训练脚本 diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index db86b109d1a13d00aab833aa31d0279622e7c7f8..3fcd36c076c7d969420818484034ed610e76bc27 100755 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -138,12 +138,12 @@ For lightweight Chinese detection model inference, you can execute the following wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar tar xf ch_ppocr_mobile_v2.0_det_infer.tar # predict -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" ``` The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: -![](../imgs_results/det_res_22.jpg) +![](../imgs_results/det_res_00018069.jpg) You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image, The optional parameters of `litmit_type` are [`max`, `min`], and diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index 073b67b04d10cc2ae4b20f0ca38b604ab95bc09f..7f1f0e83c94e4d4a18b99d620b4b192c47ffde7c 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -61,7 +61,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR **4. Install third-party libraries** ``` cd PaddleOCR -pip3 install -r requirments.txt +pip3 install -r requirements.txt ``` If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. diff --git a/doc/doc_en/tree_en.md b/doc/doc_en/tree_en.md index be2f5733d3c30af8c83f1072cd0aea07865bacc3..cf9ccb38dcc0e5f70577f18933a5373d1ffa992a 100644 --- a/doc/doc_en/tree_en.md +++ b/doc/doc_en/tree_en.md @@ -116,7 +116,7 @@ PaddleOCR │ │ │ ├── text_image_aug // Tia data augment for text recognition │ │ │ │ ├── __init__.py │ │ │ │ ├── augment.py // Tia_distort,tia_stretch and tia_perspective -│ │ │ │ ├── warp_mls.py +│ │ │ │ ├── warp_mls.py │ │ │ ├── __init__.py │ │ │ ├── east_process.py // Data processing steps of EAST algorithm │ │ │ ├── iaa_augment.py // Data augmentation operations @@ -188,7 +188,7 @@ PaddleOCR │ │ └── sast_postprocess.py // SAST post-processing │ └── utils // utils │ ├── dict // Minor language dictionary -│ .... +│ .... │ ├── ic15_dict.txt // English number dictionary, case sensitive │ ├── ppocr_keys_v1.txt // Chinese dictionary for training Chinese models │ ├── logging.py // logger @@ -210,10 +210,10 @@ PaddleOCR │ ├── program.py // Inference system │ ├── test_hubserving.py │ └── train.py // Start training script -├── paddleocr.py +├── paddleocr.py ├── README_ch.md // Chinese documentation ├── README_en.md // English documentation ├── README.md // Home page documentation -├── requirments.txt // Requirments +├── requirements.txt // Requirements ├── setup.py // Whl package packaging script -├── train.sh // Start training bash script \ No newline at end of file +├── train.sh // Start training bash script diff --git a/doc/imgs/00006737.jpg b/doc/imgs/00006737.jpg new file mode 100755 index 0000000000000000000000000000000000000000..5c3329a8a0e64cb251cb5f6719d177146e75b866 Binary files /dev/null and b/doc/imgs/00006737.jpg differ diff --git a/doc/imgs/00009282.jpg b/doc/imgs/00009282.jpg new file mode 100755 index 0000000000000000000000000000000000000000..448d0f11209f4328e4097777007fdff97efbee7b Binary files /dev/null and b/doc/imgs/00009282.jpg differ diff --git a/doc/imgs/00015504.jpg b/doc/imgs/00015504.jpg new file mode 100755 index 0000000000000000000000000000000000000000..9d6aaee181913ef0d6c82b1728bc71de3130a804 Binary files /dev/null and b/doc/imgs/00015504.jpg differ diff --git a/doc/imgs/00018069.jpg b/doc/imgs/00018069.jpg new file mode 100755 index 0000000000000000000000000000000000000000..e768d8adfc5882ffac6e332ce858a175dd4a5bb7 Binary files /dev/null and b/doc/imgs/00018069.jpg differ diff --git a/doc/imgs/00056221.jpg b/doc/imgs/00056221.jpg new file mode 100755 index 0000000000000000000000000000000000000000..698e0dfc5334d64536bf9211fb2edcb5ed429186 Binary files /dev/null and b/doc/imgs/00056221.jpg differ diff --git a/doc/imgs/00057937.jpg b/doc/imgs/00057937.jpg new file mode 100755 index 0000000000000000000000000000000000000000..82a45c47bb87934ecdf599c4339de103ab965277 Binary files /dev/null and b/doc/imgs/00057937.jpg differ diff --git a/doc/imgs/00059985.jpg b/doc/imgs/00059985.jpg new file mode 100755 index 0000000000000000000000000000000000000000..0b5b656a160dfebe355e5ad58639de1170845e3f Binary files /dev/null and b/doc/imgs/00059985.jpg differ diff --git a/doc/imgs/00077949.jpg b/doc/imgs/00077949.jpg new file mode 100755 index 0000000000000000000000000000000000000000..1f832d7903e26a89148e18724d0561b926508cc7 Binary files /dev/null and b/doc/imgs/00077949.jpg differ diff --git a/doc/imgs/00111002.jpg b/doc/imgs/00111002.jpg new file mode 100755 index 0000000000000000000000000000000000000000..2aae5f7cb8829429d1564a345aa726ad426b6ec0 Binary files /dev/null and b/doc/imgs/00111002.jpg differ diff --git a/doc/imgs/00207393.jpg b/doc/imgs/00207393.jpg new file mode 100755 index 0000000000000000000000000000000000000000..e278adfb5973e44a437689a92ffac3ea882a5164 Binary files /dev/null and b/doc/imgs/00207393.jpg differ diff --git a/doc/imgs/10.jpg b/doc/imgs/10.jpg deleted file mode 100755 index a73e25dbdcdc30135804104934f5aae491d43ca8..0000000000000000000000000000000000000000 Binary files a/doc/imgs/10.jpg and /dev/null differ diff --git a/doc/imgs/13.png b/doc/imgs/13.png deleted file mode 100644 index 16e931ac69c7c99b813947c444b4a0be648b46d8..0000000000000000000000000000000000000000 Binary files a/doc/imgs/13.png and /dev/null differ diff --git a/doc/imgs/15.jpg b/doc/imgs/15.jpg deleted file mode 100644 index 180dcac005f28edae5b91fe9d99f05bd8471856a..0000000000000000000000000000000000000000 Binary files a/doc/imgs/15.jpg and /dev/null differ diff --git a/doc/imgs/16.png b/doc/imgs/16.png deleted file mode 100644 index a1adf38cd185b8a650c18226556ec0ba7a2a17f6..0000000000000000000000000000000000000000 Binary files a/doc/imgs/16.png and /dev/null differ diff --git a/doc/imgs/17.png b/doc/imgs/17.png deleted file mode 100644 index 0415adf59f9a2a65aa8ddb9c781ce01b643f5224..0000000000000000000000000000000000000000 Binary files a/doc/imgs/17.png and /dev/null differ diff --git a/doc/imgs/2.jpg b/doc/imgs/2.jpg deleted file mode 100644 index 811d2450e0a98bcfcef6980b08e6082d4f468bcf..0000000000000000000000000000000000000000 Binary files a/doc/imgs/2.jpg and /dev/null differ diff --git a/doc/imgs/22.jpg b/doc/imgs/22.jpg deleted file mode 100644 index 9bd1129d6724abc2fb541189df148c5c14721c64..0000000000000000000000000000000000000000 Binary files a/doc/imgs/22.jpg and /dev/null differ diff --git a/doc/imgs/3.jpg b/doc/imgs/3.jpg deleted file mode 100644 index 3bd6ca2bdacbdf63cf151ae4b1bab41688379510..0000000000000000000000000000000000000000 Binary files a/doc/imgs/3.jpg and /dev/null differ diff --git a/doc/imgs/4.jpg b/doc/imgs/4.jpg deleted file mode 100755 index 9669f7780d7eb9072b0befa2d043e55be9dcce1f..0000000000000000000000000000000000000000 Binary files a/doc/imgs/4.jpg and /dev/null differ diff --git a/doc/imgs/5.jpg b/doc/imgs/5.jpg deleted file mode 100644 index 8517e125c5e660f1bc603a844f946c68086e4616..0000000000000000000000000000000000000000 Binary files a/doc/imgs/5.jpg and /dev/null differ diff --git a/doc/imgs/6.jpg b/doc/imgs/6.jpg deleted file mode 100644 index 2705e262b337e565f0cbdde0914c6ef41275ebba..0000000000000000000000000000000000000000 Binary files a/doc/imgs/6.jpg and /dev/null differ diff --git a/doc/imgs/7.jpg b/doc/imgs/7.jpg deleted file mode 100644 index a9483bb74f66d88699b09545366c32a4fe108e54..0000000000000000000000000000000000000000 Binary files a/doc/imgs/7.jpg and /dev/null differ diff --git a/doc/imgs/8.jpg b/doc/imgs/8.jpg deleted file mode 100755 index 8dabd03f577ba7a510647304ca1c60e51fb6e65c..0000000000000000000000000000000000000000 Binary files a/doc/imgs/8.jpg and /dev/null differ diff --git a/doc/imgs/9.jpg b/doc/imgs/9.jpg deleted file mode 100644 index 8633c6d2dc25a73c31ec1164fe6c651497295977..0000000000000000000000000000000000000000 Binary files a/doc/imgs/9.jpg and /dev/null differ diff --git a/doc/imgs_results/2.jpg b/doc/imgs_results/2.jpg deleted file mode 100644 index 99f7e63b02556506dadf8d838eee22534d21d82c..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/2.jpg and /dev/null differ diff --git a/doc/imgs_results/det_res_00018069.jpg b/doc/imgs_results/det_res_00018069.jpg new file mode 100644 index 0000000000000000000000000000000000000000..02f35de332cb15594404c6a7df100f7b80e7c410 Binary files /dev/null and b/doc/imgs_results/det_res_00018069.jpg differ diff --git a/doc/imgs_results/det_res_2.jpg b/doc/imgs_results/det_res_2.jpg deleted file mode 100644 index c0ae501a7aff7807f53b743745005653775b0d03..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/det_res_2.jpg and /dev/null differ diff --git a/doc/imgs_results/det_res_22.jpg b/doc/imgs_results/det_res_22.jpg deleted file mode 100644 index d1255f49d9d371d4b91d98c6750a10a01f56b629..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/det_res_22.jpg and /dev/null differ diff --git a/ppocr/losses/det_db_loss.py b/ppocr/losses/det_db_loss.py index f170f6734a19289305a41d9e3f17a51d5ad10ec7..b079aabff7c7deccc7e365b91c9407f7e894bcb9 100755 --- a/ppocr/losses/det_db_loss.py +++ b/ppocr/losses/det_db_loss.py @@ -47,11 +47,12 @@ class DBLoss(nn.Layer): negative_ratio=ohem_ratio) def forward(self, predicts, labels): + predict_maps = predicts['maps'] label_threshold_map, label_threshold_mask, label_shrink_map, label_shrink_mask = labels[ 1:] - shrink_maps = predicts[:, 0, :, :] - threshold_maps = predicts[:, 1, :, :] - binary_maps = predicts[:, 2, :, :] + shrink_maps = predict_maps[:, 0, :, :] + threshold_maps = predict_maps[:, 1, :, :] + binary_maps = predict_maps[:, 2, :, :] loss_shrink_maps = self.bce_loss(shrink_maps, label_shrink_map, label_shrink_mask) diff --git a/ppocr/modeling/heads/det_db_head.py b/ppocr/modeling/heads/det_db_head.py index 49c50ffdba89f557e71ac5c7abe4e07e52bc9119..ca18d74a68f7b17ee6383d4a0c995a4c46a16187 100644 --- a/ppocr/modeling/heads/det_db_head.py +++ b/ppocr/modeling/heads/det_db_head.py @@ -120,9 +120,9 @@ class DBHead(nn.Layer): def forward(self, x): shrink_maps = self.binarize(x) if not self.training: - return shrink_maps + return {'maps': shrink_maps} threshold_maps = self.thresh(x) binary_maps = self.step_function(shrink_maps, threshold_maps) y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1) - return y + return {'maps': y} diff --git a/ppocr/postprocess/db_postprocess.py b/ppocr/postprocess/db_postprocess.py index 16c789dcd7e9740ca8ddf613d0f2567c9af22820..91729e0a93c8b013cd734abc37cb6ac2b4960312 100755 --- a/ppocr/postprocess/db_postprocess.py +++ b/ppocr/postprocess/db_postprocess.py @@ -40,7 +40,8 @@ class DBPostProcess(object): self.max_candidates = max_candidates self.unclip_ratio = unclip_ratio self.min_size = 3 - self.dilation_kernel = None if not use_dilation else np.array([[1, 1], [1, 1]]) + self.dilation_kernel = None if not use_dilation else np.array( + [[1, 1], [1, 1]]) def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height): ''' @@ -132,7 +133,8 @@ class DBPostProcess(object): cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1) return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] - def __call__(self, pred, shape_list): + def __call__(self, outs_dict, shape_list): + pred = outs_dict['maps'] if isinstance(pred, paddle.Tensor): pred = pred.numpy() pred = pred[:, 0, :, :] diff --git a/ppocr/utils/save_load.py b/ppocr/utils/save_load.py index af2de054de3656638fee8d4328765c21b4deaea4..02814d6208aba7ddfa6eac338229502b18b535da 100644 --- a/ppocr/utils/save_load.py +++ b/ppocr/utils/save_load.py @@ -102,7 +102,6 @@ def init_model(config, model, logger, optimizer=None, lr_scheduler=None): best_model_dict = states_dict.get('best_model_dict', {}) if 'epoch' in states_dict: best_model_dict['start_epoch'] = states_dict['epoch'] + 1 - best_model_dict['start_epoch'] = best_model_dict['best_epoch'] + 1 logger.info("resume from {}".format(checkpoints)) elif pretrained_model: diff --git a/tools/infer/predict_det.py b/tools/infer/predict_det.py index d389ca393dc94a7ece69e6f59f999073ae4b1773..ba0adaee258096ea9970425cc05ca7a8f1cf08c4 100755 --- a/tools/infer/predict_det.py +++ b/tools/infer/predict_det.py @@ -65,12 +65,12 @@ class TextDetector(object): postprocess_params["unclip_ratio"] = args.det_db_unclip_ratio postprocess_params["use_dilation"] = True elif self.det_algorithm == "EAST": - postprocess_params['name'] = 'EASTPostProcess' + postprocess_params['name'] = 'EASTPostProcess' postprocess_params["score_thresh"] = args.det_east_score_thresh postprocess_params["cover_thresh"] = args.det_east_cover_thresh postprocess_params["nms_thresh"] = args.det_east_nms_thresh elif self.det_algorithm == "SAST": - postprocess_params['name'] = 'SASTPostProcess' + postprocess_params['name'] = 'SASTPostProcess' postprocess_params["score_thresh"] = args.det_sast_score_thresh postprocess_params["nms_thresh"] = args.det_sast_nms_thresh self.det_sast_polygon = args.det_sast_polygon @@ -177,8 +177,10 @@ class TextDetector(object): preds['f_score'] = outputs[1] preds['f_tco'] = outputs[2] preds['f_tvo'] = outputs[3] + elif self.det_algorithm == 'DB': + preds['maps'] = outputs[0] else: - preds = outputs[0] + raise NotImplementedError post_result = self.postprocess_op(preds, shape_list) dt_boxes = post_result[0]['points']