diff --git a/README.md b/README.md index 4ebbf2f0067aa6faff3304c97b12afa7274ca554..099a43b52ba8e00838257569507467797ae06bfb 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 **近期更新** +- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) - 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md) - 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md) - 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) @@ -14,51 +15,69 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 特性 -- 超轻量级中文OCR模型,总模型仅8.6M - - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - - 检测模型DB(4.1M)+识别模型CRNN(4.5M) -- 实用通用中文OCR模型 -- 多种预测推理部署方案,包括服务部署和端侧部署 -- 多种文本检测训练算法,EAST、DB、SAST -- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE、SRN + +- PPOCR系列高质量预训练模型,媲美商业效果 + - 超轻量ppocr_mobile系列:检测(2.6M)+方向分类器(0.9M + )+ 识别(4.6M)= 8.1M + - 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M + - 超轻量压缩ppocr_mobile_slim系列:(coming soon) +- 支持中英文数字组合识别、竖排文本识别、长文本识别 +- 支持多语言识别:韩语、日语、德语、法语 (coming soon) +- 支持用户自定义训练,提供丰富的预测推理部署方案 +- 支持PIP快速安装使用 - 可运行于Linux、Windows、MacOS等多种系统 -## 快速体验 +## 效果展示
- + +
-上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 +上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 + +## 快速体验 +- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr -- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr -- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统):[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) +- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统),Android手机也可以直接扫描下面二维码安装体验。 - Android手机也可以扫描下面二维码安装体验。
+- 代码体验:可以直接进入[快速安装](./doc/doc_ch/installation.md) + + +## PP-OCR 1.1系列模型列表(9月17日更新) + +| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | | +| ------------ | --------------- | ----------------|---- | ---------- | -------- | ---- | +| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) |[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | | +| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) |[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | | +| 中英文超轻量压缩OCR模型 | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |即将开源 |即将开源|即将开源| | || + +更多V1.1版本模型下载,可以参考[OCR1.1模型列表](./doc/doc_ch/models_list.md) + +## PP-OCR 1.0系列模型列表(7月16日更新) -## 中文OCR模型列表 +| 模型简介 | 模型名称 | 检测模型 | 识别模型 | 支持空格的识别模型 | | +| ------------ | ---------------------- | -------- | ---------- | -------- | ---- | +| 超轻量中英文OCR模型(8.6M) | chinese_db_crnn_mobile_xx |[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) | | +|通用中文OCR模型(212M)|chinese_db_crnn_server_xx|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)| | -|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| -|-|-|-|-|-| -|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) ## 文档教程 - [快速安装](./doc/doc_ch/installation.md) - [中文OCR模型快速使用](./doc/doc_ch/quickstart.md) - 算法介绍 - - [文本检测](#文本检测算法) - - [文本识别](#文本识别算法) + - [文本检测](./doc/doc_ch/algorithm_overview.md) + - [文本识别](./doc/doc_ch/algorithm_overview.md) + - PP-OCR (coming soon) - 模型训练/评估 - [文本检测](./doc/doc_ch/detection.md) - [文本识别](./doc/doc_ch/recognition.md) - [yml参数配置文件介绍](./doc/doc_ch/config.md) - - [中文OCR训练预测技巧](./doc/doc_ch/tricks.md) - 预测部署 - [基于Python预测引擎推理](./doc/doc_ch/inference.md) - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) @@ -72,10 +91,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) - [常用数据标注工具](./doc/doc_ch/data_annotation.md) - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) -- 效果展示 - - [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示) - - [通用中文OCR效果展示](#通用中文OCR效果展示) - - [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示) +- [效果展示](#效果展示) - FAQ - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) - [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md) @@ -85,104 +101,20 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - [许可证书](#许可证书) - [贡献代码](#贡献代码) - -## 算法介绍 - -### 1.文本检测算法 - -PaddleOCR开源的文本检测算法列表: -- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) -- [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研) - -在ICDAR2015文本检测公开数据集上,算法效果如下: - -|模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| - -在Total-text文本检测公开数据集上,算法效果如下: - -|模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| - -**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) - - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: - -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| -|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| - -* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 - -PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。 - - -### 2.文本识别算法 - -PaddleOCR开源的文本识别算法列表: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) -- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) -- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) -- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研) - -参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: - -|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| -|-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| -|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| -|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| - -**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 -原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 - -使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: - -|模型|骨干网络|配置文件|预训练模型| -|-|-|-|-| -|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| -|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| - -PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。 - -## 效果展示 - - -### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md) + +## 效果展示 [more](./doc/doc_ch/visualization.md)
- + + + + +
- -### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md) -
- -
- -### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md) -
- -
## 欢迎加入PaddleOCR技术交流群 diff --git a/configs/det/det_mv3_db.yml b/configs/det/det_mv3_db.yml index 91a8e86f8bba440df83c1d9f7da0e6523d5907bb..5f67ca1db758069bb6d19276339895302604fd62 100755 --- a/configs/det/det_mv3_db.yml +++ b/configs/det/det_mv3_db.yml @@ -24,6 +24,7 @@ Backbone: function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3 scale: 0.5 model_name: large + disable_se: true Head: function: ppocr.modeling.heads.det_db_head,DBHead diff --git a/configs/det/det_mv3_db_v1.1.yml b/configs/det/det_mv3_db_v1.1.yml deleted file mode 100755 index 5f67ca1db758069bb6d19276339895302604fd62..0000000000000000000000000000000000000000 --- a/configs/det/det_mv3_db_v1.1.yml +++ /dev/null @@ -1,55 +0,0 @@ -Global: - algorithm: DB - use_gpu: true - epoch_num: 1200 - log_smooth_window: 20 - print_batch_step: 2 - save_model_dir: ./output/det_db/ - save_epoch_step: 200 - # evaluation is run every 5000 iterations after the 4000th iteration - eval_batch_step: [4000, 5000] - train_batch_size_per_card: 16 - test_batch_size_per_card: 16 - image_shape: [3, 640, 640] - reader_yml: ./configs/det/det_db_icdar15_reader.yml - pretrain_weights: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ - checkpoints: - save_res_path: ./output/det_db/predicts_db.txt - save_inference_dir: - -Architecture: - function: ppocr.modeling.architectures.det_model,DetModel - -Backbone: - function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3 - scale: 0.5 - model_name: large - disable_se: true - -Head: - function: ppocr.modeling.heads.det_db_head,DBHead - model_name: large - k: 50 - inner_channels: 96 - out_channels: 2 - -Loss: - function: ppocr.modeling.losses.det_db_loss,DBLoss - balance_loss: true - main_loss_type: DiceLoss - alpha: 5 - beta: 10 - ohem_ratio: 3 - -Optimizer: - function: ppocr.optimizer,AdamDecay - base_lr: 0.001 - beta1: 0.9 - beta2: 0.999 - -PostProcess: - function: ppocr.postprocess.db_postprocess,DBPostProcess - thresh: 0.3 - box_thresh: 0.6 - max_candidates: 1000 - unclip_ratio: 1.5 diff --git a/configs/rec/multi_languages/rec_en_lite_train.yml b/configs/rec/multi_languages/rec_en_lite_train.yml new file mode 100644 index 0000000000000000000000000000000000000000..128424b4d3a5631f8237f6cd596c901990ff2277 --- /dev/null +++ b/configs/rec/multi_languages/rec_en_lite_train.yml @@ -0,0 +1,53 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/en_number + save_epoch_step: 3 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 30 + character_type: ch + character_dict_path: ./ppocr/utils/ic15_dict.txt + loss_type: ctc + distort: false + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_en_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay_warmup + warmup_minibatch: 1000 + step_each_epoch: 6530 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_en_reader.yml b/configs/rec/multi_languages/rec_en_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..558e2c9b653642f919b5a1e15211b934dc39ad13 --- /dev/null +++ b/configs/rec/multi_languages/rec_en_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/en_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/en_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_french_lite_train.yml b/configs/rec/multi_languages/rec_french_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..2cf54c427eb6a7c64f4b54b021c44013a1dc1d6a --- /dev/null +++ b/configs/rec/multi_languages/rec_french_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_french + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: french + character_dict_path: ./ppocr/utils/french_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_french_reader.yml b/configs/rec/multi_languages/rec_french_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..e456de1dc8800822cc9af496e825c45cdbebe081 --- /dev/null +++ b/configs/rec/multi_languages/rec_french_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/french_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/french_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_ger_lite_train.yml b/configs/rec/multi_languages/rec_ger_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..beb1755b105fea9cbade9f35ceac15d380651f37 --- /dev/null +++ b/configs/rec/multi_languages/rec_ger_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_german + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: german + character_dict_path: ./ppocr/utils/german_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_ger_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_ger_reader.yml b/configs/rec/multi_languages/rec_ger_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..edd78d4f115dc7e1376556ee0c93f655ac891e47 --- /dev/null +++ b/configs/rec/multi_languages/rec_ger_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/de_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/de_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_japan_lite_train.yml b/configs/rec/multi_languages/rec_japan_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..fbbab33eadd2901d9eac93f49e737e92d9441270 --- /dev/null +++ b/configs/rec/multi_languages/rec_japan_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_japan + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: japan + character_dict_path: ./ppocr/utils/japan_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_japan_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_japan_reader.yml b/configs/rec/multi_languages/rec_japan_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..348590920a131843a6ab7d8c76498a486d4ed709 --- /dev/null +++ b/configs/rec/multi_languages/rec_japan_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/japan_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/japan_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/configs/rec/multi_languages/rec_korean_lite_train.yml b/configs/rec/multi_languages/rec_korean_lite_train.yml new file mode 100755 index 0000000000000000000000000000000000000000..29cc08aaefb017c690551e030a57e85ebb21e2dd --- /dev/null +++ b/configs/rec/multi_languages/rec_korean_lite_train.yml @@ -0,0 +1,52 @@ +Global: + algorithm: CRNN + use_gpu: true + epoch_num: 500 + log_smooth_window: 20 + print_batch_step: 10 + save_model_dir: ./output/rec_korean + save_epoch_step: 1 + eval_batch_step: 2000 + train_batch_size_per_card: 256 + test_batch_size_per_card: 256 + image_shape: [3, 32, 320] + max_text_length: 25 + character_type: korean + character_dict_path: ./ppocr/utils/korean_dict.txt + loss_type: ctc + distort: true + use_space_char: false + reader_yml: ./configs/rec/multi_languages/rec_korean_reader.yml + pretrain_weights: + checkpoints: + save_inference_dir: + infer_img: + +Architecture: + function: ppocr.modeling.architectures.rec_model,RecModel + +Backbone: + function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3 + scale: 0.5 + model_name: small + small_stride: [1, 2, 2, 2] + +Head: + function: ppocr.modeling.heads.rec_ctc_head,CTCPredict + encoder_type: rnn + SeqRNN: + hidden_size: 48 + +Loss: + function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss + +Optimizer: + function: ppocr.optimizer,AdamDecay + l2_decay: 0.00001 + base_lr: 0.001 + beta1: 0.9 + beta2: 0.999 + decay: + function: cosine_decay + step_each_epoch: 254 + total_epoch: 500 diff --git a/configs/rec/multi_languages/rec_korean_reader.yml b/configs/rec/multi_languages/rec_korean_reader.yml new file mode 100755 index 0000000000000000000000000000000000000000..58ebf6cf8d340a06c0b3e2883be8839112980123 --- /dev/null +++ b/configs/rec/multi_languages/rec_korean_reader.yml @@ -0,0 +1,13 @@ +TrainReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + num_workers: 8 + img_set_dir: ./train_data + label_file_path: ./train_data/korean_train.txt + +EvalReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader + img_set_dir: ./train_data + label_file_path: ./train_data/korean_eval.txt + +TestReader: + reader_function: ppocr.data.rec.dataset_traversal,SimpleReader diff --git a/deploy/cpp_infer/readme.md b/deploy/cpp_infer/readme.md index 0b2441097fbdd0c0ea3acb7ce5a696837645443f..571ed2eb2b071574aec3cabdff01b6c9d7f17440 100644 --- a/deploy/cpp_infer/readme.md +++ b/deploy/cpp_infer/readme.md @@ -193,6 +193,9 @@ make -j sh tools/run.sh ``` +* 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1,表示开启方向分类器的预测。 + + 最终屏幕上会输出检测结果如下。
diff --git a/deploy/cpp_infer/readme_en.md b/deploy/cpp_infer/readme_en.md index ecb29f9b9673446c86b2b561440b57d29ea457f4..a545b8606cda0b476b439543382d997065721892 100644 --- a/deploy/cpp_infer/readme_en.md +++ b/deploy/cpp_infer/readme_en.md @@ -162,7 +162,7 @@ inference/ sh tools/build.sh ``` -具体地,`tools/build.sh`中内容如下。 +Specifically, the content in `tools/build.sh` is as follows. ```shell OPENCV_DIR=your_opencv_dir @@ -201,6 +201,8 @@ make -j sh tools/run.sh ``` +* If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function. + The detection results will be shown on the screen, which is as follows.
diff --git a/deploy/cpp_infer/tools/config.txt b/deploy/cpp_infer/tools/config.txt index 28bacba60d4a599ad951c9820938b38e55b07283..9fa770bb04dd4991e6844ad67aa1bbcfa7788318 100644 --- a/deploy/cpp_infer/tools/config.txt +++ b/deploy/cpp_infer/tools/config.txt @@ -15,7 +15,7 @@ det_model_dir ./inference/det_db # cls config use_angle_cls 0 -cls_model_dir ../inference/cls +cls_model_dir ./inference/cls cls_thresh 0.9 # rec config diff --git a/deploy/hubserving/ocr_det/params.py b/deploy/hubserving/ocr_det/params.py index e88ab45c7bb548ef971465d4aaefb30d247ab17f..f37993a10b85097b11e38bbb2efe25c649bec8d0 100644 --- a/deploy/hubserving/ocr_det/params.py +++ b/deploy/hubserving/ocr_det/params.py @@ -13,7 +13,7 @@ def read_params(): #params for text detector cfg.det_algorithm = "DB" - cfg.det_model_dir = "./inference/ch_det_mv3_db/" + cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/" cfg.det_max_side_len = 960 #DB parmas diff --git a/deploy/hubserving/ocr_rec/params.py b/deploy/hubserving/ocr_rec/params.py index 59772e2163d1d5f8279dee85432b5bf93502914e..58a8bc119e2a54ad78446bd616eeb7a9089a6084 100644 --- a/deploy/hubserving/ocr_rec/params.py +++ b/deploy/hubserving/ocr_rec/params.py @@ -28,7 +28,7 @@ def read_params(): #params for text recognizer cfg.rec_algorithm = "CRNN" - cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/" + cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/" cfg.rec_image_shape = "3, 32, 320" cfg.rec_char_type = 'ch' diff --git a/deploy/hubserving/ocr_system/params.py b/deploy/hubserving/ocr_system/params.py index 21e8cca4a0990ecb5963280100db1a0a3fb62151..d83fe692dca7c94c7225a1aa26e782765e665bdd 100644 --- a/deploy/hubserving/ocr_system/params.py +++ b/deploy/hubserving/ocr_system/params.py @@ -13,7 +13,7 @@ def read_params(): #params for text detector cfg.det_algorithm = "DB" - cfg.det_model_dir = "./inference/ch_det_mv3_db/" + cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/" cfg.det_max_side_len = 960 #DB parmas @@ -28,7 +28,7 @@ def read_params(): #params for text recognizer cfg.rec_algorithm = "CRNN" - cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/" + cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/" cfg.rec_image_shape = "3, 32, 320" cfg.rec_char_type = 'ch' diff --git a/doc/doc_ch/serving.md b/deploy/hubserving/readme.md similarity index 83% rename from doc/doc_ch/serving.md rename to deploy/hubserving/readme.md index 99fe3006fde8762930ef9a168da81cce9069f8e0..5d29b432ba3d4c098872431c9b5fde13f553eee0 100644 --- a/doc/doc_ch/serving.md +++ b/deploy/hubserving/readme.md @@ -1,10 +1,12 @@ -# 服务部署 +[English](readme_en.md) | 简体中文 PaddleOCR提供2种服务部署方式: -- 基于HubServing的部署:已集成到PaddleOCR中([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)),按照本教程使用; -- 基于PaddleServing的部署:详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr),后续也将集成到PaddleOCR。 +- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用; +- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../pdserving/readme.md)。 -服务部署目录下包括检测、识别、2阶段串联三种服务包,根据需求选择相应的服务包进行安装和启动。目录如下: +# 基于PaddleHub Serving的服务部署 + +hubserving服务部署目录下包括检测、识别、2阶段串联三种服务包,请根据需求选择相应的服务包进行安装和启动。目录结构如下: ``` deploy/hubserving/ └─ ocr_det 检测模块服务包 @@ -30,11 +32,18 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple # 在Linux下设置环境变量 export PYTHONPATH=. -# 在Windows下设置环境变量 + +# 或者,在Windows下设置环境变量 SET PYTHONPATH=. ``` -### 2. 安装服务模块 +### 2. 下载推理模型 +安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认检测模型路径为: +`./inference/ch_ppocr_mobile_v1.1_det_infer/`,识别模型路径为:`./inference/ch_ppocr_mobile_v1.1_rec_infer/`。 + +**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。 + +### 3. 安装服务模块 PaddleOCR提供3种服务模块,根据需要安装所需模块。 * 在Linux环境下,安装示例如下: @@ -61,15 +70,7 @@ hub install deploy\hubserving\ocr_rec\ hub install deploy\hubserving\ocr_system\ ``` -#### 安装模型 -安装服务模块前,需要将训练好的模型放到对应的文件夹内。默认使用的是: -./inference/ch_det_mv3_db/ -和 -./inference/ch_rec_mv3_crnn/ -这两个模型可以在https://github.com/PaddlePaddle/PaddleOCR 下载 -可以在./deploy/hubserving/ocr_system/params.py 里面修改成自己的模型 - -### 3. 启动服务 +### 4. 启动服务 #### 方式1. 命令行命令启动(仅支持CPU) **启动命令:** ```shell @@ -172,7 +173,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json ```hub serving stop --port/-p XXXX``` - 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。 -例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。 +例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。** - 3、 卸载旧服务包 ```hub uninstall ocr_system``` diff --git a/doc/doc_en/serving_en.md b/deploy/hubserving/readme_en.md similarity index 84% rename from doc/doc_en/serving_en.md rename to deploy/hubserving/readme_en.md index 7439cc84abb58f091febc3acda169816d34a836b..efef1cda6dd5a91d6ad2f7db27061418fa24e105 100644 --- a/doc/doc_en/serving_en.md +++ b/deploy/hubserving/readme_en.md @@ -1,10 +1,12 @@ -# Service deployment +English | [简体中文](readme.md) -PaddleOCR provides 2 service deployment methods:: -- Based on **HubServing**:Has been integrated into PaddleOCR ([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)). Please follow this tutorial. -- Based on **PaddleServing**:See PaddleServing official website for details ([demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)). Follow-up will also be integrated into PaddleOCR. +PaddleOCR provides 2 service deployment methods: +- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial. +- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../pdserving/readme_en.md) for usage. -The service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Select the corresponding service package to install and start service according to your needs. The directory is as follows: +# Service deployment based on PaddleHub Serving + +The hubserving service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Please select the corresponding service package to install and start service according to your needs. The directory is as follows: ``` deploy/hubserving/ └─ ocr_det detection module service package @@ -31,11 +33,17 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple # Set environment variables on Linux export PYTHONPATH=. + # Set environment variables on Windows SET PYTHONPATH=. ``` -### 2. Install Service Module +### 2. Download inference model +Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default detection model path is: `./inference/ch_ppocr_mobile_v1.1_det_infer/`, the default recognition model path is: `./inference/ch_ppocr_mobile_v1.1_rec_infer/`. + +**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself. + +### 3. Install Service Module PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs. * On Linux platform, the examples are as follows. @@ -62,7 +70,7 @@ hub install deploy\hubserving\ocr_rec\ hub install deploy\hubserving\ocr_system\ ``` -### 3. Start service +### 4. Start service #### Way 1. Start with command line parameters (CPU only) **start command:** diff --git a/deploy/pdserving/readme.md b/deploy/pdserving/readme.md index a6a88c20517c6ca01db1004c9e634d1adeafaa3a..af12d508ba9c04e6032f2a392701e72b41462395 100644 --- a/deploy/pdserving/readme.md +++ b/deploy/pdserving/readme.md @@ -1,5 +1,10 @@ -# Paddle Serving 服务部署 +[English](readme_en.md) | 简体中文 + +PaddleOCR提供2种服务部署方式: +- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../hubserving/readme.md)。 +- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。 +# Paddle Serving 服务部署 本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。 ## 快速启动服务 diff --git a/deploy/pdserving/readme_en.md b/deploy/pdserving/readme_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9a0c684fb6fb4f0eeff2552af70f62053d3351fb --- /dev/null +++ b/deploy/pdserving/readme_en.md @@ -0,0 +1,123 @@ +English | [简体中文](readme.md) + +PaddleOCR provides 2 service deployment methods: +- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../hubserving/readme_en.md) for usage. +- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please follow this tutorial. + +# Service deployment based on Paddle Serving + +This tutorial will introduce the detail steps of deploying PaddleOCR online prediction service based on [Paddle Serving](https://github.com/PaddlePaddle/Serving). + +## Quick start service + +### 1. Prepare the environment +Let's first install the relevant components of Paddle Serving. GPU is recommended for service deployment with Paddle Serving. + +**Requirements:** +- **CUDA version: 9.0** +- **CUDNN version: 7.0** +- **Operating system version: >= CentOS 6** +- **Python version: 2.7/3.6/3.7** + +**Installation:** +``` +# install GPU server +python -m pip install paddle_serving_server_gpu + +# or, install CPU server +python -m pip install paddle_serving_server + +# install client and App package (CPU/GPU) +python -m pip install paddle_serving_app paddle_serving_client +``` + +### 2. Model transformation +You can directly use converted model provided by `paddle_serving_app` for convenience. Execute the following command to obtain: +``` +python -m paddle_serving_app.package --get_model ocr_rec +tar -xzvf ocr_rec.tar.gz +python -m paddle_serving_app.package --get_model ocr_det +tar -xzvf ocr_det.tar.gz +``` +Executing the above command will download the `db_crnn_mobile` model, which is in different format with inference model. If you want to use other models for deployment, you can refer to the [tutorial](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md) to convert your inference model to a model which is deployable for Paddle Serving. + +We take `ch_rec_r34_vd_crnn` model as example. Download the inference model by executing the following command: +``` +wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar +tar xf ch_rec_r34_vd_crnn_infer.tar +``` + +Convert the downloaded model by executing the following python script: +``` +from paddle_serving_client.io import inference_model_to_serving +inference_model_dir = "ch_rec_r34_vd_crnn" +serving_client_dir = "serving_client_dir" +serving_server_dir = "serving_server_dir" +feed_var_names, fetch_var_names = inference_model_to_serving( + inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params") +``` + +Finally, model configuration of client and server will be generated in `serving_client_dir` and `serving_server_dir`. + +### 3. Start service +Start the standard version or the fast version service according to your actual needs. The comparison of the two versions is shown in the table below: + +|version|characteristics|recommended scenarios| +|-|-|-| +|standard version|High stability, suitable for distributed deployment|Large throughput and cross regional deployment| +|fast version|Easy to deploy and fast to predict|Suitable for scenarios which requires high prediction speed and fast iteration speed| + +#### Mode 1. Start the standard mode service + +``` +# start with CPU +python -m paddle_serving_server.serve --model ocr_det_model --port 9293 +python ocr_web_server.py cpu + +# or, with GPU +python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0 +python ocr_web_server.py gpu +``` + +#### Mode 2. Start the fast mode service + +``` +# start with CPU +python ocr_local_server.py cpu + +# or, with GPU +python ocr_local_server.py gpu +``` + +## Send prediction requests + +``` +python ocr_web_client.py +``` + +## Returned result format + +The returned result is a JSON string, eg. +``` +{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}} +``` + +You can also print the readable result in `res`: +``` +土地整治与土壤修复研究中心 +华南农业大学1素图 +``` + +## User defined service module modification + +The pre-processing and post-processing process, can be found in the `preprocess` and `postprocess` function in `ocr_web_server.py` or `ocr_local_server.py`. The pre-processing/post-processing library for common CV models provided by `paddle_serving_app` is called. +You can modify the corresponding code as actual needs. + +If you only want to start the detection service or the recognition service, execute the corresponding script reffering to the following table. Indicate the CPU or GPU is used in the start command parameters. + +| task | standard | fast | +| ---- | ----------------- | ------------------- | +| detection | det_web_server.py | det_local_server.py | +| recognition | rec_web_server.py | rec_local_server.py | + +More info can be found in [Paddle Serving](https://github.com/PaddlePaddle/Serving). diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md index f7d87c83602f69ada46b35e7d63260fe8bc6e055..d1aa3d71e5254cf6b5b2be7fdf6943903d42fafd 100755 --- a/deploy/slim/quantization/README.md +++ b/deploy/slim/quantization/README.md @@ -1,21 +1,148 @@ > 运行示例前请先安装1.2.0或更高版本PaddleSlim + # 模型量化压缩教程 +压缩结果: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序号任务模型压缩策略精度(自建中文数据集)耗时(ms)整体耗时(ms)加速比整体模型大小(M)压缩比例下载链接
0检测MobileNetV3_DB61.7224375-8.6-
识别MobileNetV3_CRNN62.09.52
1检测SlimTextDetPACT量化训练62.11953488%2.867.82%
识别SlimTextRecPACT量化训练61.488.6
2检测SlimTextDet_quat_pruning剪裁+PACT量化训练60.8614228830%2.867.82%
识别SlimTextRecPACT量化训练61.488.6
3检测SlimTextDet_pruning剪裁61.5713829527%2.966.28%
识别SlimTextRecPACT量化训练61.488.6
+ + + ## 概述 +复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 + 该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对OCR模型进行压缩。 在阅读该示例前,建议您先了解以下内容: - [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md) -- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/) +- [PaddleSlim使用文档](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) + + ## 安装PaddleSlim -可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。 +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git + +cd Paddleslim + +python setup.py install +``` + + + +## 获取预训练模型 + +[识别预训练模型下载地址]() + +[检测预训练模型下载地址]() ## 量化训练 +加载预训练模型后,在定义好量化策略后即可对模型进行量化。量化相关功能的使用具体细节见:[模型量化](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html) 进入PaddleOCR根目录,通过以下命令对模型进行量化: @@ -25,10 +152,11 @@ python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global + ## 导出模型 在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署: ```bash -python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_model +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model ``` diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md new file mode 100755 index 0000000000000000000000000000000000000000..4b8a2b23a254b143cd230c81a7e433d251e10ff2 --- /dev/null +++ b/deploy/slim/quantization/README_en.md @@ -0,0 +1,167 @@ +\> PaddleSlim 1.2.0 or higher version should be installed before runing this example. + + + +# Model compress tutorial (Quantization) + +Compress results: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IDTaskModelCompress StrategyCriterion(Chinese dataset)Inference Time(ms)Inference Time(Total model)(ms)Acceleration RatioModel Size(MB)Commpress RatioDownload Link
0DetectionMobileNetV3_DBNone61.7224375-8.6-
RecognitionMobileNetV3_CRNNNone62.09.52
1DetectionSlimTextDetPACT Quant Aware Training62.11953488%2.867.82%
RecognitionSlimTextRecPACT Quant Aware Training61.488.6
2DetectionSlimTextDet_quat_pruningPruning+PACT Quant Aware Training60.8614228830%2.867.82%
RecognitionSlimTextRecPPACT Quant Aware Training61.488.6
3DetectionSlimTextDet_pruningPruning61.5713829527%2.966.28%
RecognitionSlimTextRecPACT Quant Aware Training61.488.6
+ + + +## Overview + +Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancyby reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance. + +This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. + +It is recommended that you could understand following pages before reading this example,: + + + +- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md) + +- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) + + + +## Install PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git + +cd Paddleslim + +python setup.py install + +``` + + +## Download Pretrain Model + +[Download link of Detection pretrain model]() + +[Download link of recognization pretrain model]() + + +## Quan-Aware Training + +After loading the pre training model, the model can be quantified after defining the quantization strategy. For specific details of quantization method, see:[Model Quantization](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html) + +Enter the PaddleOCR root directory,perform model quantization with the following command: + +```bash +python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1 +``` + + + +## Export inference model + +After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment: + +```bash +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model +``` diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md new file mode 100644 index 0000000000000000000000000000000000000000..9c2499f3d11a82c5246ace8dc96eef6dcc32e857 --- /dev/null +++ b/doc/doc_ch/algorithm_overview.md @@ -0,0 +1,78 @@ + +## 算法介绍 +- [1.文本检测算法](#文本检测算法) +- [2.文本识别算法](#文本识别算法) + + +### 1.文本检测算法 + +PaddleOCR开源的文本检测算法列表: +- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐) +- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498)) + +在ICDAR2015文本检测公开数据集上,算法效果如下: + +|模型|骨干网络|precision|recall|Hmean|下载链接| +|-|-|-|-|-|-| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| +|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)| + +在Total-text文本检测公开数据集上,算法效果如下: + +|模型|骨干网络|precision|recall|Hmean|下载链接| +|-|-|-|-|-|-| +|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)| + +**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) + + +使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: + +|模型|骨干网络|配置文件|预训练模型| +|-|-|-|-| +|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| +|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| + +* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 + +PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md)。 + + +### 2.文本识别算法 + +PaddleOCR开源的文本识别算法列表: +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐) +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) +- [x] SRN([paper](https://arxiv.org/abs/2003.12294)) + +参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: + +|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| +|-|-|-|-|-| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| +|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)| + +**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。 +原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。 + +使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: + +|模型|骨干网络|配置文件|预训练模型| +|-|-|-|-| +|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| + +PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。 diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index 84ffeb5d7f1008bfdb1eef269f050fbf4e6fb72e..c2b62edbee7ae855cd32b03cc0019027fb05f669 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -14,6 +14,15 @@ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_l wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt ``` +PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例: + +``` +# 将官网下载的标签文件转换为 train_icdar2015_label.txt +python gen_label.py --mode="det" --root_path="icdar_c4_train_imgs/" \ + --input_path="ch4_training_localization_transcription_gt" \ + --output_label="train_icdar2015_label.txt" +``` + 解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,分别是: ``` /PaddleOCR/train_data/icdar2015/text_localization/ diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md index 431cdb5a4a9f24cf5862c159d51be2a07e9d4047..709a07515c316cdfd60b74f0b090d4baeeb290a7 100644 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -24,6 +24,7 @@ inference 模型(`fluid.io.save_inference_model`保存的模型) - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理) - [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理) + - [5. 多语言模型的推理](#多语言模型的推理) - [四、方向分类模型推理](#方向识别模型推理) - [1. 方向分类模型推理](#方向分类模型推理) @@ -305,6 +306,22 @@ dict_character = list(self.character_str) python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" ``` + +### 5. 多语言模型的推理 +如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, +需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" +``` +![](../imgs_words/korean/1.jpg) + +执行命令后,上图的预测结果为: +``` text +2020-09-19 16:15:05,076-INFO: index: [205 206 38 39] +2020-09-19 16:15:05,077-INFO: word : 바탕으로 +2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535 +``` ## 四、方向分类模型推理 diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 497140592ea4f4cbfe2000146b6903844f3f9872..ab47db21ef7e31c53d018a9741c08f24eaf83ca2 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -7,22 +7,22 @@ - [3. 多语言识别模型](#多语言识别模型) - [三、文本方向分类模型](#文本方向分类模型) -PaddleOCR提供的可下载模型包括`预测模型`、`训练模型`、`预训练模型`、`slim模型`,模型区别说明如下: +PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训练模型`、`slim模型`,模型区别说明如下: |模型类型|模型格式|简介| |-|-|-| -|预测模型|model、params|用于python预测引擎推理,[详情](./inference.md)| +|推理模型|model、params|用于python预测引擎推理,[详情](./inference.md)| |训练模型、预训练模型|\*.pdmodel、\*.pdopt、\*.pdparams|训练过程中保存的checkpoints模型,保存的是模型的参数,多用于模型指标评估和恢复训练| |slim模型|-|用于lite部署| ### 一、文本检测模型 -|模型名称|模型简介|预测模型大小|下载地址| +|模型名称|模型简介|推理模型大小|下载地址| |-|-|-|-| -|ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|-|[预测模型]() / [训练模型]() / [slim模型]()| -|ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|2.6M|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| -|ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|47.2M|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| +|ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|1.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| +|ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| +|ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|47.2M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| @@ -30,41 +30,42 @@ PaddleOCR提供的可下载模型包括`预测模型`、`训练模型`、`预训 #### 1. 中文识别模型 -|模型名称|模型简介|预测模型大小|下载地址| +|模型名称|模型简介|推理模型大小|下载地址| |-|-|-|-| -|ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|-|[预测模型]() / [训练模型]() / [slim模型]()| -|ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|4.6M|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)| -|ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|105M|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)| +|ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|1.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)| +|ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|4.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)| +|ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|105M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)| **说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。 #### 2. 英文识别模型 -|模型名称|模型简介|预测模型大小|下载地址| +|模型名称|模型简介|推理模型大小|下载地址| |-|-|-|-| -|en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|-|[预测模型]() / [训练模型]() / [slim模型]()| -|en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|2.0M|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)| +|en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|0.9M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)| +|en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|2.0M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)| #### 3. 多语言识别模型(更多语言持续更新中...) -|模型名称|模型简介|预测模型大小|下载地址| +|模型名称|模型简介|推理模型大小|下载地址| |-|-|-|-| -|-|法文识别|-|[预测模型]() / [训练模型]()| -|-|德文识别|-|[预测模型]() / [训练模型]()| -|-|韩文识别|-|[预测模型]() / [训练模型]()| -|-|日文识别|-|[预测模型]() / [训练模型]()| +| french_ppocr_mobile_v1.1_rec |法文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)| +| german_ppocr_mobile_v1.1_rec |德文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)| +| korean_ppocr_mobile_v1.1_rec |韩文识别|3.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)| +| japan_ppocr_mobile_v1.1_rec |日文识别|3.7M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)| + ### 三、文本方向分类模型 -|模型名称|模型简介|预测模型大小|下载地址| +|模型名称|模型简介|推理模型大小|下载地址| |-|-|-|-| -|ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|-|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型]()| -|ch_ppocr_mobile_v1.1_cls|原始模型|850kb|[预测模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)| +|ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|0.5M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型]()| +|ch_ppocr_mobile_v1.1_cls|原始模型|850kb|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)| ## OCR模型列表(V1.0,7月16日更新) |模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| |-|-|-|-|-| -|chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) -|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) +|chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|通用中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index 1920be56d1a05bb2f7ade944fd225e690fb484a4..c8955f7fe1c7022cf68155be330fad307c68fe43 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -44,6 +44,13 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt ``` +PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例: + +``` +# 将官网下载的标签文件转换为 rec_gt_label.txt +python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt" +``` + 最终训练集应有如下文件结构: ``` |-train_data @@ -201,7 +208,19 @@ Optimizer: ``` **注意,预测/评估时的配置文件请务必与训练一致。** +- 小语种 + +PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有: + +| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | +| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 | +| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | +| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | +| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | +| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | +多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体和测试数据可以在[百度网盘]()上下载。 ### 评估 diff --git a/doc/doc_ch/tree.md b/doc/doc_ch/tree.md new file mode 100644 index 0000000000000000000000000000000000000000..f730d8f01fae467f49a03d68d931eb4fda526626 --- /dev/null +++ b/doc/doc_ch/tree.md @@ -0,0 +1,208 @@ +# 整体目录结构 + +PaddleOCR 的整体目录结构介绍如下: + +``` +PaddleOCR +├── configs // 配置文件,可通过yml文件选择模型结构并修改超参 +│ ├── cls // 方向分类器相关配置文件 +│ │ ├── cls_mv3.yml // 训练配置相关,包括骨干网络、head、loss、优化器 +│ │ └── cls_reader.yml // 数据读取相关,数据读取方式、数据存储路径 +│ ├── det // 检测相关配置文件 +│ │ ├── det_db_icdar15_reader.yml // 数据读取 +│ │ ├── det_mv3_db.yml // 训练配置 +│ │ ... +│ └── rec // 识别相关配置文件 +│ ├── rec_benchmark_reader.yml // LMDB 格式数据读取相关 +│ ├── rec_chinese_common_train.yml // 通用中文训练配置 +│ ├── rec_icdar15_reader.yml // simple 数据读取相关,包括数据读取函数、数据路径、标签文件 +│ ... +├── deploy // 部署相关 +│ ├── android_demo // android_demo +│ │ ... +│ ├── cpp_infer // C++ infer +│ │ ├── CMakeLists.txt // Cmake 文件 +│ │ ├── docs // 说明文档 +│ │ │ └── windows_vs2019_build.md +│ │ ├── include // 头文件 +│ │ │ ├── clipper.h // clipper 库 +│ │ │ ├── config.h // 预测配置 +│ │ │ ├── ocr_cls.h // 方向分类器 +│ │ │ ├── ocr_det.h // 文字检测 +│ │ │ ├── ocr_rec.h // 文字识别 +│ │ │ ├── postprocess_op.h // 检测后处理 +│ │ │ ├── preprocess_op.h // 检测预处理 +│ │ │ └── utility.h // 工具 +│ │ ├── readme.md // 说明文档 +│ │ ├── ... +│ │ ├── src // 源文件 +│ │ │ ├── clipper.cpp +│ │ │ ├── config.cpp +│ │ │ ├── main.cpp +│ │ │ ├── ocr_cls.cpp +│ │ │ ├── ocr_det.cpp +│ │ │ ├── ocr_rec.cpp +│ │ │ ├── postprocess_op.cpp +│ │ │ ├── preprocess_op.cpp +│ │ │ └── utility.cpp +│ │ └── tools // 编译、执行脚本 +│ │ ├── build.sh // 编译脚本 +│ │ ├── config.txt // 配置文件 +│ │ └── run.sh // 测试启动脚本 +│ ├── docker +│ │ └── hubserving +│ │ ├── cpu +│ │ │ └── Dockerfile +│ │ ├── gpu +│ │ │ └── Dockerfile +│ │ ├── README_cn.md +│ │ ├── README.md +│ │ └── sample_request.txt +│ ├── hubserving // hubserving +│ │ ├── ocr_det // 文字检测 +│ │ │ ├── config.json // serving 配置 +│ │ │ ├── __init__.py +│ │ │ ├── module.py // 预测模型 +│ │ │ └── params.py // 预测参数 +│ │ ├── ocr_rec // 文字识别 +│ │ │ ├── config.json +│ │ │ ├── __init__.py +│ │ │ ├── module.py +│ │ │ └── params.py +│ │ └── ocr_system // 系统预测 +│ │ ├── config.json +│ │ ├── __init__.py +│ │ ├── module.py +│ │ └── params.py +│ ├── imgs // 预测图片 +│ │ ├── cpp_infer_pred_12.png +│ │ └── demo.png +│ ├── ios_demo // ios demo +│ │ ... +│ ├── lite // lite 部署 +│ │ ├── cls_process.cc // 方向分类器数据处理 +│ │ ├── cls_process.h +│ │ ├── config.txt // 检测配置参数 +│ │ ├── crnn_process.cc // crnn数据处理 +│ │ ├── crnn_process.h +│ │ ├── db_post_process.cc // db数据处理 +│ │ ├── db_post_process.h +│ │ ├── Makefile // 编译文件 +│ │ ├── ocr_db_crnn.cc // 串联预测 +│ │ ├── prepare.sh // 数据准备 +│ │ ├── readme.md // 说明文档 +│ │ ... +│ ├── pdserving // pdserving 部署 +│ │ ├── det_local_server.py // 检测 快速版,部署方便预测速度快 +│ │ ├── det_web_server.py // 检测 完整版,稳定性高分布式部署 +│ │ ├── ocr_local_server.py // 检测+识别 快速版 +│ │ ├── ocr_web_client.py // 客户端 +│ │ ├── ocr_web_server.py // 检测+识别 完整版 +│ │ ├── readme.md // 说明文档 +│ │ ├── rec_local_server.py // 识别 快速版 +│ │ └── rec_web_server.py // 识别 完整版 +│ └── slim +│ └── quantization // 量化相关 +│ ├── export_model.py // 导出模型 +│ ├── quant.py // 量化 +│ └── README.md // 说明文档 +├── doc // 文档教程 +│ ... +├── paddleocr.py +├── ppocr // 网络核心代码 +│ ├── data // 数据处理 +│ │ ├── cls // 方向分类器 +│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch +│ │ │ └── randaugment.py // 随机数据增广操作 +│ │ ├── det // 检测 +│ │ │ ├── data_augment.py // 数据增广操作 +│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch +│ │ │ ├── db_process.py // db 数据处理 +│ │ │ ├── east_process.py // east 数据处理 +│ │ │ ├── make_border_map.py // 生成边界图 +│ │ │ ├── make_shrink_map.py // 生成收缩图 +│ │ │ ├── random_crop_data.py // 随机切割 +│ │ │ └── sast_process.py // sast 数据处理 +│ │ ├── reader_main.py // 数据读取器主函数 +│ │ └── rec // 识别 +│ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,包含 LMDB_Reader 和 Simple_Reader +│ │ └── img_tools.py // 数据处理相关,包括数据归一化、扰动 +│ ├── __init__.py +│ ├── modeling // 组网相关 +│ │ ├── architectures // 模型架构,定义模型所需的各个模块 +│ │ │ ├── cls_model.py // 方向分类器 +│ │ │ ├── det_model.py // 检测 +│ │ │ └── rec_model.py // 识别 +│ │ ├── backbones // 骨干网络 +│ │ │ ├── det_mobilenet_v3.py // 检测 mobilenet_v3 +│ │ │ ├── det_resnet_vd.py +│ │ │ ├── det_resnet_vd_sast.py +│ │ │ ├── rec_mobilenet_v3.py // 识别 mobilenet_v3 +│ │ │ ├── rec_resnet_fpn.py +│ │ │ └── rec_resnet_vd.py +│ │ ├── common_functions.py // 公共函数 +│ │ ├── heads // 头函数 +│ │ │ ├── cls_head.py // 分类头 +│ │ │ ├── det_db_head.py // db 检测头 +│ │ │ ├── det_east_head.py // east 检测头 +│ │ │ ├── det_sast_head.py // sast 检测头 +│ │ │ ├── rec_attention_head.py // 识别 attention +│ │ │ ├── rec_ctc_head.py // 识别 ctc +│ │ │ ├── rec_seq_encoder.py // 识别 序列编码 +│ │ │ ├── rec_srn_all_head.py // 识别 srn 相关 +│ │ │ └── self_attention // srn attention +│ │ │ └── model.py +│ │ ├── losses // 损失函数 +│ │ │ ├── cls_loss.py // 方向分类器损失函数 +│ │ │ ├── det_basic_loss.py // 检测基础loss +│ │ │ ├── det_db_loss.py // DB loss +│ │ │ ├── det_east_loss.py // EAST loss +│ │ │ ├── det_sast_loss.py // SAST loss +│ │ │ ├── rec_attention_loss.py // attention loss +│ │ │ ├── rec_ctc_loss.py // ctc loss +│ │ │ └── rec_srn_loss.py // srn loss +│ │ └── stns // 空间变换网络 +│ │ └── tps.py // TPS 变换 +│ ├── optimizer.py // 优化器 +│ ├── postprocess // 后处理 +│ │ ├── db_postprocess.py // DB 后处理 +│ │ ├── east_postprocess.py // East 后处理 +│ │ ├── lanms // lanms 相关 +│ │ │ ... +│ │ ├── locality_aware_nms.py // nms +│ │ └── sast_postprocess.py // sast 后处理 +│ └── utils // 工具 +│ ├── character.py // 字符处理,包括对文本的编码和解码,计算预测准确率 +│ ├── check.py // 参数加载检查 +│ ├── ic15_dict.txt // 英文数字字典,区分大小写 +│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型 +│ ├── save_load.py // 模型保存和加载函数 +│ ├── stats.py // 统计 +│ └── utility.py // 工具函数,包含输入参数是否合法等相关检查工具 +├── README_en.md // 说明文档 +├── README.md +├── requirments.txt // 安装依赖 +├── setup.py // whl包打包脚本 +└── tools // 启动工具 + ├── eval.py // 评估函数 + ├── eval_utils // 评估工具 + │ ├── eval_cls_utils.py // 分类相关 + │ ├── eval_det_iou.py // 检测 iou 相关 + │ ├── eval_det_utils.py // 检测相关 + │ ├── eval_rec_utils.py // 识别相关 + │ └── __init__.py + ├── export_model.py // 导出 infer 模型 + ├── infer // 基于预测引擎预测 + │ ├── predict_cls.py + │ ├── predict_det.py + │ ├── predict_rec.py + │ ├── predict_system.py + │ └── utility.py + ├── infer_cls.py // 基于训练引擎 预测分类 + ├── infer_det.py // 基于训练引擎 预测检测 + ├── infer_rec.py // 基于训练引擎 预测识别 + ├── program.py // 整体流程 + ├── test_hubserving.py + └── train.py // 启动训练 + +``` diff --git a/doc/doc_ch/tricks.md b/doc/doc_ch/tricks.md deleted file mode 100644 index b6852bc95aa3a8eefe9597abc0e173f4515fa358..0000000000000000000000000000000000000000 --- a/doc/doc_ch/tricks.md +++ /dev/null @@ -1,68 +0,0 @@ -## 中文OCR训练预测技巧 -这里整理了一些中文OCR训练预测技巧,持续更新中,欢迎各位小伙伴贡献OCR炼丹秘籍~ -- [更换骨干网络](#更换骨干网络) -- [中文长文本识别](#中文长文本识别) -- [空格识别](#空格识别) - - -#### 1、更换骨干网络 -- **问题描述** - - 目前PaddleOCR中使用的骨干网络有ResNet_vd系列和MobileNetV3系列,更换骨干网络是否有助于效果提升?更换时需要注意什么? - -- **炼丹建议** - - - 无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。 - - 文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。 - - 文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。 - - -#### 2、中文长文本识别 -- **问题描述** - - 中文识别模型训练时分辨率最大是[3,32,320],如果待识别的文本图像太长,如下图所示,该如何适配? - -
- -
- -- **炼丹建议** - - 在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。[参考代码如下](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py): - - ``` - def resize_norm_img(self, img, max_wh_ratio): - imgC, imgH, imgW = self.rec_image_shape - assert imgC == img.shape[2] - if self.character_type == "ch": - imgW = int((32 * max_wh_ratio)) - h, w = img.shape[:2] - ratio = w / float(h) - if math.ceil(imgH * ratio) > imgW: - resized_w = imgW - else: - resized_w = int(math.ceil(imgH * ratio)) - resized_image = cv2.resize(img, (resized_w, imgH)) - resized_image = resized_image.astype('float32') - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im[:, :, 0:resized_w] = resized_image - return padding_im - ``` - - -#### 3、空格识别 -- **问题描述** - - 如下图所示,对于中英文混合场景,为了便于阅读和使用识别结果,往往需要将单词之间的空格识别出来,这种情况如何适配? - -
- -
- -- **炼丹建议** - - 空格识别可以考虑以下两种方案:(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。PaddleOCR目前采用的是第二种方案。 - \ No newline at end of file diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 23a47df580da065af0ab62aca2c50e507f564f05..55442c8dfcaee815d52ef73718aeb0cacf7a4b4a 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -1,4 +1,6 @@ # 更新 +- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列系列中英文ocr模型,效果媲美商业效果。[模型下载](./models_list.md) +- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./FAQ.md) - 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md) - 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) - 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294) diff --git a/doc/doc_ch/visualization.md b/doc/doc_ch/visualization.md index 5a711fe93cfd7959731a5ec73cc74120b175347a..fca075914feb6afd159c5ea6355d3c7bb6842233 100644 --- a/doc/doc_ch/visualization.md +++ b/doc/doc_ch/visualization.md @@ -1,45 +1,47 @@ # 效果展示 -- [超轻量级中文OCR效果展示](#超轻量级中文OCR) -- [通用中文OCR效果展示](#通用中文OCR) -- [支持空格的中文OCR效果展示](#支持空格的中文OCR) +- PP-OCR 1.1系列模型效果 + - [通用ppocr_server_1.1效果展示](#通用ppocr_server_1.1效果展示) + - [通用ppocr_mobile_1.1效果展示(待补充)]() +- PP-OCR 1.0系列模型效果 + - [超轻量ppocr_mobile_1.0效果展示](#超轻量ppocr_mobile_1.0效果展示) + - [通用ppocr_server_1.0效果展示](#通用ppocr_server_1.0效果展示) - -## 超轻量级中文OCR效果展示 + +## 通用ppocr_server_1.1效果展示
- + + + + + +
-
- -
-
- -
+ + +## 超轻量ppocr_mobile_1.0效果展示
- +
- +
- +
-
- -
- -## 通用中文OCR效果展示 + +## 通用ppocr_server_1.0效果展示
@@ -52,16 +54,3 @@
- - -## 支持空格的中文OCR效果展示 - -### 轻量级模型 -
- -
- -### 通用模型 -
- -
diff --git a/doc/doc_ch/whl.md b/doc/doc_ch/whl.md index 657f9837a768f6753b68b5e937134e10440e382d..46796ce64a60f12db9bbfbdd7b16ff77238c1831 100644 --- a/doc/doc_ch/whl.md +++ b/doc/doc_ch/whl.md @@ -19,7 +19,9 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本 * 检测+分类+识别全流程 ```python from paddleocr import PaddleOCR, draw_ocr -ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory +# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换 +# 参数依次为`zh`, `en`, `french`, `german`, `korean`, `japan`。 +ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory img_path = 'PaddleOCR/doc/imgs/11.jpg' result = ocr.ocr(img_path, cls=True) for line in result: diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 9f37ca8d24c75ba80a143233cdc0a3321fee6a4f..401d7a9ad479716a6d6694ca1f432a2c934def88 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -73,7 +73,7 @@ You can also use `-o` to change the training parameters without modifying the ym python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 ``` -#### load trained model and conntinue training +#### load trained model and continue training If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. For example: diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md new file mode 100644 index 0000000000000000000000000000000000000000..8d1192bafb7a80a882f471489f8fdeb53e2abc67 --- /dev/null +++ b/doc/doc_en/models_list_en.md @@ -0,0 +1,70 @@ +## OCR model list(V1.1, updated on 9.22) + +- [1. Text Detection Model](#Detection) +- [2. Text Recognition Model](#Recognition) + - [Chinese Recognition Model](#Chinese) + - [English Recognition Model](#English) + - [Multilingual Recognition Model](#Multilingual) +- [3. Text Angle Classification Model](#Angle) + +The downloadable models provided by PaddleOCR include `inference model`, `trained model`, `pre-trained model` and `slim model`. The differences between the models are as follows: + +|model type|model format|description| +|-|-|-| +|inference model|model、params|Used for reasoning based on Python prediction engine. [detail](./inference_en.md)| +|trained model / pre-trained model|\*.pdmodel、\*.pdopt、\*.pdparams|The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.| +|slim model|-|Generally used for Lite deployment| + + + +### 1. Text Detection Model +|model name|description|model size|download| +|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)| +|ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)| +|ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)| + + + +### 2. Text Recognition Model + + +#### Chinese Recognition Model +|model name|description|model size|download| +|-|-|-|-| +|ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)| +|ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)| +|ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)| + +**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. + + +#### English Recognition Model +|model name|description|model size|download| +|-|-|-|-| +|en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)| +|en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)| + + +#### Multilingual Recognition Model(Updating...) +|model name|description|model size|download| +|-|-|-|-| +| french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)| +| german_ppocr_mobile_v1.1_rec |German model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)| +| korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)| +| japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)| + + + +### 3. Text Angle Classification Model +|model name|description|model size|download| +|-|-|-|-| +|ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model]()| +|ch_ppocr_mobile_v1.1_cls|Original model|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)| + + +## OCR model list(V1.0, updated on 7.16) +|model name|description|detection model|recognition model|recognition model supporting space recognition| +|-|-|-|-|-| +|chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar) +|chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar) diff --git a/doc/doc_en/whl_en.md b/doc/doc_en/whl_en.md index b62e5454e82a9bf4f8242b94b0d37544d3796c13..4049d9dcb2d52eb5f610d5f02017a9d2d4f14f47 100644 --- a/doc/doc_en/whl_en.md +++ b/doc/doc_en/whl_en.md @@ -17,12 +17,16 @@ pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of padd * detection classification and recognition ```python from paddleocr import PaddleOCR,draw_ocr +# Paddleocr supports Chinese, English, French, German, Korean and Japanese. +# You can set the parameter `lang` as `zh`, `en`, `french`, `german`, `korean`, `japan` +# to switch the language model in order. ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg' result = ocr.ocr(img_path, cls=True) for line in result: print(line) + # draw result from PIL import Image image = Image.open(img_path).convert('RGB') diff --git a/doc/french.ttf b/doc/french.ttf new file mode 100644 index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 Binary files /dev/null and b/doc/french.ttf differ diff --git a/doc/german.ttf b/doc/german.ttf new file mode 100644 index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 Binary files /dev/null and b/doc/german.ttf differ diff --git a/doc/imgs_results/1101.jpg b/doc/imgs_results/1101.jpg new file mode 100644 index 0000000000000000000000000000000000000000..fa8d809a9b133ca09e4265355493e5c60e311e44 Binary files /dev/null and b/doc/imgs_results/1101.jpg differ diff --git a/doc/imgs_results/1102.jpg b/doc/imgs_results/1102.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6988b12c4b836e88b67897a7b7141e12e236e7c0 Binary files /dev/null and b/doc/imgs_results/1102.jpg differ diff --git a/doc/imgs_results/1103.jpg b/doc/imgs_results/1103.jpg new file mode 100644 index 0000000000000000000000000000000000000000..3437f60b8e587b0fda9c88aa37c001a68ace59b4 Binary files /dev/null and b/doc/imgs_results/1103.jpg differ diff --git a/doc/imgs_results/1104.jpg b/doc/imgs_results/1104.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9297be0787ad6cc89c43acfcd1abd010c512c45b Binary files /dev/null and b/doc/imgs_results/1104.jpg differ diff --git a/doc/imgs_results/1105.jpg b/doc/imgs_results/1105.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6280e5eec8c05125bcde2a171d767a3fc3f3ea4d Binary files /dev/null and b/doc/imgs_results/1105.jpg differ diff --git a/doc/imgs_results/1110.jpg b/doc/imgs_results/1110.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ff004c864047ecb1cefcd02e0eea561c415a3a7b Binary files /dev/null and b/doc/imgs_results/1110.jpg differ diff --git a/doc/imgs_results/1112.jpg b/doc/imgs_results/1112.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c2d87fe5936abf2032f125940b5e99ec8d030da7 Binary files /dev/null and b/doc/imgs_results/1112.jpg differ diff --git a/doc/imgs_words/french/1.jpg b/doc/imgs_words/french/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..077ca28e70b74ed07fa637011c80219aecc448d5 Binary files /dev/null and b/doc/imgs_words/french/1.jpg differ diff --git a/doc/imgs_words/french/2.jpg b/doc/imgs_words/french/2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..38a73caa621710a7eb7378603e0152ba9c14dd41 Binary files /dev/null and b/doc/imgs_words/french/2.jpg differ diff --git a/doc/imgs_words/german/1.jpg b/doc/imgs_words/german/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d26ec9ed14de65c2d27e37693ff0da133e774b94 Binary files /dev/null and b/doc/imgs_words/german/1.jpg differ diff --git a/doc/imgs_words/japan/1.jpg b/doc/imgs_words/japan/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..684879749764a1b6063da32d7910bff911e855f4 Binary files /dev/null and b/doc/imgs_words/japan/1.jpg differ diff --git a/doc/imgs_words/korean/1.jpg b/doc/imgs_words/korean/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..48a89389ae880783a39a13e9b06a861b88948fba Binary files /dev/null and b/doc/imgs_words/korean/1.jpg differ diff --git a/doc/imgs_words/korean/2.jpg b/doc/imgs_words/korean/2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b24f28914d574be44e147943d906f8634f149ed5 Binary files /dev/null and b/doc/imgs_words/korean/2.jpg differ diff --git a/doc/japan.ttc b/doc/japan.ttc new file mode 100644 index 0000000000000000000000000000000000000000..ad68243b968fc87b207928594c585039859b75a9 Binary files /dev/null and b/doc/japan.ttc differ diff --git a/doc/korean.ttf b/doc/korean.ttf new file mode 100644 index 0000000000000000000000000000000000000000..e638ce37f67ff1cd9babf73387786eaeb5c52968 Binary files /dev/null and b/doc/korean.ttf differ diff --git a/paddleocr.py b/paddleocr.py index 55ca87ac93996311d2760b9e2b63530acc7e5092..7e9b2402ad792b4d690b1147f042203df46872a5 100644 --- a/paddleocr.py +++ b/paddleocr.py @@ -46,6 +46,26 @@ model_urls = { 'url': 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar', 'dict_path': './ppocr/utils/ic15_dict.txt' + }, + 'french': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/french_dict.txt' + }, + 'german': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/german_dict.txt' + }, + 'korean': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/korean_dict.txt' + }, + 'japan': { + 'url': + 'https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar', + 'dict_path': './ppocr/utils/japan_dict.txt' } }, 'cls': @@ -165,8 +185,9 @@ class PaddleOCR(predict_system.TextSystem): postprocess_params.__dict__.update(**kwargs) self.use_angle_cls = postprocess_params.use_angle_cls lang = postprocess_params.lang - assert lang in model_urls['rec'], 'param lang must in {}'.format( - model_urls['rec'].keys()) + assert lang in model_urls[ + 'rec'], 'param lang must in {}, but got {}'.format( + model_urls['rec'].keys(), lang) if postprocess_params.rec_char_dict_path is None: postprocess_params.rec_char_dict_path = model_urls['rec'][lang][ 'dict_path'] diff --git a/ppocr/utils/character.py b/ppocr/utils/character.py index b4b2021e02c9905623fd9fad5c9673543569c1c2..97237cfa71a3d3ae0684ecbefbb2511f09bcd3a2 100755 --- a/ppocr/utils/character.py +++ b/ppocr/utils/character.py @@ -29,7 +29,9 @@ class CharacterOps(object): if self.character_type == "en": self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) - elif self.character_type == "ch": + elif self.character_type in [ + "ch", 'japan', 'korean', 'french', 'german' + ]: character_dict_path = config['character_dict_path'] add_space = False if 'use_space_char' in config: @@ -166,7 +168,7 @@ def cal_predicts_accuracy_srn(char_ops, cur_label = [] cur_pred = [] for j in range(max_text_len): - if labels[j + i * max_text_len] != int(char_num-1): #0 + if labels[j + i * max_text_len] != int(char_num - 1): #0 cur_label.append(labels[j + i * max_text_len][0]) else: break @@ -178,7 +180,8 @@ def cal_predicts_accuracy_srn(char_ops, elif j == len(cur_label) and j == max_text_len: acc_num += 1 break - elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(char_num-1): + elif j == len(cur_label) and preds[j + i * max_text_len][0] == int( + char_num - 1): acc_num += 1 break acc = acc_num * 1.0 / img_num diff --git a/ppocr/utils/french_dict.txt b/ppocr/utils/french_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7cd8ec503b3c70e03b72795761a858bb9a0d34d --- /dev/null +++ b/ppocr/utils/french_dict.txt @@ -0,0 +1,118 @@ +! +" +% +& +' +( +) ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +] +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +« +³ +µ +º +» +À +Á +Â +Å +É +Ê +Î +Ö +ß +à +á +â +ä +å +æ +ç +è +é +ê +ë +í +î +ï +ñ +ò +ó +ô +ö +ø +ù +ú +û +ü + diff --git a/ppocr/utils/german_dict.txt b/ppocr/utils/german_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..30c4d4218e8a77386db912e24117b1f197466e83 --- /dev/null +++ b/ppocr/utils/german_dict.txt @@ -0,0 +1,131 @@ +! +" +$ +% +& +' +( +) ++ +, +- +. +/ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +> +? +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +] +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +£ +§ +­ +² +´ +µ +· +º +¼ +½ +¿ +À +Á +Ä +Å +Ç +É +Í +Ï +Ô +Ö +Ø +Ù +Ü +ß +à +á +â +ã +ä +å +æ +ç +è +é +ê +ë +í +ï +ñ +ò +ó +ô +ö +ø +ù +ú +û +ü + diff --git a/ppocr/utils/ic15_dict.txt b/ppocr/utils/ic15_dict.txt index 71043689051fb5a2da516b2e005d1d9b0fdecfb3..6fbd99f46acca8391a5e86ae546c637399204506 100644 --- a/ppocr/utils/ic15_dict.txt +++ b/ppocr/utils/ic15_dict.txt @@ -34,3 +34,30 @@ w x y z +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z + diff --git a/ppocr/utils/japan_dict.txt b/ppocr/utils/japan_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..ddcc89f7c3ef011274862b1d573d079b76129ff2 --- /dev/null +++ b/ppocr/utils/japan_dict.txt @@ -0,0 +1,4399 @@ +誰 +が +一 +番 +に +着 +く +か +私 +は +分 +り +ま +せ +ん +。 +多 +の +動 +物 +人 +間 +よ +っ +て +滅 +ぼ +さ +れ +た +テ +ニ +ス +部 +員 +で +す +エ +ミ +幸 +そ +う +見 +え +こ +事 +実 +を +心 +留 +め +お +い +下 +彼 +女 +ち +世 +話 +し +る +達 +国 +際 +な +と +思 +約 +束 +破 +べ +き +あ +道 +路 +横 +切 +車 +注 +意 +生 +甲 +斐 +父 +外 +へ +行 +承 +知 +弁 +当 +食 +ょ +小 +説 +1 +つ +も +読 +ど +ら +、 +母 +親 +少 +似 +卑 +屈 +奴 +曇 +日 +音 +楽 +好 +本 +ず +仕 +引 +受 +け +サ +ッ +カ +ー +以 +前 +今 +気 +混 +ば +問 +題 +2 +時 +待 +ボ +ブ +友 +だ +ぞ +午 +後 +家 +来 +子 +供 +ろ +申 +告 +何 +夕 +済 +み +雪 +降 +陰 +口 +言 +的 +年 +馬 +鹿 +ね +大 +変 +忙 +危 +険 +遅 +刻 +度 +学 +校 +入 +電 +々 +酒 +飲 +む +顔 +奇 +妙 +聞 +自 +慢 +声 +ク +ラ +男 +数 +3 +0 +歴 +史 +試 +験 +計 +画 +反 +対 +づ +先 +渡 +連 +恐 +羽 +振 +ロ +ン +ド +合 +由 +舞 +靴 +向 +手 +紙 +週 +休 +釣 +ひ +わ +? +頼 +ア +メ +リ +婦 +結 +婚 +猫 +木 +登 +ぶ +ジ +ョ +駅 +方 +歩 +怒 +必 +要 +折 +返 +ケ +キ +召 +上 +成 +功 +努 +力 +選 +び +屋 +坂 +東 +京 +育 +月 +曜 +終 +買 +戦 +争 +起 +目 +覚 +病 +院 +元 +無 +セ +フ +阪 +や +格 +祝 +ゆ +十 +中 +八 +九 +勘 +定 +我 +ほ +叫 +耳 +通 +書 +帽 +5 +ル +朝 +君 +兄 +交 +故 +亡 +単 +純 +列 +止 +老 +全 +新 +忠 +尊 +重 +解 +決 +欲 +ざ +僕 +浮 +件 +裁 +持 +イ +ギ +味 +夢 +ぎ +続 +ぜ +直 +接 +考 +頭 +別 +住 +辞 +役 +立 +明 +違 +指 +摘 +勇 +答 +適 +冷 +売 +旅 +疲 +辺 +鄙 +村 +訪 +水 +回 +両 +昨 +映 +空 +太 +陽 +輝 +正 +賛 +町 +案 +内 +助 +会 +次 +延 +期 +チ +ム +最 +価 +値 +タ +シ +歌 +他 +劣 +勉 +強 +ェ +喜 +伝 +職 +業 +関 +係 +誉 +犬 +近 +座 +転 +使 +妹 +建 +奥 +損 +野 +球 +緒 +繊 +細 +出 +教 +マ +駄 +石 +油 +不 +足 +震 +ト +同 +じ +ガ +ツ +発 +表 +然 +失 +敗 +滞 +在 +バ +コ +吸 +平 +和 +泳 +寒 +秋 +社 +台 +短 +死 +情 +報 +民 +政 +府 +作 +帰 +宅 +順 +調 +真 +仮 +命 +用 +箱 +階 +運 +長 +ご +腕 +放 +乗 +利 +程 +腹 +悪 +念 +怖 +形 +抗 +議 +ゲ +面 +白 +記 +憶 +姉 +都 +劇 +残 +早 +信 +懸 +ャ +品 +パ +初 +開 +理 +誤 +急 +ぐ +簡 +易 +英 +語 +娘 +寝 +赤 +ゃ +願 +障 +金 +支 +払 +冒 +論 +ぱ +確 +ヒ +産 +火 +散 +守 +有 +名 +医 +者 +毎 +渋 +レ +ビ +消 +服 +宿 +署 +齢 +ポ +突 +げ +良 +徒 +貧 +戻 +牛 +二 +夫 +脱 +暑 +湖 +深 +普 +段 +謝 +優 +甘 +ソ +非 +常 +場 +警 +察 +呼 +誘 +惑 +坊 +字 +可 +能 +料 +涙 +落 +量 +妻 +愛 +界 +温 +授 +船 +酔 +万 +仲 +付 +捜 +静 +若 +ダ +招 +追 +夜 +雨 +述 +山 +獲 +訳 +居 +異 +熱 +息 +点 +主 +質 +始 +花 +飾 +幼 +寂 +興 +プ +触 +活 +オ +青 +春 +尋 +盲 +燃 +傷 +科 +晩 +騒 +機 +限 +予 +想 +望 +代 +薬 +効 +「 +風 +共 +去 +ぬ +」 +孤 +独 +店 +径 +ュ +態 +銀 +認 +至 +驚 +美 +添 +光 +ピ +ノ +伴 +奏 +嬉 +7 +除 +席 +吹 +創 +設 +市 +! +髪 +悔 +秘 +密 +高 +地 +丸 +悲 +極 +暇 +葉 +速 +走 +三 +企 +天 +茶 +働 +演 +8 +泣 +公 +園 +原 +因 +勝 +標 +進 +ヘ +郵 +便 +局 +農 +遊 +到 +笑 +冬 +ィ +参 +加 +版 +暗 +絶 +誕 +6 +歳 +鍵 +絵 +栄 +将 +張 +過 +弾 +ふ +等 +具 +雇 +賞 +得 +絞 +取 +健 +康 +身 +頂 +客 +迎 +所 +夏 +海 +草 +ヨ +配 +古 +鳴 +庭 +探 +偶 +眠 +昼 +禁 +煙 +提 +閉 +飛 +魚 +捕 +断 +武 +装 +背 +街 +丘 +ホ +泊 +快 +求 +怠 +惰 +グ +欠 +片 +視 +特 +難 +締 +半 +壁 +困 +冗 +談 +族 +神 +戸 +遣 +状 +握 +第 +ザ +師 +倍 +級 +逆 +為 +化 +恩 +諸 +貸 +卒 +諦 +互 +依 +存 +円 +嫌 +紳 +士 +凍 +誇 +例 +迷 +否 +準 +備 +専 +晴 +満 +邪 +腰 +痛 +菜 +商 +離 +疑 +資 +送 +忘 +暖 +素 +敵 +窓 +色 +写 +途 +文 +防 +識 +4 +側 +叱 +裕 +福 +果 +移 +停 +百 +魔 +性 +郎 +感 +耐 +治 +恋 +敬 +様 +診 +橋 +安 +慰 +貴 +眺 +討 +処 +逃 +符 +許 +狂 +審 +軽 +率 +恵 +規 +則 +猶 +借 +歯 +録 +責 +任 +絡 +爆 +撃 +趣 +替 +芝 +捨 +抜 +費 +ペ +黙 +床 +個 +裏 +暮 +叔 +ベ +尽 +迫 +材 +田 +舎 +没 +侮 +辱 +施 +歓 +援 +滑 +恥 +飯 +置 +徹 +廃 +机 +退 +拝 +詰 +ぁ +蝙 +蝠 +従 +王 +札 +幌 +氏 +随 +縁 +整 +頓 +ぽ +概 +六 +川 +岸 +博 +館 +図 +慣 +儀 +像 +緻 +昇 +土 +伏 +悩 +敷 +包 +囲 +善 +肉 +担 +偽 +呂 +盛 +噂 +希 +序 +焼 +狭 +掌 +苦 +避 +積 +港 +復 +荷 +御 +嘘 +徳 +ゴ +再 +粧 +卵 +繰 +習 +畑 +匹 +是 +星 +景 +余 +永 +久 +デ +盗 +ヤ +巨 +遠 +械 +愚 +洗 +濯 +珍 +溶 +込 +履 +昔 +千 +泥 +棒 +号 +乏 +偉 +継 +ウ +崩 +勤 +務 +術 +克 +聴 +権 +惚 +弟 +体 +飼 +軒 +犯 +課 +修 +四 +鮮 +汽 +鳥 +現 +旧 +塔 +冊 +五 +塩 +経 +批 +判 +簿 +棚 +才 +研 +究 +ぷ +類 +覆 +祈 +往 +妨 +柄 +財 +捧 +衛 +距 +闇 +著 +区 +隣 +相 +比 +頃 +就 +矛 +盾 +広 +掛 +典 +型 +制 +憎 +殺 +モ +嘆 +雄 +鼓 +負 +右 +窮 +法 +掃 +繕 +篠 +流 +章 +看 +杯 +植 +評 +枚 +叶 +抑 +滴 +斎 +森 +額 +蛮 +ナ +攻 +雅 +米 +雑 +編 +換 +構 +詳 +帳 +厳 +ワ +預 +室 +更 +銘 +与 +濃 +臭 +布 +衆 +撮 +舌 +容 +貌 +乳 +喫 +固 +巣 +懇 +奈 +群 +集 +皆 +影 +響 +ネ +悟 +弱 +ハ +嗅 +飽 +完 +了 +浴 +昆 +虫 +ヌ +乱 +描 +俺 +首 +嵐 +給 +低 +派 +衝 +団 +投 +函 +礼 +島 +委 +官 +周 +鋭 +宝 +契 +採 +致 +漏 +翻 +洋 +恒 +保 +証 +筆 +潜 +ォ +枯 +打 +憩 +護 +尾 +埋 +紹 +介 +城 +谷 +沈 +季 +節 +巻 +倒 +巡 +姿 +踏 +黒 +己 +沿 +ぇ +懐 +扮 +詩 +労 +左 +底 +占 +差 +架 +壊 +ゼ +欺 +検 +造 +寄 +庫 +眼 +鏡 +慮 +郊 +購 +営 +駐 +血 +模 +ズ +収 +越 +板 +癌 +飢 +井 +罵 +忍 +増 +賢 +涼 +荒 +踊 +些 +並 +省 +銃 +州 +症 +麻 +雀 +濡 +般 +展 +覧 +紅 +統 +領 +ぴ +松 +江 +c +d +講 +義 +熟 +扶 +養 +属 +9 +鹸 +遇 +寿 +司 +憂 +乾 +唱 +割 +皿 +拭 +貯 +箇 +殴 +鉛 +狙 +蒸 +雲 +椅 +未 +練 +卓 +ぺ +淑 +壇 +憲 +末 +沙 +汰 +操 +匙 +抱 +候 +鼻 +ヶ +蔑 +毛 +勢 +償 +浜 +激 +倹 +製 +宛 +ゅ +翌 +稿 +鞄 +届 +慌 +扱 +式 +組 +瓶 +渉 +句 +技 +陸 +器 +河 +衰 +納 +律 +罰 +譲 +旨 +補 +傘 +贈 +請 +駆 +腐 +ァ +線 +丁 +骨 +筋 +伺 +丈 +祖 +孫 +犠 +牲 +遭 +肌 +綺 +麗 +魂 +種 +減 +唯 +婆 +推 +薦 +訓 +曲 +睡 +頑 +s +f +勧 +印 +刷 +錠 +励 +胆 +糧 +績 +排 +剣 +岳 +涯 +競 +精 +敏 +衣 +赦 +志 +位 +胸 +堅 +販 +査 +税 +壷 +暴 +露 +益 +敢 +撒 +喧 +嘩 +蹴 +沢 +妊 +娠 +芸 +航 +催 +射 +超 +改 +戒 +n +a +泉 +奪 +零 +咲 +隠 +遺 +憾 +漱 +肥 +輩 +房 +寺 +奨 +脚 +汚 +煩 +弥 +怪 +免 +氷 +灯 +総 +ユ +戚 +掘 +維 +釈 +拾 +凝 +漫 +兵 +痔 +馳 +粋 +微 +訴 +浅 +緩 +崖 +覗 +塞 +虚 +北 +湘 +南 +賭 +腎 +臓 +仰 +仙 +筈 +砂 +糖 +干 +唾 +観 +娯 +臆 +門 +宇 +宙 +複 +毒 +奮 +患 +撲 +控 +液 +貫 +禄 +辛 +郷 +稼 +餓 +痙 +攣 +秀 +澄 +遂 +挨 +拶 +慈 +富 +豪 +溺 +県 +緑 +籠 +刑 +根 +脅 +誌 +訂 +揺 +築 +罪 +喋 +陥 +姫 +髭 +剃 +害 +疎 +銭 +墓 +賦 +押 +穴 +淡 +噛 +賃 +導 +域 +肩 +尻 +伯 +牧 +傾 +基 +又 +咳 +邦 +貨 +豊 +挑 +偏 +溜 +傲 +樹 +含 +滝 +魅 +嫉 +妬 +脇 +謎 +磨 +括 +佐 +猛 +烈 +玄 +吉 +執 +応 +及 +拒 +顎 +鬚 +既 +狐 +浣 +腸 +隅 +拡 +吠 +璧 +ヴ +顧 +睦 +湯 +幾 +輪 +七 +絹 +湿 +疹 +池 +袋 +灰 +摂 +即 +紛 +刈 +況 +染 +矢 +聖 +塗 +伸 +浪 +岩 +餌 +戴 +鎖 +宣 +測 +工 +被 +象 +痩 +搭 +妥 +協 +汗 +救 +跳 +裂 +林 +檎 +棲 +帝 +潮 +侵 +略 +柔 +票 +蝶 +肯 +筒 +呆 +沼 +厚 +宗 +梨 +軍 +蔵 +較 +羨 +粛 +痢 +愉 +儲 +癒 +鬱 +幹 +掴 +鎮 +縫 +炎 +示 +諾 +寛 +虜 +瀬 +鉄 +祭 +醜 +菓 +項 +岡 +胎 +拠 +択 +網 +拳 +党 +繁 +熊 +爪 +慎 +墜 +穏 +募 +縦 +伊 +藤 +胃 +惜 +芽 +誠 +薄 +嫁 +譜 +寮 +薔 +薇 +賜 +1 +2 +l +i +y +潔 +充 +据 +舟 +遮 +寸 +猿 +・ +抵 +暢 +錆 +脈 +挙 +瞬 +萎 +聡 +埠 +琵 +琶 +黄 +策 +宜 +梅 +各 +匂 +清 +撥 +載 +境 +吐 +怯 +唸 +却 +拍 +端 +吻 +惨 +剤 +甥 +核 +緊 +香 +層 +系 +躍 +嬢 +縛 +酸 +t +〆 +鱗 +堂 +算 +貢 +献 +威 +監 +督 +針 +襲 +銅 +姪 +幽 +霊 +癖 +綾 +扉 +雹 +崎 +条 +療 +封 +癇 +癪 +揮 +碁 +瓜 +泰 +嘲 +錯 +凡 +碗 +豚 +哀 +児 +童 +虐 +蕩 +刺 +波 +貰 +凪 +炭 +嚢 +索 +圧 +均 +帯 +u +o +峠 +西 +騙 +肘 +砕 +黍 +革 +棄 +俳 +秩 +如 +宵 +竜 +姓 +噴 +閑 +幅 +虎 +塀 +堪 +鈴 +双 +照 +淋 +葬 +悠 +蝿 +鳩 +獄 +晒 +j +仏 +某 +享 +尿 +慶 +裸 +丹 +( +) +杖 +逮 +徴 +災 +〔 +〕 +酷 +角 +炉 +僚 +揚 +馴 +珠 +霧 +詞 +潟 +陣 +鍋 +拘 +焦 +h +k +蜜 +蜂 +穂 +湾 +弄 +跡 +麓 +蔭 +讐 +弊 +董 +〜 +綴 +ゾ +膳 +称 +痒 +倉 +怨 +掻 +蓄 +茨 +摩 +厄 +陳 +詫 +贔 +屓 +桃 +赴 +墟 +湧 +逢 +隻 +― +伎 +潰 +鯔 +鑑 +鯨 +炊 +腑 +獣 +勿 +禎 +沖 +縄 +蕾 +股 +娩 +枝 +殆 +氾 +濫 +乞 +恨 +豆 +禿 +釧 +扇 +誓 +躊 +躇 +徐 +貿 +雷 +鋳 +飴 +洞 +窟 +粗 +鎌 +鈍 +刊 +狼 +煎 +幻 +旗 +狩 +耕 +範 +掲 +源 +漢 +枕 +嬌 +莫 +券 +崇 +隔 +袈 +裟 +里 +暫 +虹 +櫛 +硬 +此 +縮 +m +兆 +轢 +帆 +這 +央 +俗 +瞼 +頻 +需 +餐 +琴 +羊 +令 +薫 +勃 +朽 +虻 +賑 +刀 +籍 +漂 +煽 +斉 +株 +褒 +膝 +, +C +D +叩 +鶏 +N +A +S +糸 +. +挟 + +胡 +椒 +玩 +祉 +" +0 +— +併 +蛾 +ゥ +郡 +` +' +・ +9 +6 +8 +3 +- +拿 +爵 +准 +幕 +5 +~ +副 +鞭 +7 +兼 +: +á +ň +宮 +廷 +磁 +4 +ó +菌 +卿 +皇 +峰 +% +貝 +軟 +, +把 +携 +/ +析 +ž +盤 +斑 +輸 +託 +隊 +蓋 +『 +』 +彩 +& +詠 +篇 +騎 +_ +晋 +釜 +尚 +欧 +紀 +管 +渓 +韓 +李 +栽 +培 +尉 +骸 +ă +ş +剖 +翼 +亜 +羅 +奉 +畔 +拓 +環 +礁 +枢 +斜 +漕 +艇 +稀 +臣 +勲 +棘 +艦 +盟 +粒 +闘 +å +戯 +∇ +柵 +醸 +礎 +旬 +聘 +矮 +棟 +碑 +殿 +億 +! +惧 +抽 +迭 +% +  +垂 +還 +澤 +輔 +粉 +齊 +秦 +砲 +屯 +織 +胞 +諮 +殊 +媒 +嫡 +綱 +搬 +該 +透 +禽 +弦 +瞭 +坦 +浸 +韻 +竪 +墳 +隷 +撤 +哲 +叙 +é +庶 +紡 +禍 +肺 +婉 +$ +沃 +鬼 +棋 +揃 +楊 +綿 +訟 +遁 +妄 +玉 +軌 +榴 +蘇 +臨 +疇 +披 +顕 +圏 +Ș +融 +擦 +Č +č +埃 +曖 +昧 +旋 +瞳 +謡 +衡 +槍 +茎 +唐 +轄 +郴 +捉 +覇 +嘉 +陵 +嘴 +蔓 +嘱 +閲 +征 +謄 +胚 +陶 +浦 +勅 +芻 +疾 +昏 +; +耗 +践 +禅 +襟 +曹 +瞑 +ș +偵 +酬 +駿 +蔡 +諷 +瑁 +í +è +: +ø +呈 +笠 +岬 +洛 +聾 +唖 +溝 +堀 +雌 +牝 +仔 +尼 +庁 +穫 +妖 +曽 += += +嗜 +珊 +瑚 +軸 +# +紋 +劉 +璿 +胤 +墉 +彫 +盆 +饗 +宴 +挿 +蔽 +脳 +暦 +ä +õ +廊 +讃 +ë +促 +峻 +壌 +訛 +鉱 +姦 +唆 +舗 +迂 +ñ +弘 +昌 +舶 +箔 +冠 +溢 +鶴 +肛 +脊 +柱 +傑 +智 +彦 +朋 +昪 +靖 +姻 +哨 +尺 +冥 +​ +剪 +“ +” +L +P +- +瀕 +ö +津 +汐 +泌 +皮 +膚 +肢 +只 +鍮 +斧 +壮 +倫 +幣 +儒 +遷 +殻 +惹 +累 +ß +珪 +弛 +曝 +浙 +華 +柿 +哺 +ü +& +W +Z +X +I +薪 +E +M +ę +雰 +媚 +艶 +蹄 +拐 +ř +â +塊 +箋 +漠 +呪 +Ł +ą +ł +挽 +灌 +漑 +煉 +瓦 +G +μ +迅 ++ +猥 +褻 +頬 +逐 +廠 +ć +邸 +疼 +伐 +燥 +凌 +駕 +錐 +尖 +û +呉 +翔 +憤 +慨 +琥 +珀 +漸 +堆 +ā +亀 +肖 +T +R +à +枠 +桁 +剰 +匿 +秤 +厩 +褐 +Ž +đ +Ä +趙 +š +餃 +擁 +脆 +脂 +肪 +漿 +× +晶 +岐 +遍 +謙 +殉 +弓 +Ü +昭 +Å +* +澎 +擬 +債 +秒 +猟 +歪 +阻 +砦 +凸 +諜 +ı +… +腫 +晃 +也 +龍 +燕 +閣 +ê +眉 +牡 +旺 +ç +ō +恣 +疆 +坐 +孵 +搾 +傍 +■ +削 +唇 +釉 +凹 +囚 +魏 +腱 +謀 +ţ +堤 +# +笛 +靭 +V +B +崗 +О +с +т +р +о +в +Г +а +л +я +膜 +椎 +帥 +剛 +梢 +俊 +蟹 +腿 +牽 +粘 +葦 +ń +劾 +祥 +紺 +ヵ +芳 +須 +賀 +填 +殖 +痺 +浚 +渫 +H +F +ī +匯 +Š +寡 +閃 +É +疫 +庇 +而 +頁 +侯 +挺 +畳 +浄 +淘 +杭 +K +縞 +牙 +循 +髄 +Á +屑 +朴 +p +隆 +傭 +紫 +峡 +謬 +ã +膠 +瘍 +瞞 +鋸 +塁 +鋼 +雛 +弧 +ğ +桂 +½ +唄 +扁 +α +酵 +’ +; +肝 +Ö +孔 +彙 +φ +梁 +栖 +妃 +蛹 +勾 +欄 +茂 +漁 +晦 +遼 +寧 +吊 +刃 +彰 +之 +濁 +喪 +僧 +萬 +膣 +那 +蛍 +鍛 +麦 +腺 +ô +Ó +λ +尤 +z +Δ +ż +ò +℃ +肋 +臍 +丼 +´ +踵 +宏 +朱 +燻 +漬 +霜 ++ +巧 +鐘 +冶 +膿 +疱 +寓 +蚊 +匠 +檻 +桟 +洪 +后 +ū +楕 +垣 +孝 +e +r +O +耽 +© +鴨 +杉 +烏 +啓 +Ç +痴 +祀 +贅 +荘 +濾 +ú +瞰 +U +埼 +窒 +沸 +騰 +閾 +È +樽 +→ +陪 +Ş +酢 +ė +漆 +喰 +汎 +< +æ +乙 +² +倣 +− +葛 +墨 +腔 +坑 +緋 +稚 +潤 +侶 +喚 +踪 +穀 +膨 +畜 +陛 +巾 +鉢 +彗 +臼 +杵 +Í +罹 +狡 +猾 +凱 +塑 +頸 +梱 +矯 +竹 +焙 +窄 +剥 +捗 +憧 +袖 +ð +榮 +ț +閥 +窩 +沌 +抄 +遡 +> +鳳 +凰 +痕 +蛇 +矩 +罠 +詐 +ý +楼 +庵 +ē +° +賊 +ồ +爬 +柑 +橘 +曾 +郭 +措 +栗 +桐 +粥 +C +O +E +卯 +詮 +忌 + +倭 +禰 +菖 +蒲 +條 +祓 +幡 +A +B +L +G +T +M +S +u +( +) +a +. +W +i +V +b +c +f +e +N +K +R +U +D +g +P +醍 +醐 +F +Z +I +H +Q +y +o +t +J +ヂ +J +槙 +嵯 +峨 +畿 +塚 +Y +X +淀 +伽 +s +ヅ +餅 +蒡 +穣 +ゞ +絲 +p +鯖 +n +琳 +柳 +髷 +閤 +稲 +菊 +巌 +迦 +抹 +曳 +叡 +壺 +苑 +羌 +狗 +ヰ +醤 +ぉ +硝 +袴 +倶 +汁 +但 +杮 +葺 +煮 +爺 +夙 +桜 +亭 +ゑ +苗 +m +曼 +荼 +簪 +☆ +辻 +鑢 +ゝ +稗 +蹊 +貼 +獅 +廟 +阿 +陀 +蘭 +妓 +翠 +柚 +賓 +芦 +拉 +麺 +帷 +或 +槐 +屎 +j +惟 +撫 +瑞 +侍 +巴 +廉 +峯 +菩 +薩 +吽 +弖 +彌 +佛 +耨 +閇 +貞 +闍 +閦 +洲 +妾 +仁 +宕 +媛 +隧 +笥 +葵 +茜 +譚 +渥 +旭 +綬 +霰 +楓 +雁 +朗 +渕 +梓 +巫 +姐 +鉾 +囃 +藩 +藺 +鮎 +粟 +袷 +篤 +杏 +遵 +徽 +宍 +瓊 +堵 +猷 +馨 +與 +麿 +冨 +彷 +徨 +湊 +菅 +按 +渠 +龗 +鞍 +采 +琢 +枳 +詣 +祇 +稙 +祐 +毅 +冲 +坡 +阯 +堯 +庄 +掾 +牟 +豫 +尹 +弉 +牌 +鑒 +夷 +俘 +喬 +暁 +允 +亮 +緯 +繋 +偈 +誡 +諡 +瑠 +璃 +弼 +岑 +亥 +郁 +媞 +磯 +佳 +翁 +蹟 +揆 +槻 +嗣 +恭 +熈 +畝 +噌 +燈 +脩 +佩 +閻 +壱 +逸 +眷 +誼 +籌 +芋 +鰯 +璽 +旛 +鑰 +摺 +鉤 +淫 +祠 +凉 +牒 +款 +蟄 +丞 +鋒 +檗 +帖 +菟 +荻 +邨 +厨 +佑 +乃 +鷺 +屏 +柴 +於 +箒 +祷 +蓮 +鵜 +丑 +寅 +碓 +渦 +蔚 +鰻 +姥 +毘 +閏 +涌 +庸 +樂 +祚 +邵 +虞 +邇 +悦 +栃 +怡 +斯 +榎 +厭 +爾 +圓 +應 +吏 +并 +堰 +奄 +掩 +壕 +稔 +焔 +w +猴 +@ +薗 +諏 +窯 +甚 +麹 +竈 +无 +穢 +窠 +廻 +寇 +鈞 +菴 +鍍 +珉 +慕 +詢 +肇 +羲 +莽 +襖 +鴎 +錦 +紗 +胴 +輿 +玲 +畷 +窪 +徂 +徠 +對 +桶 +螺 +鈿 +麝 +巳 +卸 +寵 +狛 +裳 +剋 +喩 +樋 +噺 +藍 +婢 +梵 +樫 +鷲 +嶽 +憐 +宰 +塾 +蔬 +涅 +槃 +址 +耆 +穎 +糠 +鰭 +俣 +咒 +鼠 +裘 +筯 +繍 +宸 +翰 +魁 +隈 +匡 +熙 +翫 +畠 +瓢 +壽 +卉 +筐 +僑 +蝦 +蹉 +k +v +跋 +釐 +堕 +h +r +d +哩 +l +樓 +霞 +韶 +碩 +皓 +臥 +鷹 +淵 +篭 +收 +桑 +誅 +國 +竄 +煕 +苔 +晏 +韋 +芥 +墾 +闔 +梆 +拵 +舅 +鎧 +蛙 +播 +楯 +廓 +暹 +惠 +瑜 +鑁 +舘 +恂 +衞 +嶋 +駒 +箏 +悼 +橿 +梶 +箸 +烹 +喝 +稽 +餡 +鰹 +樺 +㈱ +兜 +竃 +炒 +盒 +茅 +萱 +嶺 +藉 +苅 +坤 +闥 +懲 +湛 +藁 +衙 +饉 +戈 +桓 +衫 +聚 +潅 +藷 +糟 +妍 +竿 +絃 +罷 +擾 +疏 +鈔 +銕 +亟 +瀧 +勒 +躰 +佶 +錬 +慧 +檀 +聨 +頴 +亘 +尭 +愿 +贋 +證 +撰 +附 +阜 +毫 +漉 +惣 +蘂 +爐 +賎 +祢 +刹 +叉 +饅 +茲 +菱 +筮 +澳 +纂 +楚 +辰 +詔 +遐 +蟻 +吾 +萩 +鞠 +謹 +叢 +伍 +卜 +吃 +桔 +梗 +砧 +敦 +仇 +宥 +飫 +粂 +廿 +鼎 +逕 +嬪 +箭 +恤 +杣 +舖 +汲 +竟 +邃 +糾 +邑 +哇 +〈 +〉 +圀 +盡 +儼 +椋 +籃 +芹 +滋 +蛤 +淳 +駝 +猪 +沂 +稜 +莵 +藏 +經 +筍 +茗 +侠 +凶 +蓆 +紐 +蕎 +魯 +朔 +澗 +藻 +甫 +琮 +鬘 +欣 +欽 +笙 +舜 +闕 +煇 +鈎 +騨 +蒔 +鰐 +埵 +幢 +鑽 +嵌 +楷 +榛 +錍 +鈷 +笈 +鐸 +磬 +碧 +熨 +斗 +翅 +襴 +鑚 +鵄 +吟 +垢 +掟 +卦 +筑 +茄 +葱 +竴 +廼 +玖 +珂 +跏 +蝉 +誄 +串 +沓 +游 +蕃 +蕪 +鍬 +粮 +諭 +盃 +葩 +迪 +圭 +廬 +諶 +德 +祕 +裃 +荊 +洒 +蟷 +螂 +腋 +袍 +髮 +禮 +趺 +堺 +嘗 +甞 +帛 +蝕 +芿 +讀 +褌 +坪 +簒 +鋤 +硯 +翺 +棺 +胝 +篩 +磐 +隋 +諫 +亨 +旦 +孚 +叟 +曉 +盈 +澪 +懿 +爲 +琦 +愔 +圃 +濱 +奘 +諺 +藪 +註 +蜘 +蛛 +鞏 +篷 +閨 +裡 +糊 +賁 +跨 +劫 +壬 +絁 +釘 +譬 +聯 +傳 +芒 +體 +髻 +悉 +荏 +綸 +柏 +珣 +撹 +芬 +裔 +焚 +廂 +饌 +嵩 +簾 +匣 +禊 +籤 +奠 +鯉 +幟 +脛 +巷 +楳 +胖 +庚 +浩 +諒 +溥 +丙 +楠 +冑 +班 +學 +麞 +緬 +肱 +砥 +縢 +耶 +舂 +靈 +砌 +樗 +暉 +蛸 +鞆 +芙 +蓉 +雙 +鴻 +臚 +褄 +濠 +奢 +槌 +紘 +框 +蓑 +甑 +忽 +淆 +艮 +樵 +竭 +羯 +牢 +櫃 +鸞 +拙 +椿 +榊 +肴 +萠 +綜 +鮭 +笹 +苞 +硫 +奸 +徭 +躯 +戟 +襷 +閘 +櫓 +嘯 +臂 +實 +椏 +潴 +藐 +麒 +麟 +烝 +杜 +籐 +槇 +曰 +筰 +懺 +縣 +褥 +輯 +蚕 +斬 +庖 +謌 +璞 +屍 +團 +哉 +畏 +塵 +什 +鳶 +鴉 +濤 +縒 +趾 +櫻 +麩 +曠 +愍 +彊 +驕 +姶 +兎 +鴫 +竺 +僊 +雫 +彭 +灘 +餝 +棗 +蔀 +侑 +弗 +婬 +牘 +訶 +衍 +錫 +惺 +熹 +顛 +呑 +粕 +楞 +咀 +詛 +釋 +瑋 +曄 +筧 +誾 +徧 +虔 +蒐 +酋 +會 +頌 +齋 +誦 +戎 +袿 +繹 +榱 +酥 +碕 +汪 +奔 +曙 +鶯 +囀 +裾 +楮 +歎 +嬬 +婿 +升 +晧 +娼 +祟 +楢 +蓬 +杢 +篁 +柯 +弐 +几 +渤 +憙 +蜀 +芭 +蕉 +恕 +谿 +樟 +訢 +蒋 +鉦 +鍾 +馗 +鞘 +殷 +臈 +檄 +滿 +憑 +埴 +劔 +寶 +鐵 +姨 +耀 +僭 +襄 +疋 +蘆 +靺 +鞨 +悌 +仍 +枡 +鱈 +籬 +芯 +酉 +姜 +陞 +睿 +逗 +頚 +迹 +掬 +巒 +槽 +滸 +魄 +錘 +饋 +椙 +彬 +狄 +躬 +瀋 +奎 +悍 +總 +瑛 +禧 +廣 +塘 +蓼 +兌 +碾 +桝 +瞿 +醒 +苧 +嶂 +韮 +薙 +皺 +莞 +膏 +贄 +咋 +啄 +鎚 +汀 +鏃 +龕 +衷 +諱 +駈 +笄 +酌 +觀 +礙 +杓 +决 +覲 +甕 +栴 +絅 +晟 +銑 +珈 +琲 +膩 +愷 +蕭 +戮 +租 +戔 +嗚 +盞 +鵞 +軾 +昉 +爽 +宋 +匝 +瑳 +逝 +蕨 +欅 +黌 +蒼 +鎗 +惇 +其 +攘 +杲 +斥 +傅 +鞁 +毬 +璋 +賈 +蹲 +踞 +黛 +鯛 +鉋 +姞 +葡 +萄 +訥 +輌 +閬 +鬯 +靜 +瑩 +孁 +洹 +闡 +盧 +猩 +岫 +套 +巖 +篳 +篥 +舩 +覺 +沅 +衒 +凞 +祺 +袱 +托 +蟇 +巽 +藹 +狸 +衾 +ぢ +蘊 +顗 +鮒 +遥 +邊 +箆 +簀 +雍 +筌 +漣 +筅 +鈦 +夾 +紵 +梧 +賣 +凋 +弔 +霖 +劭 +餉 +ぃ +篋 +諚 +朕 +茸 +栂 +佃 +柘 +蔦 +鍔 +逍 +綏 +碇 +逓 +鄭 +鏑 +簺 +棹 +卍 +痘 +闢 +籟 +饂 +飩 +澱 +汝 +邉 +儛 +暾 +屠 +祁 +砺 +俵 +蒙 +藝 +熾 +洽 +榜 +莱 +璵 +蕊 +髙 +鄰 +z +穆 +姚 +忻 +竝 +苡 +諟 +媓 +嫄 +忯 +鐙 +撞 +綽 +璨 +鑼 +苫 +煌 +皋 +當 +捺 +邁 +瞻 +舍 +[ +] +糜 +輦 +啼 +捻 +襠 +涛 +瀾 +娑 +諧 +毀 +簫 +溪 +煤 +賠 +奕 +蜷 +雉 +咫 +暲 +艘 +拏 +筏 +塙 +蜊 +隼 +纏 +叛 +彈 +枇 +杷 +柊 +畢 +逼 +桧 +鴛 +鴦 +蝋 +燭 +箪 +豹 +鋲 +蛭 +囉 +羂 +羈 +逞 +單 +蛎 +萍 +糞 +站 +騏 +鮫 +昂 +袒 +且 +鎬 +戊 +瓔 +珞 +俸 +檜 +萌 +萊 +俔 +潭 +鵬 +翡 +柾 +亦 +玅 +箕 +咸 +獏 +瞋 +聊 +礬 +孟 +氈 +銚 +葭 +橇 +籾 +澂 +匁 +嬾 +淇 +薮 +愈 +茹 +揖 +僮 +渾 +蜻 +蛉 +羹 +酪 +洸 +嶠 +癡 +畺 +謫 +琉 +瀑 +湫 +賤 +摸 +濟 +淄 +伶 +聲 +莬 +禖 +韜 +彝 +珎 +賄 +賂 +亙 +彎 +椀 +丿 +舒 +仗 +佚 +估 +侏 +侘 +俯 +偃 +偕 +偐 +傀 +儡 +遜 +儺 +兀 +冤 +菫 +刎 +畸 +剽 +窃 +辨 +號 +匏 +厠 +吝 +嗇 +咄 +哭 +唳 +嗔 +嚆 +譯 +乘 +圜 +埒 +壹 +夥 +夬 +夭 +妲 +沆 +娟 +媽 +嫗 +岷 +帚 +幄 +幔 +幇 +淨 +繼 +徑 +忿 +恬 +懽 +戌 +截 +拇 +挂 +掖 +掣 +揉 +揶 +揄 +搦 +攝 +斟 +旁 +旡 +旻 +昵 +暈 +朏 +朧 +杁 +杞 +枅 +矧 +梟 +梔 +梛 +桴 +桾 +椥 +楫 +椹 +楡 +楪 +槿 +檐 +檣 +檸 +檬 +櫟 +殯 +麾 +沐 +沽 +涵 +淤 +滄 +滕 +滌 +澁 +眞 +瀟 +灑 +炮 +烙 +煬 +燔 +犂 +狷 +猊 +祗 +瑾 +瑪 +瑙 +甍 +瘡 +瘧 +盂 +鉉 +睨 +矍 +鑠 +矜 +碌 +碣 +磔 +礒 +礫 +禹 +稠 +稱 +笏 +笞 +筥 +筬 +箜 +篌 +筝 +箙 +篆 +籀 +篝 +簧 +粳 +糯 +糺 +絖 +絽 +綟 +縅 +繦 +緥 +縹 +繧 +繝 +纐 +纈 +纛 +罔 +罧 +羇 +聰 +肄 +膀 +胱 +膵 +膾 +臘 +舳 +范 +鷄 +苻 +苴 +擔 +莪 +蒟 +蒻 +薨 +薛 +茘 +蠣 +蛟 +蜆 +蜃 +雖 +蟠 +蠢 +衵 +衽 +袙 +袰 +裙 +裹 +褂 +裲 +褪 +褶 +襞 +襦 +袢 +襪 +誥 +誣 +諌 +謚 +謗 +譛 +譴 +讒 +豐 +貪 +賽 +贖 +扈 +跪 +踐 +躑 +躅 +躙 +躪 +軋 +軻 +輜 +辟 +檮 +邂 +逅 +邀 +邯 +鄲 +郢 +鄂 +醪 +醵 +釿 +銜 +鋏 +鋺 +錵 +鍼 +灸 +鎰 +鎹 +鐔 +鐃 +鈸 +鐇 +鑷 +鑿 +閔 +閼 +崛 +阮 +陬 +雊 +霍 +靫 +靱 +乎 +顆 +餘 +饒 +騁 +驛 +驢 +髢 +鬢 +鬨 +鮑 +鯱 +鰒 +鰰 +鱧 +鳰 +鴟 +鶉 +鵺 +鷙 +鸚 +鵡 +麁 +黠 +鼈 +齟 +齬 +棠 +遙 +瑤 +銈 +禔 +禛 +鈐 +儇 +匲 +媄 +尪 +巀 +辥 +忉 +掄 +枓 +栻 +梲 +檥 +滹 +沱 +潙 +炷 +猨 +璜 +穜 +竽 +筇 +翛 +薭 +螣 +/ +豅 +辦 +鉇 +鍑 +鑊 +鼉 +磧 +寔 +拈 +轍 +泯 +諍 +? +錣 +爼 +纒 +鑵 +櫨 +酎 +泡 +俄 +燗 +鞋 +鵲 +茵 +缶 +紬 +絣 +衿 +鴈 +盥 +凛 +燎 +袞 +淹 +瀉 +聟 +嫐 +俤 +薊 +衢 +醗 +斂 +懌 +袁 +渟 +杼 +鱒 +瀞 +鐚 +苛 +陌 +侈 +旌 +筵 +泗 +槊 +稷 +鐐 +頒 +斤 +勺 +嶼 +篦 +埔 +假 +墺 +刪 +于 +鯰 +穗 +渚 +崑 +轟 +皐 +關 +晁 +迢 +崋 +榕 +楨 +菘 +呰 +蒿 +憬 +雋 +珥 +羆 +弌 +墻 +鮪 +陂 +裴 +顯 +鐡 +臺 +煥 +稻 +肆 +遯 +鹽 +暘 +栲 +洩 +抓 +覈 +豎 +禦 + diff --git a/ppocr/utils/korean_dict.txt b/ppocr/utils/korean_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..0edec5fe5635cefe094a303f3c3038153e1b8e05 --- /dev/null +++ b/ppocr/utils/korean_dict.txt @@ -0,0 +1,3636 @@ +저 +자 +명 +: +신 +효 +필 +< +국 +문 +초 +록 +2 +5 +한 +어 +관 +계 +구 +의 +통 +사 +와 +미 +조 +- +합 +법 +적 +접 +근 +본 +논 +은 +형 +성 +일 +종 +으 +로 +오 +래 +전 +부 +터 +되 +온 +인 +특 +을 +살 +피 +고 +다 +시 +이 +를 +정 +보 +기 +반 +머 +리 +중 +심 +하 +여 +가 +상 +호 +작 +용 +는 +모 +안 +에 +서 +련 +된 +러 +현 +들 +술 +해 +것 +목 +표 +삼 +론 +과 +두 +함 +께 +복 +면 +더 +나 +아 +화 +황 +까 +지 +요 +측 +므 +재 +느 +른 +및 +포 +괄 +할 +수 +있 +잘 +착 +장 +뒤 +식 +절 +차 +위 +범 +주 +그 +유 +6 +3 +동 +격 +설 +징 +찰 +존 +9 +라 +분 +류 +양 +였 +출 +발 +개 +념 +공 +백 +대 +귀 +등 +펴 +략 +연 +도 +울 +핀 +많 +영 +역 +니 +제 +능 +내 +만 +충 +첨 +점 +핵 +' +않 +높 +체 +낮 +섬 +약 +드 +난 +또 +순 +진 +언 +타 +소 +편 +르 +데 +7 +별 +립 +야 +외 +밀 +맺 +방 +속 +행 +배 +경 +건 +려 +운 +원 +따 +후 +규 +짓 +바 +탕 +우 +선 +달 +활 +질 +채 +택 +임 +단 +히 +벗 +될 +색 +았 +간 +극 +루 +세 +파 +악 +게 +1 +말 +었 +집 +생 +입 +밝 +혀 +졌 +맥 +락 +쪽 +왔 +검 +토 +던 +확 +새 +란 +음 +치 +마 +못 +했 +맞 +춘 +며 +급 +거 +석 +남 +8 +누 +든 +완 +갖 +추 +앞 +쓰 +익 +섭 +홍 +빈 +같 +눈 +{ +0 +런 +낸 +열 +람 +네 +떤 +렵 +때 +닌 +} +학 +당 +혼 +준 +즉 +불 +없 +취 +비 +강 +변 +결 +렇 +겨 +키 +무 +받 +4 +항 +흔 +처 +직 +뿌 +엄 +축 +휘 +담 +컴 +퓨 +향 +몇 +둔 +박 +병 +참 +잡 +율 +금 +긴 +태 +각 +값 +렬 +예 ++ +| +[ +] +큰 +갈 +칙 +됨 +산 +매 +크 +증 +막 +뿐 +럼 +청 +층 +롯 +랜 +떻 +독 +력 +응 +감 +틀 +롭 +낼 +최 +희 +돈 +겹 +친 +쉽 +삭 +킨 +놓 +실 +" +폭 +넓 +료 +허 +메 +교 +* +ㄴ +붙 +스 +싸 +환 +찬 += +흐 +름 +물 +켰 +뀌 +삽 +# +첫 +번 +째 +억 +너 +멀 +떨 +져 +밑 +줄 +냥 +움 +볼 +둘 +깊 +탈 +낳 +왜 +벽 +족 +책 +읽 +겠 +찾 +큼 +투 +곳 +판 +끼 +철 +쉬 +칭 +; +견 +빠 +섯 +린 +습 +흥 +객 +묘 +꼴 +쉼 +쓸 +끝 +올 +령 +풀 +? +몰 +냐 +년 +권 +씩 +길 +밖 +알 +떠 +옆 +슷 +룬 +윤 +_ +랑 +났 +침 +먹 +찌 +꺼 +곰 +죽 +풍 +탄 +냄 +듯 +엇 +꾼 +회 +트 +날 +빼 +닐 +승 +맏 +딸 +버 +> +켜 +덕 +총 +꾸 +ㄹ +혹 +김 +균 +밥 +폐 +쇄 +평 +깝 +쉘 +옛 +\ +품 +ㄸ +얻 +돌 +셨 +킬 +득 +뜻 +갔 +봉 +넘 +뺏 +민 +워 +렸 +써 +림 +찍 +척 +잃 +답 +앗 +널 +송 +혜 +얼 +천 +셈 +녀 +골 +옮 +겼 +씨 +놀 +좌 +쳐 +좁 +님 +옷 +멋 +업 +월 +디 +늘 +창 +닭 +랐 +봄 +손 +왼 +코 +끌 +잉 +펄 +뛰 +낚 +對 +象 +化 +훈 +퍽 +쌍 +몸 +쯤 +걸 +! +쓴 +샀 +노 +좋 +컬 +쥐 +쫓 +혔 +잠 +깐 +좀 +깨 +웠 +군 +찔 +렀 +딕 +암 +룰 +맛 +카 +훨 +씬 +꼭 +럽 +촘 +광 +눌 +뒷 +팔 +망 +꺾 +먼 +뀐 +짐 +넣 +짜 +킴 +슴 +슨 +걷 +뉜 +` +숙 +글 +例 +同 +名 +異 +人 +럿 +퍼 +뜨 +험 +북 +끄 +짝 +칼 +닮 +짧 +쁜 +앉 +춥 +픈 +밉 +프 +둥 +싫 +애 +힌 +깎 +융 +앤 +똑 +깥 +껴 +싼 +잊 +낡 +봐 +욱 +케 +커 +곤 +낌 +헐 +긋 +테 +& +윈 +닥 +슬 +셋 +맨 +럴 +흡 +홀 +잖 +힘 +닫 +뮤 +션 +칠 +쉐 +량 +획 +혁 +협 +웨 +샹 +즘 +쏟 +쟁 +컨 +띠 +례 +플 +농 +낙 +탐 +육 +뇌 +팽 +궁 +늦 +춰 +탁 +패 +긍 +텔 +레 +젼 +뉴 +高 +빨 +퇴 +맡 +컫 +욕 +곽 +염 +~ +팩 +베 +곧 +職 +뚜 +렷 +닦 +겪 +냉 +헌 +죄 +쳤 +젊 +엘 +냈 +맑 +쿠 +푸 +믿 +뎨 +웬 +멸 +츠 +끊 +윌 +릴 +밟 +브 +삶 +끔 +률 +깃 +듦 +딘 +램 +펀 +웅 +훗 +콜 +촉 +즈 +벨 +꾀 +궤 +펜 +쿨 +뢰 +톤 +륙 +젝 +젠 +딪 +묵 +됐 +곡 +빚 +템 +父 +系 +權 +혈 +첩 +압 +괴 +숭 +뽑 +숨 +벼 +즐 +쾌 +륜 +三 +從 +之 +道 +七 +去 +惡 +잔 +쉴 +낱 +흉 +낀 +얽 +납 +볍 +헤 +촌 +뻗 +% +뭐 +홉 +떼 +뻔 +쨌 +걱 +쌓 +튼 +썩 +덮 +굴 +엮 +곁 +델 +쯧 +갑 +괜 +찮 +땅 +랫 +얌 +왠 +껏 +녕 +쑥 +섞 +렴 +풋 +뗀 +벌 +얘 +닉 +횟 +클 +컸 +밤 +싶 +겉 +푼 +꼈 +릇 +쩍 +녁 +쩌 +멈 +눕 +겁 +듣 +낭 +얇 +꿈 +틴 +엷 +젓 +귄 +굉 +옳 +몹 +뚫 +떡 +죠 +훌 +륭 +앓 +팬 +티 +액 +묻 +흘 +텃 +밭 +핏 +엔 +쇠 +페 +댔 +톱 +깍 +땠 +땐 +툭 +멍 +붉 +빛 +띤 +쭐 +댄 +숱 +샤 +툰 +줍 +윽 +딱 +솔 +뭔 +뜬 +덥 +덜 +뜩 +줌 +떳 +십 +팼 +쌀 +꼬 +듬 +꼽 +쁘 +꿔 +몫 +쁨 +엽 +셔 +헛 +꽤 +툴 +숲 +덤 +엿 +쏘 +낄 +팠 +色 +톨 +릭 +랄 +섹 +훑 +띄 +돼 +봤 +홧 +끗 +룻 +到 +達 +度 +推 +論 +變 +革 +樸 +根 +低 +作 +爲 +個 +原 +點 +밈 +賢 +明 +둑 +偏 +見 +者 +룩 +文 +質 +心 +身 +富 +利 +華 +美 +僞 +巧 +困 +惑 +飾 +無 +極 +仁 +萬 +物 +짚 +草 +犬 +不 +而 +不 +魏 +晋 +時 +代 +왕 +王 +弼 +開 +券 +常 +差 +別 +相 +一 +般 +窮 +稱 +大 +言 +辭 +當 +體 +實 +德 +上 +日 +證 +市 +씌 +老 +子 +秦 +漢 +源 +流 +生 +沒 +年 +宇 +宙 +著 +假 +託 +集 +積 +빗 +透 +徹 +前 +中 +期 +司 +馬 +遷 +史 +記 +韓 +非 +列 +傳 +學 +問 +經 +書 +諸 +百 +家 +儒 +思 +想 +武 +帝 +董 +仲 +舒 +朝 +國 +敎 +的 +官 +典 +訓 +枯 +風 +始 +皇 +갱 +焚 +坑 +紀 +獻 +先 +濟 +南 +伏 +故 +老 +新 +今 +舊 +古 +尙 +텍 +룹 +뉘 +易 +五 +專 +門 +墨 +守 +數 +融 +鄭 +玄 +章 +建 +初 +白 +虎 +觀 +議 +奏 +通 +義 +誥 +周 +禮 +儀 +禮 +春 +秋 +鞏 +羊 +穀 +梁 +佐 +氏 +論 +語 +班 +固 +筍 +悅 +凞 +衡 +太 +談 +憤 +滿 +公 +自 +序 +宣 +室 +令 +天 +星 +歷 +卜 +祝 +丞 +曆 +揚 +何 +黃 +元 +封 +泰 +山 +禪 +地 +治 +平 +閣 +딜 +河 +洛 +虞 +夏 +死 +西 +方 +關 +잇 +操 +縱 +發 +千 +歲 +海 +內 +紬 +君 +士 +載 +修 +事 +業 +淡 +六 +陰 +陽 +刑 +致 +廬 +歸 +法 +省 +下 +本 +四 +季 +多 +面 +臣 +夫 +婦 +長 +幼 +꿀 +節 +儉 +形 +善 +俗 +主 +旨 +功 +述 +点 +短 +卓 +說 +굳 +然 +久 +合 +虛 +聖 +텅 +因 +行 +端 +寬 +正 +肖 +是 +政 +渾 +冥 +統 +循 +消 +綱 +龍 +陝 +城 +縣 +楊 +祖 +來 +蹟 +郎 +小 +聞 +石 +遺 +抽 +出 +룡 +李 +龍 +禍 +匈 +奴 +宮 +옥 +갇 +廣 +卷 +찢 +腸 +땀 +젖 +끓 +任 +安 +悲 +境 +詩 +簡 +略 +屈 +離 +騷 +左 +丘 +意 +鬱 +結 +惟 +逝 +涇 +壺 +遂 +表 +理 +혐 +世 +再 +興 +徑 +川 +溪 +谷 +禽 +獸 +木 +牝 +牡 +雌 +雄 +樂 +和 +잣 +指 +散 +侯 +奔 +走 +里 +照 +夕 +ㄷ +웃 +纂 +弑 +孝 +롤 +빙 +轉 +寫 +版 +註 +釋 +戰 +術 +脚 +맹 +唐 +解 +貞 +索 +隱 +張 +北 +宋 +遽 +뻐 +刊 +校 +訂 +耳 +伯 +뼈 +車 +流 +哲 +愚 +俠 +氣 +得 +雲 +尹 +喜 +萊 +用 +宗 +段 +干 +住 +骸 +앙 +膠 +仰 +傅 +淸 +淨 +口 +譯 +聃 +欄 +外 +交 +所 +在 +鄕 +曲 +膽 +函 +後 +邊 +韶 +銘 +曾 +陳 +敍 +倫 +몽 +蒙 +申 +害 +京 +궐 +闕 +沛 +捌 +志 +廟 +녹 +읍 +鹿 +邑 +江 +펼 +擔 +刻 +疑 +梁 +玉 +繩 +讀 +雜 +念 +孫 +왈 +諡 +曰 +字 +選 +楚 +桓 +덧 +幽 +尼 +曼 +귓 +福 +哀 +齒 +敬 +案 +與 +判 +二 +藝 +畢 +沅 +駒 +禦 +寇 +商 +弟 +嚴 +憺 +音 +澹 +蟬 +欌 +遊 +性 +魯 +叔 +랍 +貴 +辯 +舌 +칫 +執 +峻 +烈 +近 +閻 +若 +據 +昭 +續 +葬 +巷 +黨 +食 +곱 +喪 +孔 +十 +有 +葉 +適 +識 +寓 +崔 +東 +壁 +洙 +泗 +考 +信 +錄 +戴 +朱 +핑 +尊 +崇 +堯 +舜 +設 +類 +驕 +浴 +態 +淫 +盛 +我 +引 +存 +眞 +路 +庫 +굽 +欲 +禹 +立 +篇 +神 +仙 +應 +注 +哮 +景 +吳 +誅 +殺 +資 +鑑 +威 +定 +壽 +箱 +養 +쳇 +퀴 +씻 +私 +贍 +足 +移 +各 +博 +句 +韻 +陶 +冶 +탠 +核 +連 +智 +壯 +荀 +呂 +管 +愼 +策 +鬼 +喩 +末 +乾 +괘 +卦 +告 +界 +藩 +屛 +器 +第 +莫 +終 +也 +比 +庇 +役 +可 +線 +造 +츰 +切 +部 +偈 +頌 +벳 +要 +誦 +曜 +끈 +읊 +씀 +劫 +뾰 +틈 +妄 +챙 +뛸 +샘 +늪 +솟 +늙 +쭙 +苦 +솜 +삐 +꽃 +흩 +맙 +붓 +픔 +빌 +겸 +돋 +뽐 +팁 +돕 +흙 +랴 +坐 +뱀 +뿔 +숫 +댐 +읜 +짊 +깔 +듭 +ㄱ +엉 +붕 +넌 +貪 +瞋 +痔 +脫 +밴 +엎 +큽 +덩 +읠 +姓 +階 +級 +힐 +콩 +묶 +훔 +肉 +넷 +뇨 +갚 +흑 +꽁 +휴 +껌 +씹 +뱉 +랬 +九 +涅 +槃 +入 +廷 +空 +惺 +具 +以 +둠 +求 +菩 +衆 +果 +벅 +짖 +센 +꼼 +똥 +뜸 +믐 +뜯 +털 +낯 +넬 +ㅎ +늑 +캐 +큐 +렌 +텐 +쿵 +흠 +핌 +탓 +턱 +뚤 +멕 +켈 +졸 +쪼 +ㅂ +앳 +탬 +즙 +휩 +폴 +뭉 +뚱 +빅 +슈 +셀 +둬 +캉 +튜 +ㅅ +뭇 +얗 +핍 +썼 +場 +뀔 +숴 +像 +띨 +科 +屬 +種 +괸 +롱 +띈 +횡 +킹 +웰 +닷 +얕 +탱 +팡 +꿨 +펌 +헨 +콰 +링 +벤 +콘 +빔 +둡 +뚝 +헬 +콥 +펠 +쏠 +잦 +탑 +멩 +튀 +뽀 +돔 +꽝 +돗 +빽 +펭 +ㅇ +짹 +렁 +옴 +껍 +옇 +윙 +햇 +닿 +얀 +흰 +윗 +굶 +둣 +깰 +맴 +뺨 +컷 +탔 +렐 +덟 +팥 +맘 +썰 +샌 +닝 +갯 +쩔 +캬 +춤 +릉 +싱 +캔 +깡 +킷 +뎠 +랭 +릎 +꽉 +첸 +췬 +랩 +옹 +뛴 +쐐 +믹 +찝 +댓 +걀 +쌘 +쉰 +갓 +틱 +폈 +냘 +랗 +늬 +빤 +톰 +맣 +/ +촬 +럭 +깬 +깜 +튕 +틋 +떴 +藻 +類 +잎 +셉 +싹 +캤 +훼 +틔 +놨 +얹 +젯 +캄 +師 +迦 +葉 +쯔 +붐 +僧 +茶 +弓 +醫 +팀 +臨 +曹 +洞 +겐 +昧 +魔 +旋 +씽 +柱 +趙 +州 +껄 +촛 +臥 +딴 +呵 +笑 +護 +位 +ㅌ +漸 +認 +都 +寺 +딛 +콤 +렉 +副 +聰 +持 +阿 +蜀 +佛 +育 +受 +蘊 +慧 +갠 +잿 +렝 +女 +뗑 +慈 +앎 +휼 +겅 +됩 +닙 +힙 +짠 +덴 +블 +맷 +重 +옵 +멜 +봅 +겔 +ㅈ +칩 +렘 +뵈 +삯 +몬 +暑 +싣 +찜 +퉁 +겟 +놋 +創 +컹 +렛 +花 +紅 +엡 +巢 +能 +꼐 +롬 +팍 +섰 +봇 +툼 +폼 +슥 +팎 +舟 +돛 +닻 +뗏 +엣 +칸 +知 +延 +批 +評 +理 +賞 +享 +뤄 +味 +浦 +筆 +漫 +쌩 +엠 +쇼 +흄 +뮈 +왓 +審 +分 +過 +間 +렙 +틸 +뭘 +뮐 +얏 +밋 +헉 +밧 +콧 +듸 +뿜 +앵 +쨍 +쭉 +誤 +덱 +愛 +샅 +밍 +눔 +룸 +엥 +폄 +꿰 +룐 +냇 +쑤 +릿 +圖 +盆 +勢 +坊 +民 +局 +承 +喆 +橋 +土 +保 +水 +濯 +멱 +獵 +頭 +踏 +깅 +李 +岸 +强 +占 +排 +뺀 +渠 +껑 +暗 +力 +銀 +鑛 +鐘 +樓 +共 +涌 +則 +精 +秩 +樣 +式 +聲 +畏 +脈 +絡 +찡 +뜰 +픽 +엌 +誠 +母 +胎 +其 +盤 +伴 +侶 +加 +工 +反 +車 +洋 +輪 +廻 +禾 +乘 +動 +땡 +볕 +캠 +귈 +넉 +感 +視 +覺 +댁 +늠 +戶 +棟 +뷰 +費 +얄 +廳 +往 +倍 +格 +斜 +젤 +客 +顚 +倒 +此 +彼 +步 +릅 +낫 +未 +靴 +샐 +핸 +켤 +줘 +톡 +맬 +넨 +巫 +슭 +兀 +瓦 +骨 +斯 +盟 +劃 +麗 +쿄 +뭍 +辰 +成 +族 +塞 +赤 +峰 +녔 +昔 +波 +角 +杯 +製 +꽂 +헝 +겊 +솥 +銅 +鏡 +줏 +鳥 +社 +陵 +處 +텡 +堆 +秘 +悖 +兒 +罕 +짙 +꿩 +쥬 +酒 +俱 +뭄 +홱 +靑 +鷹 +앴 +뽈 +튿 +卍 +騫 +域 +樺 +漁 +쟉 +八 +寶 +雙 +紋 +싯 +쩐 +욤 +丹 +뒬 +槍 +츨 +뱅 +泡 +疹 +哨 +눠 +톈 +샴 +캘 +쏜 +셰 +켯 +毛 +ㅓ +斷 +層 +푹 +숀 +멧 +鰐 +梨 +늄 +遍 +超 +턴 +옐 +쿼 +랙 +球 +슘 +뷔 +퐁 +윅 +벙 +멘 +産 +줬 +콕 +팅 +잽 +닛 +쌉 +텁 +헙 +乎 +옭 +派 +띌 +꾹 +遠 +챌 +썽 +씁 +훤 +칵 +곬 +딩 +團 +連 +삿 +갸 +잭 +뗄 +쥔 +光 +庭 +漆 +옻 +닯 +寄 +回 +羽 +狀 +複 +燁 +樗 +樹 +땔 +綠 +雖 +危 +最 +好 +啼 +影 +侵 +綠 +衣 +濕 +夢 +賣 +臨 +魚 +月 +軒 +菜 +妊 +雪 +深 +夜 +愁 +귤 +펑 +柑 +橘 +亞 +金 +쌌 +橄 +攬 +欖 +薺 +멎 +腋 +媒 +鹽 +藏 +油 +쐬 +쪄 +桑 +童 +奇 +짇 +뽕 +供 +犧 +섣 +냅 +굵 +찧 +蓮 +詵 +巖 +液 +藥 +盧 +命 +賦 +髮 +香 +囊 +燕 +楓 +歌 +謠 +永 +金 +澤 +霜 +뫼 +勸 +뻑 +굿 +雀 +配 +糖 +松 +障 +幹 +궂 +홈 +꿋 +꺽 +雅 +苕 +云 +矣 +憂 +維 +傷 +如 +웁 +칡 +凌 +女 +紫 +墜 +瘀 +血 +乳 +蔡 +絹 +蠶 +繭 +紙 +蘭 +亭 +竹 +麻 +房 +友 +謝 +箋 +燈 +堂 +薛 +濤 +杜 +甫 +苔 +楮 +蘚 +植 +넋 +錦 +썹 +病 +빳 +阪 +組 +柳 +쬐 +又 +會 +놈 +밸 +홋 +島 +岡 +덫 +폰 +놔 +췄 +찐 +켓 +켄 +텄 +野 +村 +뻘 +쌈 +큘 +쨋 +콱 +座 +쥘 +田 +登 +井 +兵 +鬪 +멤 +黑 +넸 +由 +쳔 +軍 +情 +뿍 +댕 +技 +쩡 +貫 +ㅋ +탤 +偶 +앰 +뷸 +핫 +郞 +店 +햄 +牛 +찼 +넛 +宅 +便 +急 +渡 +播 +磨 +齋 +藤 +忠 +次 +긁 +林 +晴 +띔 +낵 +吉 +祥 +짭 +짤 +隆 +勝 +茂 +務 +펫 +森 +良 +靖 +팸 +玲 +헹 +굼 +쉭 +륵 +쏙 +磁 +火 +印 +핥 +볐 +뎌 +現 +顯 +딤 +궈 +켠 +恨 +늉 +캇 +롸 +쎄 +헴 +誕 +탯 +夷 +낟 +殷 +슐 +燧 +農 +頊 +괭 +빻 +墟 +湯 +傑 +后 +稷 +戎 +越 +晉 +翟 +셜 +엊 +誌 +利 +賓 +盡 +把 +習 +全 +於 +챠 +뱍 +즌 +셍 +園 +츄 +墳 +엑 +雇 +岳 +퓬 +蕓 +촨 +뻬 +虹 +豫 +蔬 +杭 +蘇 +桂 +林 +秀 +璃 +臺 +潭 +烈 +輸 +特 +區 +鳳 +榮 +池 +魯 +蓮 +溫 +泉 +슝 +膨 +湖 +墾 +丁 +恒 +췌 +進 +옌 +텨 +냔 +ㅊ +팜 +提 +羅 +弘 +益 +輯 +鄒 +牟 +奄 +넜 +랏 +留 +樂 +뼘 +曉 +잤 +諍 +薩 +柏 +逐 +鹿 +惠 +施 +꿴 +댈 +弱 +隨 +뱃 +汎 +兼 +支 +離 +損 +깻 +뭣 +鵲 +醯 +診 +臟 +뭡 +紂 +己 +抱 +烙 +樓 +쿡 +卿 +竅 +箕 +微 +祭 +康 +桀 +右 +땜 +逆 +滑 +釐 +攻 +煬 +辨 +拇 +枝 +目 +刺 +繡 +律 +律 +姑 +磬 +呂 +曠 +蔘 +輿 +衛 +靈 +堅 +居 +畸 +鳧 +脛 +鶴 +앨 +켐 +品 +少 +六 +孤 +齊 +首 +雷 +懸 +財 +貨 +눴 +챈 +參 +鰍 +臾 +盜 +拓 +麗 +縷 +躬 +穆 +調 +放 +至 +泊 +伐 +慾 +素 +朴 +樽 +珪 +璋 +쁠 +赫 +胥 +腹 +醴 +屋 +閭 +壬 +罰 +逢 +諫 +靈 +劣 +伍 +暈 +戮 +勇 +脣 +竭 +寒 +亡 +鄲 +薄 +圍 +起 +淵 +斗 +斛 +璽 +候 +爵 +恩 +斧 +鉞 +示 +絶 +乃 +止 +珠 +芋 +瑟 +琴 +僥 +匠 +拙 +妙 +容 +央 +栗 +陸 +畜 +轅 +盧 +炎 +曦 +跡 +짱 +좽 +沼 +莊 +彿 +舍 +塔 +婆 +摩 +벵 +若 +密 +蜜 +펙 +群 +剛 +趣 +改 +盂 +蘭 +鎭 +卽 +屍 +눗 +컵 +緣 +謙 +姚 +祇 +坵 +秤 +胡 +忍 +鈍 +梵 +뇩 +먁 +等 +直 +幻 +捨 +男 +願 +陸 +默 +寂 +甘 +露 +抄 +他 +肇 +菴 +뵙 +閔 +累 +皆 +奉 +講 +邪 +$ +芳 +듀 +갬 +맵 +뎀 +値 +稼 +價 +輕 +際 +갭 +網 +靜 +依 +互 +癖 +鈴 +蕉 +俳 +滅 +件 +퀘 +話 +皮 +電 +荷 +活 +降 +台 +佈 +彌 +陀 +疏 +唯 +攝 +燮 +跋 +親 +普 +叉 +難 +堤 +順 +儼 +澄 +苑 +昌 +院 +奈 +孺 +蘆 +絲 +茶 +趨 +伊 +列 +災 +厄 +英 +運 +歐 +參 +岩 +倉 +攘 +幕 +府 +潑 +殖 +猩 +條 +約 +諭 +沖 +峽 +休 +培 +艦 +馨 +防 +督 +弁 +桎 +梏 +征 +峙 +쵸 +兆 +梓 +朋 +隣 +搗 +嘗 +薪 +栗 +遼 +半 +沿 +灣 +立 +圈 +瓜 +恐 +熱 +醉 +綸 +答 +豪 +紳 +岐 +菫 +津 +袁 +凱 +純 +鐵 +洲 +企 +針 +隊 +瀋 +暘 +總 +領 +亥 +緖 +丸 +助 +敗 +猥 +獨 +望 +隷 +厦 +澳 +澎 +制 +祺 +瑞 +萍 +毅 +閥 +打 +破 +졍 +웹 +뙤 +튄 +쾰 +쏭 +뤼 +짰 +뭏 +看 +譜 +갛 +첼 +벡 +똘 +뺄 +잴 +잰 +偉 +勳 +寃 +掌 +布 +接 +亨 +甑 +姜 +淳 +報 +彬 +鼎 +奎 +倫 +訣 +吐 +蕃 +帽 +殿 +遡 +橡 +還 +領 +綽 +顔 +譚 +稽 +瑪 +壇 +彛 +꿇 +숯 +ㅆ +녘 +來 +裕 +唱 +媚 +繪 +畵 +崖 +羅 +服 +料 +圓 +煌 +冠 +ㅣ +船 +傾 +耕 +伎 +샬 +妖 +閃 +쩨 +몄 +맸 +晶 +ㅜ +矢 +쓱 +髓 +뺑 +鷄 +揭 +巨 +龜 +햐 +딧 +拜 +겡 +眼 +緯 +契 +鮮 +卑 +落 +蒿 +准 +黎 +댑 +깟 +빕 +툇 +춧 +뼉 +킵 +깼 +숟 +뭅 +낏 +섶 +뱁 +돝 +杖 +왱 +삵 +갉 +烏 +飛 +梨 +뒹 +쇳 +홰 +짢 +擧 +兩 +뺐 +펐 +쩜 +홑 +윳 +允 +좇 +쇤 +룽 +챘 +흣 +裔 +엾 +뒀 +갗 +묽 +넙 +꼿 +뻤 +꿍 +컥 +뎅 +겋 +뢸 +쏴 +쭈 +쾅 +혓 +겻 +쫀 +뗐 +蝕 +臆 +荇 +∼ +쾡 +얍 +곶 +닳 +꿎 +켕 +캥 +탉 +곯 +짬 +뻣 +믈 +빡 +겄 +갤 +횃 +卒 +륨 +껐 +캡 +肥 +빴 +훅 +材 +翁 +뗍 +枰 +慣 +틉 +켭 +탭 +끽 +웜 +넝 +賊 +均 +米 +稀 +炭 +빵 +찹 +胚 +芽 +멥 +볶 +” +곪 +酸 +沙 +麥 +궜 +貧 +怡 +찻 +肝 +豆 +壓 +疫 +午 +郡 +拾 +療 +滯 +痛 +菊 +症 +崩 +蔓 +葛 +粉 +救 +荒 +떫 +灰 +茵 +癌 +毒 +基 +脂 +授 +機 +滋 +補 +腎 +汗 +疼 +暈 +飮 +랒 +桔 +梗 +肺 +咽 +喉 +痺 +拘 +杞 +’ +菌 +燐 +板 +埴 +壤 +甲 +椒 +썬 +徐 +帶 +咳 +粘 +軟 +裂 +片 +援 +洛 +卵 +抗 +腫 +瘍 +粥 +伸 +將 +趾 +孟 +茹 +瀝 +튤 +苞 +蒲 +쫙 +番 +蠻 +倭 +擘 +煎 +苛 +劑 +符 +檀 +禁 +忌 +蒜 +必 +須 +量 +薑 +咸 +早 +隋 +챗 +棗 +떰 +枾 +飢 +餓 +滄 +옅 +檎 +捿 +秉 +垢 +溶 +整 +焦 +脾 +擒 +栢 +鋼 +潤 +稗 +耐 +晩 +燥 +游 +燔 +珍 +蝶 +裙 +刀 +借 +料 +煮 +胞 +那 +쫄 +佃 +濁 +輻 +貝 +쥴 +丑 +灸 +脯 +脩 +熟 +輓 +鴨 +逵 +凉 +胃 +瘡 +蟲 +髥 +쫑 +蒸 +糞 +屎 +볏 +덖 +豚 +猪 +쌔 +蜂 +餘 +豊 +寅 +獵 +牌 +使 +停 +碍 +狗 +塚 +吠 +飯 +숍 +錢 +雨 +追 +慕 +碑 +폿 +뵐 +쪘 +핼 +깁 +밌 +쩝 +떱 +넥 +짼 +씸 +겆 +휙 +깽 +뜀 +숩 +끙 +젭 +됴 +팝 +앱 +딨 +걔 +꺄 +눅 +쒔 +戀 +吏 +녜 +旱 +뺌 +샜 +꽥 +뻥 +걘 +떵 +뀄 +왁 +菽 +댜 +訊 +戟 +置 +睡 +삘 +샛 +낍 +才 +낑 +퀸 +꼍 +쟤 +待 +寸 +뎃 +浮 +沈 +쑨 +塵 +奮 +惡 +쨀 +떽 +쟈 +貸 +씰 +쒀 +좍 +휭 +뱄 +얜 +썸 +텀 +껀 +곗 +휠 +숄 +괌 +퉜 +꿉 +벚 +샷 +뷴 +웸 +킥 +슛 +챔 +뤘 +셸 +팻 +텝 +퀵 +콸 +뮬 +튈 +윔 +젬 +뮌 +욜 +갰 +휑 +퀭 +퉈 +헷 +탰 +랠 +븐 +퓰 +픕 +끕 +삔 +띵 +뀝 +헥 +휜 +룃 +셌 +흽 +챕 +땝 +톳 +쟀 +띕 +졀 +쨉 +뱐 +윱 +햅 +띱 +꾜 +궝 +늅 +붇 +곕 +횝 +푭 +샙 +벱 +닢 +뀜 +솝 +뜁 +쿤 +듐 +펩 +旗 +手 +患 +凡 +膜 +失 +型 +優 +尿 +襄 +限 +婚 +股 +臼 +細 +織 +卵 +尿 +늡 +^ +헀 +á +ň +ó +ž +“ +ç +ü +í +é +ã +튠 +ä +ć +ă +ş +땄 +넹 +ö +Š +ě +ñ +퀀 +å +ř +ý +캅 +∇ +è +퀼 +쳄 +헵 +ê +ō +ø +뢴 +î +쩄 +롹 +옙 +Č +č +샨 +Ș +쾨 +듈 +벰 +ș +팰 +셴 +쳉 +â +욘 +ë +퓸 +É +먀 +쪾 +​ +Ö +팟 +禅 +퀄 +ß +ę +Ł +ź +ą +ł +Α +û +ā +à +튬 +Ž +đ +浅 +克 +Ä +š +넴 +× +뉩 +쐈 +Ü +Å +ì +왑 +힉 +휄 +ı +ţ +웡 +İ +О +с +т +р +о +в +Г +а +л +я +샵 +ė +ń +Á +딥 +ī +ğ +힝 +½ +Ç +φ +ż +ô +Ó +λ +웍 +Δ +ò +ū +캣 +嶋 +淑 +α +ニ +カ +ラ +グ +ア +ン +© +챤 +ï +ú +Ş +→ +죤 +æ +펨 +² +õ +뇽 +쎈 +° +펍 +Í +콴 +ð +첵 +Î +넵 +ē +쿰 +「 +」 + diff --git a/tools/infer/predict_system.py b/tools/infer/predict_system.py index 3e6be234c68dcd82f0f9e844f3ad2859000cec88..29c4d7e8e35ceda3966dfcadcca5f0ae985d1bb1 100755 --- a/tools/infer/predict_system.py +++ b/tools/infer/predict_system.py @@ -133,6 +133,7 @@ def main(args): image_file_list = get_image_file_list(args.image_dir) text_sys = TextSystem(args) is_visualize = True + font_path = args.vis_font_path for image_file in image_file_list: img, flag = check_and_read_gif(image_file) if not flag: @@ -160,7 +161,7 @@ def main(args): scores = [rec_res[i][1] for i in range(len(rec_res))] draw_img = draw_ocr( - image, boxes, txts, scores, drop_score=drop_score) + image, boxes, txts, scores, drop_score=drop_score, font_path=font_path) draw_img_save = "./inference_results/" if not os.path.exists(draw_img_save): os.makedirs(draw_img_save) diff --git a/tools/infer/utility.py b/tools/infer/utility.py index 50d934efe91faa2956e63a2344c8b6b6090e4f7a..98eaee2d2ff6dcfcc2c90af24bdbfadd274f7793 100755 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -70,7 +70,11 @@ def parse_args(): "--rec_char_dict_path", type=str, default="./ppocr/utils/ppocr_keys_v1.txt") - parser.add_argument("--use_space_char", type=bool, default=True) + parser.add_argument("--use_space_char", type=str2bool, default=True) + parser.add_argument( + "--vis_font_path", + type=str, + default="./doc/simfang.ttf") # params for text classifier parser.add_argument("--use_angle_cls", type=str2bool, default=False) @@ -199,7 +203,7 @@ def draw_ocr(image, return image -def draw_ocr_box_txt(image, boxes, txts): +def draw_ocr_box_txt(image, boxes, txts, font_path="./doc/simfang.ttf"): h, w = image.height, image.width img_left = image.copy() img_right = Image.new('RGB', (w, h), (255, 255, 255)) @@ -226,7 +230,7 @@ def draw_ocr_box_txt(image, boxes, txts): if box_height > 2 * box_width: font_size = max(int(box_width * 0.9), 10) font = ImageFont.truetype( - "./doc/simfang.ttf", font_size, encoding="utf-8") + font_path, font_size, encoding="utf-8") cur_y = box[0][1] for c in txt: char_size = font.getsize(c) @@ -236,7 +240,7 @@ def draw_ocr_box_txt(image, boxes, txts): else: font_size = max(int(box_height * 0.8), 10) font = ImageFont.truetype( - "./doc/simfang.ttf", font_size, encoding="utf-8") + font_path, font_size, encoding="utf-8") draw_right.text( [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) img_left = Image.blend(image, img_left, 0.5) diff --git a/tools/program.py b/tools/program.py index 08799d17eb66dd9b97fa9d6a7d509167f5d74c88..2ef203f4cb08231fa04cf2e4c8ee41a40470a0ae 100755 --- a/tools/program.py +++ b/tools/program.py @@ -204,6 +204,15 @@ def build(config, main_prog, startup_prog, mode): def build_export(config, main_prog, startup_prog): """ + Build input and output for exporting a checkpoints model to an inference model + Args: + config(dict): config + main_prog(): main program + startup_prog(): startup program + Returns: + feeded_var_names(list[str]): var names of input for exported inference model + target_vars(list[Variable]): output vars for exported inference model + fetches_var_name: dict of checkpoints model outputs(included loss and measures) """ with fluid.program_guard(main_prog, startup_prog): with fluid.unique_name.guard(): @@ -246,6 +255,9 @@ def train_eval_det_run(config, train_info_dict, eval_info_dict, is_pruning=False): + ''' + main program of evaluation for detection + ''' train_batch_id = 0 log_smooth_window = config['Global']['log_smooth_window'] epoch_num = config['Global']['epoch_num'] @@ -337,6 +349,9 @@ def train_eval_det_run(config, def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict): + ''' + main program of evaluation for recognition + ''' train_batch_id = 0 log_smooth_window = config['Global']['log_smooth_window'] epoch_num = config['Global']['epoch_num'] @@ -513,6 +528,7 @@ def train_eval_cls_run(config, exe, train_info_dict, eval_info_dict): def preprocess(): + # load config from yml file FLAGS = ArgsParser().parse_args() config = load_config(FLAGS.config) merge_config(FLAGS.opt) @@ -522,6 +538,7 @@ def preprocess(): use_gpu = config['Global']['use_gpu'] check_gpu(use_gpu) + # check whether the set algorithm belongs to the supported algorithm list alg = config['Global']['algorithm'] assert alg in [ 'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN', 'CLS' diff --git a/tools/train.py b/tools/train.py index 531dd15933ebfd83527f091215c40b85253f7866..cf0171b340f8cebb6251d2ef12efb14d3cdb709e 100755 --- a/tools/train.py +++ b/tools/train.py @@ -46,6 +46,7 @@ from paddle.fluid.contrib.model_stat import summary def main(): + # build train program train_build_outputs = program.build( config, train_program, startup_program, mode='train') train_loader = train_build_outputs[0] @@ -54,6 +55,7 @@ def main(): train_opt_loss_name = train_build_outputs[3] model_average = train_build_outputs[-1] + # build eval program eval_program = fluid.Program() eval_build_outputs = program.build( config, eval_program, startup_program, mode='eval') @@ -61,9 +63,11 @@ def main(): eval_fetch_varname_list = eval_build_outputs[2] eval_program = eval_program.clone(for_test=True) + # initialize train reader train_reader = reader_main(config=config, mode="train") train_loader.set_sample_list_generator(train_reader, places=place) + # initialize eval reader eval_reader = reader_main(config=config, mode="eval") exe = fluid.Executor(place) diff --git a/train_data/gen_label.py b/train_data/gen_label.py new file mode 100644 index 0000000000000000000000000000000000000000..552f279f34efa0be437d404273c510585da12f83 --- /dev/null +++ b/train_data/gen_label.py @@ -0,0 +1,74 @@ +#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +import os +import argparse + + +def gen_rec_label(input_path, out_label): + with open(out_label, 'w') as out_file: + with open(input_path, 'r') as f: + for line in f.readlines(): + tmp = line.strip('\n').replace(" ", "").split(',') + img_path, label = tmp[0], tmp[1] + label = label.replace("\"", "") + out_file.write(img_path + '\t' + label + '\n') + + +def gen_det_label(root_path, input_dir, out_label): + with open(out_label, 'w') as out_file: + for label_file in os.listdir(input_dir): + img_path = root_path + label_file[3:-4] + ".jpg" + label = [] + with open(os.path.join(input_dir, label_file), 'r') as f: + for line in f.readlines(): + tmp = line.strip("\n\r").replace("\xef\xbb\xbf", "").split(',') + points = tmp[:-2] + s = [] + for i in range(0, len(points), 2): + b = points[i:i + 2] + s.append(b) + result = {"transcription": tmp[-1], "points": s} + label.append(result) + out_file.write(img_path + '\t' + str(label) + '\n') + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + '--mode', + type=str, + default="rec", + help='Generate rec_label or det_label, can be set rec or det') + parser.add_argument( + '--root_path', + type=str, + default=".", + help='The root directory of images.Only takes effect when mode=det ') + parser.add_argument( + '--input_path', + type=str, + default=".", + help='Input_label or input path to be converted') + parser.add_argument( + '--output_label', + type=str, + default="out_label.txt", + help='Output file name') + + args = parser.parse_args() + if args.mode == "rec": + print("Generate rec label") + gen_rec_label(args.input_path, args.output_label) + elif args.mode == "det": + gen_det_label(args.root_path, args.input_path, args.output_label)