diff --git a/StyleText/README.md b/StyleText/README.md index 65a72ac808f5f875e1f42369e7d588027e9508a2..609c90539e982bc66b044cd6294e9d4a070ab908 100644 --- a/StyleText/README.md +++ b/StyleText/README.md @@ -120,7 +120,7 @@ In actual application scenarios, it is often necessary to synthesize pictures in * `with_label`:Whether the `label_file` is label file list. * `CorpusGenerator`: * `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`. - * `language`:Language of the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko). + * `language`:Language of the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko). * `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time. @@ -171,9 +171,8 @@ After adding the above synthetic data for training, the accuracy of the recognit | Scenario | Characters | Raw Data | Test Data | Only Use Raw Data
Recognition Accuracy | New Synthetic Data | Simultaneous Use of Synthetic Data
Recognition Accuracy | Index Improvement | | -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- | -| Metal surface | English and numbers | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% | -| Random background | Korean | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% | - +| Metal surface | English and numbers | 2203 | 650 | 59.38% | 20000 | 75.46% | 16.00% | +| Random background | Korean | 5,631 | 1230 | 30.12% | 100000 | 50.57% | 20.00% | ### Code Structure diff --git a/StyleText/README_ch.md b/StyleText/README_ch.md index ccd1efaf1afae2c21c746f989e9b86bfed19e74b..b35967f4a7c1f4ddad6053ac0528fe1290f817ff 100644 --- a/StyleText/README_ch.md +++ b/StyleText/README_ch.md @@ -156,8 +156,8 @@ python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_ | 场景 | 字符 | 原始数据 | 测试数据 | 只使用原始数据
识别准确率 | 新增合成数据 | 同时使用合成数据
识别准确率 | 指标提升 | | -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- | -| 金属表面 | 英文和数字 | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% | -| 随机背景 | 韩语 | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% | +| 金属表面 | 英文和数字 | 2203 | 650 | 59.38% | 20000 | 75.46% | 16.00% | +| 随机背景 | 韩语 | 5631 | 1,230 | 30.12% | 100000 | 50.57% | 20.00% | diff --git "a/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253/PCB\345\255\227\347\254\246\350\257\206\345\210\253.md" "b/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253/PCB\345\255\227\347\254\246\350\257\206\345\210\253.md" index ee13bacffdb65e6300a034531a527fdca4ed29f9..804d57e3b543156b54923db0f1019fb9fc3afaee 100644 --- "a/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253/PCB\345\255\227\347\254\246\350\257\206\345\210\253.md" +++ "b/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253/PCB\345\255\227\347\254\246\350\257\206\345\210\253.md" @@ -266,8 +266,8 @@ python3 tools/eval.py \ | 序号 | 方案 | hmean | 效果提升 | 实验分析 | | -------- | -------- | -------- | -------- | -------- | | 1 | PP-OCRv3英文超轻量检测预训练模型 | 64.64% | - | 提供的预训练模型具有泛化能力 | -| 2 | PP-OCRv3英文超轻量检测预训练模型 + 验证集padding | 72.13% |+7.5% | padding可以提升尺寸较小图片的检测效果| -| 3 | PP-OCRv3英文超轻量检测预训练模型 + fine-tune | 100% | +27.9% | fine-tune会提升垂类场景效果 | +| 2 | PP-OCRv3英文超轻量检测预训练模型 + 验证集padding | 72.13% |+7.50% | padding可以提升尺寸较小图片的检测效果| +| 3 | PP-OCRv3英文超轻量检测预训练模型 + fine-tune | 100.00% | +27.90% | fine-tune会提升垂类场景效果 | ``` @@ -420,12 +420,12 @@ python3 tools/eval.py \ | 序号 | 方案 | acc | 效果提升 | 实验分析 | | -------- | -------- | -------- | -------- | -------- | | 1 | PP-OCRv3中英文超轻量识别预训练模型直接评估 | 46.67% | - | 提供的预训练模型具有泛化能力 | -| 2 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune | 42.02% |-4.6% | 在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试)| -| 3 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集 | 77% | +30% | 在数据量不足的情况下,可以考虑补充公开数据训练 | -| 4 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 99.99% | +23% | 如果能获取更多数据量的情况,可以通过增加数据量提升效果 | +| 2 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune | 42.02% |-4.60% | 在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试)| +| 3 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集 | 77.00% | +30.00% | 在数据量不足的情况下,可以考虑补充公开数据训练 | +| 4 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 99.99% | +23.00% | 如果能获取更多数据量的情况,可以通过增加数据量提升效果 | ``` -注:上述实验结果均是在1500张图片(1200张训练集,300张测试集)、2W张图片、添加公开通用识别数据集上训练、评估的得到,AIstudio只提供了100张数据,所以指标有所差异属于正常,只要策略有效、规律相同即可。 +注:上述实验结果均是在1,500张图片(1,200张训练集,300张测试集)、2W张图片、添加公开通用识别数据集上训练、评估的得到,AIstudio只提供了100张数据,所以指标有所差异属于正常,只要策略有效、规律相同即可。 ``` # 6. 模型导出 @@ -614,23 +614,23 @@ python3 tools/end2end/eval_end2end.py ./save_gt_label/ ./save_PPOCRV2_infer/ | 序号 | 方案 | hmean | 效果提升 | 实验分析 | | ---- | -------------------------------------------------------- | ------ | -------- | ------------------------------------- | | 1 | PP-OCRv3英文超轻量检测预训练模型直接评估 | 64.64% | - | 提供的预训练模型具有泛化能力 | -| 2 | PP-OCRv3英文超轻量检测预训练模型 + 验证集padding直接评估 | 72.13% | +7.5% | padding可以提升尺寸较小图片的检测效果 | -| 3 | PP-OCRv3英文超轻量检测预训练模型 + fine-tune | 100% | +27.9% | fine-tune会提升垂类场景效果 | +| 2 | PP-OCRv3英文超轻量检测预训练模型 + 验证集padding直接评估 | 72.13% | +7.50% | padding可以提升尺寸较小图片的检测效果 | +| 3 | PP-OCRv3英文超轻量检测预训练模型 + fine-tune | 100.00% | +27.90% | fine-tune会提升垂类场景效果 | * 识别 | 序号 | 方案 | acc | 效果提升 | 实验分析 | | ---- | ------------------------------------------------------------ | ------ | -------- | ------------------------------------------------------------ | | 1 | PP-OCRv3中英文超轻量识别预训练模型直接评估 | 46.67% | - | 提供的预训练模型具有泛化能力 | -| 2 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune | 42.02% | -4.6% | 在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试) | -| 3 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集 | 77% | +30% | 在数据量不足的情况下,可以考虑补充公开数据训练 | -| 4 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 99.99% | +23% | 如果能获取更多数据量的情况,可以通过增加数据量提升效果 | +| 2 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune | 42.02% | -4.60% | 在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试) | +| 3 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集 | 77.00% | +30.00% | 在数据量不足的情况下,可以考虑补充公开数据训练 | +| 4 | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 99.99% | +23.00% | 如果能获取更多数据量的情况,可以通过增加数据量提升效果 | * 端到端 | det | rec | fmeasure | | --------------------------------------------- | ------------------------------------------------------------ | -------- | -| PP-OCRv3英文超轻量检测预训练模型 + fine-tune | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 93.3% | +| PP-OCRv3英文超轻量检测预训练模型 + fine-tune | PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量 | 93.30% | *结论* diff --git "a/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.md" "b/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.md" index af7cc96b70410c614ef39e91c229d705c8bd400a..d61514ff2d65703aadfb81b8dc6f52167ab953fe 100644 --- "a/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.md" +++ "b/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.md" @@ -34,7 +34,7 @@ ![](https://ai-studio-static-online.cdn.bcebos.com/5ffff2093a144a6993a75eef71634a52276015ee43a04566b9c89d353198c746) -当前的表格识别算法不能很好的处理这些场景下的表格图像。在本例中,我们使用PP-Structurev2最新发布的表格识别模型SLANet来演示如何进行中文表格是识别。同时,为了方便作业流程,我们使用表格属性识别模型对表格图像的属性进行识别,对表格的难易程度进行判断,加快人工进行校对速度。 +当前的表格识别算法不能很好的处理这些场景下的表格图像。在本例中,我们使用PP-StructureV2最新发布的表格识别模型SLANet来演示如何进行中文表格是识别。同时,为了方便作业流程,我们使用表格属性识别模型对表格图像的属性进行识别,对表格的难易程度进行判断,加快人工进行校对速度。 本项目AI Studio链接:https://aistudio.baidu.com/aistudio/projectdetail/4588067 @@ -192,14 +192,14 @@ plt.show() ### 2.3 训练 -这里选用PP-Structurev2中的表格识别模型[SLANet](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/SLANet.yml) +这里选用PP-StructureV2中的表格识别模型[SLANet](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/SLANet.yml) -SLANet是PP-Structurev2全新推出的表格识别模型,相比PP-Structurev1中TableRec-RARE,在速度不变的情况下精度提升4.7%。TEDS提升2% +SLANet是PP-StructureV2全新推出的表格识别模型,相比PP-StructureV1中TableRec-RARE,在速度不变的情况下精度提升4.7%。TEDS提升2% |算法|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed| | --- | --- | --- | ---| -| EDD[2] |x| 88.3% |x| +| EDD[2] |x| 88.30% |x| | TableRec-RARE(ours) | 71.73%| 93.88% |779ms| | SLANet(ours) | 76.31%| 95.89%|766ms| diff --git "a/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.md" "b/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.md" index 2a35cb170ef165faa6566e0059cca8364b7a6da6..25e32cfadc1f7d6bebce92bd3b3e65bcc04bd839 100644 --- "a/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.md" +++ "b/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.md" @@ -182,15 +182,15 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置P | ID | 策略 | 模型大小 | 精度 | 预测耗时(CPU + MKLDNN)| |-----|-----|--------|----| --- | -| 01 | PP-OCRv2 | 8M | 74.8% | 8.54ms | -| 02 | SVTR_Tiny | 21M | 80.1% | 97ms | -| 03 | SVTR_LCNet(h32) | 12M | 71.9% | 6.6ms | -| 04 | SVTR_LCNet(h48) | 12M | 73.98% | 7.6ms | -| 05 | + GTC | 12M | 75.8% | 7.6ms | -| 06 | + TextConAug | 12M | 76.3% | 7.6ms | -| 07 | + TextRotNet | 12M | 76.9% | 7.6ms | -| 08 | + UDML | 12M | 78.4% | 7.6ms | -| 09 | + UIM | 12M | 79.4% | 7.6ms | +| 01 | PP-OCRv2 | 8M | 74.80% | 8.54ms | +| 02 | SVTR_Tiny | 21M | 80.10% | 97.00ms | +| 03 | SVTR_LCNet(h32) | 12M | 71.90% | 6.60ms | +| 04 | SVTR_LCNet(h48) | 12M | 73.98% | 7.60ms | +| 05 | + GTC | 12M | 75.80% | 7.60ms | +| 06 | + TextConAug | 12M | 76.30% | 7.60ms | +| 07 | + TextRotNet | 12M | 76.90% | 7.60ms | +| 08 | + UDML | 12M | 78.40% | 7.60ms | +| 09 | + UIM | 12M | 79.40% | 7.60ms | ### 3.3 开始训练 diff --git "a/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.md" "b/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.md" index fce9ea772eed6575de10f50c0ff447aa1aee928b..702561cef62434ea98e78659b2eddb373eedef4a 100644 --- "a/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.md" +++ "b/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.md" @@ -30,9 +30,9 @@ | 任务 | 训练数据数量 | 精度 | | -------- | - | -------- | -| 印章检测 | 1000 | 95% | -| 印章文字识别-端对端OCR方法 | 700 | 47% | -| 印章文字识别-两阶段OCR方法 | 700 | 55% | +| 印章检测 | 1000 | 95.00% | +| 印章文字识别-端对端OCR方法 | 700 | 47.00% | +| 印章文字识别-两阶段OCR方法 | 700 | 55.00% | 点击进入 [AI Studio 项目](https://aistudio.baidu.com/aistudio/projectdetail/4586113) diff --git "a/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.md" "b/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.md" index 82f5b8d48600c6bebb4d3183ee801305d305d531..b8a8ee2160580828400505206b0e54e819852290 100644 --- "a/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.md" +++ "b/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.md" @@ -145,8 +145,8 @@ LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。 | 模型 | 迭代轮数 | Hmean | | :---: | :---: | :---: | -| LayoutXLM | 50 | 100% | -| VI-LayoutXLM | 50 | 100% | +| LayoutXLM | 50 | 100.00% | +| VI-LayoutXLM | 50 | 100.00% | 可以看出,由于当前数据量较少,场景比较简单,因此2个模型的Hmean均达到了100%。 @@ -274,8 +274,8 @@ LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。 | 模型 | 迭代轮数 | Hmean | | :---: | :---: | :---: | -| LayoutXLM | 50 | 98.0% | -| VI-LayoutXLM | 50 | 99.3% | +| LayoutXLM | 50 | 98.00% | +| VI-LayoutXLM | 50 | 99.30% | 可以看出,对于VI-LayoutXLM相比LayoutXLM的Hmean高了1.3%。 diff --git "a/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.md" "b/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.md" index ff2fb2cb4812f4f8366605b3c26af5b9aaaa290e..f70fa06d839c720bc6fcf4cd6458bcf960ae4a52 100644 --- "a/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.md" +++ "b/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.md" @@ -110,7 +110,7 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Globa | | 方案 |hmeans| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.5%| +| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%| #### 4.3.2 预训练模型直接finetune ##### 修改配置文件 @@ -143,8 +143,8 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Globa 结果如下: | | 方案 |hmeans| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.5%| -| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.2%| +| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%| +| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%| #### 4.3.3 基于预训练模型Finetune_student模型 @@ -175,9 +175,9 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o G 结果如下: | | 方案 |hmeans| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.5%| -| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.2%| -| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.0%| +| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%| +| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%| +| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%| #### 4.3.4 基于预训练模型Finetune_teacher模型 @@ -233,10 +233,10 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Globa 结果如下: | | 方案 |hmeans| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.5%| -| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.2%| -| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.0%| -| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.8%| +| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%| +| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%| +| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%| +| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.80%| #### 4.3.5 采用CML蒸馏进一步提升student模型精度 @@ -294,11 +294,11 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Globa 结果如下: | | 方案 |hmeans| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.5%| -| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.2%| -| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.0%| -| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.8%| -| 4 | 基于2和3训练好的模型fintune |82.7%| +| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%| +| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%| +| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%| +| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.80%| +| 4 | 基于2和3训练好的模型fintune |82.70%| 如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
@@ -445,7 +445,7 @@ python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o 结果如下: | | 方案 |accuracy| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.4%| +| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.40%| #### 开始训练 我们使用上面修改好的配置文件configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。 @@ -465,8 +465,8 @@ python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o 结果如下: | | 方案 |accuracy| |---|---------------------------|---| -| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.4%| -| 1 | PP-OCRv3中英文超轻量识别预训练模型finetune |82.2%| +| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.40%| +| 1 | PP-OCRv3中英文超轻量识别预训练模型finetune |82.20%| 如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
diff --git "a/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.md" "b/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.md" index 1a63091b9289ba51bd2dd4de6ee51264cdb2bc79..c9b76ee61cf22c68746ba1f9027144df80f1d2ee 100644 --- "a/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.md" +++ "b/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.md" @@ -329,7 +329,7 @@ python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \ |方案|hmeans| |---|---| |PP-OCRv3中英文超轻量检测预训练模型直接预测|76.12%| -|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99%| +|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%| 可以看到进行fine-tune能显著提升车牌检测的效果。 @@ -357,8 +357,8 @@ python3.7 deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv3/ch_PP-OCR |方案|hmeans| 模型大小 | 预测速度(lite) | |---|---|------|------------| -|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99%| 2.5M | 223ms | -|PP-OCRv3中英文超轻量检测预训练模型 fine-tune+量化|98.91%| 1M | 189ms | +|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%| 2.5M | 223ms | +|PP-OCRv3中英文超轻量检测预训练模型 fine-tune+量化|98.91%| 1.0M | 189ms | 可以看到通过量化训练在精度几乎无损的情况下,降低模型体积60%并且推理速度提升15%。 @@ -492,7 +492,7 @@ text = text.replace('·','') |方案|acc| |---|---| -|PP-OCRv3中英文超轻量识别预训练模型直接预测|0.2%| +|PP-OCRv3中英文超轻量识别预训练模型直接预测|0.20%| |PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`|90.97%| 可以看到,去掉多余的`·`能大幅提高精度。 @@ -547,7 +547,7 @@ python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \ |方案| acc | |---|--------| -|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0% | +|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0.00% | |PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`| 90.97% | |PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | @@ -578,7 +578,7 @@ python3.7 deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_ |方案| acc | 模型大小 | 预测速度(lite) | |---|--------|-------|------------| |PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | 10.3M | 4.2ms | -|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.4% | 4.8M | 1.8ms | +|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.40% | 4.8M | 1.8ms | 可以看到量化后能降低模型体积53%并且推理速度提升57%,但是由于识别数据过少,量化带来了1%的精度下降。 @@ -738,7 +738,7 @@ fmeasure: 87.36% |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型| 0.04% | |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`| 78.27% | |PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune| 87.14% | -|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化| 88% | +|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化| 88.00% | 从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到78.27%,在CCPD数据集上进行 fine-tune 后指标进一步提升到87.14%, 在经过量化训练之后,由于检测模型的recall变高,指标进一步提升到88%。但是这个结果仍旧不符合检测模型+识别模型的真实性能(99%*94%=93%),因此我们需要对 base case 进行具体分析。 @@ -763,8 +763,8 @@ if len(txt) != 8: # 车牌字符串长度为8 |---|---|---|---|---|---| |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型|0.04%|0.08%|0.02%|0.05%|0.00%(A)| |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`|78.27%|90.84%|78.61%|79.43%|91.66%(A+B+C)| -|PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune|87.14%|90.40%|87.66%|89.98|92.5%(A+B+C)| -|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|88%|90.54%|88.5%|89.46%|92.02%(A+B+C)| +|PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune|87.14%|90.40%|87.66%|89.98%|92.50%(A+B+C)| +|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|88.00%|90.54%|88.50%|89.46%|92.02%(A+B+C)| 从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到91.66%,在CCPD数据集上进行 fine-tune 后指标进一步提升到92.5%, 在经过量化训练之后,指标变为92.02%。 @@ -800,17 +800,17 @@ python tools/infer/predict_system.py \ |方案|hmeans| 模型大小 | 预测速度(lite) | |---|---|------|------------| |PP-OCRv3中英文超轻量检测预训练模型直接预测|76.12%|2.5M| 233ms | -|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99%| 2.5M | 233ms | -|PP-OCRv3中英文超轻量检测预训练模型 fine-tune + 量化|98.91%| 1M | 189ms |fine-tune +|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%| 2.5M | 233ms | +|PP-OCRv3中英文超轻量检测预训练模型 fine-tune + 量化|98.91%| 1.0M | 189ms |fine-tune - 识别 |方案| acc | 模型大小 | 预测速度(lite) | |---|--------|-------|------------| -|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0% |10.3M| 4.2ms | +|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0.00% |10.3M| 4.2ms | |PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`| 90.97% |10.3M| 4.2ms | -|PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | 10.3M | 4,2ms | -|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.4% | 4.8M | 1.8ms | +|PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | 10.3M | 4.2ms | +|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.40% | 4.8M | 1.8ms | - 端到端指标如下: @@ -819,8 +819,8 @@ python tools/infer/predict_system.py \ |---|---|---|---| |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型|0.08%|12.8M|298ms| |PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`|91.66%|12.8M|298ms| -|PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune|92.5%|12.8M|298ms| -|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|92.02%|5.8M|224ms| +|PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune|92.50%|12.8M|298ms| +|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|92.02%|5.80M|224ms| **结论** diff --git "a/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.md" "b/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.md" index 4e71e23300ccc14d24627458c0852776e0adeae3..b233855f4c7a78e384b1248f89b7f7b5e9f66650 100644 --- "a/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.md" +++ "b/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.md" @@ -13,9 +13,9 @@ PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库,其中 |中文识别算法|模型|UIM|精度| | --- | --- | --- |--- | -|PP-OCRv3|SVTR_LCNet| w/o |78.4%| -|PP-OCRv3|SVTR_LCNet| w |79.4%| -|SVTR|SVTR-Tiny|-|82.5%| +|PP-OCRv3|SVTR_LCNet| w/o |78.40%| +|PP-OCRv3|SVTR_LCNet| w |79.40%| +|SVTR|SVTR-Tiny|-|82.50%| aistudio项目链接: [高精度中文场景文本识别模型SVTR](https://aistudio.baidu.com/aistudio/projectdetail/4263032) diff --git a/doc/doc_ch/PP-OCRv3_introduction.md b/doc/doc_ch/PP-OCRv3_introduction.md index ddeb78d74fb92991b9a8da752fb62850ae41102d..446af23e4ee36754cc81b7f096ce75ef7160de68 100644 --- a/doc/doc_ch/PP-OCRv3_introduction.md +++ b/doc/doc_ch/PP-OCRv3_introduction.md @@ -53,13 +53,13 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p |序号|策略|模型大小|hmean|速度(cpu + mkldnn)| |-|-|-|-|-| -|baseline teacher|PP-OCR server|49M|83.2%|171ms| -|teacher1|DB-R50-LK-PAN|124M|85.0%|396ms| -|teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms| -|baseline student|PP-OCRv2|3M|83.2%|117ms| -|student0|DB-MV3-RSE-FPN|3.6M|84.5%|124ms| -|student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms| -|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms| +|baseline teacher|PP-OCR server|49.0M|83.20%|171ms| +|teacher1|DB-R50-LK-PAN|124.0M|85.00%|396ms| +|teacher2|DB-R50-LK-PAN-DML|124.0M|86.00%|396ms| +|baseline student|PP-OCRv2|3.0M|83.20%|117ms| +|student0|DB-MV3-RSE-FPN|3.6M|84.50%|124ms| +|student1|DB-MV3-CML(teacher2)|3.0M|84.30%|117ms| +|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.60M|85.40%|124ms| 测试环境: Intel Gold 6148 CPU,预测时开启MKLDNN加速。 @@ -101,15 +101,15 @@ PP-OCRv3的识别模块是基于文本识别算法[SVTR](https://arxiv.org/abs/2 | ID | 策略 | 模型大小 | 精度 | 预测耗时(CPU + MKLDNN)| |-----|-----|--------|----| --- | -| 01 | PP-OCRv2 | 8M | 74.8% | 8.54ms | -| 02 | SVTR_Tiny | 21M | 80.1% | 97ms | -| 03 | SVTR_LCNet(h32) | 12M | 71.9% | 6.6ms | -| 04 | SVTR_LCNet(h48) | 12M | 73.98% | 7.6ms | -| 05 | + GTC | 12M | 75.8% | 7.6ms | -| 06 | + TextConAug | 12M | 76.3% | 7.6ms | -| 07 | + TextRotNet | 12M | 76.9% | 7.6ms | -| 08 | + UDML | 12M | 78.4% | 7.6ms | -| 09 | + UIM | 12M | 79.4% | 7.6ms | +| 01 | PP-OCRv2 | 8.0M | 74.80% | 8.54ms | +| 02 | SVTR_Tiny | 21.0M | 80.10% | 97.00ms | +| 03 | SVTR_LCNet(h32) | 12.0M | 71.90% | 6.60ms | +| 04 | SVTR_LCNet(h48) | 12.0M | 73.98% | 7.60ms | +| 05 | + GTC | 12.0M | 75.80% | 7.60ms | +| 06 | + TextConAug | 12.0M | 76.30% | 7.60ms | +| 07 | + TextRotNet | 12.0M | 76.90% | 7.60ms | +| 08 | + UDML | 12.0M | 78.40% | 7.60ms | +| 09 | + UIM | 12.0M | 79.40% | 7.60ms | 注: 测试速度时,实验01-03输入图片尺寸均为(3,32,320),04-08输入图片尺寸均为(3,48,320)。在实际预测时,图像为变长输入,速度会有所变化。测试环境: Intel Gold 6148 CPU,预测时开启MKLDNN加速。 @@ -144,12 +144,12 @@ SVTR_Tiny 网络结构如下所示: | ID | 策略 | 模型大小 | 精度 | 速度(CPU + MKLDNN)| |-----|-----|--------|----| --- | -| 01 | PP-OCRv2-baseline | 8M | 69.3% | 8.54ms | -| 02 | SVTR_Tiny | 21M | 80.1% | 97ms | -| 03 | SVTR_LCNet(G4) | 9.2M | 76% | 30ms | -| 04 | SVTR_LCNet(G2) | 13M | 72.98% | 9.37ms | -| 05 | SVTR_LCNet(h32) | 12M | 71.9% | 6.6ms | -| 06 | SVTR_LCNet(h48) | 12M | 73.98% | 7.6ms | +| 01 | PP-OCRv2-baseline | 8.0M | 69.30% | 8.54ms | +| 02 | SVTR_Tiny | 21.0M | 80.10% | 97.00ms | +| 03 | SVTR_LCNet(G4) | 9.2M | 76.00% | 30.00ms | +| 04 | SVTR_LCNet(G2) | 13.0M | 72.98% | 9.37ms | +| 05 | SVTR_LCNet(h32) | 12.0M | 71.90% | 6.60ms | +| 06 | SVTR_LCNet(h48) | 12.0M | 73.98% | 7.60ms | 注: 测试速度时,01-05输入图片尺寸均为(3,32,320); PP-OCRv2-baseline 代表没有借助蒸馏方法训练得到的模型 @@ -199,10 +199,10 @@ UIM(Unlabeled Images Mining)是一种非常简单的无标注数据挖掘方 | Model | Hmean | Model Size (M) | Time Cost (CPU, ms) | Time Cost (T4 GPU, ms) | |-----|-----|--------|----| --- | -| PP-OCR mobile | 50.3% | 8.1 | 356 | 116 | -| PP-OCR server | 57.0% | 155.1 | 1056 | 200 | -| PP-OCRv2 | 57.6% | 11.6 | 330 | 111 | -| PP-OCRv3 | 62.9% | 15.6 | 331 | 86.64 | +| PP-OCR mobile | 50.30% | 8.1 | 356.00 | 116.00 | +| PP-OCR server | 57.00% | 155.1 | 1056.00 | 200.00 | +| PP-OCRv2 | 57.60% | 11.6 | 330.00 | 111.00 | +| PP-OCRv3 | 62.90% | 15.6 | 331.00 | 86.64 | 测试环境:CPU型号为Intel Gold 6148,CPU预测时开启MKLDNN加速。 @@ -218,5 +218,5 @@ UIM(Unlabeled Images Mining)是一种非常简单的无标注数据挖掘方 | Model | 拉丁语系 | 阿拉伯语系 | 日语 | 韩语 | |-----|-----|--------|----| --- | -| PP-OCR_mul | 69.6% | 40.5% | 38.5% | 55.4% | -| PP-OCRv3_mul | 75.2%| 45.37% | 45.8% | 60.1% | +| PP-OCR_mul | 69.60% | 40.50% | 38.50% | 55.40% | +| PP-OCRv3_mul | 75.20%| 45.37% | 45.80% | 60.10% | diff --git a/doc/doc_ch/algorithm_det_east.md b/doc/doc_ch/algorithm_det_east.md index b89018e3468aa7772af69da469e81c16e9d43dc9..94a0d097d803cf5a74461be8faaadcabbd28938d 100644 --- a/doc/doc_ch/algorithm_det_east.md +++ b/doc/doc_ch/algorithm_det_east.md @@ -27,7 +27,7 @@ |模型|骨干网络|配置文件|precision|recall|Hmean|下载链接| | --- | --- | --- | --- | --- | --- | --- | |EAST|ResNet50_vd|88.71%| 81.36%| 84.88%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| -|EAST| MobileNetV3| 78.2%| 79.1%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| +|EAST| MobileNetV3| 78.20%| 79.10%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| diff --git a/doc/doc_ch/algorithm_e2e_pgnet.md b/doc/doc_ch/algorithm_e2e_pgnet.md index 83c1114e58a69355dadfa91902e576b552e8dcab..934328106c988e494637f91b5ea933f7cd56a025 100644 --- a/doc/doc_ch/algorithm_e2e_pgnet.md +++ b/doc/doc_ch/algorithm_e2e_pgnet.md @@ -34,7 +34,7 @@ PGNet算法细节详见[论文](https://www.aaai.org/AAAI21Papers/AAAI-2885.Wang |PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|下载| | --- | --- | --- | --- | --- | --- | --- | --- | --- | -|Paper|85.30|86.80|86.1|-|-|61.7|38.20 (size=640)|-| +|Paper|85.30|86.80|86.10|-|-|61.70|38.20 (size=640)|-| |Ours|87.03|82.48|84.69|61.71|58.43|60.03|48.73 (size=768)|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/en_server_pgnetA.tar)| *note:PaddleOCR里的PGNet实现针对预测速度做了优化,在精度下降可接受范围内,可以显著提升端对端预测速度* diff --git a/doc/doc_ch/algorithm_kie_sdmgr.md b/doc/doc_ch/algorithm_kie_sdmgr.md index 10f3ca063596942618466723ed69a9047e9c828d..86b44f6d4a27fef0e684a5fb899a1a60cc62bc5d 100644 --- a/doc/doc_ch/algorithm_kie_sdmgr.md +++ b/doc/doc_ch/algorithm_kie_sdmgr.md @@ -33,7 +33,7 @@ |模型|骨干网络|配置文件|hmean|下载链接| | --- | --- | --- | --- | --- | -|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[训练模型]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[推理模型(coming soon)]()| +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[训练模型]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[推理模型(coming soon)]()| diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md index 7de581c27e616ebba5a33bdad125f7ed3df9f489..7f6919c13aad833d8e3fda960bdc172c5fec6c7b 100755 --- a/doc/doc_ch/algorithm_overview.md +++ b/doc/doc_ch/algorithm_overview.md @@ -36,7 +36,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广 |模型|骨干网络|precision|recall|Hmean|下载链接| | --- | --- | --- | --- | --- | --- | |EAST|ResNet50_vd|88.71%|81.36%|84.88%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| -|EAST|MobileNetV3|78.2%|79.1%|78.65%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| +|EAST|MobileNetV3|78.20%|79.10%|78.65%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| |DB|ResNet50_vd|86.41%|78.72%|82.38%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| |DB|MobileNetV3|77.29%|73.08%|75.12%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| @@ -143,7 +143,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广 |模型|骨干网络|配置文件|hmean|下载链接| | --- | --- | --- | --- | --- | -|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| 在XFUND_zh公开数据集上,算法效果如下: diff --git a/doc/doc_ch/algorithm_rec_can.md b/doc/doc_ch/algorithm_rec_can.md index e4f4ba6f3a7e13d5d8baf1ce6b38a6f98681fb53..816a255efa5a5ba310df9196834f1715d98f2ca2 100644 --- a/doc/doc_ch/algorithm_rec_can.md +++ b/doc/doc_ch/algorithm_rec_can.md @@ -27,7 +27,7 @@ |模型 |骨干网络|配置文件|ExpRate|下载链接| | ----- | ----- | ----- | ----- | ----- | -|CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72|[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)| +|CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)| ## 2. 环境配置 @@ -60,7 +60,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs python3 tools/train.py -c configs/rec/rec_d28_can.yml -o Train.dataset.transforms.GrayImageChannelFormat.inverse=False ``` -- 默认每训练1个epoch(1105次iteration)进行1次评估,若您更改训练的batch_size,或更换数据集,请在训练时作出如下修改 +- 默认每训练1个epoch(1,105次iteration)进行1次评估,若您更改训练的batch_size,或更换数据集,请在训练时作出如下修改 ``` python3 tools/train.py -c configs/rec/rec_d28_can.yml -o Global.eval_batch_step=[0, {length_of_dataset//batch_size}] diff --git a/doc/doc_ch/algorithm_rec_rare.md b/doc/doc_ch/algorithm_rec_rare.md index dddd27ef9880fc7f8735f32fa85a406a57a77c8e..9476c2e69a847c494ff82a9bb398d80f8bdeb444 100644 --- a/doc/doc_ch/algorithm_rec_rare.md +++ b/doc/doc_ch/algorithm_rec_rare.md @@ -25,8 +25,8 @@ |模型|骨干网络|配置文件|Avg Accuracy|下载链接| | --- | --- | --- | --- | --- | -|RARE|Resnet34_vd|[configs/rec/rec_r34_vd_tps_bilstm_att.yml](../../configs/rec/rec_r34_vd_tps_bilstm_att.yml)|83.6%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| -|RARE|MobileNetV3|[configs/rec/rec_mv3_tps_bilstm_att.yml](../../configs/rec/rec_mv3_tps_bilstm_att.yml)|82.5%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)| +|RARE|Resnet34_vd|[configs/rec/rec_r34_vd_tps_bilstm_att.yml](../../configs/rec/rec_r34_vd_tps_bilstm_att.yml)|83.60%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| +|RARE|MobileNetV3|[configs/rec/rec_mv3_tps_bilstm_att.yml](../../configs/rec/rec_mv3_tps_bilstm_att.yml)|82.50%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)| diff --git a/doc/doc_ch/algorithm_rec_seed.md b/doc/doc_ch/algorithm_rec_seed.md index 710e92272dc3169bf373d273534441a15c6be01c..6d59c9fee27cc45afa329241458d597b2eae9f7e 100644 --- a/doc/doc_ch/algorithm_rec_seed.md +++ b/doc/doc_ch/algorithm_rec_seed.md @@ -27,7 +27,7 @@ |模型|骨干网络|Avg Accuracy|配置文件|下载链接| |---|---|---|---|---| -|SEED|Aster_Resnet| 85.2% | [configs/rec/rec_resnet_stn_bilstm_att.yml](../../configs/rec/rec_resnet_stn_bilstm_att.yml) | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | +|SEED|Aster_Resnet| 85.20% | [configs/rec/rec_resnet_stn_bilstm_att.yml](../../configs/rec/rec_resnet_stn_bilstm_att.yml) | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | ## 2. 环境配置 diff --git a/doc/doc_ch/algorithm_rec_spin.md b/doc/doc_ch/algorithm_rec_spin.md index 908a85a417c4070b95630b37b0830e08aae3ff4f..2b9c04abca3f02cc0fc695388a7dafdb30482257 100644 --- a/doc/doc_ch/algorithm_rec_spin.md +++ b/doc/doc_ch/algorithm_rec_spin.md @@ -26,7 +26,7 @@ SPIN收录于AAAI2020。主要用于OCR识别任务。在任意形状文本识 |模型|骨干网络|配置文件|Acc|下载链接| | --- | --- | --- | --- | --- | -|SPIN|ResNet32|[rec_r32_gaspin_bilstm_att.yml](../../configs/rec/rec_r32_gaspin_bilstm_att.yml)|90.0%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar)| +|SPIN|ResNet32|[rec_r32_gaspin_bilstm_att.yml](../../configs/rec/rec_r32_gaspin_bilstm_att.yml)|90.00%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar)| diff --git a/doc/doc_ch/algorithm_rec_visionlan.md b/doc/doc_ch/algorithm_rec_visionlan.md index b4474c29f8596197fb536f07fa96b9926e5b20f4..eb58942c24e7233f937735eefd82728e90cd3211 100644 --- a/doc/doc_ch/algorithm_rec_visionlan.md +++ b/doc/doc_ch/algorithm_rec_visionlan.md @@ -27,7 +27,7 @@ |模型|骨干网络|配置文件|Acc|下载链接| | --- | --- | --- | --- | --- | -|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)| +|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.30%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)| ## 2. 环境配置 diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index c36cac037e1f90a41da24bc64cacbbb860e04c6b..c6cbd6873f776c2b8eab49be496fa847929d85a0 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -42,12 +42,12 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | |ch_PP-OCRv3_det_slim|【最新】slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 1.1M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)| -|ch_PP-OCRv3_det| 【最新】原始超轻量模型,支持中英文、多语种文本检测 |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)| -|ch_PP-OCRv2_det_slim| slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| -|ch_PP-OCRv2_det| 原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| -|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| -|ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| -|ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| +|ch_PP-OCRv3_det| 【最新】原始超轻量模型,支持中英文、多语种文本检测 |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.80M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim| slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3.0M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det| 原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| +|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.60M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| +|ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3.0M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| +|ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47.0M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -83,10 +83,10 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 | --- | --- | --- | --- | --- | |ch_PP-OCRv3_rec_slim |【最新】slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) | |ch_PP-OCRv3_rec|【最新】原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 12.4M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) | -|ch_PP-OCRv2_rec_slim| slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | -|ch_PP-OCRv2_rec| 原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | -|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | -|ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | +|ch_PP-OCRv2_rec_slim| slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9.0M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec| 原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.50M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | +|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6.0M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | +|ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.20M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | **说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。 @@ -107,9 +107,9 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|字典文件|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- |--- | --- | -| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |韩文识别|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) | -| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |日文识别|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) | -| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | 中文繁体识别|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) | +| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |韩文识别|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) | +| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |日文识别|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) | +| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | 中文繁体识别|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12.0M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) | | te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | 泰卢固文识别|[te_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/te_PP-OCRv3_rec.yml)|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) | | ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt |卡纳达文识别|[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) | | ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |泰米尔文识别|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) | @@ -140,9 +140,9 @@ Paddle-Lite 是一个高性能、轻量级、灵活性强且易于扩展的深 |模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本| |---|---|---|---|---|---|---| -|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10| +|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11.0M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10| |PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.6M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10| -|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| +|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11.0M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| |PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9| |V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md index 815ad9b0e5a7ff2dec36ceaef995212d122a9f89..8d5a36edf784b847973499e7631c04459555ae77 100644 --- a/doc/doc_en/PP-OCRv3_introduction_en.md +++ b/doc/doc_en/PP-OCRv3_introduction_en.md @@ -55,13 +55,13 @@ The ablation experiments are as follows: |ID|Strategy|Model Size|Hmean|The Inference Time(cpu + mkldnn)| |-|-|-|-|-| -|baseline teacher|PP-OCR server|49M|83.2%|171ms| -|teacher1|DB-R50-LK-PAN|124M|85.0%|396ms| -|teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms| -|baseline student|PP-OCRv2|3M|83.2%|117ms| -|student0|DB-MV3-RSE-FPN|3.6M|84.5%|124ms| -|student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms| -|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms| +|baseline teacher|PP-OCR server|49.0M|83.20%|171ms| +|teacher1|DB-R50-LK-PAN|124.0M|85.00%|396ms| +|teacher2|DB-R50-LK-PAN-DML|124.0M|86.00%|396ms| +|baseline student|PP-OCRv2|3.0M|83.20%|117ms| +|student0|DB-MV3-RSE-FPN|3.6M|84.50%|124ms| +|student1|DB-MV3-CML(teacher2)|3.0M|84.30%|117ms| +|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.40%|124ms| Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference. @@ -111,15 +111,15 @@ Based on the above strategy, compared with PP-OCRv2, the PP-OCRv3 recognition mo | ID | strategy | Model size | accuracy | prediction speed(CPU + MKLDNN)| |-----|-----|--------|----| --- | -| 01 | PP-OCRv2 | 8M | 74.8% | 8.54ms | -| 02 | SVTR_Tiny | 21M | 80.1% | 97ms | -| 03 | SVTR_LCNet(h32) | 12M | 71.9% | 6.6ms | -| 04 | SVTR_LCNet(h48) | 12M | 73.98% | 7.6ms | -| 05 | + GTC | 12M | 75.8% | 7.6ms | -| 06 | + TextConAug | 12M | 76.3% | 7.6ms | -| 07 | + TextRotNet | 12M | 76.9% | 7.6ms | -| 08 | + UDML | 12M | 78.4% | 7.6ms | -| 09 | + UIM | 12M | 79.4% | 7.6ms | +| 01 | PP-OCRv2 | 8.0M | 74.80% | 8.54ms | +| 02 | SVTR_Tiny | 21.0M | 80.10% | 97.00ms | +| 03 | SVTR_LCNet(h32) | 12.0M | 71.90% | 6.60ms | +| 04 | SVTR_LCNet(h48) | 12.0M | 73.98% | 7.60ms | +| 05 | + GTC | 12.0M | 75.80% | 7.60ms | +| 06 | + TextConAug | 12.0M | 76.30% | 7.60ms | +| 07 | + TextRotNet | 12.0M | 76.90% | 7.60ms | +| 08 | + UDML | 12.0M | 78.40% | 7.60ms | +| 09 | + UIM | 12.0M | 79.40% | 7.60ms | Note: When testing the speed, the input image shape of Experiment 01-03 is (3, 32, 320), and the input image shape of 04-08 is (3, 48, 320). In the actual prediction, the image is a variable-length input, and the speed will vary. Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during prediction. @@ -158,12 +158,12 @@ The ablation experiments are as follows: | ID | strategy | Model size | accuracy | prediction speed(CPU + MKLDNN)| |-----|-----|--------|----| --- | -| 01 | PP-OCRv2-baseline | 8M | 69.3% | 8.54ms | -| 02 | SVTR_Tiny | 21M | 80.1% | 97ms | -| 03 | SVTR_LCNet(G4) | 9.2M | 76% | 30ms | -| 04 | SVTR_LCNet(G2) | 13M | 72.98% | 9.37ms | -| 05 | SVTR_LCNet(h32) | 12M | 71.9% | 6.6ms | -| 06 | SVTR_LCNet(h48) | 12M | 73.98% | 7.6ms | +| 01 | PP-OCRv2-baseline | 8.0M | 69.30% | 8.54ms | +| 02 | SVTR_Tiny | 21.0M | 80.10% | 97.00ms | +| 03 | SVTR_LCNet(G4) | 9.2M | 76.00% | 30.00ms | +| 04 | SVTR_LCNet(G2) | 13.0M | 72.98% | 9.37ms | +| 05 | SVTR_LCNet(h32) | 12.0M | 71.90% | 6.60ms | +| 06 | SVTR_LCNet(h48) | 12.0M | 73.98% | 7.60ms | Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320); PP-OCRv2-baseline represents the model trained without distillation method @@ -210,21 +210,21 @@ UIM (Unlabeled Images Mining) is a very simple unlabeled data mining strategy. T ## 4. End-to-end Evaluation -With the optimization strategies mentioned above, PP-OCRv3 outperforms PP-OCRv2 by 5% in terms of end-to-end Hmean for Chinese scenarios with comparable speed. The specific metrics are shown as follows. +With the optimization strategies mentioned above, PP-OCRv3 outperforms PP-OCRv2 by 5.00% in terms of end-to-end Hmean for Chinese scenarios with comparable speed. The specific metrics are shown as follows. | Model | Hmean | Model Size (M) | Time Cost (CPU, ms) | Time Cost (T4 GPU, ms) | |-----|-----|--------|----| --- | -| PP-OCR mobile | 50.3% | 8.1 | 356 | 116 | -| PP-OCR server | 57.0% | 155.1 | 1056 | 200 | -| PP-OCRv2 | 57.6% | 11.6 | 330 | 111 | -| PP-OCRv3 | 62.9% | 15.6 | 331 | 86.64 | +| PP-OCR mobile | 50.30% | 8.1 | 356.00 | 116.00 | +| PP-OCR server | 57.00% | 155.1 | 1056.00 | 200.00 | +| PP-OCRv2 | 57.60% | 11.6 | 330.00 | 111.00 | +| PP-OCRv3 | 62.90% | 15.6 | 331.00 | 86.64 | Testing environment: - CPU: Intel Gold 6148, and MKLDNN acceleration is enabled during CPU inference. -In addition to Chinese scenarios, the recognition model for English is also optimized with an increasement of 11% for end-to-end Hmean, which is shown as follows. +In addition to Chinese scenarios, the recognition model for English is also optimized with an increasement of 11.00% for end-to-end Hmean, which is shown as follows. | Model | Recall | Precision | Hmean | |-----|-----|--------|----| @@ -235,5 +235,5 @@ At the same time, recognition models for more than 80 language are also upgraded | Model | Latin | Arabic | Japanese | Korean | |-----|-----|--------|----| --- | -| PP-OCR_mul | 69.6% | 40.5% | 38.5% | 55.4% | -| PP-OCRv3_mul | 75.2% | 45.37% | 45.8% | 60.1% | +| PP-OCR_mul | 69.60% | 40.50% | 38.50% | 55.40% | +| PP-OCRv3_mul | 75.20% | 45.37% | 45.80% | 60.10% | diff --git a/doc/doc_en/algorithm_det_east_en.md b/doc/doc_en/algorithm_det_east_en.md index 07c434a9b162d9d373f5f357522cbd752be1afc1..3848464abfd275fd319a24b0d3f6b3522c06c4a2 100644 --- a/doc/doc_en/algorithm_det_east_en.md +++ b/doc/doc_en/algorithm_det_east_en.md @@ -27,7 +27,7 @@ On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|Configuration|Precision|Recall|Hmean|Download| | --- | --- | --- | --- | --- | --- | --- | |EAST|ResNet50_vd|88.71%| 81.36%| 84.88%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| -|EAST| MobileNetV3| 78.2%| 79.1%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| +|EAST| MobileNetV3| 78.20%| 79.10%| 78.65%| [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| diff --git a/doc/doc_en/algorithm_e2e_pgnet_en.md b/doc/doc_en/algorithm_e2e_pgnet_en.md index ab74c57bc3d4d97852641cd708a2dceea5732ba7..ccb5e6c070032eed258325749f8fe405b58ab86e 100644 --- a/doc/doc_en/algorithm_e2e_pgnet_en.md +++ b/doc/doc_en/algorithm_e2e_pgnet_en.md @@ -29,7 +29,7 @@ The results of detection and recognition are as follows: #### Test environment: NVIDIA Tesla V100-SXM2-16GB |PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|download| | --- | --- | --- | --- | --- | --- | --- | --- | --- | -|Paper|85.30|86.80|86.1|-|-|61.7|38.20 (size=640)|-| +|Paper|85.30|86.80|86.10|-|-|61.70|38.20 (size=640)|-| |Ours|87.03|82.48|84.69|61.71|58.43|60.03|48.73 (size=768)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/en_server_pgnetA.tar)| *note:PGNet in PaddleOCR optimizes the prediction speed, and can significantly improve the end-to-end prediction speed within the acceptable range of accuracy reduction* diff --git a/doc/doc_en/algorithm_kie_sdmgr_en.md b/doc/doc_en/algorithm_kie_sdmgr_en.md index 5b12b8c959e830015ffb173626ac5752ee9ecee0..ce52ef135e1aaaf033f57cb368ebf169601d3c3e 100644 --- a/doc/doc_en/algorithm_kie_sdmgr_en.md +++ b/doc/doc_en/algorithm_kie_sdmgr_en.md @@ -26,7 +26,7 @@ On wildreceipt dataset, the algorithm reproduction Hmean is as follows. |Model|Backbone |Cnnfig|Hmean|Download link| | --- | --- | --- | --- | --- | -|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[inference model(coming soon)]()| +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[trained model]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[inference model(coming soon)]()| diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md index 09ff407916751cfa52fb72b14bacf763afbda3a7..309d074ed4fc3cb39e53134d51a07fa07e1be621 100755 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -34,7 +34,7 @@ On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|Precision|Recall|Hmean|Download link| | --- | --- | --- | --- | --- | --- | |EAST|ResNet50_vd|88.71%|81.36%|84.88%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| -|EAST|MobileNetV3|78.2%|79.1%|78.65%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| +|EAST|MobileNetV3|78.20%|79.10%|78.65%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| |DB|ResNet50_vd|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| |DB|MobileNetV3|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| @@ -141,7 +141,7 @@ On wildreceipt dataset, the algorithm result is as follows: |Model|Backbone|Config|Hmean|Download link| | --- | --- | --- | --- | --- | -|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| On XFUND_zh dataset, the algorithm result is as follows: diff --git a/doc/doc_en/algorithm_rec_can_en.md b/doc/doc_en/algorithm_rec_can_en.md index e65bb2aa8d37b316b005c2bd9dbeffa4b7124dcf..5cc7038f668e394c48169e46a83a3f0e1a62a0e1 100644 --- a/doc/doc_en/algorithm_rec_can_en.md +++ b/doc/doc_en/algorithm_rec_can_en.md @@ -25,7 +25,7 @@ Using CROHME handwrittem mathematical expression recognition datasets for traini |Model|Backbone|config|exprate|Download link| | --- | --- | --- | --- | --- | -|CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72|[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)| +|CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72%|[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)| ## 2. Environment diff --git a/doc/doc_en/algorithm_rec_rare_en.md b/doc/doc_en/algorithm_rec_rare_en.md index 3aeb1e3adfdfaaeaf493d8d40c967973f0805cd1..a756ac75b390009a8f99dc7ae4d926922d64788d 100644 --- a/doc/doc_en/algorithm_rec_rare_en.md +++ b/doc/doc_en/algorithm_rec_rare_en.md @@ -25,8 +25,8 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval |Models|Backbone Networks|Configuration Files|Avg Accuracy|Download Links| | --- | --- | --- | --- | --- | -|RARE|Resnet34_vd|[configs/rec/rec_r34_vd_tps_bilstm_att.yml](../../configs/rec/rec_r34_vd_tps_bilstm_att.yml)|83.6%|[training model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| -|RARE|MobileNetV3|[configs/rec/rec_mv3_tps_bilstm_att.yml](../../configs/rec/rec_mv3_tps_bilstm_att.yml)|82.5%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)| +|RARE|Resnet34_vd|[configs/rec/rec_r34_vd_tps_bilstm_att.yml](../../configs/rec/rec_r34_vd_tps_bilstm_att.yml)|83.60%|[training model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| +|RARE|MobileNetV3|[configs/rec/rec_mv3_tps_bilstm_att.yml](../../configs/rec/rec_mv3_tps_bilstm_att.yml)|82.50%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)| diff --git a/doc/doc_en/algorithm_rec_seed_en.md b/doc/doc_en/algorithm_rec_seed_en.md index f8d7ae6d3f34ab8a4f510c88002b22dbce7a10e8..83cadfceac76877cc12cefd598db673b73a81d43 100644 --- a/doc/doc_en/algorithm_rec_seed_en.md +++ b/doc/doc_en/algorithm_rec_seed_en.md @@ -27,7 +27,7 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval |Model|Backbone|ACC|config|Download link| | --- | --- | --- | --- | --- | -|SEED|Aster_Resnet| 85.2% | [configs/rec/rec_resnet_stn_bilstm_att.yml](../../configs/rec/rec_resnet_stn_bilstm_att.yml) | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | +|SEED|Aster_Resnet| 85.20% | [configs/rec/rec_resnet_stn_bilstm_att.yml](../../configs/rec/rec_resnet_stn_bilstm_att.yml) | [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) | ## 2. Environment diff --git a/doc/doc_en/algorithm_rec_spin_en.md b/doc/doc_en/algorithm_rec_spin_en.md index 03f8d8f69986fc5eb14cfdf294fc25fafb06e269..3aea5809762141061b1117982346c21317079e42 100644 --- a/doc/doc_en/algorithm_rec_spin_en.md +++ b/doc/doc_en/algorithm_rec_spin_en.md @@ -25,7 +25,7 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval |Model|Backbone|config|Acc|Download link| | --- | --- | --- | --- | --- | -|SPIN|ResNet32|[rec_r32_gaspin_bilstm_att.yml](../../configs/rec/rec_r32_gaspin_bilstm_att.yml)|90.0%|[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar) | +|SPIN|ResNet32|[rec_r32_gaspin_bilstm_att.yml](../../configs/rec/rec_r32_gaspin_bilstm_att.yml)|90.00%|[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_r32_gaspin_bilstm_att.tar) | diff --git a/doc/doc_en/algorithm_rec_visionlan_en.md b/doc/doc_en/algorithm_rec_visionlan_en.md index f67aa3c622d706a387075b37bd9e493740574cdd..585e8539132df8eecfb8c02d42410eb955ea74bb 100644 --- a/doc/doc_en/algorithm_rec_visionlan_en.md +++ b/doc/doc_en/algorithm_rec_visionlan_en.md @@ -25,7 +25,7 @@ Using MJSynth and SynthText two text recognition datasets for training, and eval |Model|Backbone|config|Acc|Download link| | --- | --- | --- | --- | --- | -|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.3%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)| +|VisionLAN|ResNet45|[rec_r45_visionlan.yml](../../configs/rec/rec_r45_visionlan.yml)|90.30%|[预训练、训练模型](https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar)| ## 2. Environment diff --git a/doc/doc_en/distributed_training_en.md b/doc/doc_en/distributed_training_en.md index a9db354ad46751dc1320b48d68fe8025edb651d3..947fb139bc734a31bcf5df144414ac528cbf3255 100644 --- a/doc/doc_en/distributed_training_en.md +++ b/doc/doc_en/distributed_training_en.md @@ -47,14 +47,14 @@ python3 -m paddle.distributed.launch \ | Model | Configuration | Configuration | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio | |:------:|:-----:|:--------:|:--------:|:--------:|:-----:| -| CRNN | [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | 260k Chinese dataset | 2.50d/66.7% | 1.67d/67.0% | **1.5** | +| CRNN | [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | 260k Chinese dataset | 2.50d/66.70% | 1.67d/67.00% | **1.5** | * We conducted model training on 3x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below. | Model | Configuration | Configuration | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio | |:------:|:-----:|:--------:|:--------:|:--------:|:-----:| -| SLANet | [SLANet.yml](../../configs/table/SLANet.yml) | PubTabNet | 49.8h/76.2% | 19.75h/74.77% | **2.52** | +| SLANet | [SLANet.yml](../../configs/table/SLANet.yml) | PubTabNet | 49.80h/76.20% | 19.75h/74.77% | **2.52** | > Note: when training with 3x8 GPUs, the single card batch size is unchanged compared with the 1x8 GPUs' training process, and the learning rate is multiplied by 2 (if it is multiplied by 3 by default, the accuracy is only 73.42%). @@ -65,4 +65,4 @@ python3 -m paddle.distributed.launch \ | Model | Configuration | Configuration | 8 GPU training time / Accuracy | 4x8 GPU training time / Accuracy | Acceleration ratio | |:------:|:-----:|:--------:|:--------:|:--------:|:-----:| -| SVTR | [ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | PP-OCRv3_rec data | 10d/- | 2.84d/74.0% | **3.5** | +| SVTR | [ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | PP-OCRv3_rec data | 10d/- | 2.84d/74.00% | **3.5** | diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index c52f71dfe4124302b8cb308980a6228a89589bd6..3ec5013cfe425c168d73c7e894b6698e47cbdc5c 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -39,11 +39,11 @@ Relationship of the above models is as follows. | --- | --- | --- | --- | --- | |ch_PP-OCRv3_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.nb)| |ch_PP-OCRv3_det| [New] Original lightweight model, supporting Chinese, English, multilingual text detection |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)| -|ch_PP-OCRv2_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| -|ch_PP-OCRv2_det| [New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| +|ch_PP-OCRv2_det_slim| [New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3.0M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| +|ch_PP-OCRv2_det| [New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| -|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| -|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| +|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3.0M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| +|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47.0M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -77,9 +77,9 @@ Relationship of the above models is as follows. | --- | --- | --- | --- | --- | |ch_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) | |ch_PP-OCRv3_rec| [New] Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 12.4M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) | -|ch_PP-OCRv2_rec_slim| Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec_slim| Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9.0M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | |ch_PP-OCRv2_rec| Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | -|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | +|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6.0M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -101,9 +101,9 @@ Relationship of the above models is as follows. |model name| dict file | description|config|model size|download| | --- | --- | --- |--- | --- | --- | -| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |Lightweight model for Korean recognition|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) | -| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |Lightweight model for Japanese recognition|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) | -| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) | +| korean_PP-OCRv3_rec | ppocr/utils/dict/korean_dict.txt |Lightweight model for Korean recognition|[korean_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/korean_PP-OCRv3_rec.yml)|11.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_train.tar) | +| japan_PP-OCRv3_rec | ppocr/utils/dict/japan_dict.txt |Lightweight model for Japanese recognition|[japan_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/japan_PP-OCRv3_rec.yml)|11.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_train.tar) | +| chinese_cht_PP-OCRv3_rec | ppocr/utils/dict/chinese_cht_dict.txt | Lightweight model for chinese cht|[chinese_cht_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/chinese_cht_PP-OCRv3_rec.yml)|12.0M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_train.tar) | | te_PP-OCRv3_rec | ppocr/utils/dict/te_dict.txt | Lightweight model for Telugu recognition |[te_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/te_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_train.tar) | | ka_PP-OCRv3_rec | ppocr/utils/dict/ka_dict.txt | Lightweight model for Kannada recognition |[ka_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ka_PP-OCRv3_rec.yml)|9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_train.tar) | | ta_PP-OCRv3_rec | ppocr/utils/dict/ta_dict.txt |Lightweight model for Tamil recognition|[ta_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/ta_PP-OCRv3_rec.yml)|9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_train.tar) | @@ -131,9 +131,9 @@ This chapter lists OCR nb models with PP-OCRv2 or earlier versions. You can acce |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |---|---|---|---|---|---|---| -|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10| +|PP-OCRv2|extra-lightweight chinese OCR optimized model|11.0M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_infer_opt.nb)|v2.10| |PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.6M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/lite/ch_PP-OCRv2_rec_slim_opt.nb)|v2.10| -|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| +|PP-OCRv2|extra-lightweight chinese OCR optimized model|11.0M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| |PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9| |V2.0|ppocr_v2.0 extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.0(slim)|ppovr_v2.0 extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| diff --git a/ppstructure/docs/PP-StructureV2_introduction.md b/ppstructure/docs/PP-StructureV2_introduction.md index efaf35f2b5f8299180a7b1c1c7e4eb887323fe63..555fc4560ec79e157f0518a7afbf1ddbf585aee6 100644 --- a/ppstructure/docs/PP-StructureV2_introduction.md +++ b/ppstructure/docs/PP-StructureV2_introduction.md @@ -16,11 +16,11 @@ 现实场景中包含大量的文档图像,它们以图片等非结构化形式存储。基于文档图像的结构化分析与信息抽取对于数据的数字化存储以及产业的数字化转型至关重要。基于该考虑,PaddleOCR自研并发布了PP-Structure智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别、关键信息抽取等文档理解相关任务。 -近期,PaddleOCR团队针对PP-Structurev1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-StructureV2。 +近期,PaddleOCR团队针对PP-StructureV1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-StructureV2。 ## 2. 简介 -PP-StructureV2在PP-Structurev1的基础上进一步改进,主要有以下3个方面升级: +PP-StructureV2在PP-StructureV1的基础上进一步改进,主要有以下3个方面升级: * **系统功能升级** :新增图像矫正和版面复原模块,图像转word/pdf、关键信息抽取能力全覆盖! * **系统性能优化** : @@ -52,7 +52,7 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正 * TB-YX:考虑阅读顺序的文本行排序逻辑 * UDML:联合互学习知识蒸馏策略 -最终,与PP-Structurev1相比: +最终,与PP-StructureV1相比: - 版面分析模型参数量减少95.6%,推理速度提升11倍,精度提升0.4%; - 表格识别预测耗时不变,模型精度提升6%,端到端TEDS提升2%; @@ -74,17 +74,17 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正 ### 4.1 版面分析 -版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-Structurev1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。 +版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-StructureV1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。 在PP-StructureV2中,我们发布基于PP-PicoDet的轻量级版面分析模型,并针对版面分析场景定制图像尺度,同时使用FGD知识蒸馏算法,进一步提升模型精度。最终CPU上`41ms`即可完成版面分析过程(仅包含模型推理时间,数据预处理耗时大约50ms左右)。在公开数据集PubLayNet 上,消融实验如下: | 实验序号 | 策略 | 模型存储(M) | mAP | CPU预测耗时(ms) | |:------:|:------:|:------:|:------:|:------:| -| 1 | PP-YOLOv2(640*640) | 221 | 93.6% | 512 | -| 2 | PP-PicoDet-LCNet2.5x(640*640) | 29.7 | 92.5% |53.2| -| 3 | PP-PicoDet-LCNet2.5x(800*608) | 29.7 | 94.2% |83.1 | -| 4 | PP-PicoDet-LCNet1.0x(800*608) | 9.7 | 93.5% | 41.2| -| 5 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 9.7 | 94% |41.2| +| 1 | PP-YOLOv2(640*640) | 221.0 | 93.60% | 512.00 | +| 2 | PP-PicoDet-LCNet2.5x(640*640) | 29.7 | 92.50% |53.20| +| 3 | PP-PicoDet-LCNet2.5x(800*608) | 29.7 | 94.20% |83.10 | +| 4 | PP-PicoDet-LCNet1.0x(800*608) | 9.7 | 93.50% | 41.20| +| 5 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 9.7 | 94.00% |41.20| * 测试条件 * paddle版本:2.3.0 @@ -94,8 +94,8 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正 | 模型 | mAP | CPU预测耗时 | |-------------------|-----------|------------| -| layoutparser (Detectron2) | 88.98% | 2.9s | -| PP-StructureV2 (PP-PicoDet) | **94%** | 41.2ms | +| layoutparser (Detectron2) | 88.98% | 2.90s | +| PP-StructureV2 (PP-PicoDet) | **94.00%** | 41.20ms | [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)数据集是一个大型的文档图像数据集,包含Text、Title、Tale、Figure、List,共5个类别。数据集中包含335,703张训练集、11,245张验证集和11,405张测试集。训练数据与标注示例图如下所示: @@ -108,7 +108,7 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正 **(1)轻量级版面分析模型PP-PicoDet** -`PP-PicoDet`是PaddleDetection中提出的轻量级目标检测模型,通过使用PP-LCNet骨干网络、CSP-PAN特征融合模块、SimOTA标签分配方法等优化策略,最终在CPU与移动端具有卓越的性能。我们将PP-Structurev1中采用的PP-YOLOv2模型替换为`PP-PicoDet`,同时针对版面分析场景优化预测尺度,从针对目标检测设计的`640*640`调整为更适配文档图像的`800*608`,在`1.0x`配置下,模型精度与PP-YOLOv2相当,CPU平均预测速度可提升11倍。 +`PP-PicoDet`是PaddleDetection中提出的轻量级目标检测模型,通过使用PP-LCNet骨干网络、CSP-PAN特征融合模块、SimOTA标签分配方法等优化策略,最终在CPU与移动端具有卓越的性能。我们将PP-StructureV1中采用的PP-YOLOv2模型替换为`PP-PicoDet`,同时针对版面分析场景优化预测尺度,从针对目标检测设计的`640*640`调整为更适配文档图像的`800*608`,在`1.0x`配置下,模型精度与PP-YOLOv2相当,CPU平均预测速度可提升11倍。 **(1)FGD知识蒸馏** @@ -130,10 +130,10 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 | 实验序号 | 策略 | mAP | |:------:|:------:|:------:| -| 1 | PP-YOLOv2 | 84.7% | -| 2 | PP-PicoDet-LCNet2.5x(800*608) | 87.8% | -| 3 | PP-PicoDet-LCNet1.0x(800*608) | 84.5% | -| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 86.8% | +| 1 | PP-YOLOv2 | 84.70% | +| 2 | PP-PicoDet-LCNet2.5x(800*608) | 87.80% | +| 3 | PP-PicoDet-LCNet1.0x(800*608) | 84.50% | +| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 86.80% | **(2)表格版面分析** @@ -144,10 +144,10 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 | 实验序号 | 策略 | mAP | |:------:|:------:|:------:| -| 1 | PP-YOLOv2 |91.3% | -| 2 | PP-PicoDet-LCNet2.5x(800*608) | 95.9% | -| 3 | PP-PicoDet-LCNet1.0x(800*608) | 95.2% | -| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 95.7% | +| 1 | PP-YOLOv2 |91.30% | +| 2 | PP-PicoDet-LCNet2.5x(800*608) | 95.90% | +| 3 | PP-PicoDet-LCNet1.0x(800*608) | 95.20% | +| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 95.70% | 表格检测效果示意图如下: @@ -157,7 +157,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 ### 4.2 表格识别 -基于深度学习的表格识别算法种类丰富,PP-Structurev1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-StructureV2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示: +基于深度学习的表格识别算法种类丰富,PP-StructureV1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-StructureV2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示:
@@ -170,7 +170,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 |TableRec-RARE| 71.73% | 93.88% |779ms |6.8M| |+PP-LCNet| 74.71% |94.37% |778ms| 8.7M| |+CSP-PAN| 75.68%| 94.72% |708ms| 9.3M| -|+SLAHead| 77.7%|94.85%| 766ms| 9.2M| +|+SLAHead| 77.70%|94.85%| 766ms| 9.2M| |+MergeToken| 76.31%| 95.89%|766ms| 9.2M| * 测试环境 @@ -181,7 +181,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾 |策略|Acc|TEDS|推理速度(CPU+MKLDNN)|模型大小| |---|---|---|---|---| -|TableMaster|77.9%|96.12%|2144ms|253M| +|TableMaster|77.90%|96.12%|2144ms|253.0M| |TableRec-RARE| 71.73% | 93.88% |779ms |6.8M| |SLANet|76.31%| 95.89%|766ms|9.2M| @@ -218,7 +218,7 @@ PP-StructureV2中,我们参考TableMaster中的token处理方法,将`` 除了上述模型策略的升级外,本次升级还开源了中文表格识别模型。在实际应用场景中,表格图像存在着各种各样的倾斜角度(PubTabNet数据集不存在该问题),因此在中文模型中,我们将单元格坐标回归的点数从2个(左上,右下)增加到4个(左上,右上,右下,左下)。在内部测试集上,模型升级前后指标如下: |模型|acc| |---|---| -|TableRec-RARE|44.3%| +|TableRec-RARE|44.30%| |SLANet|59.35%| 可视化结果如下,左为输入图像,右为识别的html表格 @@ -307,8 +307,8 @@ LayoutLMv2以及LayoutXLM中引入视觉骨干网络,用于提取视觉特征 |-----------------|----------|---------|--------| | LayoutLMv2 | 0.76 | 84.20% | - | | VI-LayoutLMv2 | 0.42 | 82.10% | -2.10% | -| LayoutXLM | 1.4 | 89.50% | - | -| VI-LayouXLM | 1.1 | 90.46% | +0.96% | +| LayoutXLM | 1.40 | 89.50% | - | +| VI-LayouXLM | 1.10 | 90.46% | +0.96% | 同时,基于XFUND数据集,VI-LayoutXLM在RE任务上的精度也进一步提升了`1.06%`。 diff --git a/ppstructure/docs/models_list.md b/ppstructure/docs/models_list.md index afed95600f0858b1423a105c4f5bcd3e092211ab..a5b9549a7fe31541fbf40c3a237bbcdaf8171e10 100644 --- a/ppstructure/docs/models_list.md +++ b/ppstructure/docs/models_list.md @@ -13,11 +13,11 @@ |模型名称|模型简介|推理模型大小|下载地址|dict path| | --- | --- | --- | --- | --- | | picodet_lcnet_x1_0_fgd_layout | 基于PicoDet LCNet_x1_0和FGD蒸馏在PubLayNet 数据集训练的英文版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) | -| ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 | +| ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 | | picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的中文版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) | | picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,支持中英文文档表格区域的检测 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) | -| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 | -| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 | +| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 | +| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 | @@ -54,9 +54,9 @@ |re_VI-LayoutXLM_xfund_zh|基于VI-LayoutXLM在xfund中文数据集上训练的RE模型|1.1G| 83.92% | 15.49 |[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar) | |ser_LayoutXLM_xfund_zh|基于LayoutXLM在xfund中文数据集上训练的SER模型|1.4G| 90.38% | 19.49 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | |re_LayoutXLM_xfund_zh|基于LayoutXLM在xfund中文数据集上训练的RE模型|1.4G| 74.83% | 19.49 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | -|ser_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfund中文数据集上训练的SER模型|778M| 85.44% | 31.46 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | -|re_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M| 67.77% | 31.46 |[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | -|ser_LayoutLM_xfund_zh|基于LayoutLM在xfund中文数据集上训练的SER模型|430M| 77.31% | - |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | +|ser_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfund中文数据集上训练的SER模型|778.0M| 85.44% | 31.46 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | +|re_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765.0M| 67.77% | 31.46 |[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | +|ser_LayoutLM_xfund_zh|基于LayoutLM在xfund中文数据集上训练的SER模型|430.0M| 77.31% | - |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | * 注:上述预测耗时信息仅包含了inference模型的推理耗时,没有统计预处理与后处理耗时,测试环境为`V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`。 @@ -65,4 +65,4 @@ |模型名称|模型简介|模型大小|精度|下载地址| | --- | --- | --- |--- | --- | -|SDMGR|关键信息提取模型|78M| 86.70% | [推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| +|SDMGR|关键信息提取模型|78.0M| 86.70% | [推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md index 291d42f995fdd7fabc293a0e4df35c2249945fd2..5908f45e82c3536f7c513f0f42959f664e8c4e8e 100644 --- a/ppstructure/docs/models_list_en.md +++ b/ppstructure/docs/models_list_en.md @@ -13,11 +13,11 @@ |model name| description | inference model size |download|dict path| | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | --- | | picodet_lcnet_x1_0_fgd_layout | The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) | -| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | same as above | -| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) | -| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) | -| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above | -| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above | +| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221.0M | [inference_moel](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | same as above | +| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.70M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) | +| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents | 9.70M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) | +| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221.0M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above | +| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221.0M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above | ## 2. OCR and Table Recognition @@ -63,4 +63,4 @@ On wildreceipt dataset, the algorithm result is as follows: |Model|Backbone|Config|Hmean|Download link| | --- | --- | --- | --- | --- | -|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| +|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| diff --git a/ppstructure/table/README.md b/ppstructure/table/README.md index cebbd1ccafbde0aee7fa9f50398682a86cb1c8dd..bacd9ff5d2d7be384bcfaf26ee37f15e213bbecf 100644 --- a/ppstructure/table/README.md +++ b/ppstructure/table/README.md @@ -32,7 +32,7 @@ We evaluated the algorithm on the PubTabNet[1] eval dataset, and the |Method|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed| | --- | --- | --- | ---| -| EDD[2] |x| 88.3 |x| +| EDD[2] |x| 88.30 |x| | TableRec-RARE(ours) | 71.73%| 93.88% |779ms| | SLANet(ours) | 76.31%| 95.89%|766ms| diff --git a/ppstructure/table/README_ch.md b/ppstructure/table/README_ch.md index 72b7f5cbeb176cd28102c2f4da576f7af3f0c275..b8817523c67821e49fc258d1e71c8eae3f48435a 100644 --- a/ppstructure/table/README_ch.md +++ b/ppstructure/table/README_ch.md @@ -38,7 +38,7 @@ |算法|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed| | --- | --- | --- | ---| -| EDD[2] |x| 88.3% |x| +| EDD[2] |x| 88.30% |x| | TableRec-RARE(ours) | 71.73%| 93.88% |779ms| | SLANet(ours) |76.31%| 95.89%|766ms|