diff --git a/ppstructure/docs/PP-StructureV2_introduction.md b/ppstructure/docs/PP-StructureV2_introduction.md
index efaf35f2b5f8299180a7b1c1c7e4eb887323fe63..555fc4560ec79e157f0518a7afbf1ddbf585aee6 100644
--- a/ppstructure/docs/PP-StructureV2_introduction.md
+++ b/ppstructure/docs/PP-StructureV2_introduction.md
@@ -16,11 +16,11 @@
现实场景中包含大量的文档图像,它们以图片等非结构化形式存储。基于文档图像的结构化分析与信息抽取对于数据的数字化存储以及产业的数字化转型至关重要。基于该考虑,PaddleOCR自研并发布了PP-Structure智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别、关键信息抽取等文档理解相关任务。
-近期,PaddleOCR团队针对PP-Structurev1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-StructureV2。
+近期,PaddleOCR团队针对PP-StructureV1的版面分析、表格识别、关键信息抽取模块,进行了共计8个方面的升级,同时新增整图方向矫正、文档复原等功能,打造出一个全新的、效果更优的文档分析系统:PP-StructureV2。
## 2. 简介
-PP-StructureV2在PP-Structurev1的基础上进一步改进,主要有以下3个方面升级:
+PP-StructureV2在PP-StructureV1的基础上进一步改进,主要有以下3个方面升级:
* **系统功能升级** :新增图像矫正和版面复原模块,图像转word/pdf、关键信息抽取能力全覆盖!
* **系统性能优化** :
@@ -52,7 +52,7 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正
* TB-YX:考虑阅读顺序的文本行排序逻辑
* UDML:联合互学习知识蒸馏策略
-最终,与PP-Structurev1相比:
+最终,与PP-StructureV1相比:
- 版面分析模型参数量减少95.6%,推理速度提升11倍,精度提升0.4%;
- 表格识别预测耗时不变,模型精度提升6%,端到端TEDS提升2%;
@@ -74,17 +74,17 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正
### 4.1 版面分析
-版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-Structurev1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。
+版面分析指的是对图片形式的文档进行区域划分,定位其中的关键区域,如文字、标题、表格、图片等,PP-StructureV1使用了PaddleDetection中开源的高效检测算法PP-YOLOv2完成版面分析的任务。
在PP-StructureV2中,我们发布基于PP-PicoDet的轻量级版面分析模型,并针对版面分析场景定制图像尺度,同时使用FGD知识蒸馏算法,进一步提升模型精度。最终CPU上`41ms`即可完成版面分析过程(仅包含模型推理时间,数据预处理耗时大约50ms左右)。在公开数据集PubLayNet 上,消融实验如下:
| 实验序号 | 策略 | 模型存储(M) | mAP | CPU预测耗时(ms) |
|:------:|:------:|:------:|:------:|:------:|
-| 1 | PP-YOLOv2(640*640) | 221 | 93.6% | 512 |
-| 2 | PP-PicoDet-LCNet2.5x(640*640) | 29.7 | 92.5% |53.2|
-| 3 | PP-PicoDet-LCNet2.5x(800*608) | 29.7 | 94.2% |83.1 |
-| 4 | PP-PicoDet-LCNet1.0x(800*608) | 9.7 | 93.5% | 41.2|
-| 5 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 9.7 | 94% |41.2|
+| 1 | PP-YOLOv2(640*640) | 221.0 | 93.60% | 512.00 |
+| 2 | PP-PicoDet-LCNet2.5x(640*640) | 29.7 | 92.50% |53.20|
+| 3 | PP-PicoDet-LCNet2.5x(800*608) | 29.7 | 94.20% |83.10 |
+| 4 | PP-PicoDet-LCNet1.0x(800*608) | 9.7 | 93.50% | 41.20|
+| 5 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 9.7 | 94.00% |41.20|
* 测试条件
* paddle版本:2.3.0
@@ -94,8 +94,8 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正
| 模型 | mAP | CPU预测耗时 |
|-------------------|-----------|------------|
-| layoutparser (Detectron2) | 88.98% | 2.9s |
-| PP-StructureV2 (PP-PicoDet) | **94%** | 41.2ms |
+| layoutparser (Detectron2) | 88.98% | 2.90s |
+| PP-StructureV2 (PP-PicoDet) | **94.00%** | 41.20ms |
[PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)数据集是一个大型的文档图像数据集,包含Text、Title、Tale、Figure、List,共5个类别。数据集中包含335,703张训练集、11,245张验证集和11,405张测试集。训练数据与标注示例图如下所示:
@@ -108,7 +108,7 @@ PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正
**(1)轻量级版面分析模型PP-PicoDet**
-`PP-PicoDet`是PaddleDetection中提出的轻量级目标检测模型,通过使用PP-LCNet骨干网络、CSP-PAN特征融合模块、SimOTA标签分配方法等优化策略,最终在CPU与移动端具有卓越的性能。我们将PP-Structurev1中采用的PP-YOLOv2模型替换为`PP-PicoDet`,同时针对版面分析场景优化预测尺度,从针对目标检测设计的`640*640`调整为更适配文档图像的`800*608`,在`1.0x`配置下,模型精度与PP-YOLOv2相当,CPU平均预测速度可提升11倍。
+`PP-PicoDet`是PaddleDetection中提出的轻量级目标检测模型,通过使用PP-LCNet骨干网络、CSP-PAN特征融合模块、SimOTA标签分配方法等优化策略,最终在CPU与移动端具有卓越的性能。我们将PP-StructureV1中采用的PP-YOLOv2模型替换为`PP-PicoDet`,同时针对版面分析场景优化预测尺度,从针对目标检测设计的`640*640`调整为更适配文档图像的`800*608`,在`1.0x`配置下,模型精度与PP-YOLOv2相当,CPU平均预测速度可提升11倍。
**(1)FGD知识蒸馏**
@@ -130,10 +130,10 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
| 实验序号 | 策略 | mAP |
|:------:|:------:|:------:|
-| 1 | PP-YOLOv2 | 84.7% |
-| 2 | PP-PicoDet-LCNet2.5x(800*608) | 87.8% |
-| 3 | PP-PicoDet-LCNet1.0x(800*608) | 84.5% |
-| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 86.8% |
+| 1 | PP-YOLOv2 | 84.70% |
+| 2 | PP-PicoDet-LCNet2.5x(800*608) | 87.80% |
+| 3 | PP-PicoDet-LCNet1.0x(800*608) | 84.50% |
+| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 86.80% |
**(2)表格版面分析**
@@ -144,10 +144,10 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
| 实验序号 | 策略 | mAP |
|:------:|:------:|:------:|
-| 1 | PP-YOLOv2 |91.3% |
-| 2 | PP-PicoDet-LCNet2.5x(800*608) | 95.9% |
-| 3 | PP-PicoDet-LCNet1.0x(800*608) | 95.2% |
-| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 95.7% |
+| 1 | PP-YOLOv2 |91.30% |
+| 2 | PP-PicoDet-LCNet2.5x(800*608) | 95.90% |
+| 3 | PP-PicoDet-LCNet1.0x(800*608) | 95.20% |
+| 4 | PP-PicoDet-LCNet1.0x(800*608) + FGD | 95.70% |
表格检测效果示意图如下:
@@ -157,7 +157,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
### 4.2 表格识别
-基于深度学习的表格识别算法种类丰富,PP-Structurev1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-StructureV2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示:
+基于深度学习的表格识别算法种类丰富,PP-StructureV1中,我们基于文本识别算法RARE研发了端到端表格识别算法TableRec-RARE,模型输出为表格结构的HTML表示,进而可以方便地转化为Excel文件。PP-StructureV2中,我们对模型结构和损失函数等5个方面进行升级,提出了 SLANet (Structure Location Alignment Network) ,模型结构如下图所示:

@@ -170,7 +170,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
|TableRec-RARE| 71.73% | 93.88% |779ms |6.8M|
|+PP-LCNet| 74.71% |94.37% |778ms| 8.7M|
|+CSP-PAN| 75.68%| 94.72% |708ms| 9.3M|
-|+SLAHead| 77.7%|94.85%| 766ms| 9.2M|
+|+SLAHead| 77.70%|94.85%| 766ms| 9.2M|
|+MergeToken| 76.31%| 95.89%|766ms| 9.2M|
* 测试环境
@@ -181,7 +181,7 @@ FGD(Focal and Global Knowledge Distillation for Detectors),是一种兼顾
|策略|Acc|TEDS|推理速度(CPU+MKLDNN)|模型大小|
|---|---|---|---|---|
-|TableMaster|77.9%|96.12%|2144ms|253M|
+|TableMaster|77.90%|96.12%|2144ms|253.0M|
|TableRec-RARE| 71.73% | 93.88% |779ms |6.8M|
|SLANet|76.31%| 95.89%|766ms|9.2M|
@@ -218,7 +218,7 @@ PP-StructureV2中,我们参考TableMaster中的token处理方法,将`
`
除了上述模型策略的升级外,本次升级还开源了中文表格识别模型。在实际应用场景中,表格图像存在着各种各样的倾斜角度(PubTabNet数据集不存在该问题),因此在中文模型中,我们将单元格坐标回归的点数从2个(左上,右下)增加到4个(左上,右上,右下,左下)。在内部测试集上,模型升级前后指标如下:
|模型|acc|
|---|---|
-|TableRec-RARE|44.3%|
+|TableRec-RARE|44.30%|
|SLANet|59.35%|
可视化结果如下,左为输入图像,右为识别的html表格
@@ -307,8 +307,8 @@ LayoutLMv2以及LayoutXLM中引入视觉骨干网络,用于提取视觉特征
|-----------------|----------|---------|--------|
| LayoutLMv2 | 0.76 | 84.20% | - |
| VI-LayoutLMv2 | 0.42 | 82.10% | -2.10% |
-| LayoutXLM | 1.4 | 89.50% | - |
-| VI-LayouXLM | 1.1 | 90.46% | +0.96% |
+| LayoutXLM | 1.40 | 89.50% | - |
+| VI-LayouXLM | 1.10 | 90.46% | +0.96% |
同时,基于XFUND数据集,VI-LayoutXLM在RE任务上的精度也进一步提升了`1.06%`。
diff --git a/ppstructure/docs/models_list.md b/ppstructure/docs/models_list.md
index afed95600f0858b1423a105c4f5bcd3e092211ab..a5b9549a7fe31541fbf40c3a237bbcdaf8171e10 100644
--- a/ppstructure/docs/models_list.md
+++ b/ppstructure/docs/models_list.md
@@ -13,11 +13,11 @@
|模型名称|模型简介|推理模型大小|下载地址|dict path|
| --- | --- | --- | --- | --- |
| picodet_lcnet_x1_0_fgd_layout | 基于PicoDet LCNet_x1_0和FGD蒸馏在PubLayNet 数据集训练的英文版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) |
-| ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 |
+| ppyolov2_r50vd_dcn_365e_publaynet | 基于PP-YOLOv2在PubLayNet数据集上训练的英文版面分析模型 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | 同上 |
| picodet_lcnet_x1_0_fgd_layout_cdla | CDLA数据集训练的中文版面分析模型,可以划分为**表格、图片、图片标题、表格、表格标题、页眉、脚本、引用、公式**10类区域 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) |
| picodet_lcnet_x1_0_fgd_layout_table | 表格数据集训练的版面分析模型,支持中英文文档表格区域的检测 | 9.7M | [推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) |
-| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 |
-| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 |
+| ppyolov2_r50vd_dcn_365e_tableBank_word | 基于PP-YOLOv2在TableBank Word 数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | 同上 |
+| ppyolov2_r50vd_dcn_365e_tableBank_latex | 基于PP-YOLOv2在TableBank Latex数据集训练的版面分析模型,支持英文文档表格区域的检测 | 221.0M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | 同上 |
@@ -54,9 +54,9 @@
|re_VI-LayoutXLM_xfund_zh|基于VI-LayoutXLM在xfund中文数据集上训练的RE模型|1.1G| 83.92% | 15.49 |[推理模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar) |
|ser_LayoutXLM_xfund_zh|基于LayoutXLM在xfund中文数据集上训练的SER模型|1.4G| 90.38% | 19.49 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfund_zh|基于LayoutXLM在xfund中文数据集上训练的RE模型|1.4G| 74.83% | 19.49 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
-|ser_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfund中文数据集上训练的SER模型|778M| 85.44% | 31.46 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
-|re_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M| 67.77% | 31.46 |[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
-|ser_LayoutLM_xfund_zh|基于LayoutLM在xfund中文数据集上训练的SER模型|430M| 77.31% | - |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
+|ser_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfund中文数据集上训练的SER模型|778.0M| 85.44% | 31.46 |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
+|re_LayoutLMv2_xfund_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765.0M| 67.77% | 31.46 |[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
+|ser_LayoutLM_xfund_zh|基于LayoutLM在xfund中文数据集上训练的SER模型|430.0M| 77.31% | - |[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
* 注:上述预测耗时信息仅包含了inference模型的推理耗时,没有统计预处理与后处理耗时,测试环境为`V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`。
@@ -65,4 +65,4 @@
|模型名称|模型简介|模型大小|精度|下载地址|
| --- | --- | --- |--- | --- |
-|SDMGR|关键信息提取模型|78M| 86.70% | [推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
+|SDMGR|关键信息提取模型|78.0M| 86.70% | [推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md
index 291d42f995fdd7fabc293a0e4df35c2249945fd2..889ad09708cebff7d6b14fdd7fb58d9e95a09dcc 100644
--- a/ppstructure/docs/models_list_en.md
+++ b/ppstructure/docs/models_list_en.md
@@ -13,11 +13,11 @@
|model name| description | inference model size |download|dict path|
| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | --- |
| picodet_lcnet_x1_0_fgd_layout | The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) |
-| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | same as above |
+| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221.0M | [inference_moel](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | same as above |
| picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) |
| picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) |
-| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above |
-| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above |
+| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents | 221.0M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above |
+| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents | 221.0M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | same as above |
## 2. OCR and Table Recognition
@@ -63,4 +63,4 @@ On wildreceipt dataset, the algorithm result is as follows:
|Model|Backbone|Config|Hmean|Download link|
| --- | --- | --- | --- | --- |
-|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
+|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.70%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
diff --git a/ppstructure/docs/quickstart.md b/ppstructure/docs/quickstart.md
index 6fbd31c3c19b9d5bb8d6045efaac76628c18a3d9..9909f7950c9810bd99ee0b0504ccdf3d0a9b06ff 100644
--- a/ppstructure/docs/quickstart.md
+++ b/ppstructure/docs/quickstart.md
@@ -45,16 +45,10 @@
```bash
# 安装 paddleocr,推荐使用2.6版本
-pip3 install "paddleocr>=2.6"
+pip3 install "paddleocr>=2.6.0.3"
# 安装 图像方向分类依赖包paddleclas(如不需要图像方向分类功能,可跳过)
pip3 install paddleclas>=2.4.3
-
-# 安装 关键信息抽取 依赖包(如不需要KIE功能,可跳过)
-pip3 install -r ppstructure/kie/requirements.txt
-
-# 安装 版面恢复 依赖包(如不需要版面恢复功能,可跳过)
-pip3 install -r ppstructure/recovery/requirements.txt
```
diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md
index 446f9d2ee387a169cbfeb067de9d1a0aa0ff7584..c990088a28a6849d53f05a42fc7e0d14adc1ca51 100644
--- a/ppstructure/docs/quickstart_en.md
+++ b/ppstructure/docs/quickstart_en.md
@@ -47,16 +47,10 @@ For more software version requirements, please refer to the instructions in [Ins
```bash
# Install paddleocr, version 2.6 is recommended
-pip3 install "paddleocr>=2.6"
+pip3 install "paddleocr>=2.6.0.3"
# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it)
pip3 install paddleclas>=2.4.3
-
-# Install the KIE dependency packages (if you do not use the KIE, you can skip it)
-pip3 install -r kie/requirements.txt
-
-# Install the layout recovery dependency packages (if you do not use the layout recovery, you can skip it)
-pip3 install -r recovery/requirements.txt
```
diff --git a/ppstructure/pdf2word/pdf2word.py b/ppstructure/pdf2word/pdf2word.py
index 735fa5350a8f4f3bdc4ac62f3772083705ea3589..5c8f8f2bd3ec2035934b9226d8bbccfa82d55f0f 100644
--- a/ppstructure/pdf2word/pdf2word.py
+++ b/ppstructure/pdf2word/pdf2word.py
@@ -1,9 +1,23 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
import sys
import tarfile
import os
import time
import datetime
-import functools
+import functools
import cv2
import platform
import numpy as np
@@ -20,7 +34,6 @@ root = os.path.abspath(os.path.join(file, '../../'))
sys.path.append(file)
sys.path.insert(0, root)
-
from ppstructure.predict_system import StructureSystem, save_structure_res
from ppstructure.utility import parse_args, draw_structure_result
from ppocr.utils.network import download_with_progressbar
@@ -32,13 +45,17 @@ __VERSION__ = "0.2.2"
URLs_EN = {
# 下载超英文轻量级PP-OCRv3模型的检测模型并解压
- "en_PP-OCRv3_det_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar",
+ "en_PP-OCRv3_det_infer":
+ "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar",
# 下载英文轻量级PP-OCRv3模型的识别模型并解压
- "en_PP-OCRv3_rec_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar",
+ "en_PP-OCRv3_rec_infer":
+ "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar",
# 下载超轻量级英文表格英文模型并解压
- "en_ppstructure_mobile_v2.0_SLANet_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
+ "en_ppstructure_mobile_v2.0_SLANet_infer":
+ "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
# 英文版面分析模型
- "picodet_lcnet_x1_0_fgd_layout_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar",
+ "picodet_lcnet_x1_0_fgd_layout_infer":
+ "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar",
}
DICT_EN = {
"rec_char_dict_path": "en_dict.txt",
@@ -47,21 +64,24 @@ DICT_EN = {
URLs_CN = {
# 下载超中文轻量级PP-OCRv3模型的检测模型并解压
- "cn_PP-OCRv3_det_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar",
+ "cn_PP-OCRv3_det_infer":
+ "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar",
# 下载中文轻量级PP-OCRv3模型的识别模型并解压
- "cn_PP-OCRv3_rec_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar",
+ "cn_PP-OCRv3_rec_infer":
+ "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar",
# 下载超轻量级英文表格英文模型并解压
- "cn_ppstructure_mobile_v2.0_SLANet_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
+ "cn_ppstructure_mobile_v2.0_SLANet_infer":
+ "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
# 中文版面分析模型
- "picodet_lcnet_x1_0_fgd_layout_cdla_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar",
+ "picodet_lcnet_x1_0_fgd_layout_cdla_infer":
+ "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar",
}
DICT_CN = {
- "rec_char_dict_path": "ppocr_keys_v1.txt",
+ "rec_char_dict_path": "ppocr_keys_v1.txt",
"layout_dict_path": "layout_cdla_dict.txt",
}
-
def QImageToCvMat(incomingImage) -> np.array:
'''
Converts a QImage into an opencv MAT format
@@ -98,7 +118,7 @@ def readImage(image_file) -> list:
img = cv2.imread(image_file, cv2.IMREAD_COLOR)
if img is not None:
imgs = [img]
-
+
return imgs
@@ -106,7 +126,7 @@ class Worker(QThread):
progressBarValue = Signal(int)
progressBarRange = Signal(int)
endsignal = Signal()
- exceptedsignal = Signal(str) #发送一个异常信号
+ exceptedsignal = Signal(str) #发送一个异常信号
loopFlag = True
def __init__(self, predictors, save_pdf, vis_font_path, use_pdf2docx_api):
@@ -120,7 +140,7 @@ class Worker(QThread):
self.outputDir = None
self.totalPageCnt = 0
self.pageCnt = 0
- self.setStackSize(1024*1024)
+ self.setStackSize(1024 * 1024)
def setImagePath(self, imagePaths):
self.imagePaths = imagePaths
@@ -130,7 +150,7 @@ class Worker(QThread):
def setOutputDir(self, outputDir):
self.outputDir = outputDir
-
+
def setPDFParser(self, enabled):
self.use_pdf2docx_api = enabled
@@ -167,10 +187,10 @@ class Worker(QThread):
try:
convert_info_docx(imgs, all_res, self.outputDir, img_name)
except Exception as ex:
- print("error in layout recovery image:{}, err msg: {}".
- format(img_name, ex))
+ print("error in layout recovery image:{}, err msg: {}".format(
+ img_name, ex))
print("Predict time : {:.3f}s".format(time_dict['all']))
- print('result save to {}'.format(self.outputDir))
+ print('result save to {}'.format(self.outputDir))
def run(self):
self.resetPageCnt()
@@ -185,10 +205,11 @@ class Worker(QThread):
and os.path.basename(image_file)[-3:] == 'pdf':
self.totalPageCnt += 1
self.progressBarRange.emit(self.totalPageCnt)
- print('===============using use_pdf2docx_api===============')
+ print(
+ '===============using use_pdf2docx_api===============')
img_name = os.path.basename(image_file).split('.')[0]
- docx_file = os.path.join(
- self.outputDir, '{}.docx'.format(img_name))
+ docx_file = os.path.join(self.outputDir,
+ '{}.docx'.format(img_name))
cv = Converter(image_file)
cv.convert(docx_file)
cv.close()
@@ -201,13 +222,14 @@ class Worker(QThread):
if len(imgs) == 0:
continue
img_name = os.path.basename(image_file).split('.')[0]
- os.makedirs(os.path.join(self.outputDir, img_name), exist_ok=True)
+ os.makedirs(
+ os.path.join(self.outputDir, img_name), exist_ok=True)
self.ppocrPrecitor(imgs, img_name)
# file processed
self.endsignal.emit()
# self.exec()
except Exception as e:
- self.exceptedsignal.emit(str(e)) # 将异常发送给UI进程
+ self.exceptedsignal.emit(str(e)) # 将异常发送给UI进程
class APP_Image2Doc(QWidget):
@@ -222,8 +244,7 @@ class APP_Image2Doc(QWidget):
self.screenShot = None
self.save_pdf = False
self.output_dir = None
- self.vis_font_path = os.path.join(root,
- "doc", "fonts", "simfang.ttf")
+ self.vis_font_path = os.path.join(root, "doc", "fonts", "simfang.ttf")
self.use_pdf2docx_api = False
# ProgressBar
@@ -239,14 +260,16 @@ class APP_Image2Doc(QWidget):
self.downloadModels(URLs_CN)
# 初始化模型
- predictors = {
+ predictors = {
'EN': self.initPredictor('EN'),
'CN': self.initPredictor('CN'),
}
# 设置工作进程
- self._thread = Worker(predictors, self.save_pdf, self.vis_font_path, self.use_pdf2docx_api)
- self._thread.progressBarValue.connect(self.handleProgressBarUpdateSingal)
+ self._thread = Worker(predictors, self.save_pdf, self.vis_font_path,
+ self.use_pdf2docx_api)
+ self._thread.progressBarValue.connect(
+ self.handleProgressBarUpdateSingal)
self._thread.endsignal.connect(self.handleEndsignalSignal)
# self._thread.finished.connect(QObject.deleteLater)
self._thread.progressBarRange.connect(self.handleProgressBarRangeSingal)
@@ -285,7 +308,7 @@ class APP_Image2Doc(QWidget):
layout.addWidget(self.PDFParserButton, 0, 3, 1, 1)
self.PDFParserButton.clicked.connect(
functools.partial(self.handleStartSignal, 'CN', True))
-
+
self.showResultButton = QPushButton("显示结果")
self.showResultButton.setIcon(QIcon(QPixmap("./icons/folder-open.png")))
layout.addWidget(self.showResultButton, 0, 4, 1, 1)
@@ -294,8 +317,7 @@ class APP_Image2Doc(QWidget):
# ProgressBar
layout.addWidget(self.pb, 2, 0, 1, 5)
# time estimate label
- self.timeEstLabel = QLabel(
- ("Time Left: --"))
+ self.timeEstLabel = QLabel(("Time Left: --"))
layout.addWidget(self.timeEstLabel, 3, 0, 1, 5)
self.setLayout(layout)
@@ -303,11 +325,8 @@ class APP_Image2Doc(QWidget):
def downloadModels(self, URLs):
# using custom model
tar_file_name_list = [
- 'inference.pdiparams',
- 'inference.pdiparams.info',
- 'inference.pdmodel',
- 'model.pdiparams',
- 'model.pdiparams.info',
+ 'inference.pdiparams', 'inference.pdiparams.info',
+ 'inference.pdmodel', 'model.pdiparams', 'model.pdiparams.info',
'model.pdmodel'
]
model_path = os.path.join(root, 'inference')
@@ -325,9 +344,10 @@ class APP_Image2Doc(QWidget):
try:
download_with_progressbar(url, tarpath)
except Exception as e:
- print("Error occurred when downloading file, error message:")
+ print(
+ "Error occurred when downloading file, error message:")
print(e)
-
+
# unzip model tar
try:
with tarfile.open(tarpath, 'r') as tarObj:
@@ -341,13 +361,12 @@ class APP_Image2Doc(QWidget):
if filename is None:
continue
file = tarObj.extractfile(member)
- with open(
- os.path.join(storage_dir, filename),
- 'wb') as f:
+ with open(os.path.join(storage_dir, filename),
+ 'wb') as f:
f.write(file.read())
except Exception as e:
- print("Error occurred when unziping file, error message:")
- print(e)
+ print("Error occurred when unziping file, error message:")
+ print(e)
def initPredictor(self, lang='EN'):
# init predictor args
@@ -356,50 +375,53 @@ class APP_Image2Doc(QWidget):
args.ocr = True
args.recovery = True
args.save_pdf = self.save_pdf
- args.table_char_dict_path = os.path.join(root,
- "ppocr", "utils", "dict", "table_structure_dict.txt")
+ args.table_char_dict_path = os.path.join(root, "ppocr", "utils", "dict",
+ "table_structure_dict.txt")
if lang == 'EN':
- args.det_model_dir = os.path.join(root, # 此处从这里找到模型存放位置
- "inference", "en_PP-OCRv3_det_infer")
- args.rec_model_dir = os.path.join(root,
- "inference", "en_PP-OCRv3_rec_infer")
- args.table_model_dir = os.path.join(root,
- "inference", "en_ppstructure_mobile_v2.0_SLANet_infer")
- args.output = os.path.join(root, "output") # 结果保存路径
- args.layout_model_dir = os.path.join(root,
- "inference", "picodet_lcnet_x1_0_fgd_layout_infer")
+ args.det_model_dir = os.path.join(
+ root, # 此处从这里找到模型存放位置
+ "inference",
+ "en_PP-OCRv3_det_infer")
+ args.rec_model_dir = os.path.join(root, "inference",
+ "en_PP-OCRv3_rec_infer")
+ args.table_model_dir = os.path.join(
+ root, "inference", "en_ppstructure_mobile_v2.0_SLANet_infer")
+ args.output = os.path.join(root, "output") # 结果保存路径
+ args.layout_model_dir = os.path.join(
+ root, "inference", "picodet_lcnet_x1_0_fgd_layout_infer")
lang_dict = DICT_EN
elif lang == 'CN':
- args.det_model_dir = os.path.join(root, # 此处从这里找到模型存放位置
- "inference", "cn_PP-OCRv3_det_infer")
- args.rec_model_dir = os.path.join(root,
- "inference", "cn_PP-OCRv3_rec_infer")
- args.table_model_dir = os.path.join(root,
- "inference", "cn_ppstructure_mobile_v2.0_SLANet_infer")
- args.output = os.path.join(root, "output") # 结果保存路径
- args.layout_model_dir = os.path.join(root,
- "inference", "picodet_lcnet_x1_0_fgd_layout_cdla_infer")
+ args.det_model_dir = os.path.join(
+ root, # 此处从这里找到模型存放位置
+ "inference",
+ "cn_PP-OCRv3_det_infer")
+ args.rec_model_dir = os.path.join(root, "inference",
+ "cn_PP-OCRv3_rec_infer")
+ args.table_model_dir = os.path.join(
+ root, "inference", "cn_ppstructure_mobile_v2.0_SLANet_infer")
+ args.output = os.path.join(root, "output") # 结果保存路径
+ args.layout_model_dir = os.path.join(
+ root, "inference", "picodet_lcnet_x1_0_fgd_layout_cdla_infer")
lang_dict = DICT_CN
else:
raise ValueError("Unsupported language")
- args.rec_char_dict_path = os.path.join(root,
- "ppocr", "utils",
- lang_dict['rec_char_dict_path'])
- args.layout_dict_path = os.path.join(root,
- "ppocr", "utils", "dict", "layout_dict",
- lang_dict['layout_dict_path'])
+ args.rec_char_dict_path = os.path.join(root, "ppocr", "utils",
+ lang_dict['rec_char_dict_path'])
+ args.layout_dict_path = os.path.join(root, "ppocr", "utils", "dict",
+ "layout_dict",
+ lang_dict['layout_dict_path'])
# init predictor
return StructureSystem(args)
-
+
def handleOpenFileSignal(self):
'''
可以多选图像文件
'''
- selectedFiles = QFileDialog.getOpenFileNames(self,
- "多文件选择", "/", "图片文件 (*.png *.jpeg *.jpg *.bmp *.pdf)")[0]
+ selectedFiles = QFileDialog.getOpenFileNames(
+ self, "多文件选择", "/", "图片文件 (*.png *.jpeg *.jpg *.bmp *.pdf)")[0]
if len(selectedFiles) > 0:
self.imagePaths = selectedFiles
- self.screenShot = None # discard screenshot temp image
+ self.screenShot = None # discard screenshot temp image
self.pb.setValue(0)
# def screenShotSlot(self):
@@ -415,18 +437,19 @@ class APP_Image2Doc(QWidget):
# self.pb.setValue(0)
def handleStartSignal(self, lang='EN', pdfParser=False):
- if self.screenShot: # for screenShot
- img_name = 'screenshot_' + time.strftime("%Y%m%d%H%M%S", time.localtime())
+ if self.screenShot: # for screenShot
+ img_name = 'screenshot_' + time.strftime("%Y%m%d%H%M%S",
+ time.localtime())
image = QImageToCvMat(self.screenShot)
self.predictAndSave(image, img_name, lang)
# update Progress Bar
self.pb.setValue(1)
- QMessageBox.information(self,
- u'Information', "文档提取完成")
- elif len(self.imagePaths) > 0 : # for image file selection
+ QMessageBox.information(self, u'Information', "文档提取完成")
+ elif len(self.imagePaths) > 0: # for image file selection
# Must set image path list and language before start
self.output_dir = os.path.join(
- os.path.dirname(self.imagePaths[0]), "output") # output_dir shold be same as imagepath
+ os.path.dirname(self.imagePaths[0]),
+ "output") # output_dir shold be same as imagepath
self._thread.setOutputDir(self.output_dir)
self._thread.setImagePath(self.imagePaths)
self._thread.setLang(lang)
@@ -438,12 +461,10 @@ class APP_Image2Doc(QWidget):
self.PDFParserButton.setEnabled(False)
# 启动工作进程
self._thread.start()
- self.time_start = time.time() # log start time
- QMessageBox.information(self,
- u'Information', "开始转换")
+ self.time_start = time.time() # log start time
+ QMessageBox.information(self, u'Information', "开始转换")
else:
- QMessageBox.warning(self,
- u'Information', "请选择要识别的文件或截图")
+ QMessageBox.warning(self, u'Information', "请选择要识别的文件或截图")
def handleShowResultSignal(self):
if self.output_dir is None:
@@ -454,15 +475,16 @@ class APP_Image2Doc(QWidget):
else:
os.system('open ' + os.path.normpath(self.output_dir))
else:
- QMessageBox.information(self,
- u'Information', "输出文件不存在")
+ QMessageBox.information(self, u'Information', "输出文件不存在")
def handleProgressBarUpdateSingal(self, i):
self.pb.setValue(i)
# calculate time left of recognition
lenbar = self.pb.maximum()
- avg_time = (time.time() - self.time_start) / i # Use average time to prevent time fluctuations
- time_left = str(datetime.timedelta(seconds=avg_time * (lenbar - i))).split(".")[0] # Remove microseconds
+ avg_time = (time.time() - self.time_start
+ ) / i # Use average time to prevent time fluctuations
+ time_left = str(datetime.timedelta(seconds=avg_time * (
+ lenbar - i))).split(".")[0] # Remove microseconds
self.timeEstLabel.setText(f"Time Left: {time_left}") # show time left
def handleProgressBarRangeSingal(self, max):
diff --git a/ppstructure/recovery/requirements.txt b/ppstructure/recovery/requirements.txt
index 4e4239a14af9b6f95aca1171f25d50da5eac37cf..ec08f9d0a28b54e3e082db4d32799f8384250c1d 100644
--- a/ppstructure/recovery/requirements.txt
+++ b/ppstructure/recovery/requirements.txt
@@ -2,4 +2,5 @@ python-docx
PyMuPDF==1.19.0
beautifulsoup4
fonttools>=4.24.0
-fire>=0.3.0
\ No newline at end of file
+fire>=0.3.0
+pdf2docx
\ No newline at end of file
diff --git a/ppstructure/table/README.md b/ppstructure/table/README.md
index cebbd1ccafbde0aee7fa9f50398682a86cb1c8dd..17f0488791512f56b766951b67fff4e480dbfdda 100644
--- a/ppstructure/table/README.md
+++ b/ppstructure/table/README.md
@@ -32,7 +32,7 @@ We evaluated the algorithm on the PubTabNet[1] eval dataset, and the
|Method|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed|
| --- | --- | --- | ---|
-| EDD[2] |x| 88.3 |x|
+| EDD[2] |x| 88.30% |x|
| TableRec-RARE(ours) | 71.73%| 93.88% |779ms|
| SLANet(ours) | 76.31%| 95.89%|766ms|
diff --git a/ppstructure/table/README_ch.md b/ppstructure/table/README_ch.md
index 72b7f5cbeb176cd28102c2f4da576f7af3f0c275..b8817523c67821e49fc258d1e71c8eae3f48435a 100644
--- a/ppstructure/table/README_ch.md
+++ b/ppstructure/table/README_ch.md
@@ -38,7 +38,7 @@
|算法|Acc|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|Speed|
| --- | --- | --- | ---|
-| EDD[2] |x| 88.3% |x|
+| EDD[2] |x| 88.30% |x|
| TableRec-RARE(ours) | 71.73%| 93.88% |779ms|
| SLANet(ours) |76.31%| 95.89%|766ms|
diff --git a/ppstructure/table/predict_table.py b/ppstructure/table/predict_table.py
index 8f9c7174904ab3818f62544aeadc97c410070b07..354baf6ddf5e73b2e933a9b9e8a568bda80340e5 100644
--- a/ppstructure/table/predict_table.py
+++ b/ppstructure/table/predict_table.py
@@ -60,12 +60,16 @@ class TableSystem(object):
self.args = args
if not args.show_log:
logger.setLevel(logging.INFO)
- args.benchmark = False
+ benchmark_tmp = False
+ if args.benchmark:
+ benchmark_tmp = args.benchmark
+ args.benchmark = False
self.text_detector = predict_det.TextDetector(copy.deepcopy(
args)) if text_detector is None else text_detector
self.text_recognizer = predict_rec.TextRecognizer(copy.deepcopy(
args)) if text_recognizer is None else text_recognizer
- args.benchmark = True
+ if benchmark_tmp:
+ args.benchmark = True
self.table_structurer = predict_strture.TableStructurer(args)
if args.table_algorithm in ['TableMaster']:
self.match = TableMasterMatcher()
diff --git a/setup.py b/setup.py
index 7d4d871d89defcf832910c60f18b094f10ba11db..3aa0a1701c23d4d122495a4f8fd11b76e714114e 100644
--- a/setup.py
+++ b/setup.py
@@ -16,9 +16,16 @@ from setuptools import setup
from io import open
from paddleocr import VERSION
-with open('requirements.txt', encoding="utf-8-sig") as f:
- requirements = f.readlines()
- requirements.append('tqdm')
+def load_requirements(file_list=None):
+ if file_list is None:
+ file_list = ['requirements.txt']
+ if isinstance(file_list,str):
+ file_list = [file_list]
+ requirements = []
+ for file in file_list:
+ with open(file, encoding="utf-8-sig") as f:
+ requirements.extend(f.readlines())
+ return requirements
def readme():
@@ -34,7 +41,7 @@ setup(
include_package_data=True,
entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]},
version=VERSION,
- install_requires=requirements,
+ install_requires=load_requirements(['requirements.txt', 'ppstructure/recovery/requirements.txt']),
license='Apache License 2.0',
description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices',
long_description=readme(),
diff --git a/test_tipc/configs/sr_telescope/sr_telescope.yml b/test_tipc/configs/sr_telescope/sr_telescope.yml
index d3c10448e423ff0305950ea39664379e60f8a113..c78a42d0efb7bbcdd182861a6474d87c9f68b3d4 100644
--- a/test_tipc/configs/sr_telescope/sr_telescope.yml
+++ b/test_tipc/configs/sr_telescope/sr_telescope.yml
@@ -51,7 +51,7 @@ Metric:
Train:
dataset:
name: LMDBDataSetSR
- data_dir: ./train_data/TextZoom/train
+ data_dir: ./train_data/TextZoom/test
transforms:
- SRResize:
imgH: 32
diff --git a/test_tipc/configs/sr_telescope/train_infer_python.txt b/test_tipc/configs/sr_telescope/train_infer_python.txt
index 4dcfa29ee146b3b2662122966d859142bb0ed0c5..7235f07e8c72411f6ae979e666e624c32de935b9 100644
--- a/test_tipc/configs/sr_telescope/train_infer_python.txt
+++ b/test_tipc/configs/sr_telescope/train_infer_python.txt
@@ -4,12 +4,12 @@ python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:null
-Global.epoch_num:lite_train_lite_infer=2|whole_train_whole_infer=300
+Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_lite_infer=16|whole_train_whole_infer=16
Global.pretrained_model:null
train_model_name:latest
-train_infer_img_dir:./inference/sr_inference
+train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train
@@ -21,7 +21,7 @@ null:null
null:null
##
===========================eval_params===========================
-eval:tools/eval.py -c test_tipc/configs/sr_telescope/sr_telescope.yml -o
+eval:null
null:null
##
===========================infer_params===========================
@@ -44,8 +44,8 @@ inference:tools/infer/predict_sr.py --sr_image_shape="1,32,128" --rec_algorithm=
--rec_batch_num:1
--use_tensorrt:False
--precision:fp32
---rec_model_dir:
---image_dir:./inference/sr_inference
+--sr_model_dir:
+--image_dir:./inference/rec_inference
--save_log_path:./test/output/
--benchmark:True
null:null
diff --git a/test_tipc/prepare.sh b/test_tipc/prepare.sh
index b76332af931c5c4c071c34e70d32f2b5c7d8ebbc..02ee8a24d241195d1330ea42fc05ed35dd7a87b7 100644
--- a/test_tipc/prepare.sh
+++ b/test_tipc/prepare.sh
@@ -150,6 +150,7 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
# pretrain lite train data
wget -nc -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams --no-check-certificate
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar --no-check-certificate
+ cd ./pretrain_models/ && tar xf det_mv3_db_v2.0_train.tar && cd ../
if [[ ${model_name} =~ "ch_PP-OCRv2_det" ]];then
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar --no-check-certificate
cd ./pretrain_models/ && tar xf ch_PP-OCRv2_det_distill_train.tar && cd ../
@@ -179,7 +180,6 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar --no-check-certificate
cd ./pretrain_models/ && tar xf table_structure_tablemaster_train.tar && cd ../
fi
- cd ./pretrain_models/ && tar xf det_mv3_db_v2.0_train.tar && cd ../
rm -rf ./train_data/icdar2015
rm -rf ./train_data/ic15_data
rm -rf ./train_data/pubtabnet
@@ -290,6 +290,7 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
if [ ${model_name} == "sr_telescope" ]; then
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/TextZoom.tar --no-check-certificate
cd ./train_data/ && tar xf TextZoom.tar && cd ../
+ fi
if [ ${model_name} == "rec_d28_can" ]; then
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/CROHME_lite.tar --no-check-certificate
cd ./train_data/ && tar xf CROHME_lite.tar && cd ../
diff --git a/tools/program.py b/tools/program.py
index a0594e950d969c39eb1cb363435897c5f219f0e4..afb8a47254b9847e4a4d432b7f17902c3ee78725 100755
--- a/tools/program.py
+++ b/tools/program.py
@@ -642,7 +642,8 @@ def preprocess(is_train=False):
'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn', 'SAR', 'PSE',
'SEED', 'SDMGR', 'LayoutXLM', 'LayoutLM', 'LayoutLMv2', 'PREN', 'FCE',
'SVTR', 'ViTSTR', 'ABINet', 'DB++', 'TableMaster', 'SPIN', 'VisionLAN',
- 'Gestalt', 'SLANet', 'RobustScanner', 'CT', 'RFL', 'DRRG', 'CAN'
+ 'Gestalt', 'SLANet', 'RobustScanner', 'CT', 'RFL', 'DRRG', 'CAN',
+ 'Telescope'
]
if use_xpu:
|