Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into table_pr

f050b700 · 文幕地方 · a8ac0a13 · 37479c01 · f050b700 · f050b700
21 changed file
--- a/configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml
+++ b/configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml
@@ -12,7 +12,7 @@ Global:
  checkpoints:
  save_inference_dir:
  use_visualdl: false
-  infer_img: doc/imgs_words/ch/word_1.jpg
+  infer_img: ./doc/imgs_words/arabic/ar_2.jpg
  character_dict_path: ppocr/utils/dict/arabic_dict.txt
  max_text_length: &max_text_length 25
  infer_mode: false

--- a/doc/doc_ch/algorithm_overview.md
+++ b/doc/doc_ch/algorithm_overview.md
@@ -24,7 +24,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 ### 1.1 文本检测算法
 已支持的文本检测算法列表（戳链接获取使用教程）：
- [x]  [DB](./algorithm_det_db.md)
+- [x]  [DB与DB++](./algorithm_det_db.md)
 - [x]  [EAST](./algorithm_det_east.md)
 - [x]  [SAST](./algorithm_det_sast.md)
 - [x]  [PSENet](./algorithm_det_psenet.md)
@@ -41,6 +41,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
 |PSE|ResNet50_vd|85.81%|79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
 |PSE|MobileNetV3|82.20%|70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
+|DB++|ResNet50|90.89%|82.66%|86.58%|[合成数据预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
 在Total-text文本检测公开数据集上，算法效果如下：
@@ -129,10 +130,10 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 已支持的关键信息抽取算法列表（戳链接获取使用教程）：
- [x]  [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm.md)
+- [x]  [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm.md)
- [x]  [LayoutLM](./algorithm_kie_laoutxlm.md)
+- [x]  [LayoutLM](./algorithm_kie_layoutxlm.md)
- [x]  [LayoutLMv2](./algorithm_kie_laoutxlm.md)
+- [x]  [LayoutLMv2](./algorithm_kie_layoutxlm.md)
- [x]  [LayoutXLM](./algorithm_kie_laoutxlm.md)
+- [x]  [LayoutXLM](./algorithm_kie_layoutxlm.md)
 - [x]  [SDMGR](././algorithm_kie_sdmgr.md)
 在wildreceipt发票公开数据集上，算法复现效果如下：

--- a/doc/doc_en/algorithm_det_db_en.md
+++ b/doc/doc_en/algorithm_det_db_en.md
-# DB
+# DB && DB++
 - [1. Introduction](#1)
 - [2. Environment](#2)
@@ -21,13 +21,23 @@ Paper:
 > Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang
 > AAAI, 2020
+> [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
+> Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang
+> TPAMI, 2022
 On the ICDAR2015 dataset, the text detection result is as follows:
 |Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
 | --- | --- | --- | --- | --- | --- | --- |
 |DB|ResNet50_vd|[configs/det/det_r50_vd_db.yml](../../configs/det/det_r50_vd_db.yml)|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
 |DB|MobileNetV3|[configs/det/det_mv3_db.yml](../../configs/det/det_mv3_db.yml)|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
+|DB++|ResNet50|[configs/det/det_r50_db++_ic15.yml](../../configs/det/det_r50_db++_ic15.yml)|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
+On the TD_TR dataset, the text detection result is as follows:
+|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
+| --- | --- | --- | --- | --- | --- | --- |
+|DB++|ResNet50|[configs/det/det_r50_db++_td_tr.yml](../../configs/det/det_r50_db++_td_tr.yml)|92.92%|86.48%|89.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_td_tr_train.tar)|
 <a name="2"></a>
 ## 2. Environment
@@ -96,4 +106,12 @@ More deployment schemes supported for DB:
  pages={11474--11481},
  year={2020}
 }
-```
\ No newline at end of file
+@article{liao2022real,
+  title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
+  author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  year={2022},
+  publisher={IEEE}
+}
+```
--- a/doc/doc_en/algorithm_overview_en.md
+++ b/doc/doc_en/algorithm_overview_en.md
@@ -22,7 +22,7 @@ Developers are welcome to contribute more algorithms! Please refer to [add new a
 ### 1.1 Text Detection Algorithms
 Supported text detection algorithms (Click the link to get the tutorial):
- [x]  [DB](./algorithm_det_db_en.md)
+- [x]  [DB && DB++](./algorithm_det_db_en.md)
 - [x]  [EAST](./algorithm_det_east_en.md)
 - [x]  [SAST](./algorithm_det_sast_en.md)
 - [x]  [PSENet](./algorithm_det_psenet_en.md)
@@ -39,6 +39,7 @@ On the ICDAR2015 dataset, the text detection result is as follows:
 |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
 |PSE|ResNet50_vd|85.81%|79.53%|82.55%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
 |PSE|MobileNetV3|82.20%|70.48%|75.89%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
+|DB++|ResNet50|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
 On Total-Text dataset, the text detection result is as follows:
@@ -127,10 +128,10 @@ On the PubTabNet dataset, the algorithm result is as follows:
 Supported KIE algorithms (Click the link to get the tutorial):
- [x]  [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm_en.md)
+- [x]  [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm_en.md)
- [x]  [LayoutLM](./algorithm_kie_laoutxlm_en.md)
+- [x]  [LayoutLM](./algorithm_kie_layoutxlm_en.md)
- [x]  [LayoutLMv2](./algorithm_kie_laoutxlm_en.md)
+- [x]  [LayoutLMv2](./algorithm_kie_layoutxlm_en.md)
- [x]  [LayoutXLM](./algorithm_kie_laoutxlm_en.md)
+- [x]  [LayoutXLM](./algorithm_kie_layoutxlm_en.md)
 - [x]  [SDMGR](./algorithm_kie_sdmgr_en.md)
 On wildreceipt dataset, the algorithm result is as follows:

--- a/doc/overview_en.png
+++ b/doc/overview_en.png
--- a/doc/ppocr_v3/svtr_tiny.jpg
+++ b/doc/ppocr_v3/svtr_tiny.jpg
--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@@ -45,6 +45,27 @@ class BaseRecLabelDecode(object):
            self.dict[char] = i
        self.character = dict_character
+        if 'arabic' in character_dict_path:
+            self.reverse = True
+        else:
+            self.reverse = False
+    def pred_reverse(self, pred):
+        pred_re = []
+        c_current = ''
+        for c in pred:
+            if not bool(re.search('[a-zA-Z0-9 :*./%+-]', c)):
+                if c_current != '':
+                    pred_re.append(c_current)
+                pred_re.append(c)
+                c_current = ''
+            else:
+                c_current += c
+        if c_current != '':
+            pred_re.append(c_current)
+        return ''.join(pred_re[::-1])
    def add_special_char(self, dict_character):
        return dict_character
@@ -73,6 +94,10 @@ class BaseRecLabelDecode(object):
                conf_list = [0]
            text = ''.join(char_list)
+            if self.reverse:  # for arabic rec
+                text = self.pred_reverse(text)
            result_list.append((text, np.mean(conf_list).tolist()))
        return result_list

--- a/ppocr/utils/dict/arabic_dict.txt
+++ b/ppocr/utils/dict/arabic_dict.txt
 !
 #
 $

--- a/ppstructure/docs/models_list_en.md
+++ b/ppstructure/docs/models_list_en.md
@@ -13,7 +13,7 @@
 |model name| description                                                                                                                                             | inference model size                                                                                                                         |download|dict path|
 | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- | --- |
 | picodet_lcnet_x1_0_fgd_layout | The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as **Text, Title, Table, Picture and List** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams) | [PubLayNet dict](../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt) |
-| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel]](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | sme as above |
+| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2 | 221M | [inference_moel](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) | same as above |
 | picodet_lcnet_x1_0_fgd_layout_cdla | The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as **Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation** | 9.7M | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla.pdparams) | [CDLA dict](../../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt) |
 | picodet_lcnet_x1_0_fgd_layout_table | The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents                     | 9.7M                                                  | [inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_table.pdparams) | [Table dict](../../ppocr/utils/dict/layout_dict/layout_table_dict.txt) |
 | ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect  tables  in English documents | 221M | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | same as above |

--- a/ppstructure/kie/README.md
+++ b/ppstructure/kie/README.md
@@ -242,9 +242,7 @@ For training, evaluation and inference tutorial for KIE models, please refer to
 For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](../../doc/doc_en/detection_en.md).
-For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](../../doc/doc_en/recognition.md).
+For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](../../doc/doc_en/recognition_en.md).
-If you want to finish the KIE tasks in your scene, and don't know what to prepare,  please refer to [End cdoc](../../doc/doc_en/recognition.md).
 To complete the key information extraction task in your own scenario from data preparation to model selection, please refer to: [Guide to End-to-end KIE](./how_to_do_kie_en.md)。

--- a/ppstructure/layout/README.md
+++ b/ppstructure/layout/README.md
--- a/ppstructure/layout/README_ch.md
+++ b/ppstructure/layout/README_ch.md
+简体中文 | [English](README.md)
+# 版面分析
 - [1. 简介](#1-简介)
 - [2. 安装](#2-安装)
  - [2.1 安装PaddlePaddle](#21-安装paddlepaddle)
@@ -15,8 +19,6 @@
  - [6.1 模型导出](#61-模型导出)
  - [6.2 模型推理](#62-模型推理)
-# 版面分析
 ## 1. 简介
 版面分析指的是对图片形式的文档进行区域划分，定位其中的关键区域，如文字、标题、表格、图片等。版面分析算法基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的轻量模型PP-PicoDet进行开发。
@@ -37,10 +39,10 @@
 python3 -m pip install --upgrade pip
 # GPU安装
-python3 -m pip install "paddlepaddle-gpu>=2.2" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple
 # CPU安装
-python3 -m pip install "paddlepaddle>=2.2" -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple
 ```
 更多需求，请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。

--- a/ppstructure/pdf2word/icons/chinese.png
+++ b/ppstructure/pdf2word/icons/chinese.png
--- a/ppstructure/pdf2word/icons/english.png
+++ b/ppstructure/pdf2word/icons/english.png
--- a/ppstructure/pdf2word/icons/folder-open.png
+++ b/ppstructure/pdf2word/icons/folder-open.png
--- a/ppstructure/pdf2word/icons/folder-plus.png
+++ b/ppstructure/pdf2word/icons/folder-plus.png
--- a/ppstructure/pdf2word/pdf2word.md
+++ b/ppstructure/pdf2word/pdf2word.md
+# PDF2WORD
+PDF2WORD是PaddleOCR社区开发者[whjdark](https://github.com/whjdark) 基于PP-Structure智能文档分析模型实现的PDF转换Word应用程序，提供可直接安装的exe，方便windows用户运行
+## 1.使用
+### 应用程序
+1. 下载与安装：针对Windows用户，根据[软件下载]()一节下载软件后，运行 `pdf2word.exe` 。若您下载的是lite版本，安装过程中会在线下载环境依赖、模型等必要资源，安装时间较长，请确保网络畅通。serve版本打包了相关依赖，安装时间较短，可按需下载。
+2. 转换：由于PP-Structure根据中英文数据分别进行适配，在转换相应文件时可**根据文档语言进行相应选择**。
+### 脚本运行
+首次运行需要将切换路径到 `/ppstructure/pdf2word` ，然后运行代码
+```
+python pdf2word.py
+```
+## 2.软件下载
+如需获取已打包程序，可以扫描下方二维码，关注公众号填写问卷后，加入PaddleOCR官方交流群免费获取20G OCR学习大礼包，内含OCR场景应用集合（包含数码管、液晶屏、车牌、高精度SVTR模型等7个垂类模型）、《动手学OCR》电子书、课程回放视频、前沿论文等重磅资料
+<div align="center">
+<img src="https://user-images.githubusercontent.com/50011306/186369636-35f2008b-df5a-4784-b1f5-cebebcb2b7a5.jpg"  width = "150" height = "150" />
+</div>
--- a/ppstructure/pdf2word/pdf2word.py
+++ b/ppstructure/pdf2word/pdf2word.py
+import sys
+import tarfile
+import os
+import time
+import datetime
+import functools 
+import cv2
+import platform
+import numpy as np
+from qtpy.QtWidgets import QApplication, QWidget, QPushButton, QProgressBar, \
+                           QGridLayout, QMessageBox, QLabel, QFileDialog
+from qtpy.QtCore import Signal, QThread, QObject
+from qtpy.QtGui import QImage, QPixmap, QIcon
+file = os.path.dirname(os.path.abspath(__file__))
+root = os.path.abspath(os.path.join(file, '../../'))
+sys.path.append(file)
+sys.path.insert(0, root)
+from ppstructure.predict_system import StructureSystem, save_structure_res
+from ppstructure.utility import parse_args, draw_structure_result
+from ppocr.utils.network import download_with_progressbar
+from ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx
+# from ScreenShotWidget import ScreenShotWidget
+__APPNAME__ = "pdf2word"
+__VERSION__ = "0.1.1"
+URLs_EN = {
+    # 下载超英文轻量级PP-OCRv3模型的检测模型并解压
+    "en_PP-OCRv3_det_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar",
+    # 下载英文轻量级PP-OCRv3模型的识别模型并解压
+    "en_PP-OCRv3_rec_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar",
+    # 下载超轻量级英文表格英文模型并解压
+    "en_ppstructure_mobile_v2.0_SLANet_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
+    # 英文版面分析模型
+    "picodet_lcnet_x1_0_fgd_layout_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar",
+}
+DICT_EN = {
+    "rec_char_dict_path": "en_dict.txt",
+    "layout_dict_path": "layout_publaynet_dict.txt",
+}
+URLs_CN = {
+    # 下载超中文轻量级PP-OCRv3模型的检测模型并解压
+    "cn_PP-OCRv3_det_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar",
+    # 下载中文轻量级PP-OCRv3模型的识别模型并解压
+    "cn_PP-OCRv3_rec_infer": "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar",
+    # 下载超轻量级英文表格英文模型并解压
+    "cn_ppstructure_mobile_v2.0_SLANet_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar",
+    # 中文版面分析模型
+    "picodet_lcnet_x1_0_fgd_layout_cdla_infer": "https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar",
+}
+DICT_CN = {
+    "rec_char_dict_path":  "ppocr_keys_v1.txt",
+    "layout_dict_path": "layout_cdla_dict.txt",
+}
+def QImageToCvMat(incomingImage) -> np.array:
+    '''  
+    Converts a QImage into an opencv MAT format  
+    '''
+    incomingImage = incomingImage.convertToFormat(QImage.Format.Format_RGBA8888)
+    width = incomingImage.width()
+    height = incomingImage.height()
+    ptr = incomingImage.bits()
+    ptr.setsize(height * width * 4)
+    arr = np.frombuffer(ptr, np.uint8).reshape((height, width, 4))
+    return arr
+def readImage(image_file) -> list:
+    if os.path.basename(image_file)[-3:] in ['pdf']:
+        import fitz
+        from PIL import Image
+        imgs = []
+        with fitz.open(image_file) as pdf:
+            for pg in range(0, pdf.pageCount):
+                page = pdf[pg]
+                mat = fitz.Matrix(2, 2)
+                pm = page.getPixmap(matrix=mat, alpha=False)
+                # if width or height > 2000 pixels, don't enlarge the image
+                if pm.width > 2000 or pm.height > 2000:
+                    pm = page.getPixmap(matrix=fitz.Matrix(1, 1), alpha=False)
+                img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
+                img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
+                imgs.append(img)
+    else:
+        img = cv2.imread(image_file, cv2.IMREAD_COLOR)
+        if img is not None:
+            imgs = [img]
+    return imgs
+class Worker(QThread):
+    progressBarValue = Signal(int)
+    endsignal = Signal()
+    loopFlag = True
+    def __init__(self, predictors, save_pdf, vis_font_path):
+        super(Worker, self).__init__()
+        self.predictors = predictors
+        self.save_pdf = save_pdf
+        self.vis_font_path = vis_font_path
+        self.lang = 'EN'
+        self.imagePaths = []
+        self.outputDir = None
+        self.setStackSize(1024*1024)
+    def setImagePath(self, imagePaths):
+        self.imagePaths = imagePaths
+    def setLang(self, lang):
+        self.lang = lang
+    def setOutputDir(self, outputDir):
+        self.outputDir = outputDir
+    def predictAndSave(self, imgs, img_name):
+        all_res = []
+        for index, img in enumerate(imgs):
+            res, time_dict = self.predictors[self.lang](img)
+            # save output
+            save_structure_res(res, self.outputDir, img_name)
+            draw_img = draw_structure_result(img, res, self.vis_font_path)
+            img_save_path = os.path.join(self.outputDir, img_name, 'show_{}.jpg'.format(index))
+            if res != []:
+                cv2.imwrite(img_save_path, draw_img)
+            # recovery
+            h, w, _ = img.shape
+            res = sorted_layout_boxes(res, w)
+            all_res += res
+        try:
+            convert_info_docx(img, all_res, self.outputDir, img_name, self.save_pdf)
+        except Exception as ex:
+            print(self,
+                "error in layout recovery image:{}, err msg: {}".format(
+                img_name, ex))
+        print('result save to {}'.format(self.outputDir)) 
+    def run(self):
+        try:
+            findex = 0
+            os.makedirs(self.outputDir, exist_ok=True)
+            for i, image_file in enumerate(self.imagePaths):
+                if self.loopFlag == True:
+                    imgs = readImage(image_file)
+                    if len(imgs) == 0:
+                        continue
+                    img_name = os.path.basename(image_file).split('.')[0]
+                    os.makedirs(os.path.join(self.outputDir, img_name), exist_ok=True)
+                    self.predictAndSave(imgs, img_name)
+                    findex += 1
+                    self.progressBarValue.emit(findex)
+                else:
+                    break
+            self.endsignal.emit()
+            self.exec()
+        except Exception as e:
+            print(e)
+            raise
+class APP_Image2Doc(QWidget):
+    def __init__(self):
+        super().__init__()
+        self.setFixedHeight(90)
+        self.setFixedWidth(400)
+        # settings
+        self.imagePaths = []
+#         self.screenShotWg = ScreenShotWidget()
+        self.screenShot = None
+        self.save_pdf = False
+        self.output_dir = None
+        self.vis_font_path = os.path.join(root,
+                "doc", "fonts", "simfang.ttf")
+        # ProgressBar
+        self.pb = QProgressBar()
+        self.pb.setRange(0, 100)
+        self.pb.setValue(0)
+        # 初始化界面
+        self.setupUi()
+        # 下载模型
+        self.downloadModels(URLs_EN)
+        self.downloadModels(URLs_CN)
+        # 初始化模型
+        predictors = { 
+            'EN': self.initPredictor('EN'),
+            'CN': self.initPredictor('CN'),
+        }
+        # 设置工作进程
+        self._thread = Worker(predictors, self.save_pdf, self.vis_font_path)
+        self._thread.progressBarValue.connect(self.handleProgressBarSingal)
+        self._thread.endsignal.connect(self.handleEndsignalSignal)
+        self._thread.finished.connect(QObject.deleteLater)
+        self.time_start = 0  # save start time
+    def setupUi(self):
+        self.setObjectName("MainWindow")
+        self.setWindowTitle(__APPNAME__ + " " + __VERSION__)
+        layout = QGridLayout()
+        self.openFileButton = QPushButton("打开文件")
+        self.openFileButton.setIcon(QIcon(QPixmap("./icons/folder-plus.png")))
+        layout.addWidget(self.openFileButton, 0, 0, 1, 1)
+        self.openFileButton.clicked.connect(self.handleOpenFileSignal)
+        # screenShotButton = QPushButton("截图识别")
+        # layout.addWidget(screenShotButton, 0, 1, 1, 1)
+        # screenShotButton.clicked.connect(self.screenShotSlot)
+        # screenShotButton.setEnabled(False) # temporarily disenble
+        self.startCNButton = QPushButton("中文转换")
+        self.startCNButton.setIcon(QIcon(QPixmap("./icons/chinese.png")))
+        layout.addWidget(self.startCNButton, 0, 1, 1, 1)
+        self.startCNButton.clicked.connect(
+            functools.partial(self.handleStartSignal, 'CN'))
+        self.startENButton = QPushButton("英文转换")
+        self.startENButton.setIcon(QIcon(QPixmap("./icons/english.png")))
+        layout.addWidget(self.startENButton, 0, 2, 1, 1)
+        self.startENButton.clicked.connect(
+            functools.partial(self.handleStartSignal, 'EN'))
+        self.showResultButton = QPushButton("显示结果")
+        self.showResultButton.setIcon(QIcon(QPixmap("./icons/folder-open.png")))
+        layout.addWidget(self.showResultButton, 0, 3, 1, 1)
+        self.showResultButton.clicked.connect(self.handleShowResultSignal)
+        # ProgressBar
+        layout.addWidget(self.pb, 2, 0, 1, 4)
+        # time estimate label
+        self.timeEstLabel = QLabel(
+            ("Time Left: --"))
+        layout.addWidget(self.timeEstLabel, 3, 0, 1, 4)
+        self.setLayout(layout)
+    def downloadModels(self, URLs):
+        # using custom model
+        tar_file_name_list = [
+            'inference.pdiparams', 
+            'inference.pdiparams.info', 
+            'inference.pdmodel',
+            'model.pdiparams', 
+            'model.pdiparams.info', 
+            'model.pdmodel'
+        ]
+        model_path = os.path.join(root, 'inference')
+        os.makedirs(model_path, exist_ok=True)
+        # download and unzip models
+        for name in URLs.keys():
+            url = URLs[name]
+            print("Try downloading file: {}".format(url))
+            tarname = url.split('/')[-1]
+            tarpath = os.path.join(model_path, tarname)
+            if os.path.exists(tarpath):
+                print("File have already exist. skip")
+            else:
+                try:
+                    download_with_progressbar(url, tarpath)
+                except Exception as e:
+                    print("Error occurred when downloading file, error message:")
+                    print(e)
+            # unzip model tar
+            try:
+                with tarfile.open(tarpath, 'r') as tarObj:
+                    storage_dir = os.path.join(model_path, name)
+                    os.makedirs(storage_dir, exist_ok=True)
+                    for member in tarObj.getmembers():
+                        filename = None
+                        for tar_file_name in tar_file_name_list:
+                            if tar_file_name in member.name:
+                                filename = tar_file_name
+                        if filename is None:
+                            continue
+                        file = tarObj.extractfile(member)
+                        with open(
+                                os.path.join(storage_dir, filename),
+                                'wb') as f:
+                            f.write(file.read())
+            except Exception as e:
+                    print("Error occurred when unziping file, error message:")
+                    print(e)
+    def initPredictor(self, lang='EN'):
+        # init predictor args
+        args = parse_args()
+        args.table_max_len = 488
+        args.ocr = True
+        args.recovery = True
+        args.save_pdf = self.save_pdf
+        args.table_char_dict_path = os.path.join(root, 
+                "ppocr", "utils", "dict", "table_structure_dict.txt")
+        if lang == 'EN':
+            args.det_model_dir = os.path.join(root,  # 此处从这里找到模型存放位置
+                "inference", "en_PP-OCRv3_det_infer")
+            args.rec_model_dir = os.path.join(root, 
+                "inference", "en_PP-OCRv3_rec_infer")
+            args.table_model_dir = os.path.join(root, 
+                "inference", "en_ppstructure_mobile_v2.0_SLANet_infer")
+            args.output = os.path.join(root, "output") # 结果保存路径
+            args.layout_model_dir = os.path.join(root,
+                "inference", "picodet_lcnet_x1_0_fgd_layout_infer")
+            lang_dict = DICT_EN
+        elif lang == 'CN':
+            args.det_model_dir = os.path.join(root,  # 此处从这里找到模型存放位置
+                "inference", "cn_PP-OCRv3_det_infer")
+            args.rec_model_dir = os.path.join(root, 
+                "inference", "cn_PP-OCRv3_rec_infer")
+            args.table_model_dir = os.path.join(root, 
+                "inference", "cn_ppstructure_mobile_v2.0_SLANet_infer")
+            args.output = os.path.join(root, "output") # 结果保存路径
+            args.layout_model_dir = os.path.join(root,
+                "inference", "picodet_lcnet_x1_0_fgd_layout_cdla_infer")
+            lang_dict = DICT_CN
+        else:
+            raise ValueError("Unsupported language")
+        args.rec_char_dict_path = os.path.join(root, 
+                "ppocr", "utils", 
+                lang_dict['rec_char_dict_path'])
+        args.layout_dict_path = os.path.join(root,
+                "ppocr", "utils", "dict", "layout_dict", 
+                lang_dict['layout_dict_path'])
+        # init predictor
+        return StructureSystem(args)
+    def handleOpenFileSignal(self):
+        '''
+        可以多选图像文件
+        '''
+        selectedFiles = QFileDialog.getOpenFileNames(self, 
+            "多文件选择", "/", "图片文件 (*.png *.jpeg *.jpg *.bmp *.pdf)")[0]
+        if len(selectedFiles) > 0:
+            self.imagePaths = selectedFiles
+            self.screenShot = None # discard screenshot temp image
+            self.pb.setRange(0, len(self.imagePaths))
+            self.pb.setValue(0)
+#     def screenShotSlot(self):
+#         '''
+#         选定图像文件和截图的转换过程只能同时进行一个
+#         截图只能同时转换一个
+#         '''
+#         self.screenShotWg.start()
+#         if self.screenShotWg.captureImage:
+#             self.screenShot = self.screenShotWg.captureImage
+#             self.imagePaths.clear() # discard openfile temp list
+#             self.pb.setRange(0, 1)
+#             self.pb.setValue(0)
+    def handleStartSignal(self, lang):
+        if self.screenShot: # for screenShot
+            img_name = 'screenshot_' + time.strftime("%Y%m%d%H%M%S", time.localtime())
+            image = QImageToCvMat(self.screenShot)
+            self.predictAndSave(image, img_name, lang)
+            # update Progress Bar
+            self.pb.setValue(1)
+            QMessageBox.information(self, 
+                u'Information', "文档提取完成")
+        elif len(self.imagePaths) > 0 : # for image file selection
+            # Must set image path list and language before start
+            self.output_dir = os.path.join(
+                os.path.dirname(self.imagePaths[0]), "output")  # output_dir shold be same as imagepath
+            self._thread.setOutputDir(self.output_dir)
+            self._thread.setImagePath(self.imagePaths)
+            self._thread.setLang(lang)
+            # disenble buttons
+            self.openFileButton.setEnabled(False)
+            self.startCNButton.setEnabled(False)
+            self.startENButton.setEnabled(False)
+            # 启动工作进程
+            self._thread.start()
+            self.time_start = time.time() # log start time
+            QMessageBox.information(self, 
+                u'Information', "开始转换")
+        else:
+            QMessageBox.warning(self, 
+                u'Information', "请选择要识别的文件或截图")
+    def handleShowResultSignal(self):
+        if self.output_dir is None:
+            return
+        if os.path.exists(self.output_dir):
+            if platform.system() == 'Windows':
+                os.startfile(self.output_dir)
+            else:
+                os.system('open ' + os.path.normpath(self.output_dir))
+        else:
+            QMessageBox.information(self, 
+                u'Information', "输出文件不存在")
+    def handleProgressBarSingal(self, i):
+        self.pb.setValue(i)
+        # calculate time left of recognition
+        lenbar = self.pb.maximum()
+        avg_time = (time.time() - self.time_start) / i  # Use average time to prevent time fluctuations
+        time_left = str(datetime.timedelta(seconds=avg_time * (lenbar - i))).split(".")[0]  # Remove microseconds
+        self.timeEstLabel.setText(f"Time Left: {time_left}")  # show time left
+    def handleEndsignalSignal(self):
+        # enble buttons
+        self.openFileButton.setEnabled(True)
+        self.startCNButton.setEnabled(True)
+        self.startENButton.setEnabled(True)
+        QMessageBox.information(self, u'Information', "转换结束")
+def main():
+    app = QApplication(sys.argv)
+    window = APP_Image2Doc()  # 创建对象
+    window.show()  # 全屏显示窗口
+    QApplication.processEvents()
+    sys.exit(app.exec())
+if __name__ == "__main__":
+    main()
--- a/ppstructure/recovery/README.md
+++ b/ppstructure/recovery/README.md
@@ -66,7 +66,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
 - **(2) Install recovery's `requirements`**
-The layout restoration is exported as docx and PDF files, so python-docx and docx2pdf API need to be installed, and fitz and PyMuPDF apis need to be installed to process the input files in pdf format.
+The layout restoration is exported as docx and PDF files, so python-docx and docx2pdf API need to be installed, and PyMuPDF api([requires Python >= 3.7](https://pypi.org/project/PyMuPDF/)) need to be installed to process the input files in pdf format.
 ```bash
 python3 -m pip install -r ppstructure/recovery/requirements.txt

--- a/ppstructure/recovery/README_ch.md
+++ b/ppstructure/recovery/README_ch.md
@@ -68,7 +68,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
 - **（2）安装recovery的`requirements`**
-版面恢复导出为docx、pdf文件，所以需要安装python-docx、docx2pdf API，同时处理pdf格式的输入文件，需要安装fitz、PyMuPDF API。
+版面恢复导出为docx、pdf文件，所以需要安装python-docx、docx2pdf API，同时处理pdf格式的输入文件，需要安装PyMuPDF API([要求Python >= 3.7](https://pypi.org/project/PyMuPDF/))。
 ```bash
 python3 -m pip install -r ppstructure/recovery/requirements.txt

--- a/ppstructure/recovery/requirements.txt
+++ b/ppstructure/recovery/requirements.txt
 python-docx
 docx2pdf
-fitz
+PyMuPDF
-PyMuPDF==1.16.14
 beautifulsoup4
\ No newline at end of file