提交 f492047b 编写于 作者: qq_25193841's avatar qq_25193841

Merge remote-tracking branch 'origin/release/2.6' into release2.6

......@@ -32,10 +32,15 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
- [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
- [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
- **🔥2022.7 Release [OCR scene application collection](./applications/README_en.md)**
- **🔥2022.8 Release [OCR scene application collection](./applications/README_en.md)**
- Release **9 vertical models** such as digital tube, LCD screen, license plate, handwriting recognition model, high-precision SVTR model, etc, covering the main OCR vertical applications in general, manufacturing, finance, and transportation industries.
- **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- **2022.8 Add implementation of [8 cutting-edge algorithms](doc/doc_en/algorithm_overview_en.md)**
- Text Detection: [FCENet](doc/doc_en/algorithm_det_fcenet_en.md), [DB++](doc/doc_en/algorithm_det_db_en.md)
- Text Recognition: [ViTSTR](doc/doc_en/algorithm_rec_vitstr_en.md), [ABINet](doc/doc_en/algorithm_rec_abinet_en.md), [VisionLAN](doc/doc_en/algorithm_rec_visionlan_en.md), [SPIN](doc/doc_en/algorithm_rec_spin_en.md), [RobustScanner](doc/doc_en/algorithm_rec_robustscanner_en.md)
- Table Recognition: [TableMaster](doc/doc_en/algorithm_table_master_en.md)
- **2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
- Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
- Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
......
......@@ -28,14 +28,19 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 近期更新
- **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- 发布[PP-Structurev2](./ppstructure/),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery),支持**一行命令完成PDF转Word**
- [版面分析](./ppstructure/layout)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
- [表格识别](./ppstructure/table)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
- [关键信息抽取](./ppstructure/kie)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
- 发布[PP-Structurev2](./ppstructure/README_ch.md),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery/README_ch.md),支持**一行命令完成PDF转Word**
- [版面分析](./ppstructure/layout/README_ch.md)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
- [表格识别](./ppstructure/table/README_ch.md)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
- [关键信息抽取](./ppstructure/kie/README_ch.md)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
- **🔥2022.8 发布 [OCR场景应用集合](./applications)**
- 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
- **2022.8 新增实现[8种前沿算法](doc/doc_ch/algorithm_overview.md)**
- 文本检测:[FCENet](doc/doc_ch/algorithm_det_fcenet.md), [DB++](doc/doc_ch/algorithm_det_db.md)
- 文本识别:[ViTSTR](doc/doc_ch/algorithm_rec_vitstr.md), [ABINet](doc/doc_ch/algorithm_rec_abinet.md), [VisionLAN](doc/doc_ch/algorithm_rec_visionlan.md), [SPIN](doc/doc_ch/algorithm_rec_spin.md), [RobustScanner](doc/doc_ch/algorithm_rec_robustscanner.md)
- 表格识别:[TableMaster](doc/doc_ch/algorithm_table_master.md)
- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
......
......@@ -24,7 +24,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
### 1.1 文本检测算法
已支持的文本检测算法列表(戳链接获取使用教程):
- [x] [DB](./algorithm_det_db.md)
- [x] [DB与DB++](./algorithm_det_db.md)
- [x] [EAST](./algorithm_det_east.md)
- [x] [SAST](./algorithm_det_sast.md)
- [x] [PSENet](./algorithm_det_psenet.md)
......@@ -41,6 +41,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|DB++|ResNet50|90.89%|82.66%|86.58%|[合成数据预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
在Total-text文本检测公开数据集上,算法效果如下:
......@@ -129,10 +130,10 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
已支持的关键信息抽取算法列表(戳链接获取使用教程):
- [x] [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm.md)
- [x] [LayoutLM](./algorithm_kie_laoutxlm.md)
- [x] [LayoutLMv2](./algorithm_kie_laoutxlm.md)
- [x] [LayoutXLM](./algorithm_kie_laoutxlm.md)
- [x] [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm.md)
- [x] [LayoutLM](./algorithm_kie_layoutxlm.md)
- [x] [LayoutLMv2](./algorithm_kie_layoutxlm.md)
- [x] [LayoutXLM](./algorithm_kie_layoutxlm.md)
- [x] [SDMGR](././algorithm_kie_sdmgr.md)
在wildreceipt发票公开数据集上,算法复现效果如下:
......
# DB
# DB && DB++
- [1. Introduction](#1)
- [2. Environment](#2)
......@@ -21,13 +21,23 @@ Paper:
> Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang
> AAAI, 2020
> [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
> Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang
> TPAMI, 2022
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
| --- | --- | --- | --- | --- | --- | --- |
|DB|ResNet50_vd|[configs/det/det_r50_vd_db.yml](../../configs/det/det_r50_vd_db.yml)|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|DB|MobileNetV3|[configs/det/det_mv3_db.yml](../../configs/det/det_mv3_db.yml)|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|DB++|ResNet50|[configs/det/det_r50_db++_ic15.yml](../../configs/det/det_r50_db++_ic15.yml)|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
On the TD_TR dataset, the text detection result is as follows:
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
| --- | --- | --- | --- | --- | --- | --- |
|DB++|ResNet50|[configs/det/det_r50_db++_td_tr.yml](../../configs/det/det_r50_db++_td_tr.yml)|92.92%|86.48%|89.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_td_tr_train.tar)|
<a name="2"></a>
## 2. Environment
......@@ -96,4 +106,12 @@ More deployment schemes supported for DB:
pages={11474--11481},
year={2020}
}
```
\ No newline at end of file
@article{liao2022real,
title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2022},
publisher={IEEE}
}
```
......@@ -22,7 +22,7 @@ Developers are welcome to contribute more algorithms! Please refer to [add new a
### 1.1 Text Detection Algorithms
Supported text detection algorithms (Click the link to get the tutorial):
- [x] [DB](./algorithm_det_db_en.md)
- [x] [DB && DB++](./algorithm_det_db_en.md)
- [x] [EAST](./algorithm_det_east_en.md)
- [x] [SAST](./algorithm_det_sast_en.md)
- [x] [PSENet](./algorithm_det_psenet_en.md)
......@@ -39,6 +39,7 @@ On the ICDAR2015 dataset, the text detection result is as follows:
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|DB++|ResNet50|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
On Total-Text dataset, the text detection result is as follows:
......@@ -127,10 +128,10 @@ On the PubTabNet dataset, the algorithm result is as follows:
Supported KIE algorithms (Click the link to get the tutorial):
- [x] [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm_en.md)
- [x] [LayoutLM](./algorithm_kie_laoutxlm_en.md)
- [x] [LayoutLMv2](./algorithm_kie_laoutxlm_en.md)
- [x] [LayoutXLM](./algorithm_kie_laoutxlm_en.md)
- [x] [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm_en.md)
- [x] [LayoutLM](./algorithm_kie_layoutxlm_en.md)
- [x] [LayoutLMv2](./algorithm_kie_layoutxlm_en.md)
- [x] [LayoutXLM](./algorithm_kie_layoutxlm_en.md)
- [x] [SDMGR](./algorithm_kie_sdmgr_en.md)
On wildreceipt dataset, the algorithm result is as follows:
......
......@@ -24,7 +24,7 @@ class BaseRecLabelDecode(object):
def __init__(self, character_dict_path=None, use_space_char=False):
self.beg_str = "sos"
self.end_str = "eos"
self.reverse = False
self.character_str = []
if character_dict_path is None:
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
......@@ -38,6 +38,8 @@ class BaseRecLabelDecode(object):
if use_space_char:
self.character_str.append(" ")
dict_character = list(self.character_str)
if 'arabic' in character_dict_path:
self.reverse = True
dict_character = self.add_special_char(dict_character)
self.dict = {}
......@@ -45,11 +47,6 @@ class BaseRecLabelDecode(object):
self.dict[char] = i
self.character = dict_character
if 'arabic' in character_dict_path:
self.reverse = True
else:
self.reverse = False
def pred_reverse(self, pred):
pred_re = []
c_current = ''
......
......@@ -3,21 +3,22 @@ English | [简体中文](README_ch.md)
# Layout analysis
- [1. Introduction](#1-Introduction)
- [2. Install](#2-Install)
- [2.1 Install PaddlePaddle](#21-Install-paddlepaddle)
- [2.2 Install PaddleDetection](#22-Install-paddledetection)
- [3. Data preparation](#3-Data-preparation)
- [3.1 English data set](#31-English-data-set)
- [3.2 More datasets](#32-More-datasets)
- [4. Start training](#4-Start-training)
- [4.1 Train](#41-Train)
- [4.2 FGD Distillation training](#42-FGD-Distillation-training)
- [5. Model evaluation and prediction](#5-Model-evaluation-and-prediction)
- [5.1 Indicator evaluation](#51-Indicator-evaluation)
- [5.2 Test layout analysis results](#52-Test-layout-analysis-results)
- [6 Model export and inference](#6-Model-export-and-inference)
- [6.1 Model export](#61-Model-export)
- [6.2 Model inference](#62-Model-inference)
- [2. Quick start](#3-Quick-start)
- [3. Install](#3-Install)
- [3.1 Install PaddlePaddle](#31-Install-paddlepaddle)
- [3.2 Install PaddleDetection](#32-Install-paddledetection)
- [4. Data preparation](#4-Data-preparation)
- [4.1 English data set](#41-English-data-set)
- [4.2 More datasets](#42-More-datasets)
- [5. Start training](#5-Start-training)
- [5.1 Train](#51-Train)
- [5.2 FGD Distillation training](#52-FGD-Distillation-training)
- [6. Model evaluation and prediction](#6-Model-evaluation-and-prediction)
- [6.1 Indicator evaluation](#61-Indicator-evaluation)
- [6.2 Test layout analysis results](#62-Test-layout-analysis-results)
- [7 Model export and inference](#7-Model-export-and-inference)
- [7.1 Model export](#71-Model-export)
- [7.2 Model inference](#72-Model-inference)
## 1. Introduction
......@@ -28,11 +29,12 @@ Layout analysis refers to the regional division of documents in the form of pict
<img src="../docs/layout/layout.png" width="800">
</div>
## 2. Quick start
PP-Structure currently provides layout analysis models in Chinese, English and table documents. For the model link, see [models_list](../docs/models_list_en.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
## 3. Install
## 2. Install
### 2.1. Install PaddlePaddle
### 3.1. Install PaddlePaddle
- **(1) Install PaddlePaddle**
......@@ -47,7 +49,7 @@ python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simp
```
For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)
### 2.2. Install PaddleDetection
### 3.2. Install PaddleDetection
- **(1)Download PaddleDetection Source code**
......@@ -62,11 +64,11 @@ cd PaddleDetection
python3 -m pip install -r requirements.txt
```
## 3. Data preparation
## 4. Data preparation
If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model.
### 3.1. English data set
### 4.1. English data set
Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(Dataset 96G),contains 5 classes:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
......@@ -141,7 +143,7 @@ The JSON file contains the annotations of all images, and the data is stored in
}
```
### 3.2. More datasets
### 4.2. More datasets
We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links,process to the JSON format of the above annotation file,that is, the training can be conducted in the same way。
......@@ -154,7 +156,7 @@ We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. d
| [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title |
## 4. Start training
## 5. Start training
Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section.
......@@ -171,7 +173,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
If the test image is Chinese, the pre-trained model of Chinese CDLA dataset can be downloaded to identify 10 types of document regions:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation,Download the training model and inference model of Model 'picodet_lcnet_x1_0_fgd_layout_cdla' in [layout analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)。If only the table area in the image is detected, you can download the pre-trained model of the table dataset, and download the training model and inference model of the 'picodet_LCnet_x1_0_FGd_layout_table' model in [Layout Analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)
### 4.1. Train
### 5.1. Train
Train:
......@@ -247,7 +249,7 @@ After starting training normally, you will see the following log output:
**Note that the configuration file for prediction / evaluation must be consistent with the training.**
### 4.2. FGD Distillation Training
### 5.2. FGD Distillation Training
PaddleDetection supports FGD-based [Focal and Global Knowledge Distillation for Detectors]( https://arxiv.org/abs/2111.11837v1) The training process of the target detection model of distillation, FGD distillation is divided into two parts `Focal` and `Global`. `Focal` Distillation separates the foreground and background of the image, allowing the student model to focus on the key pixels of the foreground and background features of the teacher model respectively;` Global`Distillation section reconstructs the relationships between different pixels and transfers them from the teacher to the student to compensate for the global information lost in `Focal`Distillation.
......@@ -265,9 +267,9 @@ python3 tools/train.py \
- `-c`: Specify the model configuration file.
- `--slim_config`: Specify the compression policy profile.
## 5. Model evaluation and prediction
## 6. Model evaluation and prediction
### 5.1. Indicator evaluation
### 6.1. Indicator evaluation
Model parameters in training are saved by default in `output/picodet_ Lcnet_ X1_ 0_ Under the layout` directory. When evaluating indicators, you need to set `weights` to point to the saved parameter file.Assessment datasets can be accessed via `configs/picodet/legacy_ Model/application/layout_ Analysis/picodet_ Lcnet_ X1_ 0_ Layout. Yml` . Modify `EvalDataset` : `img_dir`,`anno_ Path`and`dataset_dir` setting.
......@@ -310,7 +312,7 @@ python3 tools/eval.py \
- `--slim_config`: Specify the distillation policy profile.
- `-o weights`: Specify the model path trained by the distillation algorithm.
### 5.2. Test Layout Analysis Results
### 6.2. Test Layout Analysis Results
The profile predicted to be used must be consistent with the training, for example, if you pass `python3 tools/train'. Py-c configs/picodet/legacy_ Model/application/layout_ Analysis/picodet_ Lcnet_ X1_ 0_ Layout. Yml` completed the training process for the model.
......@@ -343,10 +345,10 @@ python3 tools/infer.py \
```
## 6. Model Export and Inference
## 7. Model Export and Inference
### 6.1 Model Export
### 7.1 Model Export
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
......@@ -385,7 +387,7 @@ python3 tools/export_model.py \
--output_dir=output_inference/
```
### 6.2 Model inference
### 7.2 Model inference
Replace model_with the provided inference training model for inference or the FGD distillation training `model_dir`Inference model path, execute the following commands for inference:
......
......@@ -3,21 +3,22 @@
# 版面分析
- [1. 简介](#1-简介)
- [2. 安装](#2-安装)
- [2.1 安装PaddlePaddle](#21-安装paddlepaddle)
- [2.2 安装PaddleDetection](#22-安装paddledetection)
- [3. 数据准备](#3-数据准备)
- [3.1 英文数据集](#31-英文数据集)
- [3.2 更多数据集](#32-更多数据集)
- [4. 开始训练](#4-开始训练)
- [4.1 启动训练](#41-启动训练)
- [4.2 FGD蒸馏训练](#42-FGD蒸馏训练)
- [5. 模型评估与预测](#5-模型评估与预测)
- [5.1 指标评估](#51-指标评估)
- [5.2 测试版面分析结果](#52-测试版面分析结果)
- [6 模型导出与预测](#6-模型导出与预测)
- [6.1 模型导出](#61-模型导出)
- [6.2 模型推理](#62-模型推理)
- [2. 快速开始](#2-快速开始)
- [3. 安装](#3-安装)
- [3.1 安装PaddlePaddle](#31-安装paddlepaddle)
- [3.2 安装PaddleDetection](#32-安装paddledetection)
- [4. 数据准备](#4-数据准备)
- [4.1 英文数据集](#41-英文数据集)
- [4.2 更多数据集](#42-更多数据集)
- [5. 开始训练](#5-开始训练)
- [5.1 启动训练](#51-启动训练)
- [5.2 FGD蒸馏训练](#52-FGD蒸馏训练)
- [6. 模型评估与预测](#6-模型评估与预测)
- [6.1 指标评估](#61-指标评估)
- [6.2 测试版面分析结果](#62-测试版面分析结果)
- [7 模型导出与预测](#7-模型导出与预测)
- [7.1 模型导出](#71-模型导出)
- [7.2 模型推理](#72-模型推理)
## 1. 简介
......@@ -26,12 +27,14 @@
<div align="center">
<img src="../docs/layout/layout.png" width="800">
</div>
## 2. 快速开始
PP-Structure目前提供了中文、英文、表格三类文档版面分析模型,模型链接见 [models_list](../docs/models_list.md#1-版面分析模型)。也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)
## 2. 安装依赖
## 3. 安装依赖
### 2.1. 安装PaddlePaddle
### 3.1. 安装PaddlePaddle
- **(1) 安装PaddlePaddle**
......@@ -46,7 +49,7 @@ python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simp
```
更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
### 2.2. 安装PaddleDetection
### 3.2. 安装PaddleDetection
- **(1)下载PaddleDetection源码**
......@@ -61,11 +64,11 @@ cd PaddleDetection
python3 -m pip install -r requirements.txt
```
## 3. 数据准备
## 4. 数据准备
如果希望直接体验预测过程,可以跳过数据准备,下载我们提供的预训练模型。
### 3.1. 英文数据集
### 4.1. 英文数据集
下载文档分析数据集[PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(数据集96G),包含5个类:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
......@@ -140,7 +143,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
}
```
### 3.2. 更多数据集
### 4.2. 更多数据集
我们提供了CDLA(中文版面分析)、TableBank(表格版面分析)等数据集的下连接,处理为上述标注文件json格式,即可以按相同方式进行训练。
......@@ -153,7 +156,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
| [DocBank](https://github.com/doc-analysis/DocBank) | 使用弱监督方法构建的大规模数据集(500K文档页面),用于文档布局分析,包含12类:Author、Caption、Date、Equation、Figure、Footer、List、Paragraph、Reference、Section、Table、Title |
## 4. 开始训练
## 5. 开始训练
提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。
......@@ -170,7 +173,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
如果测试图片为中文,可以下载中文CDLA数据集的预训练模型,识别10类文档区域:Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation,在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_cdla`模型的训练模型和推理模型。如果只检测图片中的表格区域,可以下载表格数据集的预训练模型,在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_table`模型的训练模型和推理模型。
### 4.1. 启动训练
### 5.1. 启动训练
开始训练:
......@@ -246,7 +249,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
**注意,预测/评估时的配置文件请务必与训练一致。**
### 4.2. FGD蒸馏训练
### 5.2. FGD蒸馏训练
PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837v1))蒸馏的目标检测模型训练过程,FGD蒸馏分为两个部分`Focal``Global``Focal`蒸馏分离图像的前景和背景,让学生模型分别关注教师模型的前景和背景部分特征的关键像素;`Global`蒸馏部分重建不同像素之间的关系并将其从教师转移到学生,以补偿`Focal`蒸馏中丢失的全局信息。
......@@ -264,9 +267,9 @@ python3 tools/train.py \
- `-c`: 指定模型配置文件。
- `--slim_config`: 指定压缩策略配置文件。
## 5. 模型评估与预测
## 6. 模型评估与预测
### 5.1. 指标评估
### 6.1. 指标评估
训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir``anno_path``dataset_dir` 设置。
......@@ -309,7 +312,7 @@ python3 tools/eval.py \
- `--slim_config`: 指定蒸馏策略配置文件。
- `-o weights`: 指定蒸馏算法训好的模型路径。
### 5.2. 测试版面分析结果
### 6.2 测试版面分析结果
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。
......@@ -342,10 +345,10 @@ python3 tools/infer.py \
```
## 6. 模型导出与预测
## 7. 模型导出与预测
### 6.1 模型导出
### 7.1 模型导出
inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
......@@ -382,7 +385,7 @@ python3 tools/export_model.py \
### 6.2 模型推理
### 7.2 模型推理
若使用**提供的推理训练模型推理**,或使用**FGD蒸馏训练的模型**,更换`model_dir`推理模型路径,执行如下命令进行推理:
......
# PDF2WORD
# PDF2Word
PDF2WORD是PaddleOCR社区开发者[whjdark](https://github.com/whjdark) 基于PP-Structure智能文档分析模型实现的PDF转换Word应用程序,提供可直接安装的exe,方便windows用户运行
PDF2Word是PaddleOCR社区开发者[whjdark](https://github.com/whjdark) 基于PP-Structure智能文档分析模型实现的PDF转换Word应用程序,提供可直接安装的exe,方便windows用户运行
## 1.使用
### 应用程序
1. 下载与安装:针对Windows用户,根据[软件下载]()一节下载软件后,运行 `pdf2word.exe` 。若您下载的是lite版本,安装过程中会在线下载环境依赖、模型等必要资源,安装时间较长,请确保网络畅通。serve版本打包了相关依赖,安装时间较短,可按需下载。
1. 下载与安装:针对Windows用户,根据[软件下载]()一节下载软件后,运行 `启动程序.exe` 。若您下载的是lite版本,安装过程中会在线下载环境依赖、模型等必要资源,安装时间较长,请确保网络畅通。serve版本打包了相关依赖,安装时间较短,可按需下载。
2. 转换:由于PP-Structure根据中英文数据分别进行适配,在转换相应文件时可**根据文档语言进行相应选择**
......
......@@ -25,7 +25,6 @@ Layout recovery combines [layout analysis](../layout/README.md)、[table recogni
<div align="center">
<img src="../docs/recovery/recovery_ch.jpg" width = "800" />
</div>
<a name="2"></a>
## 2. Install
......@@ -44,7 +43,6 @@ python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simpl
# CPU installation
python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
````
For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html).
......@@ -85,6 +83,8 @@ Through layout analysis, we divided the image/PDF documents into regions, locate
We can restore the test picture through the layout information, OCR detection and recognition structure, table information, and saved pictures.
The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
<a name="3.1"></a>
### 3.1 Download models
......@@ -151,10 +151,10 @@ Field:
## 4. More
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/detection.md).
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/detection_en.md).
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition.md).
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/recognition_en.md).
For training, evaluation and inference tutorial for layout analysis models, please refer to [layout analysis doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README_ch.md)
For training, evaluation and inference tutorial for layout analysis models, please refer to [layout analysis doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md)
For training, evaluation and inference tutorial for table recognition models, please refer to [table recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/README_ch.md)
For training, evaluation and inference tutorial for table recognition models, please refer to [table recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/README.md)
......@@ -6,7 +6,6 @@
- [2. 安装](#2)
- [2.1 安装依赖](#2.1)
- [2.2 安装PaddleOCR](#2.2)
- [3. 使用](#3)
- [3.1 下载模型](#3.1)
- [3.2 版面恢复](#3.2)
......@@ -27,7 +26,6 @@
<div align="center">
<img src="../docs/recovery/recovery_ch.jpg" width = "800" />
</div>
<a name="2"></a>
## 2. 安装
......@@ -87,6 +85,8 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
我们通过版面信息、OCR检测和识别结构、表格信息、保存的图片,对测试图片进行恢复即可。
提供如下代码实现版面恢复,也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)
<a name="3.1"></a>
### 3.1 下载模型
......
......@@ -51,7 +51,9 @@ The performance indicators are explained as follows:
### 4.1 Quick start
PP-Structure currently provides table recognition models in both Chinese and English. For the model link, see [models_list](../docs/models_list.md). The following takes the Chinese table recognition model as an example to introduce how to recognize a table.
PP-Structure currently provides table recognition models in both Chinese and English. For the model link, see [models_list](../docs/models_list.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
The following takes the Chinese table recognition model as an example to introduce how to recognize a table.
Use the following commands to quickly complete the identification of a table.
......
......@@ -57,7 +57,9 @@
### 4.1 快速开始
PP-Structure目前提供了中英文两种语言的表格识别模型,模型链接见 [models_list](../docs/models_list.md)。下面以中文表格识别模型为例,介绍如何识别一张表格。
PP-Structure目前提供了中英文两种语言的表格识别模型,模型链接见 [models_list](../docs/models_list.md)。也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)
下面以中文表格识别模型为例,介绍如何识别一张表格。
使用如下命令即可快速完成一张表格的识别。
```python
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册