未验证 提交 b4367920 编写于 作者: X xiaoting 提交者: GitHub

Merge branch 'release/2.4' into add_serving_c++

......@@ -164,10 +164,10 @@ python PPOCRLabel.py
- Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection.
- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languagesinclude French, German, Korean, and Japanese.
- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languages include French, German, Korean, and Japanese.
For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating)
- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/ 2.3/PPOCRLabel/PPOCRLabel.py#L116) :
- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/PPOCRLabel.py#L116) :
add parameter `det_model_dir` in `self.ocr = PaddleOCR(use_pdserving=False, use_angle_cls=True, det=True, cls=True, use_gpu=gpu, lang=lang) `
......
......@@ -73,25 +73,26 @@ PPOCRLabel --lang ch # 启动
> 如果上述安装出现问题,可以参考3.6节 错误提示
#### 1.2.2 本地构建whl包并安装
#### 1.2.2 通过Python脚本运行PPOCRLabel
如果您对PPOCRLabel文件有所更改(例如指定新的内置模型),通过Python脚本运行会更加方面的看到更改的结果。如果仍然需要通过whl包启动,则需要参考下节重新编译whl包。
```bash
cd PaddleOCR/PPOCRLabel
python3 setup.py bdist_wheel
pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple
cd ./PPOCRLabel # 切换到PPOCRLabel目录
python PPOCRLabel.py --lang ch
```
#### 1.2.3 通过Python脚本运行PPOCRLabel
#### 1.2.3 本地构建whl包并安装
如果您对PPOCRLabel文件有所更改,通过Python脚本运行会更加方面的看到更改的结果
编译与安装新的whl包,其中1.0.2为版本号,可在 `setup.py` 中指定新版本。
```bash
cd ./PPOCRLabel # 切换到PPOCRLabel目录
python PPOCRLabel.py --lang ch
cd PaddleOCR/PPOCRLabel
python3 setup.py bdist_wheel
pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple
```
## 2. 使用
### 2.1 操作步骤
......
......@@ -104,7 +104,7 @@ For a new language request, please refer to [Guideline for new language_requests
- [Quick Start](./doc/doc_en/quickstart_en.md)
- [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md)
- PP-OCR Industry Landing: from Training to Deployment
- [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md)
- [PP-OCR Model Zoo](./doc/doc_en/models_en.md)
- [PP-OCR Model Download](./doc/doc_en/models_list_en.md)
- [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md)
- [PP-OCR Training](./doc/doc_en/training_en.md)
......@@ -112,6 +112,10 @@ For a new language request, please refer to [Guideline for new language_requests
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Text Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- PP-OCR Models Compression
- [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md)
- [Model Quantization](./deploy/slim/quantization/README_en.md)
- [Model Pruning](./deploy/slim/prune/README_en.md)
- Inference and Deployment
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/pdserving/README.md)
......
......@@ -32,7 +32,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- PP-OCR系列高质量预训练模型,准确的识别效果
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 通用PP-OCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语等约80种语言
- PP-Structure文档结构化系统
......@@ -53,8 +53,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 社区、社区贡献与社区常规赛
- 加入社区:微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入。
- 社区贡献:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙、也是帮助优质项目宣传的广播站。如果您的OCR项目未被收集在文档中,可根据文档说明与我们联系。最新社区贡献可查看[此处](#社区贡献)
- 社区常规赛:作为社区贡献的具体承载形式,社区常规赛是面向OCR开发者的积分赛事。首届社区常规赛与[《动手学OCR · 十讲》课程](https://aistudio.baidu.com/aistudio/course/introduce/25207)联合推广。社区常规赛的赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
- 社区贡献:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙、也是帮助优质项目宣传的广播站。如果您的OCR项目未被收集在文档中,可根据文档说明与我们联系。
- 社区常规赛:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)
<div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
......@@ -79,30 +79,35 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md)
- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
- [快速开始(中英文/多语言/版面分析)](./doc/doc_ch/quickstart.md)
- [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md)
- PP-OCR产业落地:从训练到部署
- [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md)
- [PP-OCR模型](./doc/doc_ch/models.md)
- [PP-OCR模型下载](./doc/doc_ch/models_list.md)
- [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
- [Python引擎的PP-OCR模型库推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [文本方向分类器](./doc/doc_ch/angle_class.md)
- [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- PP-OCR模型压缩
- [知识蒸馏](./doc/doc_ch/knowledge_distillation.md)
- [模型量化](./deploy/slim/quantization/README.md)
- [模型裁剪](./deploy/slim/prune/README.md)
- PP-OCR模型推理部署
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md)
- [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- [PP-Structure信息提取](./ppstructure/README_ch.md)
- [版面分析](./ppstructure/layout/README_ch.md)
- [表格识别](./ppstructure/table/README_ch.md)
- [DocVQA](./ppstructure/vqa/README_ch.md)
- [关键信息提取](./ppstructure/docs/kie.md)
- OCR学术圈
- [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
- [关键信息提取](./ppstructure/docs/kie_ch.md)
- OCR学术前沿模型介绍与下载
- [文本检测算法](./doc/doc_ch/algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95)
- [文本识别算法](./doc/doc_ch/algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
- [端到端PGNet算法](./doc/doc_ch/pgnet.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
- [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md)
......@@ -157,17 +162,6 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<img src="./doc/imgs_results/french_0.jpg" width="800">
<img src="./doc/imgs_results/korean.jpg" width="800">
</div>
<a name="社区贡献"></a>
## 最新社区贡献
- 基于PaddleOCR的社区项目: [FastOCRLabel](https://gitee.com/BaoJianQiang/FastOCRLabel):完整的C#版本标注工具 (@ [包建强](https://gitee.com/BaoJianQiang) )
- 为PaddleOCR新增功能:非常感谢 [Evezerest](https://github.com/Evezerest)[ninetailskim](https://github.com/ninetailskim)[edencfc](https://github.com/edencfc)[BeyondYourself](https://github.com/BeyondYourself)[1084667371](https://github.com/1084667371) 贡献了[PPOCRLabel](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/README_ch.md) 的完整代码。
- 代码与文档优化:非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
- 多语言语料:非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料([#954](https://github.com/PaddlePaddle/PaddleOCR/pull/954))。
完整社区贡献列表可查看[社区贡献文档](./doc/doc_ch/thirdparty.md)
<a name="许可证书"></a>
## 许可证书
......
# paddle2onnx 模型转化与预测
# Paddle2ONNX模型转化与预测
本章节介绍 PaddleOCR 模型如何转化为 ONNX 模型,并基于 ONNXRuntime 引擎预测。
......
......@@ -30,7 +30,8 @@ The introduction and tutorial of Paddle Serving service deployment framework ref
PaddleOCR operating environment and Paddle Serving operating environment are needed.
1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md).
Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.2.1.
Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.2.2
2. The steps of PaddleServing operating environment prepare are as follows:
......@@ -52,6 +53,7 @@ wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-
pip3 install paddle_serving_app-0.8.3-py3-none-any.whl
```
**note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md).
......
......@@ -8,8 +8,7 @@ PaddleOCR提供2种服务部署方式:
# 基于PaddleServing的服务部署
本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PPOCR
动态图模型的pipeline在线服务。
本文档将介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PP-OCR动态图模型的pipeline在线服务。
相比较于hubserving部署,PaddleServing具备以下优点:
- 支持客户端和服务端之间高并发和高效通信
......@@ -32,7 +31,8 @@ PaddleOCR提供2种服务部署方式:
需要准备PaddleOCR的运行环境和Paddle Serving的运行环境。
- 准备PaddleOCR的运行环境[链接](../../doc/doc_ch/installation.md)
根据环境下载对应的paddle whl包,推荐安装2.2.1版本
根据环境下载对应的paddlepaddle whl包,推荐安装2.2.2版本
- 准备PaddleServing的运行环境,步骤如下
......@@ -61,7 +61,7 @@ pip3 install paddle_serving_app-0.8.3-py3-none-any.whl
使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。
首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
首先,下载PP-OCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
```bash
# 下载并解压 OCR 文本检测模型
......
## 介绍
# PP-OCR模型裁剪
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
......@@ -7,13 +7,13 @@
在开始本教程之前,建议先了解:
1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)
1. [PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)
2. [模型裁剪教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md)
## 快速开始
模型裁剪主要包括四个步骤:
1. 安装 PaddleSlim
2. 准备训练好的模型
3. 敏感度分析、裁剪训练
......@@ -35,16 +35,19 @@ python3 setup.py install
加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,得到敏感度文件:sen.pickle,可以通过PaddleSlim提供的[接口](https://github.com/PaddlePaddle/PaddleSlim/blob/9b01b195f0c4bc34a1ab434751cb260e13d64d9e/paddleslim/dygraph/prune/filter_pruner.py#L75)加载文件,获得各网络层在不同裁剪比例下的精度损失。从而了解各网络层冗余度,决定每个网络层的裁剪比例。
敏感度文件内容格式:
sen.pickle(Dict){
```
sen.pickle(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
例子:
例子:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
```
加载敏感度文件后会返回一个字典,字典中的keys为网络模型参数模型的名字,values为一个字典,里面保存了相应网络层的裁剪敏感度信息。例如在例子中,conv10_expand_weights所对应的网络层在裁掉10%的卷积核后模型性能相较原模型会下降0.65%,详细信息可见[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析训练:
......
## Introduction
# PP-OCR Models Pruning
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
This example uses PaddleSlim provided [APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
It is recommended that you could understand following pages before reading this example:
......@@ -37,25 +37,26 @@ PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.
After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
The data format of sensitivity file:
sen.pickle(Dict){
```
sen.pickle(Dict){
'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
}
example:
example:
{
'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
}
```
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
```bash
python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model" Global.save_model_dir=./output/prune_model/
```
......
## 介绍
# PP-OCR模型量化
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。
本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。
在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/training.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
## 快速开始
......
## Introduction
# PP-OCR Models Quantization
Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
......
# 运行环境准备
Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。
Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建Python环境。
推荐环境:
- PaddlePaddle >= 2.0.0 (2.1.2)
- python3.7
- PaddlePaddle >= 2.1.2
- Python 3.7
- CUDA10.1 / CUDA10.2
- CUDNN 7.6
如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。
> 如果您已经安装Python环境,可以直接参考[PaddleOCR快速开始](./quickstart.md)
* [1. Python环境搭建](#1)
+ [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3)
* [2. 安装PaddlePaddle](#2)
<a name="1"></a>
......@@ -311,21 +310,3 @@ sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=hos
# ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令
sudo docker container exec -it ppocr /bin/bash
```
<a name="2"></a>
## 2. 安装PaddlePaddle
- 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- 如果您的机器是CPU,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
# PP-OCR模型与配置文件
PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验。
本章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。
------
下面我们首先了解一些OCR相关的基本概念:
- [1. OCR 简要介绍](#1-ocr-----)
* [1.1 OCR 检测模型基本概念](#11-ocr---------)
* [1.2 OCR 识别模型基本概念](#12-ocr---------)
* [1.3 PP-OCR模型](#13-pp-ocr--)
<a name="1-ocr-----"></a>
## 1. OCR 简要介绍
本节简要介绍OCR检测模型、识别模型的基本概念,并介绍PaddleOCR的PP-OCR模型。
OCR(Optical Character Recognition,光学字符识别)目前是文字识别的统称,已不限于文档或书本文字识别,更包括识别自然场景下的文字,又可以称为STR(Scene Text Recognition)。
OCR文字识别一般包括两个部分,文本检测和文本识别;文本检测首先利用检测算法检测到图像中的文本行;然后检测到的文本行用识别算法去识别到具体文字。
<a name="11-ocr---------"></a>
### 1.1 OCR 检测模型基本概念
文本检测就是要定位图像中的文字区域,然后通常以边界框的形式将单词或文本行标记出来。传统的文字检测算法多是通过手工提取特征的方式,特点是速度快,简单场景效果好,但是面对自然场景,效果会大打折扣。当前多是采用深度学习方法来做。
基于深度学习的文本检测算法可以大致分为以下几类:
1. 基于目标检测的方法;一般是预测得到文本框后,通过NMS筛选得到最终文本框,多是四点文本框,对弯曲文本场景效果不理想。典型算法为EAST、Text Box等方法。
2. 基于分割的方法;将文本行当成分割目标,然后通过分割结果构建外接文本框,可以处理弯曲文本,对于文本交叉场景问题效果不理想。典型算法为DB、PSENet等方法。
3. 混合目标检测和分割的方法;
<a name="12-ocr---------"></a>
### 1.2 OCR 识别模型基本概念
OCR识别算法的输入数据一般是文本行,背景信息不多,文字占据主要部分,识别算法目前可以分为两类算法:
1. 基于CTC的方法;即识别算法的文字预测模块是基于CTC的,常用的算法组合为CNN+RNN+CTC。目前也有一些算法尝试在网络中加入transformer模块等等。
2. 基于Attention的方法;即识别算法的文字预测模块是基于Attention的,常用算法组合是CNN+RNN+Attention。
<a name="13-pp-ocr--"></a>
### 1.3 PP-OCR模型
PaddleOCR 中集成了很多OCR算法,文本检测算法有DB、EAST、SAST等等,文本识别算法有CRNN、RARE、StarNet、Rosetta、SRN等算法。
其中PaddleOCR针对中英文自然场景通用OCR,推出了PP-OCR系列模型,PP-OCR模型由DB+CRNN算法组成,利用海量中文数据训练加上模型调优方法,在中文场景上具备较高的文本检测识别能力。并且PaddleOCR推出了高精度超轻量PP-OCRv2模型,检测模型仅3M,识别模型仅8.5M,利用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)的模型量化方法,可以在保持精度不降低的情况下,将检测模型压缩到0.8M,识别压缩到3M,更加适用于移动端部署场景。
......@@ -225,7 +225,7 @@ ppocr 支持使用自己的数据进行自定义训练或finetune, 其中识别
|波兰文|Polish |pl| | 比尔哈文|Bihari |bh|
| 罗马尼亚文|Romanian |ro| | 迈蒂利文|Maithili |mai|
| 斯洛伐克文|Slovak |sk| | 昂加文|Angika |ang|
| 斯洛文尼亚文|Slovenian |sl| | 孟加拉文|Bhojpuri |bho|
| 斯洛文尼亚文|Slovenian |sl| | 博杰普爾文|Bhojpuri |bho|
| 阿尔巴尼亚文|Albanian |sq| | 摩揭陀文 |Magahi |mah|
| 瑞典文|Swedish |sv| | 那格浦尔文|Nagpur |sck|
| 西瓦希里文|Swahili |sw| | 尼瓦尔文|Newari |new|
......
......@@ -97,7 +97,7 @@ train.txt标注文件格式如下,文件名和标注信息中间用"\t"分隔
" 图像文件名 json.dumps编码的图像标注信息"
rgb/img11.jpg [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0, 308.0], [259.0, 296.0], [286.0, 291.0], [313.0, 295.0], [338.0, 305.0], [362.0, 320.0], [349.0, 347.0], [330.0, 337.0], [310.0, 329.0], [290.0, 324.0], [269.0, 328.0], [249.0, 336.0], [231.0, 346.0]]}, {...}]
```
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 `points` 表示文本框的多点坐标(如:4点、8点以及14点等),从左上角的点开始顺时针排列。
`transcription` 表示当前文本框的文字,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。**
如果您想在其他数据集上训练,可以按照上述形式构建标注文件。
......
# PaddleOCR快速开始
- [1. 安装PaddleOCR whl包](#1)
- [1. 安装](#1)
- [1.1 安装PaddlePaddle](#11)
- [1.2 安装PaddleOCR whl包](#12)
- [2. 便捷使用](#2)
- [2.1 命令行使用](#21)
- [2.1.1 中英文模型](#211)
......@@ -9,10 +12,35 @@
- [2.2 Python脚本使用](#22)
- [2.2.1 中英文与多语言使用](#221)
- [2.2.2 版面分析](#222)
- [3.小结](#3)
<a name="1"></a>
## 1. 安装PaddleOCR whl包
## 1. 安装
<a name="11"></a>
### 1.1 安装PaddlePaddle
> 如果您没有基础的Python运行环境,请参考[运行环境准备](./environment.md)。
- 您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- 您的机器是CPU,请运行以下命令安装
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
更多的版本需求,请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
<a name="12"></a>
### 1.2 安装PaddleOCR whl包
```bash
pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
......@@ -257,3 +285,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
<a name="3"></a>
## 3. 小结
通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
PaddleOCR是一套丰富领先实用的OCR工具库,打通数据、模型训练、压缩和推理部署全流程,因此在[下一节](./paddleOCR_overview.md)中我们将首先为您介绍PaddleOCR的全景图,然后克隆PaddleOCR项目,正式开启PaddleOCR的应用之旅。
......@@ -63,9 +63,9 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
| ...
```
- 测试
- 验证
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示:
同训练集类似,验证集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示:
```
|-train_data
......@@ -93,7 +93,7 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
```
# 训练集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# 测试集标签
# 验证集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
```
......
# Environment Preparation
Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle.
Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment.
Recommended working environment:
- PaddlePaddle >= 2.0.0 (2.1.2)
- PaddlePaddle >= 2.1.2
- Python 3.7
- CUDA 10.1 / CUDA 10.2
- cuDNN 7.6
> If you already have a Python environment installed, you can skip to [PaddleOCR Quick Start](./quickstart_en.md).
* [1. Python Environment Setup](#1)
+ [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3)
* [2. Install PaddlePaddle 2.0](#2)
<a name="1"></a>
......@@ -330,21 +331,3 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags
# ctrl+P+Q to exit docker, to re-enter docker using the following command:
sudo docker container exec -it ppocr /bin/bash
```
<a name="2"></a>
## 2. Install PaddlePaddle 2.0
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- If you have no available GPU on your machine, please run the following command to install the CPU version
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
# PP-OCR Model and Configuration
The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model.
This chapter contains three parts. Firstly, [PP-OCR Model Download](./models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](./config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](./inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine.
------
Let's first understand some basic concepts.
- [INTRODUCTION ABOUT OCR](#introduction-about-ocr)
* [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model)
* [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
* [PP-OCR model](#pp-ocr-model)
* [And a table of contents](#and-a-table-of-contents)
* [On the right](#on-the-right)
## 1. INTRODUCTION ABOUT OCR
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
OCR (Optical Character Recognition, Optical Character Recognition) is currently the general term for text recognition. It is not limited to document or book text recognition, but also includes recognizing text in natural scenes. It can also be called STR (Scene Text Recognition).
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
### 1.1 BASIC CONCEPTS OF OCR DETECTION MODEL
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
Text detection algorithms based on deep learning can be roughly divided into the following categories:
1. Method based on target detection. Generally, after the text box is predicted, the final text box is filtered through NMS, which is mostly four-point text box, which is not ideal for curved text scenes. Typical algorithms are methods such as EAST and Text Box.
2. Method based on text segmentation. The text line is regarded as the segmentation target, and then the external text box is constructed through the segmentation result, which can handle curved text, and the effect is not ideal for the text cross scene problem. Typical algorithms are DB, PSENet and other methods.
3. Hybrid target detection and segmentation method.
### 1.2 Basic concepts of OCR recognition model
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
### 1.3 PP-OCR model
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
Among them, PaddleOCR has released the PP-OCR series model for the general OCR in Chinese and English natural scenes. The PP-OCR model is composed of the DB+CRNN algorithm. It uses massive Chinese data training and model tuning methods to have high text detection and recognition capabilities in Chinese scenes. And PaddleOCR has launched a high-precision and ultra-lightweight PP-OCRv2 model. The detection model is only 3M, and the recognition model is only 8.5M. Using [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)'s model quantification method, the detection model can be compressed to 0.8M without reducing the accuracy. The recognition is compressed to 3M, which is more suitable for mobile deployment scenarios.
......@@ -92,7 +92,7 @@ rgb/img11.jpg [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0,
```
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
The `points` in the dictionary represent the multi-point coordinates (such as: 4 points, 8 points and 14 points, etc.) of the text box, arranged clockwise from the point at the upper left corner.
`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**
......
# PaddleOCR Quick Start
+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
+ [1. Installation](#1installation)
+ [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
+ [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
* [2. Easy-to-Use](#2-easy-to-use)
+ [2.1 Use by Command Line](#21-use-by-command-line)
- [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
......@@ -10,12 +12,35 @@
+ [2.2 Use by Code](#22-use-by-code)
- [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
- [2.2.2 Layout Analysis](#222-layoutAnalysis)
* [3. Summary](#3)
<a name="1nstallation"></a>
## 1. Installation
<a name="1-install-paddleocr-whl-package"></a>
<a name="11-install-paddlepaddle"></a>
## 1. Install PaddleOCR Whl Package
### 1.1 Install PaddlePaddle
> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- If you have no available GPU on your machine, please run the following command to install the CPU version
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
<a name="12-install-paddleocr-whl-package"></a>
### 1.2 Install PaddleOCR Whl Package
```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
......@@ -248,3 +273,11 @@ im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
<a name="3"></a>
## 3. Summary
In this section, you have mastered the use of PaddleOCR whl packages and obtained results.
PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.
......@@ -121,9 +121,9 @@ class PSELoss(nn.Layer):
if neg_num == 0:
selected_mask = training_mask
selected_mask = selected_mask.view(
1, selected_mask.shape[0],
selected_mask.shape[1]).astype('float32')
selected_mask = selected_mask.reshape(
[1, selected_mask.shape[0], selected_mask.shape[1]]).astype(
'float32')
return selected_mask
neg_score = paddle.masked_select(score, gt_text <= 0.5)
......
# 关键信息提取(Key Information Extraction)
# Key Information Extraction(KIE)
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
* [1. 快速使用](#1-----)
* [2. 执行训练](#2-----)
* [3. 执行评估](#3-----)
* [1. Quick Use](#1-----)
* [2. Model Training](#2-----)
* [3. Model Evaluation](#3-----)
<a name="1-----"></a>
## 1. 快速使用
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:
## 1. Quick Use
```
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
```shell
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
执行预测:
Download the pretrained model and predict the result:
```
```shell
cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
```
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
可视化结果如下图所示:
The visualization results are shown in the figure below:
<div align="center">
<img src="./imgs/0.png" width="800">
</div>
<a name="2-----"></a>
## 2. 执行训练
## 2. Model Training
创建数据集软链到PaddleOCR/train_data目录下:
```
Create a softlink to the folder, `PaddleOCR/train_data`:
```shell
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
```
训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练:
```
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
```shell
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
```
<a name="3-----"></a>
## 3. 执行评估
```
## 3. Model Evaluation
After training, you can execute the model evaluation with the following command:
```shell
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
```
**参考文献:**
**Reference:**
<!-- [ALGORITHM] -->
......
# Key Information Extraction(KIE)
# 关键信息提取(Key Information Extraction)
This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。
* [1. Quick Use](#1-----)
* [2. Model Training](#2-----)
* [3. Model Evaluation](#3-----)
* [1. 快速使用](#1-----)
* [2. 执行训练](#2-----)
* [3. 执行评估](#3-----)
<a name="1-----"></a>
## 1. 快速使用
## 1. Quick Use
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
```shell
```
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
Download the pretrained model and predict the result:
执行预测:
```shell
```
cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
```
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。
The visualization results are shown in the figure below:
可视化结果如下图所示:
<div align="center">
<img src="./imgs/0.png" width="800">
</div>
<a name="2-----"></a>
## 2. Model Training
## 2. 执行训练
Create a softlink to the folder, `PaddleOCR/train_data`:
```shell
创建数据集软链到PaddleOCR/train_data目录下:
```
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
```
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
```shell
训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练:
```
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
```
<a name="3-----"></a>
## 3. 执行评估
## 3. Model Evaluation
After training, you can execute the model evaluation with the following command:
```shell
```
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
```
**Reference:**
**参考文献:**
<!-- [ALGORITHM] -->
......
# 文档视觉问答(DOC-VQA)
# Document Visual Q&A(DOC-VQA)
VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。
Document Visual Q&A, mainly for the image content of the question and answer, DOC-VQA is a type of VQA task, DOC-VQA mainly asks questions about the textual content of text images.
PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
The DOC-VQA algorithm in PP-Structure is developed based on PaddleNLP natural language processing algorithm library.
主要特性如下:
The main features are as follows:
- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
- 支持SER任务和RE任务的自定义训练。
- 支持OCR+SER的端到端系统预测与评估。
- 支持OCR+SER+RE的端到端系统预测。
- Integrated LayoutXLM model and PP-OCR prediction engine.
- Support Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multi-modal methods. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair.
**Note**:本项目基于 [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) 在Paddle 2.2上的开源实现,同时经过飞桨团队与**中国工商银行**在不动产证场景深入打磨,联合开源。
- Support custom training for SER and RE tasks.
- Support OCR+SER end-to-end system prediction and evaluation.
## 1.性能
- Support OCR+SER+RE end-to-end system prediction.
我们在 [XFUN](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估,性能如下
**Note**: This project is based on the open source implementation of [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2, and at the same time, after in-depth polishing by the flying Paddle team and the Industrial and **Commercial Bank of China** in the scene of real estate certificate, jointly open source.
| 模型 | 任务 | f1 | 模型下载地址 |
## 1.Performance
We evaluated the algorithm on [XFUN](https://github.com/doc-analysis/XFUND) 's Chinese data set, and the performance is as follows
| Model | Task | F1 | Model Download Link |
|:---:|:---:|:---:| :---:|
| LayoutXLM | RE | 0.7113 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [链接](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
| LayoutXLM | RE | 0.7113 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [Link](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
## 2.效果演示
## 2.Demonstration
**注意:** 测试图片来源于XFUN数据集。
**Note**: the test images are from the xfun dataset.
### 2.1 SER
![](./images/result_ser/zh_val_0_ser.jpg) | ![](./images/result_ser/zh_val_42_ser.jpg)
---|---
图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
* 深紫色:HEADER
* 浅紫色:QUESTION
* 军绿色:ANSWER
* Dark purple: header
* Light purple: query
* Army green: answer
在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
### 2.2 RE
......@@ -51,79 +54,89 @@ PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进
---|---
图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
## 3.安装
## 3. Setup
### 3.1 安装依赖
### 3.1 Installation dependency
- **(1) 安装PaddlePaddle**
- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
pip3 install --upgrade pip
# GPU安装
# GPU PaddlePaddle Install
python3 -m pip install paddlepaddle-gpu==2.2 -i https://mirror.baidu.com/pypi/simple
# CPU安装
# CPU PaddlePaddle Install
python3 -m pip install paddlepaddle==2.2 -i https://mirror.baidu.com/pypi/simple
```
更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
For more requirements, please refer to the [instructions](https://www.paddlepaddle.org.cn/install/quick) in the installation document.
### 3.2 安装PaddleOCR(包含 PP-OCR 和 VQA )
### 3.2 Install PaddleOCR (including pp-ocr and VQA)
- **(1)pip快速安装PaddleOCR whl包(仅预测)**
- **(1) PIP quick install paddleocr WHL package (forecast only)**
```bash
python3 -m pip install paddleocr
pip install paddleocr
```
- **(2)下载VQA源码(预测+训练)**
- **(2) Download VQA source code (prediction + training)**
```bash
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
[recommended] git clone https://github.com/PaddlePaddle/PaddleOCR
# 如果因为网络问题无法pull成功,也可选择使用码云上的托管:
# If you cannot pull successfully because of network problems, you can also choose to use the hosting on the code cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR
# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
# Note: the code cloud hosting code may not be able to synchronize the update of this GitHub project in real time, with a delay of 3 ~ 5 days. Please give priority to the recommended method.
```
- **(3) Install PaddleNLP**
```bash
# You need to use the latest code version of paddlenlp for installation
git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
cd PaddleNLP
pip3 install -e .
```
- **(3)安装VQA的`requirements`**
- **(4) Install requirements for VQA**
```bash
cd ppstructure/vqa
python3 -m pip install -r requirements.txt
pip install -r requirements.txt
```
## 4. 使用
## 4.Usage
### 4.1 数据和预训练模型准备
### 4.1 Data and pre training model preparation
处理好的XFUN中文数据集下载地址:[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)
Download address of processed xfun Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)
下载并解压该数据集,解压后将数据集放置在当前目录下。
Download and unzip the dataset, and then place the dataset in the current directory.
```shell
wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
```
如果希望转换XFUN中其他语言的数据集,可以参考[XFUN数据转换脚本](helper/trans_xfun_data.py)
If you want to convert data sets in other languages in xfun, you can refer to [xfun data conversion script.](helper/trans_xfun_data.py))
如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
If you want to experience the prediction process directly, you can download the pre training model provided by us, skip the training process and predict directly.
### 4.2 SER任务
### 4.2 SER Task
* 启动训练
* Start training
```shell
python3 train_ser.py \
python3.7 train_ser.py \
--model_name_or_path "layoutxlm-base-uncased" \
--ser_model_type "LayoutXLM" \
--train_data_dir "XFUND/zh_train/image" \
......@@ -139,12 +152,12 @@ python3 train_ser.py \
--seed 2048
```
最终会打印出`precision`, `recall`, `f1`等指标,模型和训练日志会保存在`./output/ser/`文件夹中。
Finally, Precision, Recall, F1 and other indicators will be printed, and the model and training log will be saved in/ In the output/Ser/ folder.
* 恢复训练
* Recovery training
```shell
python3 train_ser.py \
python3.7 train_ser.py \
--model_name_or_path "model_path" \
--ser_model_type "LayoutXLM" \
--train_data_dir "XFUND/zh_train/image" \
......@@ -162,7 +175,7 @@ python3 train_ser.py \
--resume
```
* 评估
* Evaluation
```shell
export CUDA_VISIBLE_DEVICES=0
python3 eval_ser.py \
......@@ -175,13 +188,13 @@ python3 eval_ser.py \
--output_dir "output/ser/" \
--seed 2048
```
最终会打印出`precision`, `recall`, `f1`等指标
Finally, Precision, Recall, F1 and other indicators will be printed
* 使用评估集合中提供的OCR识别结果进行预测
* The OCR recognition results provided in the evaluation set are used for prediction
```shell
export CUDA_VISIBLE_DEVICES=0
python3 infer_ser.py \
python3.7 infer_ser.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--ser_model_type "LayoutXLM" \
--output_dir "output/ser/" \
......@@ -189,13 +202,13 @@ python3 infer_ser.py \
--ocr_json_path "XFUND/zh_val/xfun_normalize_val.json"
```
最终会在`output_res`目录下保存预测结果可视化图像以及预测结果文本文件,文件名为`infer_results.txt`
It will end up in output_res The visual image of the prediction result and the text file of the prediction result are saved in the res directory. The file name is infer_ results.txt.
* 使用`OCR引擎 + SER`串联结果
* Using OCR engine + SER concatenation results
```shell
export CUDA_VISIBLE_DEVICES=0
python3 infer_ser_e2e.py \
python3.7 infer_ser_e2e.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--ser_model_type "LayoutXLM" \
--max_seq_length 512 \
......@@ -203,17 +216,17 @@ python3 infer_ser_e2e.py \
--infer_imgs "images/input/zh_val_0.jpg"
```
* `OCR引擎 + SER`预测系统进行端到端评估
* End-to-end evaluation of OCR engine + SER prediction system
```shell
export CUDA_VISIBLE_DEVICES=0
python3 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
python3.7 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
```
### 4.3 RE任务
### 4.3 RE Task
* 启动训练
* Start training
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -237,7 +250,7 @@ python3 train_re.py \
```
* 恢复训练
* Resume training
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -262,9 +275,9 @@ python3 train_re.py \
```
最终会打印出`precision`, `recall`, `f1`等指标,模型和训练日志会保存在`./output/re/`文件夹中。
Finally, Precision, Recall, F1 and other indicators will be printed, and the model and training log will be saved in the output/RE file folder.
* 评估
* Evaluation
```shell
export CUDA_VISIBLE_DEVICES=0
python3 eval_re.py \
......@@ -278,10 +291,10 @@ python3 eval_re.py \
--num_workers 8 \
--seed 2048
```
最终会打印出`precision`, `recall`, `f1`等指标
Finally, Precision, Recall, F1 and other indicators will be printed
* 使用评估集合中提供的OCR识别结果进行预测
* The OCR recognition results provided in the evaluation set are used for prediction
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -296,13 +309,13 @@ python3 infer_re.py \
--seed 2048
```
最终会在`output_res`目录下保存预测结果可视化图像以及预测结果文本文件,文件名为`infer_results.txt`
The visual image of the prediction result and the text file of the prediction result are saved in the output_res file folder, the file name is`infer_results.txt`
* 使用`OCR引擎 + SER + RE`串联结果
* Concatenation results using OCR engine + SER+ RE
```shell
export CUDA_VISIBLE_DEVICES=0
python3 infer_ser_re_e2e.py \
python3.7 infer_ser_re_e2e.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--re_model_name_or_path "PP-Layout_v1.0_re_pretrained/" \
--ser_model_type "LayoutXLM" \
......@@ -311,7 +324,7 @@ python3 infer_ser_re_e2e.py \
--infer_imgs "images/input/zh_val_21.jpg"
```
## 参考链接
## Reference
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
......
# Document Visual Q&A(DOC-VQA)
# 文档视觉问答(DOC-VQA)
Document Visual Q&A, mainly for the image content of the question and answer, DOC-VQA is a type of VQA task, DOC-VQA mainly asks questions about the textual content of text images.
VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。
The DOC-VQA algorithm in PP-Structure is developed based on PaddleNLP natural language processing algorithm library.
PP-Structure 里的 DOC-VQA算法基于PaddleNLP自然语言处理算法库进行开发。
The main features are as follows:
主要特性如下:
- Integrated LayoutXLM model and PP-OCR prediction engine.
- Support Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multi-modal methods. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair.
- 集成[LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf)模型以及PP-OCR预测引擎。
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
- 支持SER任务和RE任务的自定义训练。
- 支持OCR+SER的端到端系统预测与评估。
- 支持OCR+SER+RE的端到端系统预测。
- Support custom training for SER and RE tasks.
**Note**:本项目基于 [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) 在Paddle 2.2上的开源实现,同时经过飞桨团队与**中国工商银行**在不动产证场景深入打磨,联合开源。
- Support OCR+SER end-to-end system prediction and evaluation.
- Support OCR+SER+RE end-to-end system prediction.
## 1.性能
**Note**: This project is based on the open source implementation of [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2, and at the same time, after in-depth polishing by the flying Paddle team and the Industrial and **Commercial Bank of China** in the scene of real estate certificate, jointly open source.
我们在 [XFUN](https://github.com/doc-analysis/XFUND) 的中文数据集上对算法进行了评估,性能如下
## 1.Performance
We evaluated the algorithm on [XFUN](https://github.com/doc-analysis/XFUND) 's Chinese data set, and the performance is as follows
| Model | Task | F1 | Model Download Link |
| 模型 | 任务 | f1 | 模型下载地址 |
|:---:|:---:|:---:| :---:|
| LayoutXLM | RE | 0.7113 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [Link](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
| LayoutXLM | RE | 0.7113 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [链接](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
## 2.Demonstration
## 2.效果演示
**Note**: the test images are from the xfun dataset.
**注意:** 测试图片来源于XFUN数据集。
### 2.1 SER
![](./images/result_ser/zh_val_0_ser.jpg) | ![](./images/result_ser/zh_val_42_ser.jpg)
---|---
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
图中不同颜色的框表示不同的类别,对于XFUN数据集,有`QUESTION`, `ANSWER`, `HEADER` 3种类别
* Dark purple: header
* Light purple: query
* Army green: answer
* 深紫色:HEADER
* 浅紫色:QUESTION
* 军绿色:ANSWER
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
### 2.2 RE
......@@ -54,89 +51,79 @@ The corresponding category and OCR recognition results are also marked at the to
---|---
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
## 3. Setup
## 3.安装
### 3.1 Installation dependency
### 3.1 安装依赖
- **(1) Install PaddlePaddle**
- **(1) 安装PaddlePaddle**
```bash
pip3 install --upgrade pip
python3 -m pip install --upgrade pip
# GPU PaddlePaddle Install
# GPU安装
python3 -m pip install paddlepaddle-gpu==2.2 -i https://mirror.baidu.com/pypi/simple
# CPU PaddlePaddle Install
# CPU安装
python3 -m pip install paddlepaddle==2.2 -i https://mirror.baidu.com/pypi/simple
```
For more requirements, please refer to the [instructions](https://www.paddlepaddle.org.cn/install/quick) in the installation document.
更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
### 3.2 Install PaddleOCR (including pp-ocr and VQA)
### 3.2 安装PaddleOCR(包含 PP-OCR 和 VQA )
- **(1) PIP quick install paddleocr WHL package (forecast only)**
- **(1)pip快速安装PaddleOCR whl包(仅预测)**
```bash
pip install paddleocr
python3 -m pip install paddleocr
```
- **(2) Download VQA source code (prediction + training)**
- **(2)下载VQA源码(预测+训练)**
```bash
[recommended] git clone https://github.com/PaddlePaddle/PaddleOCR
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
# If you cannot pull successfully because of network problems, you can also choose to use the hosting on the code cloud:
# 如果因为网络问题无法pull成功,也可选择使用码云上的托管:
git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: the code cloud hosting code may not be able to synchronize the update of this GitHub project in real time, with a delay of 3 ~ 5 days. Please give priority to the recommended method.
```
- **(3) Install PaddleNLP**
```bash
# You need to use the latest code version of paddlenlp for installation
git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
cd PaddleNLP
pip3 install -e .
# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
```
- **(4) Install requirements for VQA**
- **(3)安装VQA的`requirements`**
```bash
cd ppstructure/vqa
pip install -r requirements.txt
python3 -m pip install -r requirements.txt
```
## 4.Usage
## 4. 使用
### 4.1 Data and pre training model preparation
### 4.1 数据和预训练模型准备
Download address of processed xfun Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)
处理好的XFUN中文数据集下载地址:[https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)
Download and unzip the dataset, and then place the dataset in the current directory.
下载并解压该数据集,解压后将数据集放置在当前目录下。
```shell
wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
```
If you want to convert data sets in other languages in xfun, you can refer to [xfun data conversion script.](helper/trans_xfun_data.py))
如果希望转换XFUN中其他语言的数据集,可以参考[XFUN数据转换脚本](helper/trans_xfun_data.py)
If you want to experience the prediction process directly, you can download the pre training model provided by us, skip the training process and predict directly.
如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
### 4.2 SER Task
### 4.2 SER任务
* Start training
* 启动训练
```shell
python3.7 train_ser.py \
python3 train_ser.py \
--model_name_or_path "layoutxlm-base-uncased" \
--ser_model_type "LayoutXLM" \
--train_data_dir "XFUND/zh_train/image" \
......@@ -152,12 +139,12 @@ python3.7 train_ser.py \
--seed 2048
```
Finally, Precision, Recall, F1 and other indicators will be printed, and the model and training log will be saved in/ In the output/Ser/ folder.
最终会打印出`precision`, `recall`, `f1`等指标,模型和训练日志会保存在`./output/ser/`文件夹中。
* Recovery training
* 恢复训练
```shell
python3.7 train_ser.py \
python3 train_ser.py \
--model_name_or_path "model_path" \
--ser_model_type "LayoutXLM" \
--train_data_dir "XFUND/zh_train/image" \
......@@ -175,7 +162,7 @@ python3.7 train_ser.py \
--resume
```
* Evaluation
* 评估
```shell
export CUDA_VISIBLE_DEVICES=0
python3 eval_ser.py \
......@@ -188,13 +175,13 @@ python3 eval_ser.py \
--output_dir "output/ser/" \
--seed 2048
```
Finally, Precision, Recall, F1 and other indicators will be printed
最终会打印出`precision`, `recall`, `f1`等指标
* The OCR recognition results provided in the evaluation set are used for prediction
* 使用评估集合中提供的OCR识别结果进行预测
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 infer_ser.py \
python3 infer_ser.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--ser_model_type "LayoutXLM" \
--output_dir "output/ser/" \
......@@ -202,13 +189,13 @@ python3.7 infer_ser.py \
--ocr_json_path "XFUND/zh_val/xfun_normalize_val.json"
```
It will end up in output_res The visual image of the prediction result and the text file of the prediction result are saved in the res directory. The file name is infer_ results.txt.
最终会在`output_res`目录下保存预测结果可视化图像以及预测结果文本文件,文件名为`infer_results.txt`
* Using OCR engine + SER concatenation results
* 使用`OCR引擎 + SER`串联结果
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 infer_ser_e2e.py \
python3 infer_ser_e2e.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--ser_model_type "LayoutXLM" \
--max_seq_length 512 \
......@@ -216,17 +203,17 @@ python3.7 infer_ser_e2e.py \
--infer_imgs "images/input/zh_val_0.jpg"
```
* End-to-end evaluation of OCR engine + SER prediction system
* `OCR引擎 + SER`预测系统进行端到端评估
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
python3 helper/eval_with_label_end2end.py --gt_json_path XFUND/zh_val/xfun_normalize_val.json --pred_json_path output_res/infer_results.txt
```
### 4.3 RE Task
### 4.3 RE任务
* Start training
* 启动训练
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -250,7 +237,7 @@ python3 train_re.py \
```
* Resume training
* 恢复训练
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -275,9 +262,9 @@ python3 train_re.py \
```
Finally, Precision, Recall, F1 and other indicators will be printed, and the model and training log will be saved in the output/RE file folder.
最终会打印出`precision`, `recall`, `f1`等指标,模型和训练日志会保存在`./output/re/`文件夹中。
* Evaluation
* 评估
```shell
export CUDA_VISIBLE_DEVICES=0
python3 eval_re.py \
......@@ -291,10 +278,10 @@ python3 eval_re.py \
--num_workers 8 \
--seed 2048
```
Finally, Precision, Recall, F1 and other indicators will be printed
最终会打印出`precision`, `recall`, `f1`等指标
* The OCR recognition results provided in the evaluation set are used for prediction
* 使用评估集合中提供的OCR识别结果进行预测
```shell
export CUDA_VISIBLE_DEVICES=0
......@@ -309,13 +296,13 @@ python3 infer_re.py \
--seed 2048
```
The visual image of the prediction result and the text file of the prediction result are saved in the output_res file folder, the file name is`infer_results.txt`
最终会在`output_res`目录下保存预测结果可视化图像以及预测结果文本文件,文件名为`infer_results.txt`
* Concatenation results using OCR engine + SER+ RE
* 使用`OCR引擎 + SER + RE`串联结果
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 infer_ser_re_e2e.py \
python3 infer_ser_re_e2e.py \
--model_name_or_path "PP-Layout_v1.0_ser_pretrained/" \
--re_model_name_or_path "PP-Layout_v1.0_re_pretrained/" \
--ser_model_type "LayoutXLM" \
......@@ -324,7 +311,7 @@ python3.7 infer_ser_re_e2e.py \
--infer_imgs "images/input/zh_val_21.jpg"
```
## Reference
## 参考链接
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding, https://arxiv.org/pdf/2104.08836.pdf
- microsoft/unilm/layoutxlm, https://github.com/microsoft/unilm/tree/master/layoutxlm
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册