This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
*[1. 快速使用](#1-----)
*[1. Quick Use](#1-----)
*[2. 执行训练](#2-----)
*[2. Model Training](#2-----)
*[3. 执行评估](#3-----)
*[3. Model Evaluation](#3-----)
<aname="1-----"></a>
<aname="1-----"></a>
## 1. 快速使用
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:
## 1. Quick Use
```
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
可视化结果如下图所示:
The visualization results are shown in the figure below:
<divalign="center">
<divalign="center">
<imgsrc="./imgs/0.png"width="800">
<imgsrc="./imgs/0.png"width="800">
</div>
</div>
<aname="2-----"></a>
<aname="2-----"></a>
## 2. 执行训练
## 2. Model Training
创建数据集软链到PaddleOCR/train_data目录下:
Create a softlink to the folder, `PaddleOCR/train_data`:
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
```
```shell
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
```
Download the pretrained model and predict the result:
执行预测:
```shell
```
cd PaddleOCR/
cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
The visualization results are shown in the figure below:
可视化结果如下图所示:
<divalign="center">
<divalign="center">
<imgsrc="./imgs/0.png"width="800">
<imgsrc="./imgs/0.png"width="800">
</div>
</div>
<aname="2-----"></a>
<aname="2-----"></a>
## 2. Model Training
## 2. 执行训练
Create a softlink to the folder, `PaddleOCR/train_data`:
创建数据集软链到PaddleOCR/train_data目录下:
```shell
```
cd PaddleOCR/ && mkdir train_data && cd train_data
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
ln -s ../../wildreceipt ./
```
```
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
Document Visual Q&A, mainly for the image content of the question and answer, DOC-VQA is a type of VQA task, DOC-VQA mainly asks questions about the textual content of text images.
- Integrated LayoutXLM model and PP-OCR prediction engine.
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
- Support Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multi-modal methods. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair.
**Note**: This project is based on the open source implementation of [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2, and at the same time, after in-depth polishing by the flying Paddle team and the Industrial and **Commercial Bank of China** in the scene of real estate certificate, jointly open source.
| 模型 | 任务 | f1 | 模型下载地址 |
## 1.Performance
We evaluated the algorithm on [XFUN](https://github.com/doc-analysis/XFUND) 's Chinese data set, and the performance is as follows
| Model | Task | F1 | Model Download Link |
|:---:|:---:|:---:| :---:|
|:---:|:---:|:---:| :---:|
| LayoutXLM | RE | 0.7113 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | RE | 0.7113 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [链接](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutXLM | SER | 0.9056 | [Link](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [链接](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
| LayoutLM | SER | 0.78 | [Link](https://paddleocr.bj.bcebos.com/pplayout/LayoutLM_ser_pretrained.tar) |
## 2.效果演示
## 2.Demonstration
**注意:** 测试图片来源于XFUN数据集。
**Note**: the test images are from the xfun dataset.
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
# Note: the code cloud hosting code may not be able to synchronize the update of this GitHub project in real time, with a delay of 3 ~ 5 days. Please give priority to the recommended method.
```
-**(3) Install PaddleNLP**
```bash
# You need to use the latest code version of paddlenlp for installation
Download address of processed xfun Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
下载并解压该数据集,解压后将数据集放置在当前目录下。
Download and unzip the dataset, and then place the dataset in the current directory.
If you want to convert data sets in other languages in xfun, you can refer to [xfun data conversion script.](helper/trans_xfun_data.py))
如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
If you want to experience the prediction process directly, you can download the pre training model provided by us, skip the training process and predict directly.
It will end up in output_res The visual image of the prediction result and the text file of the prediction result are saved in the res directory. The file name is infer_ results.txt.
The visual image of the prediction result and the text file of the prediction result are saved in the output_res file folder, the file name is`infer_results.txt`。
Document Visual Q&A, mainly for the image content of the question and answer, DOC-VQA is a type of VQA task, DOC-VQA mainly asks questions about the textual content of text images.
- Support Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks based on multi-modal methods. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair.
- 支持基于多模态方法的语义实体识别 (Semantic Entity Recognition, SER) 以及关系抽取 (Relation Extraction, RE) 任务。基于 SER 任务,可以完成对图像中的文本识别与分类;基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。
- Support OCR+SER end-to-end system prediction and evaluation.
- Support OCR+SER+RE end-to-end system prediction.
## 1.性能
**Note**: This project is based on the open source implementation of [LayoutXLM](https://arxiv.org/pdf/2104.08836.pdf) on Paddle 2.2, and at the same time, after in-depth polishing by the flying Paddle team and the Industrial and **Commercial Bank of China** in the scene of real estate certificate, jointly open source.
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
在OCR检测框的左上方也标出了对应的类别和OCR识别结果。
### 2.2 RE
### 2.2 RE
...
@@ -54,89 +51,79 @@ The corresponding category and OCR recognition results are also marked at the to
...
@@ -54,89 +51,79 @@ The corresponding category and OCR recognition results are also marked at the to
---|---
---|---
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
# Note: the code cloud hosting code may not be able to synchronize the update of this GitHub project in real time, with a delay of 3 ~ 5 days. Please give priority to the recommended method.
# 注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
```
-**(3) Install PaddleNLP**
```bash
# You need to use the latest code version of paddlenlp for installation
Download address of processed xfun Chinese dataset: [https://paddleocr.bj.bcebos.com/dataset/XFUND.tar](https://paddleocr.bj.bcebos.com/dataset/XFUND.tar)。
If you want to experience the prediction process directly, you can download the pre training model provided by us, skip the training process and predict directly.
如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。
### 4.2 SER Task
### 4.2 SER任务
*Start training
*启动训练
```shell
```shell
python3.7 train_ser.py \
python3 train_ser.py \
--model_name_or_path"layoutxlm-base-uncased"\
--model_name_or_path"layoutxlm-base-uncased"\
--ser_model_type"LayoutXLM"\
--ser_model_type"LayoutXLM"\
--train_data_dir"XFUND/zh_train/image"\
--train_data_dir"XFUND/zh_train/image"\
...
@@ -152,12 +139,12 @@ python3.7 train_ser.py \
...
@@ -152,12 +139,12 @@ python3.7 train_ser.py \
--seed 2048
--seed 2048
```
```
Finally, Precision, Recall, F1 and other indicators will be printed, and the model and training log will be saved in/ In the output/Ser/ folder.
It will end up in output_res The visual image of the prediction result and the text file of the prediction result are saved in the res directory. The file name is infer_ results.txt.
Finally, Precision, Recall, F1 and other indicators will be printed
最终会打印出`precision`, `recall`, `f1`等指标
*The OCR recognition results provided in the evaluation set are used for prediction
*使用评估集合中提供的OCR识别结果进行预测
```shell
```shell
export CUDA_VISIBLE_DEVICES=0
export CUDA_VISIBLE_DEVICES=0
...
@@ -309,13 +296,13 @@ python3 infer_re.py \
...
@@ -309,13 +296,13 @@ python3 infer_re.py \
--seed 2048
--seed 2048
```
```
The visual image of the prediction result and the text file of the prediction result are saved in the output_res file folder, the file name is`infer_results.txt`。