未验证 提交 63c643cc 编写于 作者: E Evezerest 提交者: GitHub

Merge pull request #5744 from Evezerest/release2.4

Unified file name
...@@ -100,7 +100,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -100,7 +100,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [版面分析](./ppstructure/layout/README_ch.md) - [版面分析](./ppstructure/layout/README_ch.md)
- [表格识别](./ppstructure/table/README_ch.md) - [表格识别](./ppstructure/table/README_ch.md)
- [DocVQA](./ppstructure/vqa/README_ch.md) - [DocVQA](./ppstructure/vqa/README_ch.md)
- [关键信息提取](./ppstructure/docs/kie.md) - [关键信息提取](./ppstructure/docs/kie_ch.md)
- OCR学术圈 - OCR学术圈
- [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md) - [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
- [端到端PGNet算法](./doc/doc_ch/pgnet.md) - [端到端PGNet算法](./doc/doc_ch/pgnet.md)
......
# 关键信息提取(Key Information Extraction) # Key Information Extraction(KIE)
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。 This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。 [SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
* [1. 快速使用](#1-----) * [1. Quick Use](#1-----)
* [2. 执行训练](#2-----) * [2. Model Training](#2-----)
* [3. 执行评估](#3-----) * [3. Model Evaluation](#3-----)
<a name="1-----"></a> <a name="1-----"></a>
## 1. 快速使用
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集: ## 1. Quick Use
``` [Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
```shell
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
``` ```
执行预测: Download the pretrained model and predict the result:
``` ```shell
cd PaddleOCR/ cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
``` ```
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。 The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
可视化结果如下图所示: The visualization results are shown in the figure below:
<div align="center"> <div align="center">
<img src="./imgs/0.png" width="800"> <img src="./imgs/0.png" width="800">
</div> </div>
<a name="2-----"></a> <a name="2-----"></a>
## 2. 执行训练 ## 2. Model Training
创建数据集软链到PaddleOCR/train_data目录下: Create a softlink to the folder, `PaddleOCR/train_data`:
``` ```shell
cd PaddleOCR/ && mkdir train_data && cd train_data cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./ ln -s ../../wildreceipt ./
``` ```
训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练: The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
``` ```shell
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/ python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
``` ```
<a name="3-----"></a> <a name="3-----"></a>
## 3. 执行评估
``` ## 3. Model Evaluation
After training, you can execute the model evaluation with the following command:
```shell
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
``` ```
**Reference:**
**参考文献:**
<!-- [ALGORITHM] --> <!-- [ALGORITHM] -->
......
# Key Information Extraction(KIE) # 关键信息提取(Key Information Extraction)
This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR. 本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc. SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。
* [1. Quick Use](#1-----) * [1. 快速使用](#1-----)
* [2. Model Training](#2-----) * [2. 执行训练](#2-----)
* [3. Model Evaluation](#3-----) * [3. 执行评估](#3-----)
<a name="1-----"></a> <a name="1-----"></a>
## 1. 快速使用
## 1. Quick Use 训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget: ```
```shell
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
``` ```
Download the pretrained model and predict the result: 执行预测:
```shell ```
cd PaddleOCR/ cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
``` ```
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`. 执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。
The visualization results are shown in the figure below: 可视化结果如下图所示:
<div align="center"> <div align="center">
<img src="./imgs/0.png" width="800"> <img src="./imgs/0.png" width="800">
</div> </div>
<a name="2-----"></a> <a name="2-----"></a>
## 2. Model Training ## 2. 执行训练
Create a softlink to the folder, `PaddleOCR/train_data`: 创建数据集软链到PaddleOCR/train_data目录下:
```shell ```
cd PaddleOCR/ && mkdir train_data && cd train_data cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./ ln -s ../../wildreceipt ./
``` ```
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command: 训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练:
```shell ```
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/ python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
``` ```
<a name="3-----"></a> <a name="3-----"></a>
## 3. 执行评估
## 3. Model Evaluation ```
After training, you can execute the model evaluation with the following command:
```shell
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
``` ```
**Reference:**
**参考文献:**
<!-- [ALGORITHM] --> <!-- [ALGORITHM] -->
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册