diff --git a/ppocr/utils/dict/kie_dict/xfund_class_list.txt b/ppocr/utils/dict/kie_dict/xfund_class_list.txt
new file mode 100644
index 0000000000000000000000000000000000000000..faded9f9b8f56bd258909bec9b8f1755aa688367
--- /dev/null
+++ b/ppocr/utils/dict/kie_dict/xfund_class_list.txt
@@ -0,0 +1,4 @@
+OTHER
+QUESTION
+ANSWER
+HEADER
diff --git a/ppstructure/docs/inference.md b/ppstructure/docs/inference.md
index b050900760067402b2b738ed8d0e94d6788aca4f..3f92a6046e94d2eeba1bbea80a9663dabfd4b245 100644
--- a/ppstructure/docs/inference.md
+++ b/ppstructure/docs/inference.md
@@ -4,7 +4,7 @@
- [1.1 版面分析+表格识别](#1.1)
- [1.2 版面分析](#1.2)
- [1.3 表格识别](#1.3)
-- [2. DocVQA](#2)
+- [2. 关键信息抽取](#2)
## 1. Structure
@@ -61,20 +61,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,表格会存储为一个excel,excel文件名为`[0,0,img_h,img_w]`。
-## 2. DocVQA
+## 2. 关键信息抽取
```bash
cd ppstructure
-# 下载模型
mkdir inference && cd inference
-# 下载SER xfun 模型并解压
-wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
+# 下载SER XFUND 模型并解压
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
cd ..
-
-python3 predict_system.py --model_name_or_path=kie/PP-Layout_v1.0_ser_pretrained/ \
- --mode=kie \
- --image_dir=kie/images/input/zh_val_0.jpg \
- --vis_font_path=../doc/fonts/simfang.ttf
+python3 kie/predict_kie_token_ser.py \
+ --kie_algorithm=LayoutXLM \
+ --ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
+ --image_dir=./docs/kie/input/zh_val_42.jpg \
+ --ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt \
+ --vis_font_path=../doc/fonts/simfang.ttf \
+ --ocr_order_method="tb-yx"
```
+
运行完成后,每张图片会在`output`字段指定的目录下的`kie`目录下存放可视化之后的图片,图片名和输入图片名一致。
diff --git a/ppstructure/docs/inference_en.md b/ppstructure/docs/inference_en.md
index ad16f048e3b08a45d6e6d76e630ba48483f263d4..126878378d54932937054e2aa0503214f876bfbf 100644
--- a/ppstructure/docs/inference_en.md
+++ b/ppstructure/docs/inference_en.md
@@ -4,7 +4,7 @@
- [1.1 layout analysis + table recognition](#1.1)
- [1.2 layout analysis](#1.2)
- [1.3 table recognition](#1.3)
-- [2. DocVQA](#2)
+- [2. KIE](#2)
## 1. Structure
@@ -63,19 +63,22 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
-## 2. DocVQA
+## 2. KIE
```bash
cd ppstructure
-# download model
mkdir inference && cd inference
-wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
+# download model
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_infer.tar && tar -xf ser_vi_layoutxlm_xfund_infer.tar
cd ..
-
-python3 predict_system.py --model_name_or_path=kie/PP-Layout_v1.0_ser_pretrained/ \
- --mode=kie \
- --image_dir=kie/images/input/zh_val_0.jpg \
- --vis_font_path=../doc/fonts/simfang.ttf
+python3 kie/predict_kie_token_ser.py \
+ --kie_algorithm=LayoutXLM \
+ --ser_model_dir=../inference/ser_vi_layoutxlm_xfund_infer \
+ --image_dir=./docs/kie/input/zh_val_42.jpg \
+ --ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt \
+ --vis_font_path=../doc/fonts/simfang.ttf \
+ --ocr_order_method="tb-yx"
```
+
After the operation is completed, each image will store the visualized image in the `kie` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
diff --git a/ppstructure/docs/installation.md b/ppstructure/docs/installation.md
index 3649e729d04ec83ba2d97571af993d75358eec73..0635580234abe2716769441d845df6386fbf5b86 100644
--- a/ppstructure/docs/installation.md
+++ b/ppstructure/docs/installation.md
@@ -1,7 +1,7 @@
- [快速安装](#快速安装)
- [1. PaddlePaddle 和 PaddleOCR](#1-paddlepaddle-和-paddleocr)
- [2. 安装其他依赖](#2-安装其他依赖)
- - [2.1 VQA所需依赖](#21--kie所需依赖)
+ - [2.1 KIE所需依赖](#21-kie所需依赖)
# 快速安装
@@ -11,16 +11,11 @@
## 2. 安装其他依赖
-### 2.1 VQA所需依赖
-* paddleocr
+### 2.1 KIE所需依赖
-```bash
-pip3 install paddleocr
-```
+* paddleocr
-* PaddleNLP
```bash
-git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
-cd PaddleNLP
-pip3 install -e .
+pip install paddleocr -U
+pip install -r ./kie/requirements.txt
```
diff --git a/ppstructure/docs/installation_en.md b/ppstructure/docs/installation_en.md
index 02b02db0c58f60a5296734b93563510732a7286d..de8bb5f6fc06fbd4f21cb0ca00ec80cce109ebf7 100644
--- a/ppstructure/docs/installation_en.md
+++ b/ppstructure/docs/installation_en.md
@@ -2,7 +2,7 @@
- [1. PaddlePaddle 和 PaddleOCR](#1)
- [2. Install other dependencies](#2)
- - [2.1 VQA](#21)
+ - [2.1 KIE](#21)
@@ -14,17 +14,11 @@ Please refer to [PaddleOCR installation documentation](../../doc/doc_en/installa
## 2. Install other dependencies
-### 2.1 VQA
+### 2.1 KIE
* paddleocr
```bash
-pip3 install paddleocr
-```
-
-* PaddleNLP
-```bash
-git clone https://github.com/PaddlePaddle/PaddleNLP -b develop
-cd PaddleNLP
-pip3 install -e .
+pip install paddleocr -U
+pip install -r ./kie/requirements.txt
```
diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md
index 7ba1d30464287eaf67a0265464fcc261e3b4407f..cb6857f62f54fb7830b8cc77023693849942081a 100644
--- a/ppstructure/docs/models_list_en.md
+++ b/ppstructure/docs/models_list_en.md
@@ -4,8 +4,7 @@
- [2. OCR and Table Recognition](#2-ocr-and-table-recognition)
- [2.1 OCR](#21-ocr)
- [2.2 Table Recognition](#22-table-recognition)
-- [3. VQA](#3-kie)
-- [4. KIE](#4-kie)
+- [3. KIE](#3-kie)
@@ -40,19 +39,25 @@ If you need to use other OCR models, you can download the model in [PP-OCR model
|ch_ppstructure_mobile_v2.0_SLANet|Chinese table recognition model trained on PubTabNet dataset based on SLANet|9.3M|[inference model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar) |
-## 3. VQA
-
-|model| description |inference model size|download|
-| --- |----------------------------------------------------------------| --- | --- |
-|ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
-|re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
-|ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
-|re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
-|ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
-
-
-## 4. KIE
-
-|model|description|model size|download|
-| --- | --- | --- | --- |
-|SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
+## 3. KIE
+
+On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows.
+
+|Model|Backbone|Task|Config|Hmean|Time cost(ms)|Download link|
+| --- | --- | --- | --- | --- | --- |--- |
+|VI-LayoutXLM| VI-LayoutXLM-base | SER | [ser_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml)|**93.19%**| 15.49| [trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar)|
+|LayoutXLM| LayoutXLM-base | SER | [ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%| 19.49 |[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)|
+|LayoutLM| LayoutLM-base | SER | [ser_layoutlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlm_xfund_zh.yml)|77.31%|-|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar)|
+|LayoutLMv2| LayoutLMv2-base | SER | [ser_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutlmv2_xfund_zh.yml)|85.44%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar)|
+|VI-LayoutXLM| VI-LayoutXLM-base | RE | [re_vi_layoutxlm_xfund_zh_udml.yml](../../configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh_udml.yml)|**83.92%**|15.49|[trained model](https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar)|
+|LayoutXLM| LayoutXLM-base | RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|19.49|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)|
+|LayoutLMv2| LayoutLMv2-base | RE | [re_layoutlmv2_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutlmv2_xfund_zh.yml)|67.77%|31.46|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar)|
+
+* Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: `V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4`
+
+
+On wildreceipt dataset, the algorithm result is as follows:
+
+|Model|Backbone|Config|Hmean|Download link|
+| --- | --- | --- | --- | --- |
+|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
diff --git a/ppstructure/docs/quickstart.md b/ppstructure/docs/quickstart.md
index 9a538a6f11d99e9caa4c3483421aaccc344079de..517703d7b6c172937b45dda15ffc816946a5ef53 100644
--- a/ppstructure/docs/quickstart.md
+++ b/ppstructure/docs/quickstart.md
@@ -7,16 +7,16 @@
- [2.1.2 版面分析+表格识别](#212-版面分析表格识别)
- [2.1.3 版面分析](#213-版面分析)
- [2.1.4 表格识别](#214-表格识别)
- - [2.1.5 DocVQA](#215-dockie)
+ - [2.1.5 关键信息抽取](#215-关键信息抽取)
- [2.2 代码使用](#22-代码使用)
- [2.2.1 图像方向分类版面分析表格识别](#221-图像方向分类版面分析表格识别)
- [2.2.2 版面分析+表格识别](#222-版面分析表格识别)
- [2.2.3 版面分析](#223-版面分析)
- [2.2.4 表格识别](#224-表格识别)
- - [2.2.5 DocVQA](#225-dockie)
+ - [2.2.5 关键信息抽取](#225-关键信息抽取)
- [2.3 返回结果说明](#23-返回结果说明)
- [2.3.1 版面分析+表格识别](#231-版面分析表格识别)
- - [2.3.2 DocVQA](#232-dockie)
+ - [2.3.2 关键信息抽取](#232-关键信息抽取)
- [2.4 参数说明](#24-参数说明)
@@ -26,8 +26,8 @@
```bash
# 安装 paddleocr,推荐使用2.5+版本
pip3 install "paddleocr>=2.5"
-# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过)
-pip install paddlenlp
+# 安装 关键信息抽取 依赖包(如不需要KIE功能,可跳过)
+pip install -r kie/requirements.txt
```
@@ -62,9 +62,9 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur
```
-#### 2.1.5 DocVQA
+#### 2.1.5 关键信息抽取
-请参考:[文档视觉问答](../kie/README.md)。
+请参考:[关键信息抽取教程](../kie/README_ch.md)。
### 2.2 代码使用
@@ -170,9 +170,9 @@ for line in result:
```
-#### 2.2.5 DocVQA
+#### 2.2.5 关键信息抽取
-请参考:[文档视觉问答](../kie/README.md)。
+请参考:[关键信息抽取教程](../kie/README_ch.md)。
### 2.3 返回结果说明
@@ -208,9 +208,9 @@ dict 里各个字段说明如下
```
-#### 2.3.2 DocVQA
+#### 2.3.2 关键信息抽取
-请参考:[文档视觉问答](../kie/README.md)。
+请参考:[关键信息抽取教程](../kie/README_ch.md)。
### 2.4 参数说明
diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md
index cf9d12ff9c1dadef95fedd3a02acb2146607aa96..3a4e7a2d60bb26fd4f453314a0783d4180dc7ad9 100644
--- a/ppstructure/docs/quickstart_en.md
+++ b/ppstructure/docs/quickstart_en.md
@@ -7,16 +7,16 @@
- [2.1.2 layout analysis + table recognition](#212-layout-analysis--table-recognition)
- [2.1.3 layout analysis](#213-layout-analysis)
- [2.1.4 table recognition](#214-table-recognition)
- - [2.1.5 DocVQA](#215-dockie)
+ - [2.1.5 Key Information Extraction](#215-Key-Information-Extraction)
- [2.2 Use by code](#22-use-by-code)
- [2.2.1 image orientation + layout analysis + table recognition](#221-image-orientation--layout-analysis--table-recognition)
- [2.2.2 layout analysis + table recognition](#222-layout-analysis--table-recognition)
- [2.2.3 layout analysis](#223-layout-analysis)
- [2.2.4 table recognition](#224-table-recognition)
- - [2.2.5 DocVQA](#225-dockie)
+ - [2.2.5 Key Information Extraction](#225-Key-Information-Extraction)
- [2.3 Result description](#23-result-description)
- [2.3.1 layout analysis + table recognition](#231-layout-analysis--table-recognition)
- - [2.3.2 DocVQA](#232-dockie)
+ - [2.3.2 Key Information Extraction](#232-Key-Information-Extraction)
- [2.4 Parameter Description](#24-parameter-description)
@@ -26,8 +26,8 @@
```bash
# Install paddleocr, version 2.5+ is recommended
pip3 install "paddleocr>=2.5"
-# Install the DocVQA dependency package paddlenlp (if you do not use the DocVQA, you can skip it)
-pip install paddlenlp
+# Install the KIE dependency packages (if you do not use the KIE, you can skip it)
+pip install -r kie/requirements.txt
```
@@ -62,9 +62,9 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur
```
-#### 2.1.5 DocVQA
+#### 2.1.5 Key Information Extraction
-Please refer to: [Documentation Visual Q&A](../kie/README.md) .
+Please refer to: [Key Information Extraction](../kie/README.md) .
### 2.2 Use by code
@@ -120,7 +120,7 @@ for line in result:
from PIL import Image
-font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
+font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # font provieded in PaddleOCR
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
@@ -170,9 +170,9 @@ for line in result:
```
-#### 2.2.5 DocVQA
+#### 2.2.5 Key Information Extraction
-Please refer to: [Documentation Visual Q&A](../kie/README.md) .
+Please refer to: [Key Information Extraction](../kie/README.md) .
### 2.3 Result description
@@ -208,9 +208,9 @@ After the recognition is completed, each image will have a directory with the sa
```
-#### 2.3.2 DocVQA
+#### 2.3.2 Key Information Extraction
-Please refer to: [Documentation Visual Q&A](../kie/README.md) .
+Please refer to: [Key Information Extraction](../kie/README.md) .
### 2.4 Parameter Description
diff --git a/ppstructure/kie/README.md b/ppstructure/kie/README.md
index 9e1b72e772f03a9dadd202268c39cba11f8f121e..adb19a3ca729821ab16bf8f0f8ec14c2376de1de 100644
--- a/ppstructure/kie/README.md
+++ b/ppstructure/kie/README.md
@@ -246,7 +246,7 @@ For training, evaluation and inference tutorial for text recognition models, ple
If you want to finish the KIE tasks in your scene, and don't know what to prepare, please refer to [End cdoc](../../doc/doc_en/recognition.md).
-关于怎样在自己的场景中完成关键信息抽取任务,请参考:[Guide to End-to-end KIE](./how_to_do_kie_en.md)。
+To complete the key information extraction task in your own scenario from data preparation to model selection, please refer to: [Guide to End-to-end KIE](./how_to_do_kie_en.md)。
## 5. Reference