提交 a3d9e72c 编写于 作者: 文幕地方's avatar 文幕地方

update en doc

上级 13138b70
# 基于Python预测引擎推理 # Python Inference
- [1. Structure](#1) - [1. Structure](#1)
- [1.1 版面分析+表格识别](#1.1) - [1.1 layout analysis + table recognition](#1.1)
- [1.2 版面分析](#1.2) - [1.2 layout analysis](#1.2)
- [1.3 表格识别](#1.3) - [1.3 table recognition](#1.3)
- [2. DocVQA](#2) - [2. DocVQA](#2)
<a name="1"></a> <a name="1"></a>
## 1. Structure ## 1. Structure
进入`ppstructure`目录 Go to the `ppstructure` directory
```bash ```bash
cd ppstructure cd ppstructure
```` ````
下载模型
download model
```bash ```bash
mkdir inference && cd inference mkdir inference && cd inference
# 下载PP-OCRv2文本检测模型并解压 # Download the PP-OCRv2 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
# 下载PP-OCRv2文本识别模型并解压 # Download the PP-OCRv2 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
# 下载超轻量级英文表格预测模型并解压 # Download the ultra-lightweight English table structure model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
``` ```
<a name="1.1"></a> <a name="1.1"></a>
### 1.1 版面分析+表格识别 ### 1.1 layout analysis + table recognition
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
...@@ -36,17 +38,17 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -36,17 +38,17 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
--output=../output \ --output=../output \
--vis_font_path=../doc/fonts/simfang.ttf --vis_font_path=../doc/fonts/simfang.ttf
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。详细的结果会存储在`res.txt`文件中。 After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image. Detailed results are stored in the `res.txt` file.
<a name="1.2"></a> <a name="1.2"></a>
### 1.2 版面分析 ### 1.2 layout analysis
```bash ```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/ python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,图片区域会被裁剪之后保存下来,图片名为表格在图片里的坐标。版面分析结果会存储在`res.txt`文件中。 After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each picture in image will be cropped and saved. The filename of picture area is their coordinates in the image. Layout analysis results will be stored in the `res.txt` file
<a name="1.3"></a> <a name="1.3"></a>
### 1.3 表格识别 ### 1.3 table recognition
```bash ```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
...@@ -58,7 +60,7 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -58,7 +60,7 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
--vis_font_path=../doc/fonts/simfang.ttf \ --vis_font_path=../doc/fonts/simfang.ttf \
--layout=false --layout=false
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`structure`目录下有一个同名目录,表格会存储为一个excel,excel文件名为`[0,0,img_h,img_w]`。 After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
<a name="2"></a> <a name="2"></a>
## 2. DocVQA ## 2. DocVQA
...@@ -66,9 +68,8 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i ...@@ -66,9 +68,8 @@ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_i
```bash ```bash
cd ppstructure cd ppstructure
# 下载模型 # download model
mkdir inference && cd inference mkdir inference && cd inference
# 下载SER xfun 模型并解压
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
cd .. cd ..
...@@ -77,4 +78,4 @@ python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained ...@@ -77,4 +78,4 @@ python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained
--image_dir=vqa/images/input/zh_val_0.jpg \ --image_dir=vqa/images/input/zh_val_0.jpg \
--vis_font_path=../doc/fonts/simfang.ttf --vis_font_path=../doc/fonts/simfang.ttf
``` ```
运行完成后,每张图片会在`output`字段指定的目录下的`vqa`目录下存放可视化之后的图片,图片名和输入图片名一致。 After the operation is completed, each image will store the visualized image in the `vqa` directory under the directory specified by the `output` field, and the image name is the same as the input image name.
# PP-Structure 系列模型列表 # PP-Structure Model list
- [1. 版面分析模型](#1) - [1. Layout Analysis](#1)
- [2. OCR和表格识别模型](#2) - [2. OCR and Table Recognition](#2)
- [2.1 OCR](#21) - [2.1 OCR](#21)
- [2.2 表格识别模型](#22) - [2.2 Table Recognition](#22)
- [3. VQA模型](#3) - [3. VQA](#3)
- [4. KIE模型](#4) - [4. KIE](#4)
<a name="1"></a> <a name="1"></a>
## 1. 版面分析模型 ## 1. Layout Analysis
|模型名称|模型简介|下载地址|label_map| |model name| description |download|label_map|
| --- | --- | --- | --- | | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------| --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| | ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset, the model can recognition 5 types of areas such as **text, title, table, picture and list** | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [trained model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}|
| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}|
| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| | ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset, the model can only detect tables | [inference model](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}|
<a name="2"></a> <a name="2"></a>
## 2. OCR和表格识别模型 ## 2. OCR and Table Recognition
<a name="21"></a> <a name="21"></a>
### 2.1 OCR ### 2.1 OCR
|模型名称|模型简介|推理模型大小|下载地址| |model name| description | inference model size |download|
| --- | --- | --- | --- | | --- |---|---| --- |
|en_ppocr_mobile_v2.0_table_det|PubLayNet数据集训练的英文表格场景的文字检测|4.7M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) | |en_ppocr_mobile_v2.0_table_det| Text detection model of English table scenes trained on PubTabNet dataset | 4.7M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) |
|en_ppocr_mobile_v2.0_table_rec|PubLayNet数据集训练的英文表格场景的文字识别|6.9M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) | |en_ppocr_mobile_v2.0_table_rec| Text recognition model of English table scenes trained on PubTabNet dataset | 6.9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) |
如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。 If you need to use other OCR models, you can download the model in [PP-OCR model_list](../../doc/doc_ch/models_list.md) or use the model you trained yourself to configure to `det_model_dir`, `rec_model_dir` field.
<a name="22"></a> <a name="22"></a>
### 2.2 表格识别模型 ### 2.2 Table Recognition
|模型名称|模型简介|推理模型大小|下载地址| |model| description |inference model size|download|
| --- | --- | --- | --- | | --- |-----------------------------------------------------------------------------| --- | --- |
|en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | |en_ppocr_mobile_v2.0_table_structure| Table structure model for English table scenes trained on PubTabNet dataset |18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
<a name="3"></a> <a name="3"></a>
## 3. VQA模型 ## 3. VQA
|模型名称|模型简介|推理模型大小|下载地址| |model| description |inference model size|download|
| --- | --- | --- | --- | | --- |----------------------------------------------------------------| --- | --- |
|ser_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的SER模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | |ser_LayoutXLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | |re_LayoutXLM_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLM |1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|ser_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的SER模型|778M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | |ser_LayoutLMv2_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutXLMv2 |778M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) |
|re_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | |re_LayoutLMv2_xfun_zh| Re model trained on xfun Chinese dataset based on LayoutXLMv2 |765M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) |
|ser_LayoutLM_xfun_zh|基于LayoutLM在xfun中文数据集上训练的SER模型|430M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | |ser_LayoutLM_xfun_zh| SER model trained on xfun Chinese dataset based on LayoutLM |430M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) |
<a name="4"></a> <a name="4"></a>
## 4. KIE模型 ## 4. KIE
|模型名称|模型简介|模型大小|下载地址| |model|description|model size|download|
| --- | --- | --- | --- | | --- | --- | --- | --- |
|SDMGR|关键信息提取模型|78M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| |SDMGR|Key Information Extraction Model|78M|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
...@@ -191,7 +191,6 @@ dict 里各个字段说明如下 ...@@ -191,7 +191,6 @@ dict 里各个字段说明如下
| model_name_or_path | VQA SER模型地址 | None | | model_name_or_path | VQA SER模型地址 | None |
| max_seq_length | VQA SER模型最大支持token长度 | 512 | | max_seq_length | VQA SER模型最大支持token长度 | 512 |
| label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt | | label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt |
| mode | pipeline预测模式,structure: 版面分析+表格识别; VQA: SER文档信息抽取 | structure |
| layout | 前向中是否执行版面分析 | True | | layout | 前向中是否执行版面分析 | True |
| table | 前向中是否执行表格识别 | True | | table | 前向中是否执行表格识别 | True |
| ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False | True | | ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False | True |
......
# PP-Structure 快速开始 # PP-Structure Quick Start
- [1. 安装依赖包](#1) - [1. Install package](#1)
- [2. 便捷使用](#2) - [2. Use](#2)
- [2.1 命令行使用](#21) - [2.1 Use by command line](#21)
- [2.1.1 版面分析+表格识别](#211) - [2.1.1 layout analysis + table recognition](#211)
- [2.1.2 版面分析](#212) - [2.1.2 layout analysis](#212)
- [2.1.3 表格识别](#213) - [2.1.3 table recognition](#213)
- [2.1.4 DocVQA](#214) - [2.1.4 DocVQA](#214)
- [2.2 代码使用](#22) - [2.2 Use by code](#22)
- [2.2.1 版面分析+表格识别](#221) - [2.2.1 layout analysis + table recognition](#221)
- [2.2.2 版面分析](#222) - [2.2.2 layout analysis](#222)
- [2.2.3 表格识别](#223) - [2.2.3 table recognition](#223)
- [2.2.4 DocVQA](#224) - [2.2.4 DocVQA](#224)
- [2.3 返回结果说明](#23) - [2.3 Result description](#23)
- [2.3.1 版面分析+表格识别](#231) - [2.3.1 layout analysis + table recognition](#231)
- [2.3.2 DocVQA](#232) - [2.3.2 DocVQA](#232)
- [2.4 参数说明](#24) - [2.4 Parameter Description](#24)
<a name="1"></a> <a name="1"></a>
## 1. 安装依赖包 ## 1. Install package
```bash ```bash
# 安装 paddleocr,推荐使用2.5+版本 # Install paddleocr, version 2.5+ is recommended
pip3 install "paddleocr>=2.5" pip3 install "paddleocr>=2.5"
# 安装 版面分析依赖包layoutparser(如不需要版面分析功能,可跳过) # Install layoutparser (if you do not use the layout analysis, you can skip it)
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过) # Install the DocVQA dependency package paddlenlp (if you do not use the DocVQA, you can skip it)
pip install paddlenlp pip install paddlenlp
``` ```
<a name="2"></a> <a name="2"></a>
## 2. 便捷使用 ## 2. Use
<a name="21"></a> <a name="21"></a>
### 2.1 命令行使用 ### 2.1 Use by command line
<a name="211"></a> <a name="211"></a>
#### 2.1.1 版面分析+表格识别 #### 2.1.1 layout analysis + table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
``` ```
<a name="212"></a> <a name="212"></a>
#### 2.1.2 版面分析 #### 2.1.2 layout analysis
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
``` ```
<a name="213"></a> <a name="213"></a>
#### 2.1.3 表格识别 #### 2.1.3 table recognition
```bash ```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
``` ```
...@@ -58,13 +58,13 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur ...@@ -58,13 +58,13 @@ paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structur
<a name="214"></a> <a name="214"></a>
#### 2.1.4 DocVQA #### 2.1.4 DocVQA
请参考:[文档视觉问答](../vqa/README.md) Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="22"></a> <a name="22"></a>
### 2.2 代码使用 ### 2.2 Use by code
<a name="221"></a> <a name="221"></a>
#### 2.2.1 版面分析+表格识别 #### 2.2.1 layout analysis + table recognition
```python ```python
import os import os
...@@ -93,7 +93,7 @@ im_show.save('result.jpg') ...@@ -93,7 +93,7 @@ im_show.save('result.jpg')
``` ```
<a name="222"></a> <a name="222"></a>
#### 2.2.2 版面分析 #### 2.2.2 layout analysis
```python ```python
import os import os
...@@ -114,7 +114,7 @@ for line in result: ...@@ -114,7 +114,7 @@ for line in result:
``` ```
<a name="223"></a> <a name="223"></a>
#### 2.2.3 表格识别 #### 2.2.3 table recognition
```python ```python
import os import os
...@@ -137,14 +137,15 @@ for line in result: ...@@ -137,14 +137,15 @@ for line in result:
<a name="224"></a> <a name="224"></a>
#### 2.2.4 DocVQA #### 2.2.4 DocVQA
请参考:[文档视觉问答](../vqa/README.md) Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="23"></a> <a name="23"></a>
### 2.3 返回结果说明 ### 2.3 Result description
PP-Structure的返回结果为一个dict组成的list,示例如下
The return of PP-Structure is a list of dicts, the example is as follows:
<a name="231"></a> <a name="231"></a>
#### 2.3.1 版面分析+表格识别 #### 2.3.1 layout analysis + table recognition
```shell ```shell
[ [
{ 'type': 'Text', { 'type': 'Text',
...@@ -154,46 +155,44 @@ PP-Structure的返回结果为一个dict组成的list,示例如下 ...@@ -154,46 +155,44 @@ PP-Structure的返回结果为一个dict组成的list,示例如下
} }
] ]
``` ```
dict 里各个字段说明如下 Each field in dict is described as follows:
| 字段 | 说明 |
| --------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type| 图片区域的类型 |
|bbox| 图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y] |
|res| 图片区域的OCR或表格识别结果。<br> 表格: 一个dict,字段说明如下<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: 表格的HTML字符串<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 在代码使用模式下,前向传入return_ocr_result_in_table=True可以拿到表格中每个文本的检测识别结果,对应为如下字段: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: 文本检测坐标<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: 文本识别结果。<br> OCR: 一个包含各个单行文字的检测坐标和识别结果的元组 |
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。 | field | description |
| --------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type| Type of image area. |
|bbox| The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y]. |
|res| OCR or table recognition result of the image area. <br> table: a dict with field descriptions as follows: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: html str of table.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: text detection boxes.<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: text recognition results.<br> OCR: A tuple containing the detection boxes and recognition results of each single text. |
After the recognition is completed, each image will have a directory with the same name under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image.
``` ```
/output/table/1/ /output/table/1/
└─ res.txt └─ res.txt
└─ [454, 360, 824, 658].xlsx 表格识别结果 └─ [454, 360, 824, 658].xlsx table recognition result
└─ [16, 2, 828, 305].jpg 被裁剪出的图片区域 └─ [16, 2, 828, 305].jpg picture in Image
└─ [17, 361, 404, 711].xlsx 表格识别结果 └─ [17, 361, 404, 711].xlsx table recognition result
``` ```
<a name="232"></a> <a name="232"></a>
#### 2.3.2 DocVQA #### 2.3.2 DocVQA
请参考:[文档视觉问答](../vqa/README.md) Please refer to: [Documentation Visual Q&A](../vqa/README.md) .
<a name="24"></a> <a name="24"></a>
### 2.4 参数说明 ### 2.4 Parameter Description
| 字段 | 说明 | 默认值 | | field | description | default |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------| |----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| output | excel和识别结果保存的地址 | ./output/table | | output | The save path of result | ./output/table |
| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 | | table_max_len | When the table structure model predicts, the long side of the image | 488 |
| table_model_dir | 表格结构模型 inference 模型地址 | None | | table_model_dir | the path of table structure model | None |
| table_char_dict_path | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt | | table_char_dict_path | the dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.txt |
| layout_path_model | 版面分析模型模型地址,可以为在线地址或者本地地址,当为本地地址时,需要指定 layout_label_map, 命令行模式下可通过--layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' 指定 | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config | | layout_path_model | The model path of the layout analysis model, which can be an online address or a local path. When it is a local path, layout_label_map needs to be set. In command line mode, use --layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config |
| layout_label_map | 版面分析模型模型label映射字典 | None | | layout_label_map | Layout analysis model model label mapping dictionary path | None |
| model_name_or_path | VQA SER模型地址 | None | | model_name_or_path | the model path of VQA SER model | None |
| max_seq_length | VQA SER模型最大支持token长度 | 512 | | max_seq_length | the max token length of VQA SER model | 512 |
| label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt | | label_map_path | the label path of VQA SER model | ./vqa/labels/labels_ser.txt |
| mode | pipeline预测模式,structure: 版面分析+表格识别; VQA: SER文档信息抽取 | structure | | layout | Whether to perform layout analysis in forward | True |
| layout | 前向中是否执行版面分析 | True | | table | Whether to perform table recognition in forward | True |
| table | 前向中是否执行表格识别 | True | | ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False | True |
| ocr | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False | True |
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册