| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |
| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |
| show_log | Whether to print log in det and rec
| show_log | Whether to print log in det and rec | FALSE |
| FALSE |
| type | Perform ocr or table structuring, the value is selected in ['ocr','structure'] | ocr |
PaddleStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
The return result of PaddleStructure is a list composed of a dict, an example is as follows
The return result of PPStructure is a list composed of a dict, an example is as follows
```shell
```shell
[
[
...
@@ -91,12 +82,12 @@ Most of the parameters are consistent with the paddleocr whl package, see [doc o
...
@@ -91,12 +82,12 @@ Most of the parameters are consistent with the paddleocr whl package, see [doc o
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel, and the excel file name will be the coordinates of the table in the image.
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel, and the excel file name will be the coordinates of the table in the image.
## 2. PaddleStructure Pipeline
## 2. PPStructure Pipeline
the process is as follows
the process is as follows
![pipeline](../doc/table/pipeline_en.jpg)
![pipeline](../doc/table/pipeline_en.jpg)
In PaddleStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will be converted to an excel file of the same table style via Table OCR.
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will be converted to an excel file of the same table style via Table OCR.