@@ -20,13 +20,14 @@ PaddleOCR provides 2 service deployment methods:
# Service deployment based on PaddleHub Serving
The hubserving service deployment directory includes six service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, table recognition and PP-Structure. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:
The hubserving service deployment directory includes seven service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, layout analysis, table recognition and PP-Structure. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:
```
deploy/hubserving/
└─ ocr_det text detection module service package
└─ ocr_cls text angle class module service package
└─ ocr_rec text recognition module service package
└─ ocr_system text detection+text angle class+text recognition three-stage series connection service package
└─ structure_layout layout analysis service package
└─ structure_table table recognition service package
└─ structure_system PP-Structure service package
```
...
...
@@ -43,6 +44,7 @@ deploy/hubserving/ocr_system/
* 2022.05.05 add PP-OCRv3 text detection and recognition models.
* 2022.03.30 add PP-Structure and table recognition services。
* 2022.08.23 add layout analysis services。
## 2. Quick start service
...
...
@@ -61,7 +63,7 @@ Before installing the service module, you need to prepare the inference model an
text detection model: ./inference/ch_PP-OCRv3_det_infer/
text recognition model: ./inference/ch_PP-OCRv3_rec_infer/
text angle classifier: ./inference/ch_ppocr_mobile_v2.0_cls_infer/
@@ -192,6 +200,7 @@ For example, if using the configuration file to start the text angle classificat
`http://127.0.0.1:8868/predict/ocr_system`
`http://127.0.0.1:8869/predict/structure_table`
`http://127.0.0.1:8870/predict/structure_system`
`http://127.0.0.1:8870/predict/structure_layout`
-**image_dir**:Test image path, can be a single image path or an image directory path
-**visualize**:Whether to visualize the results, the default value is False
-**output**:The floder to save Visualization result, default value is `./hubserving_result`
...
...
@@ -212,17 +221,19 @@ The returned result is a list. Each item in the list is a dict. The dict may con
|text_region|list|text location coordinates|
|html|str|table html str|
|regions|list|The result of layout analysis + table recognition + OCR, each item is a list, including `bbox` indicating area coordinates, `type` of area type and `res` of area results|
|layout|list|The result of layout analysis, each item is a dict, including `bbox` indicating area coordinates, `label` of area type|
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
| field name/module name | ocr_det | ocr_cls | ocr_rec | ocr_system | structure_table | structure_system |
| --- | --- | --- | --- | --- | --- |--- |
|angle| | ✔ | | ✔ | ||
|text| | |✔|✔| | ✔ |
|confidence| |✔ |✔| | | ✔|
|text_region| ✔| | |✔ | | ✔|
|html| | | | |✔ |✔|
|regions| | | | |✔ |✔ |
| field name/module name | ocr_det | ocr_cls | ocr_rec | ocr_system | structure_table | structure_system | structure_layout |
| --- | --- | --- | --- | --- | --- |--- |--- |
|angle| | ✔ | | ✔ | || |
|text| | |✔|✔| | ✔ | |
|confidence| |✔ |✔| | | ✔| |
|text_region| ✔| | |✔ | | ✔| |
|html| | | | |✔ |✔| |
|regions| | | | |✔ |✔ | |
|layout| | | | | | |✔ |
**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
cfg.ir_optim=True
cfg.enable_mkldnn=enable_mkldnn
self.layout_predictor=_LayoutPredictor(cfg)
defmerge_configs(self):
# deafult cfg
backup_argv=copy.deepcopy(sys.argv)
sys.argv=sys.argv[:1]
cfg=parse_args()
update_cfg_map=vars(read_params())
forkeyinupdate_cfg_map:
cfg.__setattr__(key,update_cfg_map[key])
sys.argv=copy.deepcopy(backup_argv)
returncfg
defread_images(self,paths=[]):
images=[]
forimg_pathinpaths:
assertos.path.isfile(
img_path),"The {} isn't a valid file.".format(img_path)
img=cv2.imread(img_path)
ifimgisNone:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
returnimages
defpredict(self,images=[],paths=[]):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md).
@@ -15,13 +16,16 @@ English | [简体中文](README_ch.md)
Layout recovery means that after OCR recognition, the content is still arranged like the original document pictures, and the paragraphs are output to word document in the same order.
Layout recovery combines [layout analysis](../layout/README.md)、[table recognition](../table/README.md) to better recover images, tables, titles, etc.
The following figure shows the result:
Layout recovery combines [layout analysis](../layout/README.md)、[table recognition](../table/README.md) to better recover images, tables, titles, etc. supports input files in PDF and document image formats in Chinese and English. The following figure shows the effect of restoring the layout of English and Chinese documents:
The layout restoration is exported as docx and PDF files, so python-docx and docx2pdf API need to be installed, and fitz and PyMuPDF apis need to be installed to process the input files in pdf format.
Through layout analysis, we divided the image/PDF documents into regions, located the key regions, such as text, table, picture, etc., and recorded the location, category, and regional pixel value information of each region. Different regions are processed separately, where:
- OCR detection and recognition is performed in the text area, and the coordinates of the OCR detection box and the text content information are added on the basis of the previous information
- The table area identifies tables and records html and text information of tables
- Save the image directly
We can restore the test picture through the layout information, OCR detection and recognition structure, table information, and saved pictures.
<a name="3.1"></a>
### 3.1 Download models
...
...
@@ -85,9 +101,11 @@ https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && ta
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
If input is Chinese document,download Chinese models:
...
...
@@ -128,3 +146,15 @@ Field:
- recovery:whether to enable layout of recovery, default False
- save_pdf:when recovery file, whether to save pdf file, default False
- output:save the recovery result path
<a name="4"></a>
## 4. More
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/detection.md).
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition.md).
For training, evaluation and inference tutorial for layout analysis models, please refer to [layout analysis doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README_ch.md)
For training, evaluation and inference tutorial for table recognition models, please refer to [table recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/README_ch.md)