diff --git a/README.md b/README.md index b938a2e2ba916384fc65827c4a1c11a2cec4269f..ad1ebd96e91ed963396dcc4afb445298b09f7d61 100644 --- a/README.md +++ b/README.md @@ -3,10 +3,6 @@ English | [简体中文](README_ch.md)

- - ------------------------------------------------------------------------------------------- -

@@ -32,60 +28,42 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools - [more](./doc/doc_en/update_en.md) -## Features -- PP-OCR - A series of high-quality pre-trained models, comparable to commercial products - - Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M - - Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M - - General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M - - Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition - - Support multi-lingual recognition: about 80 languages like Korean, Japanese, German, French, etc -- PP-Structure: a document structurize system - - Support layout analysis and table recognition (support export to Excel) - - Support key information extraction - - Support DocVQA -- Rich OCR toolkit - - Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation - - Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image -- Support user-defined training, provides rich predictive inference deployment solutions -- Support PIP installation, easy to use -- Support Linux, Windows, MacOS and other systems - -## Visualization -

- - - -
+## Features -The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md). +PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](./doc/doc_en/ppocr_introduction_en.md) and [PP-Structure](./ppstructure/README.md) on this basis, and get through the whole process of data production, model training, compression, inference and deployment. - -## Community -- Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation. +![](./doc/features_en.png) -
- -
+> It is recommended to start with the “quick experience” in the document tutorial ## Quick Experience -You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) +- Web online experience for the ultra-lightweight OCR: [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) +- Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) +- One line of code quick use: [Quick Start](./doc/doc_en/quickstart_en.md) + + + +## E-book: *Dive Into OCR* +- [Dive Into OCR 📚](./doc/doc_en/ocr_book_en.md) + + + +## Community -Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) +- **Join us**👬: Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation. +- **Contribution**🏅️: [Contribution page](./doc/doc_en/thirdparty.md) contains various tools and applications developed by community developers using PaddleOCR, as well as the functions, optimized documents and codes contributed to PaddleOCR. It is an official honor wall for community developers and a broadcasting station to help publicize high-quality projects. +- **Regular Season**🎁: The community regular season is a point competition for OCR developers, covering four types: documents, codes, models and applications. Awards are selected and awarded on a quarterly basis. Please refer to the [link](https://github.com/PaddlePaddle/PaddleOCR/issues/4982) for more details. - Also, you can scan the QR code below to install the App (**Android support only**)
- +
-- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md) - - ## PP-OCR Series Model List(Update on September 8th) | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | @@ -95,41 +73,48 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr | Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) | -For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md). - -For a new language request, please refer to [Guideline for new language_requests](#language_requests). +- For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md). +- For a new language request, please refer to [Guideline for new language_requests](#language_requests). +- For structural document analysis models, please refer to [PP-Structure models](./ppstructure/docs/models_list_en.md). ## Tutorials - [Environment Preparation](./doc/doc_en/environment_en.md) - [Quick Start](./doc/doc_en/quickstart_en.md) -- [PaddleOCR Overview and Project Clone](./doc/doc_en/paddleOCR_overview_en.md) -- PP-OCR Industry Landing: from Training to Deployment - - [PP-OCR Model Zoo](./doc/doc_en/models_en.md) - - [PP-OCR Model Download](./doc/doc_en/models_list_en.md) - - [Python Inference for PP-OCR Model Zoo](./doc/doc_en/inference_ppocr_en.md) - - [PP-OCR Training](./doc/doc_en/training_en.md) +- [PP-OCR 🔥](./doc/doc_en/ppocr_introduction_en.md) + - [Quick Start](./doc/doc_en/quickstart_en.md) + - [Model Zoo](./doc/doc_en/models_en.md) + - [Model training](./doc/doc_en/training_en.md) - [Text Detection](./doc/doc_en/detection_en.md) - [Text Recognition](./doc/doc_en/recognition_en.md) - [Text Direction Classification](./doc/doc_en/angle_class_en.md) - - [Yml Configuration](./doc/doc_en/config_en.md) - - PP-OCR Models Compression - - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md) + - Model Compression - [Model Quantization](./deploy/slim/quantization/README_en.md) - [Model Pruning](./deploy/slim/prune/README_en.md) + - [Knowledge Distillation](./doc/doc_en/knowledge_distillation_en.md) - [Inference and Deployment](./deploy/README.md) - - [C++ Inference](./deploy/cpp_infer/readme_en.md) + - [Python Inference](./doc/doc_en/inference_ppocr_en.md) + - [C++ Inference](./deploy/cpp_infer/readme.md) - [Serving](./deploy/pdserving/README.md) - - [Mobile](./deploy/lite/readme_en.md) + - [Mobile](./deploy/lite/readme.md) + - [Paddle2ONNX](./deploy/paddle2onnx/readme.md) - [Benchmark](./doc/doc_en/benchmark_en.md) -- [PP-Structure: Information Extraction](./ppstructure/README.md) - - [Layout Parser](./ppstructure/layout/README.md) - - [Table Recognition](./ppstructure/table/README.md) - - [DocVQA](./ppstructure/vqa/README.md) - - [Key Information Extraction](./ppstructure/docs/kie.md) -- Academic Circles - - [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md) - - [PGNet Algorithm](./doc/doc_en/pgnet_en.md) - - [Python Inference](./doc/doc_en/inference_en.md) +- [PP-Structure 🔥](./ppstructure/README.md) + - [Quick Start](./ppstructure/docs/quickstart_en.md) + - [Model Zoo](./ppstructure/docs/models_list_en.md) + - [Model training](./doc/doc_en/training_en.md) + - [Layout Parser](./ppstructure/layout/README.md) + - [Table Recognition](./ppstructure/table/README.md) + - [DocVQA](./ppstructure/vqa/README.md) + - [Key Information Extraction](./ppstructure/docs/kie_en.md) + - [Inference and Deployment](./deploy/README.md) + - [Python Inference](./ppstructure/docs/inference_en.md) + - [C++ Inference]() + - [Serving](./deploy/pdserving/README.md) +- [Academic algorithms](./doc/doc_en/algorithms_en.md) + - [Text detection](./doc/doc_en/algorithm_overview_en.md) + - [Text recognition](./doc/doc_en/algorithm_overview_en.md) + - [End-to-end](./doc/doc_en/algorithm_overview_en.md) + - [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md) - Data Annotation and Synthesis - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) - [Data Synthesis Tool: Style-Text](./StyleText/README.md) @@ -139,28 +124,16 @@ For a new language request, please refer to [Guideline for new language_requests - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) +- [Code Structure](./doc/doc_en/tree_en.md) - [Visualization](#Visualization) +- [Community](#Community) - [New language requests](#language_requests) - [FAQ](./doc/doc_en/FAQ_en.md) -- [Community](#Community) - [References](./doc/doc_en/reference_en.md) - [License](#LICENSE) -- [Contribution](#CONTRIBUTION) - - - -## PP-OCRv2 Pipeline -
- -
- -[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). - -[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144). - - + ## Visualization [more](./doc/doc_en/visualization_en.md) - Chinese OCR model
@@ -197,20 +170,4 @@ More details, please refer to [Multilingual OCR Development Plan](https://github ## License -This project is released under Apache 2.0 license - - -## Contribution -We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. - -- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation. -- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitignore and discard set PYTHONPATH manually. -- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. -- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets. -- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. -- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. -- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. -- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment. -- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set. -- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language. -- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。 +This project is released under Apache 2.0 license \ No newline at end of file diff --git a/README_ch.md b/README_ch.md index fdf7bc844128bb026cffce8f2dd419cf05545ba8..d3b26ee9d99c839ac9823dd635dabe41f13f5d31 100755 --- a/README_ch.md +++ b/README_ch.md @@ -75,7 +75,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 | 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | | 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | -更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./doc/ppstructure/models_list.md) +更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure 系列模型下载](./ppstructure/docs/models_list.md) ## 文档教程 diff --git a/doc/doc_en/ocr_book_en.md b/doc/doc_en/ocr_book_en.md new file mode 100644 index 0000000000000000000000000000000000000000..bbf202cbde31c25ef7da771fa03ad0819f2b7c4e --- /dev/null +++ b/doc/doc_en/ocr_book_en.md @@ -0,0 +1 @@ +# E-book: *Dive Into OCR* \ No newline at end of file diff --git a/doc/features_en.png b/doc/features_en.png new file mode 100644 index 0000000000000000000000000000000000000000..9f0a66299bb5e922257e3327b0c6cf2d3ebfe05b Binary files /dev/null and b/doc/features_en.png differ diff --git a/ppstructure/docs/inference_en.md b/ppstructure/docs/inference_en.md new file mode 100644 index 0000000000000000000000000000000000000000..bfcdbd0c07da6e3a9168c3b7464183ac5dfba536 --- /dev/null +++ b/ppstructure/docs/inference_en.md @@ -0,0 +1,50 @@ +# 基于Python预测引擎推理 + +- [版面分析+表格识别](#1) +- [DocVQA](#2) + + +## 1. 版面分析+表格识别 + +```bash +cd ppstructure + +# 下载模型 +mkdir inference && cd inference +# 下载PP-OCRv2文本检测模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar +# 下载PP-OCRv2文本识别模型并解压 +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar +# 下载超轻量级英文表格预测模型并解压 +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar +cd .. + +python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \ + --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \ + --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ + --image_dir=../doc/table/1.png \ + --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ + --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ + --output=../output/table \ + --vis_font_path=../doc/fonts/simfang.ttf +``` +运行完成后,每张图片会在`output`字段指定的目录下的`talbe`目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名名为表格在图片里的坐标。 + + +## 2. DocVQA + +```bash +cd ppstructure + +# 下载模型 +mkdir inference && cd inference +# 下载SER xfun 模型并解压 +wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar +cd .. + +python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \ + --mode=vqa \ + --image_dir=vqa/images/input/zh_val_0.jpg \ + --vis_font_path=../doc/fonts/simfang.ttf +``` +运行完成后,每张图片会在`output`字段指定的目录下的`vqa`目录下存放可视化之后的图片,图片名和输入图片名一致。 \ No newline at end of file diff --git a/ppstructure/docs/models_list_en.md b/ppstructure/docs/models_list_en.md new file mode 100644 index 0000000000000000000000000000000000000000..c7dab999ff6e370c56c5495e22e91f117b3d1275 --- /dev/null +++ b/ppstructure/docs/models_list_en.md @@ -0,0 +1,56 @@ +# PP-Structure 系列模型列表 + +- [1. 版面分析模型](#1) +- [2. OCR和表格识别模型](#2) + - [2.1 OCR](#21) + - [2.2 表格识别模型](#22) +- [3. VQA模型](#3) +- [4. KIE模型](#4) + + + +## 1. 版面分析模型 + +|模型名称|模型简介|下载地址|label_map| +| --- | --- | --- | --- | +| ppyolov2_r50vd_dcn_365e_publaynet | PubLayNet 数据集训练的版面分析模型,可以划分**文字、标题、表格、图片以及列表**5类区域 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet_pretrained.pdparams) |{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}| +| ppyolov2_r50vd_dcn_365e_tableBank_word | TableBank Word 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | {0:"Table"}| +| ppyolov2_r50vd_dcn_365e_tableBank_latex | TableBank Latex 数据集训练的版面分析模型,只能检测表格 | [推理模型](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) | {0:"Table"}| + + +## 2. OCR和表格识别模型 + + +### 2.1 OCR + +|模型名称|模型简介|推理模型大小|下载地址| +| --- | --- | --- | --- | +|en_ppocr_mobile_v2.0_table_det|PubLayNet数据集训练的英文表格场景的文字检测|4.7M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_det_train.tar) | +|en_ppocr_mobile_v2.0_table_rec|PubLayNet数据集训练的英文表格场景的文字识别|6.9M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) | + +如需要使用其他OCR模型,可以在 [PP-OCR model_list](../../doc/doc_ch/models_list.md) 下载模型或者使用自己训练好的模型配置到 `det_model_dir`, `rec_model_dir`两个字段即可。 + + +### 2.2 表格识别模型 + +|模型名称|模型简介|推理模型大小|下载地址| +| --- | --- | --- | --- | +|en_ppocr_mobile_v2.0_table_structure|PubLayNet数据集训练的英文表格场景的表格结构预测|18.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) | + + +## 3. VQA模型 + +|模型名称|模型简介|推理模型大小|下载地址| +| --- | --- | --- | --- | +|ser_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的SER模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) | +|re_LayoutXLM_xfun_zh|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) | +|ser_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的SER模型|778M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLMv2_xfun_zh.tar) | +|re_LayoutLMv2_xfun_zh|基于LayoutLMv2在xfun中文数据集上训练的RE模型|765M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutLMv2_xfun_zh.tar) | +|ser_LayoutLM_xfun_zh|基于LayoutLM在xfun中文数据集上训练的SER模型|430M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutLM_xfun_zh.tar) | + + +## 4. KIE模型 + +|模型名称|模型简介|模型大小|下载地址| +| --- | --- | --- | --- | +|SDMGR|关键信息提取模型|78M|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)| diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md new file mode 100644 index 0000000000000000000000000000000000000000..45643de003c3bdf9c22d43dd9c1118026f8ae34f --- /dev/null +++ b/ppstructure/docs/quickstart_en.md @@ -0,0 +1,138 @@ +# PP-Structure 快速开始 + +- [1. 安装依赖包](#1) +- [2. 便捷使用](#2) + - [2.1 命令行使用](#21) + - [2.1.1 版面分析+表格识别](#211) + - [2.1.2 DocVQA](#212) + - [2.2 Python脚本使用](#22) + - [2.2.1 版面分析+表格识别](#221) + - [2.2.2 DocVQA](#222) + - [2.3 返回结果说明](#23) + - [2.3.1 版面分析+表格识别](#231) + - [2.3.2 DocVQA](#232) + - [2.4 参数说明](#24) + + + +## 1. 安装依赖包 + +```bash +# 安装 paddleocr,推荐使用2.3.0.2+版本 +pip3 install "paddleocr>=2.3.0.2" +# 安装 版面分析依赖包layoutparser(如不需要版面分析功能,可跳过) +pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl +# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过) +pip install paddlenlp + +``` + + +## 2. 便捷使用 + + +### 2.1 命令行使用 + + +#### 2.1.1 版面分析+表格识别 +```bash +paddleocr --image_dir=../doc/table/1.png --type=structure +``` + + +#### 2.1.2 DocVQA + +请参考:[文档视觉问答](../vqa/README.md)。 + + +### 2.2 Python脚本使用 + + +#### 2.2.1 版面分析+表格识别 + +```python +import os +import cv2 +from paddleocr import PPStructure,draw_structure_result,save_structure_res + +table_engine = PPStructure(show_log=True) + +save_folder = './output/table' +img_path = '../doc/table/1.png' +img = cv2.imread(img_path) +result = table_engine(img) +save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) + +for line in result: + line.pop('img') + print(line) + +from PIL import Image + +font_path = '../doc/fonts/simfang.ttf' # PaddleOCR下提供字体包 +image = Image.open(img_path).convert('RGB') +im_show = draw_structure_result(image, result,font_path=font_path) +im_show = Image.fromarray(im_show) +im_show.save('result.jpg') +``` + + +#### 2.2.2 DocVQA + +请参考:[文档视觉问答](../vqa/README.md)。 + + +### 2.3 返回结果说明 +PP-Structure的返回结果为一个dict组成的list,示例如下 + + +#### 2.3.1 版面分析+表格识别 +```shell +[ + { 'type': 'Text', + 'bbox': [34, 432, 345, 462], + 'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], + [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)]) + } +] +``` +dict 里各个字段说明如下 + +| 字段 | 说明 | +| --------------- | -------------| +|type|图片区域的类型| +|bbox|图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y]| +|res|图片区域的OCR或表格识别结果。
表格: 表格的HTML字符串;
OCR: 一个包含各个单行文字的检测坐标和识别结果的元组| + +运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。 + + ``` + /output/table/1/ + └─ res.txt + └─ [454, 360, 824, 658].xlsx 表格识别结果 + └─ [16, 2, 828, 305].jpg 被裁剪出的图片区域 + └─ [17, 361, 404, 711].xlsx 表格识别结果 + ``` + + +#### 2.3.2 DocVQA + +请参考:[文档视觉问答](../vqa/README.md)。 + + +### 2.4 参数说明 + +| 字段 | 说明 | 默认值 | +| --------------- | ---------------------------------------- | ------------------------------------------- | +| output | excel和识别结果保存的地址 | ./output/table | +| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 | +| table_model_dir | 表格结构模型 inference 模型地址 | None | +| table_char_dict_path | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt | +| layout_path_model | 版面分析模型模型地址,可以为在线地址或者本地地址,当为本地地址时,需要指定 layout_label_map, 命令行模式下可通过--layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' 指定 | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config | +| layout_label_map | 版面分析模型模型label映射字典 | None | +| model_name_or_path | VQA SER模型地址 | None | +| max_seq_length | VQA SER模型最大支持token长度 | 512 | +| label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt | +| mode | pipeline预测模式,structure: 版面分析+表格识别; VQA: SER文档信息抽取 | structure | + +大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)