diff --git a/README.md b/README.md index 259ccb5aa02352ca2a2b81bf81d858cec2b47081..835b1e2509ebca6f6d0dd71a53a7ec02a147efcf 100644 --- a/README.md +++ b/README.md @@ -19,12 +19,9 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools **Recent updates** -- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207 -- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa)). -- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758). -- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. -- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). -- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized. +- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR, [tutorial](./ppstructure/docs/kie_en.md)) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM, [tutorial](./ppstructure/vqa)). +- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. +- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](./ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - [more](./doc/doc_en/update_en.md) @@ -81,7 +78,6 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel ## Tutorials - [Environment Preparation](./doc/doc_en/environment_en.md) -- [Quick Start](./doc/doc_en/quickstart_en.md) - [PP-OCR 🔥](./doc/doc_en/ppocr_introduction_en.md) - [Quick Start](./doc/doc_en/quickstart_en.md) - [Model Zoo](./doc/doc_en/models_en.md) diff --git a/README_ch.md b/README_ch.md index c040853074b3b6f99895ee984ed9828140fa5713..1988bfb016d2a0d5bd343dd7e93d1e168773a25a 100755 --- a/README_ch.md +++ b/README_ch.md @@ -27,10 +27,9 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 近期更新 -- 2021.12.21《动手学OCR · 十讲》课程开讲,12月21日起每晚八点半线上授课![免费报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)。 -- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.4/ppstructure/vqa))。 -- 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 -- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 +- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR,[文档](./ppstructure/docs/kie.md)),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM,[文档](./ppstructure/vqa))。 +- 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](./doc/doc_ch/ppocr_introduction.md#pp-ocrv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 +- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](./ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 > [更多](./doc/doc_ch/update.md) @@ -83,7 +82,6 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 文档教程 - [运行环境准备](./doc/doc_ch/environment.md) -- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md) - [PP-OCR文本检测识别🔥](./doc/doc_ch/ppocr_introduction.md) - [快速开始](./doc/doc_ch/quickstart.md) - [模型库](./doc/doc_ch/models_list.md) diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md index 6301755de8e41e497b83d54c897b2b939d758cdc..bc8e5a0a1eb0b0a299c94272142dbf21a5c75d91 100644 --- a/doc/doc_ch/quickstart.md +++ b/doc/doc_ch/quickstart.md @@ -110,7 +110,7 @@ cd /path/to/ppocr_img #### 2.1.2 多语言模型 -Paddleocr目前支持80个语种,可以通过修改`--lang`参数进行切换,对于英文模型,指定`--lang=en`, PP-OCRv3目前只支持中文和英文模型,其他多语言模型会陆续更新。 +PaddleOCR目前支持80个语种,可以通过修改`--lang`参数进行切换,对于英文模型,指定`--lang=en`, PP-OCRv3目前只支持中文和英文模型,其他多语言模型会陆续更新。 ``` bash paddleocr --image_dir ./imgs_en/254.jpg --lang=en --rec_image_shape 3,48,320 diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md index bf1ce05cf444937ae1565ed71f11ca82ee4f4f33..4a31924c7993960f1510816033b68af08c79b9a0 100644 --- a/doc/doc_en/quickstart_en.md +++ b/doc/doc_en/quickstart_en.md @@ -1,6 +1,6 @@ # PaddleOCR Quick Start -**Note:** this tutorial mainly introduces the usage of PP-OCR series models, please refer to [PP-Structure Quick Start](../../ppstructure/docs/quickstart_en.md) for the quick use of document analysis related functions. +**Note:** This tutorial mainly introduces the usage of PP-OCR series models, please refer to [PP-Structure Quick Start](../../ppstructure/docs/quickstart_en.md) for the quick use of document analysis related functions. - [1. Installation](#1-installation) - [1.1 Install PaddlePaddle](#11-install-paddlepaddle) @@ -9,10 +9,8 @@ - [2.1 Use by Command Line](#21-use-by-command-line) - [2.1.1 Chinese and English Model](#211-chinese-and-english-model) - [2.1.2 Multi-language Model](#212-multi-language-model) - - [2.1.3 Layout Analysis](#213-layout-analysis) - [2.2 Use by Code](#22-use-by-code) - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model) - - [2.2.2 Layout Analysis](#222-layout-analysis) - [3. Summary](#3-summary) @@ -128,7 +126,7 @@ If you need to use the 2.0 model, please specify the parameter `--version PP-OCR #### 2.1.2 Multi-language Model -Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter. PP-OCRv3 currently only supports Chinese and English models, and other multilingual models will be updated one after another. +PaddleOCR currently supports 80 languages, which can be switched by modifying the `--lang` parameter. PP-OCRv3 currently only supports Chinese and English models, and other multilingual models will be updated one after another. ``` bash paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en --rec_image_shape 3,48,320 @@ -156,48 +154,7 @@ Commonly used multilingual abbreviations include | Chinese Traditional | chinese_cht | | Italian | it | | Russian | ru | A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md) - -#### 2.1.3 Layout Analysis - -Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image. - -To use the layout analysis function of PaddleOCR, you need to specify `--type=structure` - -```bash -paddleocr --image_dir=../doc/table/1.png --type=structure -``` - -- **Results Format** - - The returned results of PP-Structure is a list composed of a dict, an example is as follows - - ```shell - [ - { 'type': 'Text', - 'bbox': [34, 432, 345, 462], - 'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], - [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)]) - } - ] - ``` - - The description of each field in dict is as follows - - | Parameter | Description | - | --------- | ------------------------------------------------------------ | - | type | Type of image area | - | bbox | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] | - | res | OCR or table recognition result of image area。
Table: HTML string of the table;
OCR: A tuple containing the detection coordinates and recognition results of each single line of text | - -- **Parameter Description:** - - | Parameter | Description | Default value | - | --------------- | ------------------------------------------------------------ | -------------------------------------------- | - | output | The path where excel and recognition results are saved | ./output/table | - | table_max_len | The long side of the image is resized in table structure model | 488 | - | table_model_dir | inference model path of table structure model | None | - | table_char_dict_path | dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.txt | @@ -245,40 +202,12 @@ Visualization of results
- - -#### 2.2.2 Layout Analysis -```python -import os -import cv2 -from paddleocr import PPStructure,draw_structure_result,save_structure_res - -table_engine = PPStructure(show_log=True) - -save_folder = './output/table' -img_path = './table/1.png' -img = cv2.imread(img_path) -result = table_engine(img) -save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0]) - -for line in result: - line.pop('img') - print(line) - -from PIL import Image - -font_path = './fonts/simfang.ttf' -image = Image.open(img_path).convert('RGB') -im_show = draw_structure_result(image, result,font_path=font_path) -im_show = Image.fromarray(im_show) -im_show.save('result.jpg') -``` ## 3. Summary -In this section, you have mastered the use of PaddleOCR whl packages and obtained results. +In this section, you have mastered the use of PaddleOCR whl package. -PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR. +PaddleOCR is a rich and practical OCR tool library that get through the whole process of data production, model training, compression, inference and deployment, please refer to the [tutorials](../../README.md#tutorials) to start the journey of PaddleOCR. \ No newline at end of file