diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md
index 2d6e7f98bb2c2dc4d1c696628e45f4649bf84c1c..9e5b3245b0cfb56d300155a94f64d38edcdbb599 100644
--- a/PPOCRLabel/README.md
+++ b/PPOCRLabel/README.md
@@ -34,10 +34,10 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
pip3 install --upgrade pip
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
-python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# If you only have cpu on your machine, please run the following command to install
-python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md
index ecc2ab600eaf6bcfe71923f7fc6a9de82fa54ba7..7f9351dfe185be2417162f2c786f5eec0b58816a 100644
--- a/PPOCRLabel/README_ch.md
+++ b/PPOCRLabel/README_ch.md
@@ -37,11 +37,11 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置P
pip3 install --upgrade pip
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
-python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
如果您的机器是CPU,请运行以下命令安装
-python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
diff --git a/README.md b/README.md
index e3d0ff4eff3d22950a56159add6a87f350b3d78f..19b848772d8ac5ee91f76c00ff2f6a89f77c226c 100644
--- a/README.md
+++ b/README.md
@@ -82,7 +82,7 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
-## PP-OCR series model list(Update on September 8th)
+## PP-OCR Series Model List(Update on September 8th)
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
@@ -106,7 +106,7 @@ For a new language request, please refer to [Guideline for new language_requests
- [PP-OCR Training](./doc/doc_en/training_en.md)
- [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md)
- - [Direction Classification](./doc/doc_en/angle_class_en.md)
+ - [Text Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
@@ -174,7 +174,7 @@ For a new language request, please refer to [Guideline for new language_requests
-## Guideline for new language requests
+## Guideline for New Language Requests
If you want to request a new language support, a PR with 2 following files are needed:
diff --git a/README_ch.md b/README_ch.md
index 799898e5de71a53b1dda9622e11c0aa7e716b1ca..f58bd3c711cd93b1e1d3b6a02f339d4512a3b872 100755
--- a/README_ch.md
+++ b/README_ch.md
@@ -98,7 +98,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- - [方向分类器](./doc/doc_ch/angle_class.md)
+ - [文本方向分类器](./doc/doc_ch/angle_class.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- PP-OCR模型推理部署
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
diff --git a/deploy/cpp_infer/readme.md b/deploy/cpp_infer/readme.md
index 9bdd54669faec874e3cdad59f604882ab0bce010..f88d021d0a050aeecf859981cc2de1cee8f3a2c0 100644
--- a/deploy/cpp_infer/readme.md
+++ b/deploy/cpp_infer/readme.md
@@ -4,15 +4,32 @@
C++在性能计算上优于python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成
PaddleOCR模型部署。
+* [1. 准备环境](#1)
+ + [1.0 运行准备](#10)
+ + [1.1 编译opencv库](#11)
+ + [1.2 下载或者编译Paddle预测库](#12)
+ - [1.2.1 直接下载安装](#121)
+ - [1.2.2 预测库源码编译](#122)
+* [2 开始运行](#2)
+ + [2.1 将模型导出为inference model](#21)
+ + [2.2 编译PaddleOCR C++预测demo](#22)
+ + [2.3运行demo](#23)
+
+
## 1. 准备环境
-### 运行准备
+
+
+### 1.0 运行准备
+
- Linux环境,推荐使用docker。
- Windows环境,目前支持基于`Visual Studio 2019 Community`进行编译。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
+
+
### 1.1 编译opencv库
* 首先需要从opencv官网上下载在Linux环境下源码编译的包,以opencv3.4.7为例,下载命令如下。
@@ -71,6 +88,8 @@ opencv3/
|-- share
```
+
+
### 1.2 下载或者编译Paddle预测库
* 有2种方式获取Paddle预测库,下面进行详细介绍。
@@ -132,9 +151,12 @@ build/paddle_inference_install_dir/
其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
+
## 2 开始运行
+
+
### 2.1 将模型导出为inference model
* 可以参考[模型预测章节](../../doc/doc_ch/inference.md),导出inference model,用于模型预测。模型导出之后,假设放在`inference`目录下,则目录结构如下。
@@ -149,6 +171,7 @@ inference/
| |--inference.pdmodel
```
+
### 2.2 编译PaddleOCR C++预测demo
@@ -172,13 +195,14 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
* 编译完成之后,会在`build`文件夹下生成一个名为`ppocr`的可执行文件。
+
-### 运行demo
+### 2.3 运行demo
运行方式:
```shell
./build/ppocr [--param1] [--param2] [...]
-```
+```
其中,`mode`为必选参数,表示选择的功能,取值范围['det', 'rec', 'system'],分别表示调用检测、识别、检测识别串联(包括方向分类器)。具体命令如下:
##### 1. 只调用检测:
@@ -258,6 +282,4 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
-### 2.3 注意
-
-* 在使用Paddle预测库时,推荐使用2.0.0版本的预测库。
+**注意:在使用Paddle预测库时,推荐使用2.0.0版本的预测库。**
diff --git a/deploy/cpp_infer/readme_en.md b/deploy/cpp_infer/readme_en.md
index 039aecf1ba3d6c1c717bafbecdb117416a1acc32..48de51ae726e662f48d465b8489a494448dafac1 100644
--- a/deploy/cpp_infer/readme_en.md
+++ b/deploy/cpp_infer/readme_en.md
@@ -1,4 +1,4 @@
-# Server-side C++ inference
+# Server-side C++ Inference
This chapter introduces the C++ deployment method of the PaddleOCR model, and the corresponding python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
C++ is better than python in terms of performance calculation. Therefore, in most CPU and GPU deployment scenarios, C++ deployment is mostly used.
@@ -6,14 +6,14 @@ This section will introduce how to configure the C++ environment and complete it
PaddleOCR model deployment.
-## 1. Prepare the environment
+## 1. Prepare the Environment
### Environment
- Linux, docker is recommended.
-### 1.1 Compile opencv
+### 1.1 Compile OpenCV
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
@@ -73,7 +73,7 @@ opencv3/
|-- share
```
-### 1.2 Compile or download or the Paddle inference library
+### 1.2 Compile or Download or the Paddle Inference Library
* There are 2 ways to obtain the Paddle inference library, described in detail below.
@@ -136,7 +136,7 @@ build/paddle_inference_install_dir/
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
-## 2. Compile and run the demo
+## 2. Compile and Run the Demo
### 2.1 Export the inference model
@@ -183,7 +183,7 @@ or the generated Paddle inference library path (`build/paddle_inference_install_
Execute the built executable file:
```shell
./build/ppocr [--param1] [--param2] [...]
-```
+```
Here, `mode` is a required parameter,and the value range is ['det', 'rec', 'system'], representing using detection only, using recognition only and using the end-to-end system respectively. Specifically,
##### 1. run det demo:
diff --git a/doc/doc_ch/add_new_algorithm.md b/doc/doc_ch/add_new_algorithm.md
index f66e26b4c13ae19460c44d80b85eb253c2accfde..79c29249dd7dd0b25ffa7625d11ed2378bfafec4 100644
--- a/doc/doc_ch/add_new_algorithm.md
+++ b/doc/doc_ch/add_new_algorithm.md
@@ -2,16 +2,18 @@
PaddleOCR将一个算法分解为以下几个部分,并对各部分进行模块化处理,方便快速组合出新的算法。
-* 数据加载和处理
-* 网络
-* 后处理
-* 损失函数
-* 指标评估
-* 优化器
+* [1. 数据加载和处理](#1)
+* [2. 网络](#2)
+* [3. 后处理](#3)
+* [4. 损失函数](#4)
+* [5. 指标评估](#5)
+* [6. 优化器](#6)
下面将分别对每个部分进行介绍,并介绍如何在该部分里添加新算法所需模块。
-## 数据加载和处理
+
+
+## 1. 数据加载和处理
数据加载和处理由不同的模块(module)组成,其完成了图片的读取、数据增强和label的制作。这一部分在[ppocr/data](../../ppocr/data)下。 各个文件及文件夹作用说明如下:
@@ -64,7 +66,9 @@ transforms:
keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
```
-## 网络
+
+
+## 2. 网络
网络部分完成了网络的组网操作,PaddleOCR将网络划分为四部分,这一部分在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones->
necks->heads)依次通过这四个部分。
@@ -123,7 +127,9 @@ Architecture:
args1: args1
```
-## 后处理
+
+
+## 3. 后处理
后处理实现解码网络输出获得文本框或者识别到的文字。这一部分在[ppocr/postprocess](../../ppocr/postprocess)下。
PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的后处理模块,对于没有内置的组件可通过如下步骤添加:
@@ -171,7 +177,9 @@ PostProcess:
args2: args2
```
-## 损失函数
+
+
+## 4. 损失函数
损失函数用于计算网络输出和label之间的距离。这一部分在[ppocr/losses](../../ppocr/losses)下。
PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的损失函数模块,对于没有内置的模块可通过如下步骤添加:
@@ -208,7 +216,9 @@ Loss:
args2: args2
```
-## 指标评估
+
+
+## 5. 指标评估
指标评估用于计算网络在当前batch上的性能。这一部分在[ppocr/metrics](../../ppocr/metrics)下。 PaddleOCR内置了检测,分类和识别等算法相关的指标评估模块,对于没有内置的模块可通过如下步骤添加:
@@ -262,7 +272,9 @@ Metric:
main_indicator: acc
```
-## 优化器
+
+
+## 6. 优化器
优化器用于训练网络。优化器内部还包含了网络正则化和学习率衰减模块。 这一部分在[ppocr/optimizer](../../ppocr/optimizer)下。 PaddleOCR内置了`Momentum`,`Adam`
和`RMSProp`等常用的优化器模块,`Linear`,`Cosine`,`Step`和`Piecewise`等常用的正则化模块与`L1Decay`和`L2Decay`等常用的学习率衰减模块。
diff --git a/doc/doc_ch/angle_class.md b/doc/doc_ch/angle_class.md
index 321b32ba48e599fb6e72f697fa438a3a54e33337..723d0d2ce5a8c25699c085c32004aa827b188fd9 100644
--- a/doc/doc_ch/angle_class.md
+++ b/doc/doc_ch/angle_class.md
@@ -1,14 +1,14 @@
# 文本方向分类器
-- [方法介绍](#方法介绍)
-- [数据准备](#数据准备)
-- [启动训练](#启动训练)
-- [训练](#训练)
-- [评估](#评估)
-- [预测](#预测)
+- [1.方法介绍](#方法介绍)
+- [2.数据准备](#数据准备)
+- [3.启动训练](#启动训练)
+- [4.训练](#训练)
+- [5.评估](#评估)
+- [6.预测](#预测)
-## 方法介绍
+## 1. 方法介绍
文本方向分类器主要用于图片非0度的场景下,在这种场景下需要对图片里检测到的文本行进行一个转正的操作。在PaddleOCR系统内,
文字检测之后得到的文本行图片经过仿射变换之后送入识别模型,此时只需要对文字进行一个0和180度的角度分类,因此PaddleOCR内置的
文本方向分类器**只支持了0和180度的分类**。如果想支持更多角度,可以自己修改算法进行支持。
@@ -18,7 +18,7 @@
![](../imgs_results/angle_class_example.jpg)
-## 数据准备
+## 2. 数据准备
请按如下步骤设置数据集:
@@ -70,7 +70,7 @@ train/cls/train/word_002.jpg 180
| ...
```
-## 启动训练
+## 3. 启动训练
将准备好的txt文件和图片文件夹路径分别写入配置文件的 `Train/Eval.dataset.label_file_list` 和 `Train/Eval.dataset.data_dir` 字段下,`Train/Eval.dataset.data_dir`字段下的路径和文件里记载的图片名构成了图片的绝对路径。
@@ -99,7 +99,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
*由于OpenCV的兼容性问题,扰动操作暂时只支持linux*
-## 训练
+## 4. 训练
PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml` 中修改 `eval_batch_step` 设置评估频率,默认每1000个iter评估一次。训练过程中将会保存如下内容:
```bash
@@ -118,7 +118,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml`
**注意,预测/评估时的配置文件请务必与训练一致。**
-## 评估
+## 5. 评估
评估数据集可以通过修改`configs/cls/cls_mv3.yml`文件里的`Eval.dataset.label_file_list` 字段设置。
@@ -129,7 +129,7 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/
```
-## 预测
+## 6. 预测
* 训练引擎的预测
diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md
index a729b900d4419706c35fa029f163fba3b4afec1e..600d5bdb120444ec89222360af02adb3f96a8640 100644
--- a/doc/doc_ch/config.md
+++ b/doc/doc_ch/config.md
@@ -1,5 +1,11 @@
# 配置文件内容与生成
+* [1. 可选参数列表](#1)
+* [2. 配置文件参数介绍](#2)
+* [3. 多语言配置文件生成](#3)
+
+
+
## 1. 可选参数列表
以下列表可以通过`--help`查看
@@ -9,11 +15,12 @@
| -c | ALL | 指定配置文件 | None | **配置模块说明请参考 参数介绍** |
| -o | ALL | 设置配置文件里的参数内容 | None | 使用-o配置相较于-c选择的配置文件具有更高的优先级。例如:`-o Global.use_gpu=false` |
+
## 2. 配置文件参数介绍
以 `rec_chinese_lite_train_v2.0.yml ` 为例
-### 2.1 Global
+### Global
| 字段 | 用途 | 默认值 | 备注 |
| :----------------------: | :---------------------: | :--------------: | :--------------------: |
@@ -124,6 +131,8 @@
| drop_last | 是否丢弃因数据集样本数不能被 batch_size 整除而产生的最后一个不完整的mini-batch | True | \ |
| num_workers | 用于加载数据的子进程个数,若为0即为不开启子进程,在主进程中进行数据加载 | 8 | \ |
+
+
## 3. 多语言配置文件生成
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
diff --git a/doc/doc_ch/quickstart.md b/doc/doc_ch/quickstart.md
index 8f57489059e7f8ac1fde11d9e5c382e2f7e85a18..1896d7a137f0768c6b2a8e0c02b18ff61fbfd03c 100644
--- a/doc/doc_ch/quickstart.md
+++ b/doc/doc_ch/quickstart.md
@@ -90,10 +90,10 @@ cd /path/to/ppocr_img
```
-如需使用2.0模型,请指定参数`--version 2.0`,paddleocr默认使用2.1模型。更多whl包使用可参考[whl包文档](./whl.md)
-
+如需使用2.0模型,请指定参数`--version PP-OCR`,paddleocr默认使用2.1模型(`--versioin PP-OCRv2`)。更多whl包使用可参考[whl包文档](./whl.md)
+
#### 2.1.2 多语言模型
Paddleocr目前支持80个语种,可以通过修改`--lang`参数进行切换,对于英文模型,指定`--lang=en`。
diff --git a/doc/doc_en/angle_class_en.md b/doc/doc_en/angle_class_en.md
index dd7cc1e4b916b9cdb7f99600710bcb844e790f90..b7fcd63e070318d3aab37714a1213ad9f56cb6fc 100644
--- a/doc/doc_en/angle_class_en.md
+++ b/doc/doc_en/angle_class_en.md
@@ -1,13 +1,14 @@
-# TEXT ANGLE CLASSIFICATION
+# Text Direction Classification
-- [Method Introduction](#method-introduction)
-- [Data Preparation](#data-preparation)
-- [Training](#training)
-- [Evaluation](#evaluation)
-- [Prediction](#prediction)
+- [1. Method Introduction](#method-introduction)
+- [2. Data Preparation](#data-preparation)
+- [3. Training](#training)
+- [4. Evaluation](#evaluation)
+- [5. Prediction](#prediction)
-## Method Introduction
+
+## 1. Method Introduction
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
@@ -16,7 +17,7 @@ Example of 0 and 180 degree data samples:
![](../imgs_results/angle_class_example.jpg)
-## Data Preparation
+## 2. Data Preparation
Please organize the dataset as follows:
@@ -72,7 +73,7 @@ containing all images (test) and a cls_gt_test.txt. The structure of the test se
| ...
```
-## Training
+## 3. Training
Write the prepared txt file and image folder path into the configuration file under the `Train/Eval.dataset.label_file_list` and `Train/Eval.dataset.data_dir` fields, the absolute path of the image consists of the `Train/Eval.dataset.data_dir` field and the image name recorded in the txt file.
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.
@@ -117,7 +118,7 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
-## Evaluation
+## 4. Evaluation
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
@@ -127,7 +128,7 @@ export CUDA_VISIBLE_DEVICES=0
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```
-## Prediction
+## 5. Prediction
* Training engine prediction
diff --git a/doc/doc_en/benchmark_en.md b/doc/doc_en/benchmark_en.md
index 0d3ffaecc5bdffc4adeffdecf98b2978759cb4a5..5d3acd59560b6f966ecaacc3698f830e3bc5b149 100755
--- a/doc/doc_en/benchmark_en.md
+++ b/doc/doc_en/benchmark_en.md
@@ -1,8 +1,8 @@
-# BENCHMARK
+# Benchmark
This document gives the performance of the series models for Chinese and English recognition.
-## TEST DATA
+## Test Data
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
@@ -10,7 +10,7 @@ We collected 300 images for different real application scenarios to evaluate the
-## MEASUREMENT
+## Measurement
Explanation:
diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md
index 4ac6758ff642a58e265e12a0be8308d1fb8251c0..aa78263e4b73a3ac35250e5483a394ab77450c90 100644
--- a/doc/doc_en/config_en.md
+++ b/doc/doc_en/config_en.md
@@ -1,4 +1,12 @@
-## Optional parameter list
+# Configuration
+
+- [1. Optional Parameter List](#1-optional-parameter-list)
+- [2. Intorduction to Global Parameters of Configuration File](#2-intorduction-to-global-parameters-of-configuration-file)
+- [3. Multilingual Config File Generation](#3-multilingual-config-file-generation)
+
+
+
+## 1. Optional Parameter List
The following list can be viewed through `--help`
@@ -7,7 +15,9 @@ The following list can be viewed through `--help`
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
-## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
+
+
+## 2. Intorduction to Global Parameters of Configuration File
Take rec_chinese_lite_train_v2.0.yml as an example
### Global
@@ -121,8 +131,9 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
| drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ |
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
+
-## 3. MULTILINGUAL CONFIG FILE GENERATION
+## 3. Multilingual Config File Generation
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
@@ -187,21 +198,21 @@ Italian is made up of Latin letters, so after executing the command, you will ge
...
character_type: it # language
character_dict_path: {path/of/dict} # path of dict
-
+
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of training data
label_file_list: ["./train_data/train_list.txt"] # train label path
...
-
+
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of val data
label_file_list: ["./train_data/val_list.txt"] # val label path
...
-
+
```
diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md
index 016cf929283c0b8dac5cd1f0b3c808c398186917..14180c6faa01ee2d5ba3e34986dd4a55facc4f25 100644
--- a/doc/doc_en/detection_en.md
+++ b/doc/doc_en/detection_en.md
@@ -1,23 +1,23 @@
-# TEXT DETECTION
+# Text Detection
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
-- [1. DATA AND WEIGHTS PREPARATIO](#1-data-and-weights-preparatio)
- * [1.1 DATA PREPARATION](#11-data-preparation)
- * [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model)
-- [2. TRAINING](#2-training)
- * [2.1 START TRAINING](#21-start-training)
- * [2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING](#22-load-trained-model-and-continue-training)
- * [2.3 TRAINING WITH NEW BACKBONE](#23-training-with-new-backbone)
-- [3. EVALUATION AND TEST](#3-evaluation-and-test)
- * [3.1 EVALUATION](#31-evaluation)
- * [3.2 TEST](#32-test)
-- [4. INFERENCE](#4-inference)
-- [2. FAQ](#2-faq)
+- [1. Data and Weights Preparation](#1-data-and-weights-preparatio)
+ * [1.1 Data Preparation](#11-data-preparation)
+ * [1.2 Download Pretrained Model](#12-download-pretrained-model)
+- [2. Training](#2-training)
+ * [2.1 Start Training](#21-start-training)
+ * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
+ * [2.3 Training with New Backbone](#23-training-with-new-backbone)
+- [3. Evaluation and Test](#3-evaluation-and-test)
+ * [3.1 Evaluation](#31-evaluation)
+ * [3.2 Test](#32-test)
+- [4. Inference](#4-inference)
+- [5. FAQ](#2-faq)
-# 1 DATA AND WEIGHTS PREPARATIO
+## 1. Data and Weights Preparation
-## 1.1 DATA PREPARATION
+### 1.1 Data Preparation
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
@@ -59,7 +59,7 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
-## 1.2 DOWNLOAD PRETRAINED MODEL
+### 1.2 Download Pretrained Model
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
@@ -77,7 +77,7 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg
# 2. TRAINING
-## 2.1 START TRAINING
+### 2.1 Start Training
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```shell
@@ -101,7 +101,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
```
-## 2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING
+### 2.2 Load Trained Model and Continue Training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example:
@@ -112,7 +112,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
-## 2.3 TRAINING WITH NEW BACKBONE
+### 2.3 Training with New Backbone
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).
@@ -162,9 +162,9 @@ After adding the four-part modules of the network, you only need to configure th
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
-# 3. EVALUATION AND TEST
+## 3. Evaluation and Test
-## 3.1 EVALUATION
+### 3.1 Evaluation
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
@@ -179,7 +179,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{pat
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
-## 3.2 TEST
+### 3.2 Test
Test the detection result on a single image:
```shell
@@ -197,7 +197,7 @@ Test the detection result on all images in the folder:
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
```
-# 4. INFERENCE
+## 4. Inference
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
@@ -220,7 +220,7 @@ If it is other detection algorithms, such as the EAST, the det_algorithm paramet
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```
-# 2. FAQ
+## 5. FAQ
Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md
index e30355fb8e29031bd4ce040a86ad0f57d18ce398..b445232feeefadc355e0f38b329050e26ccc0368 100755
--- a/doc/doc_en/inference_en.md
+++ b/doc/doc_en/inference_en.md
@@ -1,5 +1,5 @@
-# Reasoning based on Python prediction engine
+# Inference Based on Python Prediction Engine
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
@@ -10,37 +10,36 @@ For more details, please refer to the document [Classification Framework](https:
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
-- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
- - [Convert detection model to inference model](#Convert_detection_model)
- - [Convert recognition model to inference model](#Convert_recognition_model)
- - [Convert angle classification model to inference model](#Convert_angle_class_model)
+- [1. Convert Training Model to Inference Model](#CONVERT)
+ - [1.1 Convert Detection Model to Inference Model](#Convert_detection_model)
+ - [1.2 Convert Recognition Model to Inference Model](#Convert_recognition_model)
+ - [1.3 Convert Angle Classification Model to Inference Model](#Convert_angle_class_model)
-- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
- - [1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION)
- - [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
- - [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
- - [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
- - [5. Multilingual model inference](#Multilingual model inference)
+- [2. Text Detection Model Inference](#DETECTION_MODEL_INFERENCE)
+ - [2.1 Lightweight Chinese Detection Model Inference](#LIGHTWEIGHT_DETECTION)
+ - [2.2 DB Text Detection Model Inference](#DB_DETECTION)
+ - [2.3 East Text Detection Model Inference](#EAST_DETECTION)
+ - [2.4 Sast Text Detection Model Inference](#SAST_DETECTION)
+
+- [3. Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
+ - [3.1 Lightweight Chinese Text Recognition Model Reference](#LIGHTWEIGHT_RECOGNITION)
+ - [3.2 CTC-Based Text Recognition Model Inference](#CTC-BASED_RECOGNITION)
+ - [3.3 SRN-Based Text Recognition Model Inference](#SRN-BASED_RECOGNITION)
+ - [3.4 Text Recognition Model Inference Using Custom Characters Dictionary](#USING_CUSTOM_CHARACTERS)
+ - [3.5 Multilingual Model Inference](#MULTILINGUAL_MODEL_INFERENCE)
-- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
- - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
- - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
- - [3. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
- - [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
- - [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
+- [4. Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE)
-- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
- - [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
-
-- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
- - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
- - [2. OTHER MODELS](#OTHER_MODELS)
+- [5. Text Detection Angle Classification And Recognition Inference Concatenation](#CONCATENATION)
+ - [5.1 Lightweight Chinese Model](#LIGHTWEIGHT_CHINESE_MODEL)
+ - [5.2 Other Models](#OTHER_MODELS)
-## CONVERT TRAINING MODEL TO INFERENCE MODEL
+## 1. Convert Training Model to Inference Model
-### Convert detection model to inference model
+
+### 1.1 Convert Detection Model to Inference Model
Download the lightweight Chinese detection model:
```
@@ -67,7 +66,7 @@ inference/det_db/
```
-### Convert recognition model to inference model
+### 1.2 Convert Recognition Model to Inference Model
Download the lightweight Chinese recognition model:
```
@@ -95,7 +94,7 @@ inference/det_db/
```
-### Convert angle classification model to inference model
+### 1.3 Convert Angle Classification Model to Inference Model
Download the angle classification model:
```
@@ -122,13 +121,13 @@ inference/det_db/
-## TEXT DETECTION MODEL INFERENCE
+## 2. Text Detection Model Inference
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.
-### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE
+### 2.1 Lightweight Chinese Detection Model Inference
For lightweight Chinese detection model inference, you can execute the following commands:
@@ -163,7 +162,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_di
```
-### 2. DB TEXT DETECTION MODEL INFERENCE
+### 2.2 DB Text Detection Model Inference
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)), you can use the following command to convert:
@@ -184,7 +183,7 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
-### 3. EAST TEXT DETECTION MODEL INFERENCE
+### 2.3 EAST TEXT DETECTION MODEL INFERENCE
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert:
@@ -205,7 +204,7 @@ The visualized text detection results are saved to the `./inference_results` fol
-### 4. SAST TEXT DETECTION MODEL INFERENCE
+### 2.4 Sast Text Detection Model Inference
#### (1). Quadrangle text detection model (ICDAR2015)
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)), you can use the following command to convert:
@@ -243,13 +242,13 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
-## TEXT RECOGNITION MODEL INFERENCE
+## 3. Text Recognition Model Inference
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
-### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
+### 3.1 Lightweight Chinese Text Recognition Model Reference
For lightweight Chinese recognition model inference, you can execute the following commands:
@@ -269,7 +268,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
```
-### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE
+### 3.2 CTC-Based Text Recognition Model Inference
Taking CRNN as an example, we introduce the recognition model inference based on CTC loss. Rosetta and Star-Net are used in a similar way, No need to set the recognition algorithm parameter rec_algorithm.
@@ -292,6 +291,7 @@ After executing the command, the recognition result of the above image is as fol
```bash
Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
```
+
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects:
- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`.
@@ -304,7 +304,7 @@ dict_character = list(self.character_str)
```
-### 3. SRN-BASED TEXT RECOGNITION MODEL INFERENCE
+### 3.3 SRN-Based Text Recognition Model Inference
The recognition model based on SRN requires additional setting of the recognition algorithm parameter
--rec_algorithm="SRN". At the same time, it is necessary to ensure that the predicted shape is consistent
@@ -319,7 +319,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
```
-### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
+### 3.4 Text Recognition Model Inference Using Custom Characters Dictionary
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
```
@@ -327,7 +327,8 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
```
-### 5. MULTILINGAUL MODEL INFERENCE
+
+### 3.5 Multilingual Model Inference
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
@@ -343,13 +344,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
```
-## ANGLE CLASSIFICATION MODEL INFERENCE
-
-The following will introduce the angle classification model inference.
-
-
-
-### 1.ANGLE CLASSIFICATION MODEL INFERENCE
+## 4. Angle Classification Model Inference
For angle classification model inference, you can execute the following commands:
@@ -371,10 +366,10 @@ After executing the command, the prediction results (classification angle and sc
```
-## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
+## 5. Text Detection Angle Classification and Recognition Inference Concatenation
-### 1. LIGHTWEIGHT CHINESE MODEL
+### 5.1 Lightweight Chinese Model
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
@@ -388,14 +383,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --de
# use multi-process
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6
```
-```
+
After executing the command, the recognition result image is as follows:
![](../imgs_results/system_res_00018069.jpg)
-### 2. OTHER MODELS
+### 5.2 Other Models
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model.
diff --git a/doc/doc_en/models_en.md b/doc/doc_en/models_en.md
index 7226f76498eddcb42bc41d63250d12b6f46f94c8..37c4a174563abc68085a103e11e2ddb3bd954714 100644
--- a/doc/doc_en/models_en.md
+++ b/doc/doc_en/models_en.md
@@ -7,15 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_
Let's first understand some basic concepts.
-- [INTRODUCTION ABOUT OCR](#introduction-about-ocr)
- * [Basic concepts of OCR detection model](#basic-concepts-of-ocr-detection-model)
- * [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
- * [PP-OCR model](#pp-ocr-model)
- * [And a table of contents](#and-a-table-of-contents)
- * [On the right](#on-the-right)
+- [Introduction about OCR](#introduction-about-ocr)
+ * [Basic Concepts of OCR Detection Model](#basic-concepts-of-ocr-detection-model)
+ * [Basic Concepts of OCR Recognition Model](#basic-concepts-of-ocr-recognition-model)
+ * [PP-OCR Model](#pp-ocr-model)
-## 1. INTRODUCTION ABOUT OCR
+## 1. Introduction about OCR
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
@@ -24,7 +22,7 @@ OCR (Optical Character Recognition, Optical Character Recognition) is currently
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
-### 1.1 Basic concepts of OCR detection model
+### 1.1 Basic Concepts of OCR Detection Model
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
@@ -34,14 +32,14 @@ Text detection algorithms based on deep learning can be roughly divided into the
3. Hybrid target detection and segmentation method.
-### 1.2 Basic concepts of OCR recognition model
+### 1.2 Basic Concepts of OCR Recognition Model
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
-### 1.3 PP-OCR model
+### 1.3 PP-OCR Model
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
diff --git a/doc/doc_en/paddleOCR_overview_en.md b/doc/doc_en/paddleOCR_overview_en.md
index 403cd99415e08de198270fb5bfe1a43f297c5156..073c3ec889b2f21e9e40f5f7d1d6dc719e3dcac9 100644
--- a/doc/doc_en/paddleOCR_overview_en.md
+++ b/doc/doc_en/paddleOCR_overview_en.md
@@ -36,4 +36,4 @@ If you getting this error `OSError: [WinError 126] The specified module could no
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
-Reference: [Solve shapely installation on windows](
\ No newline at end of file
+Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
\ No newline at end of file
diff --git a/doc/doc_en/quickstart_en.md b/doc/doc_en/quickstart_en.md
index c4fb5068197c8fb655c1e3ddf4aa6143e7d558e2..0055d8f7a89d0d218d001ea94fd4c620de5d037f 100644
--- a/doc/doc_en/quickstart_en.md
+++ b/doc/doc_en/quickstart_en.md
@@ -5,7 +5,7 @@
+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
* [2. Easy-to-Use](#2-easy-to-use)
- + [2.1 Use by command line](#21-use-by-command-line)
+ + [2.1 Use by Command Line](#21-use-by-command-line)
- [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
- [2.1.2 Multi-language Model](#212-multi-language-model)
- [2.1.3 Layout Analysis](#213-layoutAnalysis)
@@ -39,7 +39,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
-### 2.1 Use by command line
+### 2.1 Use by Command Line
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
@@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.990372]
```
-If you need to use the 2.0 model, please specify the parameter `--version 2.0`, paddleocr uses the 2.1 model by default. More whl package usage can be found in [whl package](./whl_en.md)
+If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
#### 2.1.2 Multi-language Model
diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md
index 7ee0cb5fc084e10658fe02b03910431a074e84ce..0d42f3a768da7bf39e0e1512ce61f9d1965da6fe 100644
--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -1,24 +1,23 @@
-# TEXT RECOGNITION
+# Text Recognition
-- [1 DATA PREPARATION](#DATA_PREPARATION)
+- [1. Data Preparation](#DATA_PREPARATION)
- [1.1 Costom Dataset](#Costom_Dataset)
- [1.2 Dataset Download](#Dataset_download)
- [1.3 Dictionary](#Dictionary)
- [1.4 Add Space Category](#Add_space_category)
-- [2 TRAINING](#TRAINING)
+- [2. Training](#TRAINING)
- [2.1 Data Augmentation](#Data_Augmentation)
- [2.2 General Training](#Training)
- [2.3 Multi-language Training](#Multi_language)
-- [3 EVALUATION](#EVALUATION)
+- [3. Evaluation](#EVALUATION)
-- [4 PREDICTION](#PREDICTION)
- - [4.1 Training engine prediction](#Training_engine_prediction)
-- [5 CONVERT TO INFERENCE MODEL](#Inference)
+- [4. Prediction](#PREDICTION)
+- [5. Convert to Inference Model](#Inference)
-## 1 DATA PREPARATION
+## 1. Data Preparation
PaddleOCR supports two data formats:
@@ -37,7 +36,7 @@ mklink /d /train_data/dataset
```
-### 1.1 Costom dataset
+### 1.1 Costom Dataset
If you want to use your own data for training, please refer to the following to organize your data.
@@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
-### 1.2 Dataset download
+### 1.2 Dataset Download
- ICDAR2015
@@ -167,14 +166,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
-### 1.4 Add space category
+### 1.4 Add Space Category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
-## 2 TRAINING
+## 2.Training
### 2.1 Data Augmentation
@@ -363,7 +362,7 @@ Eval:
-## 3 EVALUATION
+## 3. Evalution
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
@@ -373,7 +372,7 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
```
-## 4 PREDICTION
+## 4. Prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script.
@@ -437,7 +436,7 @@ infer_img: doc/imgs_words/ch/word_1.jpg
-## 5 CONVERT TO INFERENCE MODEL
+## 5. Convert to Inference Model
The recognition model is converted to the inference model in the same way as the detection, as follows:
diff --git a/doc/doc_en/training_en.md b/doc/doc_en/training_en.md
index bd82bd279a6849e5ba7c0a0f13f927a7158669c7..cb7996ee0c17a4e964727ccfd193ba65c1415ba3 100644
--- a/doc/doc_en/training_en.md
+++ b/doc/doc_en/training_en.md
@@ -1,14 +1,14 @@
-# MODEL TRAINING
+# Model Training
- [1.Yml Configuration ](#1-Yml-Configuration)
-- [2. Basic concepts](#1-basic-concepts)
- * [2.1 Learning rate](#11-learning-rate)
+- [2. Basic Concepts](#1-basic-concepts)
+ * [2.1 Learning Rate](#11-learning-rate)
* [2.2 Regularization](#12-regularization)
- * [2.3 Evaluation indicators](#13-evaluation-indicators-)
-- [3. Data and vertical scenes](#2-data-and-vertical-scenes)
- * [3.1 Training data](#21-training-data)
- * [3.2 Vertical scene](#22-vertical-scene)
- * [3.3 Build your own data set](#23-build-your-own-data-set)
+ * [2.3 Evaluation Indicators](#13-evaluation-indicators-)
+- [3. Data and Vertical Scenes](#2-data-and-vertical-scenes)
+ * [3.1 Training Data](#21-training-data)
+ * [3.2 Vertical Scene](#22-vertical-scene)
+ * [3.3 Build Your Own Dataset](#23-build-your-own-data-set)
* [4. FAQ](#3-faq)
@@ -18,7 +18,7 @@ At the same time, it will briefly introduce the components of the PaddleOCR mode
-## 1. Yml configuration
+## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
@@ -26,12 +26,12 @@ For the complete configuration file description, please refer to [Configuration
-## 2. Basic concepts
+## 2. Basic Concepts
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
-### 2.1 Learning rate
+### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
@@ -68,7 +68,7 @@ Optimizer:
factor: 2.0e-05
```
-### 2.3 Evaluation indicators
+### 2.3 Evaluation Indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
@@ -78,11 +78,11 @@ Optimizer:
-## 3. Data and vertical scenes
+## 3. Data and Vertical Scenes
-### 3.1 Training data
+### 3.1 Training Data
The current open source models, data sets and magnitudes are as follows:
@@ -99,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
-### 3.2 Vertical scene
+### 3.2 Vertical Scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
-### 3.3 Build your own data set
+### 3.3 Build Your Own Dataset
There are several experiences for reference when constructing the data set:
diff --git a/ppstructure/table/README_ch.md b/ppstructure/table/README_ch.md
index e580debaebd2425786e84bedb13301c2f0bb09d3..2e90ad33423da347b5a51444f2be53ed2eb67a7a 100644
--- a/ppstructure/table/README_ch.md
+++ b/ppstructure/table/README_ch.md
@@ -1,6 +1,16 @@
# 表格识别
+* [1. 表格识别 pipeline](#1)
+* [2. 性能](#2)
+* [3. 使用](#3)
+ + [3.1 快速开始](#31)
+ + [3.2 训练](#32)
+ + [3.3 评估](#33)
+ + [3.4 预测](#34)
+
+
## 1. 表格识别 pipeline
+
表格识别主要包含三个模型
1. 单行文本检测-DB
2. 单行文本识别-CRNN
@@ -17,6 +27,8 @@
3. 由单行文字的坐标、识别结果和单元格的坐标一起组合出单元格的识别结果。
4. 单元格的识别结果和表格结构一起构造表格的html字符串。
+
+
## 2. 性能
我们在 PubTabNet[1] 评估数据集上对算法进行了评估,性能如下
@@ -26,8 +38,9 @@
| EDD[2] | 88.3 |
| Ours | 93.32 |
+
## 3. 使用
-
+
### 3.1 快速开始
```python
@@ -48,7 +61,7 @@ python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_ta
运行完成后,每张图片的excel表格会保存到output字段指定的目录下
note: 上述模型是在 PubLayNet 数据集上训练的表格识别模型,仅支持英文扫描场景,如需识别其他场景需要自己训练模型后替换 `det_model_dir`,`rec_model_dir`,`table_model_dir`三个字段即可。
-
+
### 3.2 训练
在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)和[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。
@@ -75,7 +88,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./yo
**注意**:`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。
-
+
### 3.3 评估
表格使用 [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) 作为模型的评估指标。在进行模型评估之前,需要将pipeline中的三个模型分别导出为inference模型(我们已经提供好),还需要准备评估的gt, gt示例如下:
@@ -100,7 +113,7 @@ python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_di
```bash
teds: 93.32
```
-
+
### 3.4 预测
```python