@@ -122,8 +122,7 @@ For a new language request, please refer to [Guideline for new language_requests
...
@@ -122,8 +122,7 @@ For a new language request, please refer to [Guideline for new language_requests
<imgsrc="./doc/ppocr_framework.png"width="800">
<imgsrc="./doc/ppocr_framework.png"width="800">
</div>
</div>
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner and PACT quantization is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection[2], detection frame correction and CRNN text recognition[7]. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner [8] and PACT quantization [9] is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
The Style-Text data synthesis tool is a tool based on Baidu's self-developed text editing algorithm "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).
The Style-Text data synthesis tool is a tool based on Baidu and HUST cooperation research work, "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).
Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
* (1) Text foreground style transfer module.
* (1) Text foreground style transfer module.
...
@@ -69,12 +69,14 @@ fusion_generator:
...
@@ -69,12 +69,14 @@ fusion_generator:
1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder.
1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder.
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
* Note 3: You can modify `use_gpu` in `configs/config.yml` to determine whether to use GPU for prediction.
For example, enter the following image and corpus `PaddleOCR`.
For example, enter the following image and corpus `PaddleOCR`.
...
@@ -122,7 +124,7 @@ In actual application scenarios, it is often necessary to synthesize pictures in
...
@@ -122,7 +124,7 @@ In actual application scenarios, it is often necessary to synthesize pictures in
*`corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time.
*`corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time.
Example of corpus file:
Example of corpus file:
```
```
PaddleOCR
PaddleOCR
飞桨文字识别
飞桨文字识别
...
@@ -139,9 +141,10 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
...
@@ -139,9 +141,10 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
2. You can run the following command to start synthesis task:
2. You can run the following command to start synthesis task:
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
* After the compilation process, you can see the following files in the folder of `build/fluid_inference_install_dir/`.
* After the compilation process, you can see the following files in the folder of `build/paddle_inference_install_dir/`.
```
```
build/fluid_inference_install_dir/
build/paddle_inference_install_dir/
|-- CMakeCache.txt
|-- CMakeCache.txt
|-- paddle
|-- paddle
|-- third_party
|-- third_party
...
@@ -130,10 +130,10 @@ Among them, `paddle` is the Paddle library required for C++ prediction later, an
...
@@ -130,10 +130,10 @@ Among them, `paddle` is the Paddle library required for C++ prediction later, an
* After downloading, use the following method to uncompress.
* After downloading, use the following method to uncompress.
```
```
tar -xf fluid_inference.tgz
tar -xf paddle_inference.tgz
```
```
Finally you can see the following files in the folder of `fluid_inference/`.
Finally you can see the following files in the folder of `paddle_inference/`.
## 2. Compile and run the demo
## 2. Compile and run the demo
...
@@ -145,11 +145,11 @@ Finally you can see the following files in the folder of `fluid_inference/`.
...
@@ -145,11 +145,11 @@ Finally you can see the following files in the folder of `fluid_inference/`.
```
```
inference/
inference/
|-- det_db
|-- det_db
| |--model
| |--inference.pdparams
| |--params
| |--inference.pdimodel
|-- rec_rcnn
|-- rec_rcnn
| |--model
| |--inference.pdparams
| |--params
| |--inference.pdparams
```
```
...
@@ -188,7 +188,9 @@ cmake .. \
...
@@ -188,7 +188,9 @@ cmake .. \
make -j
make -j
```
```
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`fluid_inference` folder) or the generated Paddle inference library path (`build/fluid_inference_install_dir` folder); `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`paddle_inference` folder)
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
`CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
* After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
...
@@ -211,7 +213,6 @@ gpu_id 0 # GPU id when use_gpu is 1
...
@@ -211,7 +213,6 @@ gpu_id 0 # GPU id when use_gpu is 1
gpu_mem 4000 # GPU memory requested
gpu_mem 4000 # GPU memory requested
cpu_math_library_num_threads 10 # Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed
cpu_math_library_num_threads 10 # Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed
use_mkldnn 1 # Whether to use mkdlnn library
use_mkldnn 1 # Whether to use mkdlnn library
use_zero_copy_run 1 # Whether to use use_zero_copy_run for inference
max_side_len 960 # Limit the maximum image height and width to 960
max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
...
@@ -244,4 +245,4 @@ The detection results will be shown on the screen, which is as follows.
...
@@ -244,4 +245,4 @@ The detection results will be shown on the screen, which is as follows.
### 2.3 Notes
### 2.3 Notes
* Paddle2.0.0-beta0 inference model library is recommanded for this tuturial.
* Paddle2.0.0-beta0 inference model library is recommended for this toturial.
**A**:超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中,比较经典的有SRCNN,另外CVPR2020也有一篇超分辨率的工作可以参考文章:Unpaired Image Super-Resolution using Pseudo-Supervision,但是没有充分的实践验证过,需要看实际场景下的效果。
**A**:超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中,比较经典的有SRCNN,另外CVPR2020也有一篇超分辨率的工作可以参考文章:Unpaired Image Super-Resolution using Pseudo-Supervision,但是没有充分的实践验证过,需要看实际场景下的效果。
#### Q3.1.23: ImportError: /usr/lib/x86_64_linux-gnu/libstdc++.so.6:version `CXXABI_1.3.11` not found (required by /usr/lib/python3.6/site-package/paddle/fluid/core+avx.so)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices',
description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices',