-[Text detection model training/evaluation/prediction](./doc/detection_en.md)
-[Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
-[Text recognition model training/evaluation/prediction](./doc/recognition_en.md)
-[Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
-[Inference](./doc/inference_en.md)
-[Inference](./doc/doc_en/inference_en.md)
-[Dataset](./doc/datasets_en.md)
-[Dataset](./doc/doc_en/datasets_en.md)
## Text detection algorithm
## Text detection algorithm
...
@@ -121,7 +123,7 @@ For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/dat
...
@@ -121,7 +123,7 @@ For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/dat
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection.md)
## Text recognition algorithm
## Text recognition algorithm
...
@@ -194,10 +196,10 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
...
@@ -194,10 +196,10 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
[more](./doc/FAQ_en.md)
[more](./doc/doc_en/FAQ_en.md)
## Welcome to the PaddleOCR technical exchange group
## Welcome to the PaddleOCR technical exchange group
Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~
WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~
@@ -17,7 +17,7 @@ Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be releas
...
@@ -17,7 +17,7 @@ Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be releas
6.**How to run on Windows or Mac?**
6.**How to run on Windows or Mac?**
PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation:
PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation:
1. In [Quick installation](installation.md), if you do not want to install docker, you can skip the first step and start with the second step.
1. In [Quick installation](./installation_en.md), if you do not want to install docker, you can skip the first step and start with the second step.
2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory.
2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory.
7.**The difference between ultra-lightweight model and General OCR model**
7.**The difference between ultra-lightweight model and General OCR model**
...
@@ -42,4 +42,4 @@ At present, the open source model, dataset and magnitude are as follows:
...
@@ -42,4 +42,4 @@ At present, the open source model, dataset and magnitude are as follows:
English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions.
English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions.
Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](datasets.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md)
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)
## step2: Train text recognition model
## step2: Train text recognition model
...
@@ -16,7 +16,7 @@ PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, an
...
@@ -16,7 +16,7 @@ PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, an
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md)
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)
## step3: Concatenate predictions
## step3: Concatenate predictions
...
@@ -27,4 +27,4 @@ When performing prediction, you need to specify the path of a single image or a
...
@@ -27,4 +27,4 @@ When performing prediction, you need to specify the path of a single image or a
-**Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
-**Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
@@ -22,7 +22,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize
...
@@ -22,7 +22,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize
#### 2. ICDAR2017-RCTW-17
#### 2. ICDAR2017-RCTW-17
-**Data sources**:https://rctw.vlrlab.net/
-**Data sources**:https://rctw.vlrlab.net/
-**Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
-**Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
-**Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
-**Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
- 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt )
- 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt )
- Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus
- Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus
-**Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved.
-**Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved.
In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file.
In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file.
For a detailed explanation of the configuration file, please refer to [link](./doc/config-en.md).
For a detailed explanation of the configuration file, please refer to [link](./config_en.md).
You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](imgs_results/det_res_2.jpg)
![](../imgs_results/det_res_2.jpg)
By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:
By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_db.jpg)
![](../imgs_results/det_res_img_10_db.jpg)
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_east.jpg)
![](../imgs_results/det_res_img_10_east.jpg)
**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup.
**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup.
...
@@ -139,7 +139,7 @@ For ultra-lightweight Chinese recognition model inference, you can execute the f
...
@@ -139,7 +139,7 @@ For ultra-lightweight Chinese recognition model inference, you can execute the f