diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md index 74b6086837148260742417b9471ac2dc4efeab9e..9ab25653e219c18e1acaaf7c99b050f790bcb1b9 100644 --- a/doc/doc_en/PP-OCRv3_introduction_en.md +++ b/doc/doc_en/PP-OCRv3_introduction_en.md @@ -99,7 +99,7 @@ Considering that the features of some channels will be suppressed if the convolu ## 3. Optimization for Text Recognition Model -The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability. +The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability. The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model. @@ -151,7 +151,7 @@ Due to the limited model structure supported by the MKLDNN acceleration library, 3. The experiment found that the prediction speed of the Global Mixing Block is related to the shape of the input features. Therefore, after moving the position of the Global Mixing Block to the back of pooling layer, the accuracy dropped to 71.9%, and the speed surpassed the PP-OCRv2-baseline based on the CNN structure by 22%. The network structure is as follows:
- +
The ablation experiments are as follows: @@ -172,7 +172,7 @@ Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320 [GTC](https://arxiv.org/pdf/2002.01276.pdf) (Guided Training of CTC), using the Attention module to guide the training of CTC to fuse multiple features is an effective strategy to improve text recognition accuracy. No more time-consuming is added in the inference process as the Attention module is completely removed during prediction. The accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:
- +
**(3)TextConAug:Data Augmentation Strategy for Mining Text Context Information** diff --git a/doc/ppocr_v3/GTC_en.png b/doc/ppocr_v3/GTC_en.png new file mode 100644 index 0000000000000000000000000000000000000000..a1a7fc52505f3f7f84f484fb1ee07d462e9e0648 Binary files /dev/null and b/doc/ppocr_v3/GTC_en.png differ diff --git a/doc/ppocr_v3/LCNet_SVTR_en.png b/doc/ppocr_v3/LCNet_SVTR_en.png new file mode 100644 index 0000000000000000000000000000000000000000..7890448470957cc7866a0b4e2cd09c36a788e213 Binary files /dev/null and b/doc/ppocr_v3/LCNet_SVTR_en.png differ