@@ -151,7 +151,7 @@ Due to the limited model structure supported by the MKLDNN acceleration library,
...
@@ -151,7 +151,7 @@ Due to the limited model structure supported by the MKLDNN acceleration library,
3. The experiment found that the prediction speed of the Global Mixing Block is related to the shape of the input features. Therefore, after moving the position of the Global Mixing Block to the back of pooling layer, the accuracy dropped to 71.9%, and the speed surpassed the PP-OCRv2-baseline based on the CNN structure by 22%. The network structure is as follows:
3. The experiment found that the prediction speed of the Global Mixing Block is related to the shape of the input features. Therefore, after moving the position of the Global Mixing Block to the back of pooling layer, the accuracy dropped to 71.9%, and the speed surpassed the PP-OCRv2-baseline based on the CNN structure by 22%. The network structure is as follows:
<divalign="center">
<divalign="center">
<imgsrc="../ppocr_v3/LCNet_SVTR.png"width=800>
<imgsrc="../ppocr_v3/LCNet_SVTR_en.png"width=800>
</div>
</div>
The ablation experiments are as follows:
The ablation experiments are as follows:
...
@@ -172,7 +172,7 @@ Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320
...
@@ -172,7 +172,7 @@ Note: When testing the speed, the input image shape of 01-05 are all (3, 32, 320
[GTC](https://arxiv.org/pdf/2002.01276.pdf)(Guided Training of CTC), using the Attention module to guide the training of CTC to fuse multiple features is an effective strategy to improve text recognition accuracy. No more time-consuming is added in the inference process as the Attention module is completely removed during prediction. The accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:
[GTC](https://arxiv.org/pdf/2002.01276.pdf)(Guided Training of CTC), using the Attention module to guide the training of CTC to fuse multiple features is an effective strategy to improve text recognition accuracy. No more time-consuming is added in the inference process as the Attention module is completely removed during prediction. The accuracy of the recognition model is further improved to 75.8% (+1.82%). The training process is as follows:
<divalign="center">
<divalign="center">
<imgsrc="../ppocr_v3/GTC.png"width=800>
<imgsrc="../ppocr_v3/GTC_en.png"width=800>
</div>
</div>
**(3)TextConAug:Data Augmentation Strategy for Mining Text Context Information**
**(3)TextConAug:Data Augmentation Strategy for Mining Text Context Information**