From 19c8ce7c1142405fb1b961a988da9a6033df69cb Mon Sep 17 00:00:00 2001 From: tink2123 Date: Thu, 11 Jun 2020 11:15:34 +0800 Subject: [PATCH] add tps instructions in FAQ --- doc/doc_ch/FAQ.md | 7 ++++++- doc/doc_en/FAQ_en.md | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index f734e4df..2bae57ab 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -4,7 +4,7 @@ 安装的paddle版本不对,目前本项目仅支持paddle1.7,近期会适配到1.8。 2. **转换attention识别模型时报错:KeyError: 'predict'** -基于Attention损失的识别模型推理还在调试中。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。 +问题已解决,请更新到最新代码。 3. **关于推理速度** 图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 @@ -41,3 +41,8 @@ PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[ 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 + +10. **使用带TPS的识别模型预测报错** + +报错信息:Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100) +原因:TPS模块暂时无法支持变长的输入,请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape diff --git a/doc/doc_en/FAQ_en.md b/doc/doc_en/FAQ_en.md index 9e426486..cdbc6bf7 100644 --- a/doc/doc_en/FAQ_en.md +++ b/doc/doc_en/FAQ_en.md @@ -4,7 +4,7 @@ The installed version of paddle is incorrect. Currently, this project only supports paddle1.7, which will be adapted to 1.8 in the near future. 2. **Error when converting attention recognition model: KeyError: 'predict'** -The inference of recognition model based on attention loss is still in debugging. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as that based on CTC loss. +Solved. Please update to the latest version of the code. 3. **About inference speed** When there are many words in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch num. The default value is 30, which can be changed to 10 or other values. @@ -43,3 +43,9 @@ At present, the open source model, dataset and magnitude are as follows: Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. + +10. **Error in using the model with TPS module for prediction** + +Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100) + +Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en' -- GitLab