Merge pull request #181 from tink2123/update_faq

add tps instructions in FAQ

Merge pull request #181 from tink2123/update_faq
add tps instructions in FAQ
a03f9e14 · xiaoting · GitHub · 35387291 · 19c8ce7c · a03f9e14
隐藏空白更改
内联并排

Showing with 13 addition and 2 deletion

doc/doc_ch/FAQ.md doc/doc_ch/FAQ.md +6 -1

doc/doc_en/FAQ_en.md doc/doc_en/FAQ_en.md +7 -1

未找到文件。
--- a/doc/doc_ch/FAQ.md
+++ b/doc/doc_ch/FAQ.md
@@ -4,7 +4,7 @@
 安装的paddle版本不对，目前本项目仅支持paddle1.7，近期会适配到1.8。
 2. **转换attention识别模型时报错：KeyError: 'predict'**  
-基于Attention损失的识别模型推理还在调试中。对于中文文本识别，建议优先选择基于CTC损失的识别模型，实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。
+问题已解决，请更新到最新代码。
 3. **关于推理速度**  
 图片中的文字较多时，预测时间会增，可以使用--rec_batch_num设置更小预测batch num，默认值为30，可以改为10或其他数值。
@@ -41,3 +41,8 @@ PaddleOCR已完成Windows和Mac系统适配，运行时注意两点：1、在[
    中文数据集，LSVT街景数据集根据真值将图crop出来，并进行位置校准，总共30w张图像。此外基于LSVT的语料，合成数据500w。
    其中，公开数据集都是开源的，用户可自行搜索下载，也可参考[中文数据集](./datasets.md)，合成数据暂不开源，用户可使用开源合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
+10. **使用带TPS的识别模型预测报错**
+报错信息：Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100)
+原因：TPS模块暂时无法支持变长的输入，请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape
--- a/doc/doc_en/FAQ_en.md
+++ b/doc/doc_en/FAQ_en.md
@@ -4,7 +4,7 @@
 The installed version of paddle is incorrect. Currently, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.
 2. **Error when converting attention recognition model: KeyError: 'predict'**  
-The inference of recognition model based on attention loss is still in debugging. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as that based on CTC loss.
+Solved. Please update to the latest version of the code.
 3. **About inference speed**  
 When there are many words in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch num. The default value is 30, which can be changed to 10 or other values.
@@ -43,3 +43,9 @@ At present, the open source model, dataset and magnitude are as follows:
    Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
    Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
+10. **Error in using the model with TPS module for prediction**
+Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100)
+Solution：TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'