diff --git a/configs/rec/multi_language/rec_en_number_lite_train.yml b/configs/rec/multi_language/rec_en_number_lite_train.yml
index cee0512114fe9d488004a71cf6f0a0409822a4b5..13eda8481cad8ca308cd0629214b52146c3ebf13 100644
--- a/configs/rec/multi_language/rec_en_number_lite_train.yml
+++ b/configs/rec/multi_language/rec_en_number_lite_train.yml
@@ -16,7 +16,7 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict/en_dict.txt
- character_type: ch
+ character_type: EN
max_text_length: 25
infer_mode: False
use_space_char: False
diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md
index ab5487037e69d40e38dde96fc8006022054f31df..c4601e1526d29e0a8c62030a4b47d2b2cc193d5d 100755
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md
@@ -306,10 +306,10 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
### 4. 多语言模型的推理
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
-需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
+需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别:
```
-python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
+python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
```
![](../imgs_words/korean/1.jpg)
diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md
index b473f3ac0a5007dee6ac5773e2b989454d4b8983..c5f459bdb88558b1cdea93b9b85eed0e4bb8433b 100644
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -195,8 +195,6 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: |
| [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| [rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc |
-| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
-| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
@@ -272,16 +270,109 @@ Eval:
- 小语种
-PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有:
+PaddleOCR目前已支持26种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
-| 配置文件 | 算法名称 | backbone | trans | seq | pred | language |
-| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
-| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 |
-| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 |
-| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 |
-| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
-| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
+您有两种方式创建所需的配置文件:
+1. 通过脚本自动生成
+
+[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) 可以帮助您生成多语言模型的配置文件
+
+- 以意大利语为例,如果您的数据是按如下格式准备的:
+ ```
+ |-train_data
+ |- it_train.txt # 训练集标签
+ |- it_val.txt # 验证集标签
+ |- data
+ |- word_001.jpg
+ |- word_002.jpg
+ |- word_003.jpg
+ | ...
+ ```
+
+ 可以使用默认参数,生成配置文件:
+
+ ```bash
+ # 该代码需要在指定目录运行
+ cd PaddleOCR/configs/rec/multi_language/
+ # 通过-l或者--language参数设置需要生成的语种的配置文件,该命令会将默认参数写入配置文件
+ python3 generate_multi_language_configs.py -l it
+ ```
+
+- 如果您的数据放置在其他位置,或希望使用自己的字典,可以通过指定相关参数来生成配置文件:
+
+ ```bash
+ # -l或者--language字段是必须的
+ # --train修改训练集,--val修改验证集,--data_dir修改数据集目录,--dict修改字典路径, -o修改对应默认参数
+ cd PaddleOCR/configs/rec/multi_language/
+ python3 generate_multi_language_configs.py -l it \ # 语种
+ --train {path/of/train_label.txt} \ # 训练标签文件的路径
+ --val {path/of/val_label.txt} \ # 验证集标签文件的路径
+ --data_dir {train_data/path} \ # 训练数据的根目录
+ --dict {path/of/dict} \ # 字典文件路径
+ -o Global.use_gpu=False # 是否使用gpu
+ ...
+
+ ```
+
+2. 手动修改配置文件
+
+ 您也可以手动修改模版中的以下几个字段:
+
+ ```
+ Global:
+ use_gpu: True
+ epoch_num: 500
+ ...
+ character_type: it # 需要识别的语种
+ character_dict_path: {path/of/dict} # 字典文件所在路径
+
+ Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: train_data/ # 数据存放根目录
+ label_file_list: ["./train_data/train_list.txt"] # 训练集label路径
+ ...
+
+ Eval:
+ dataset:
+ name: SimpleDataSet
+ data_dir: train_data/ # 数据存放根目录
+ label_file_list: ["./train_data/val_list.txt"] # 验证集label路径
+ ...
+
+ ```
+
+目前PaddleOCR支持的多语言算法有:
+
+| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type |
+| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
+| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | chinese_cht|
+| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语(区分大小写) | EN |
+| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | french |
+| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | german |
+| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | japan |
+| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 | korean |
+| rec_it_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 意大利语 | it |
+| rec_xi_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 西班牙语 | xi |
+| rec_pu_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 葡萄牙语 | pu |
+| rec_ru_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 俄罗斯语 | ru |
+| rec_ar_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 阿拉伯语 | ar |
+| rec_hi_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 印地语 | hi |
+| rec_ug_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 维吾尔语 | ug |
+| rec_fa_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 波斯语 | fa |
+| rec_ur_ite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 乌尔都语 | ur |
+| rec_rs_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 塞尔维亚(latin)语 | rs |
+| rec_oc_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 欧西坦语 | oc |
+| rec_mr_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 马拉地语 | mr |
+| rec_ne_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 尼泊尔语 | ne |
+| rec_rsc_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 塞尔维亚(cyrillic)语 | rsc |
+| rec_bg_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 保加利亚语 | bg |
+| rec_uk_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 乌克兰语 | uk |
+| rec_be_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 白俄罗斯语 | be |
+| rec_te_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 泰卢固语 | te |
+| rec_ka_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 卡纳达语 | ka |
+| rec_ta_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 泰米尔语 | ta |
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载,提取码:frgi。
diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md
index 98e3ef6378480022baaf6e82843294dab3fbcaf4..ccbb71847d5946e854b88817a162957af0e6ed00 100755
--- a/doc/doc_en/inference_en.md
+++ b/doc/doc_en/inference_en.md
@@ -315,10 +315,10 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
### 4. MULTILINGAUL MODEL INFERENCE
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
-You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
+You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
```
-python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
+python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
```
![](../imgs_words/korean/1.jpg)
diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md
index 7723d20b9f982bc2c121965f4cd7996c81aa42d5..22f89cdef080afe0b119d08d1e88f02ede5932c1 100644
--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -266,15 +266,116 @@ Eval:
- Multi-language
-PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are:
-
-| Configuration file | Algorithm name | backbone | trans | seq | pred | language |
-| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
-| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English |
-| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French |
-| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German |
-| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
-| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
+PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
+provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
+
+There are two ways to create the required configuration file::
+
+1. Automatically generated by script
+
+[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models
+
+- Take Italian as an example, if your data is prepared in the following format:
+ ```
+ |-train_data
+ |- it_train.txt # train_set label
+ |- it_val.txt # val_set label
+ |- data
+ |- word_001.jpg
+ |- word_002.jpg
+ |- word_003.jpg
+ | ...
+ ```
+
+ You can use the default parameters to generate a configuration file:
+
+ ```bash
+ # The code needs to be run in the specified directory
+ cd PaddleOCR/configs/rec/multi_language/
+ # Set the configuration file of the language to be generated through the -l or --language parameter.
+ # This command will write the default parameters into the configuration file
+ python3 generate_multi_language_configs.py -l it
+ ```
+
+- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:
+
+ ```bash
+ # -l or --language field is required
+ # --train to modify the training set
+ # --val to modify the validation set
+ # --data_dir to modify the data set directory
+ # --dict to modify the dict path
+ # -o to modify the corresponding default parameters
+ cd PaddleOCR/configs/rec/multi_language/
+ python3 generate_multi_language_configs.py -l it \ # language
+ --train {path/of/train_label.txt} \ # path of train_label
+ --val {path/of/val_label.txt} \ # path of val_label
+ --data_dir {train_data/path} \ # root directory of training data
+ --dict {path/of/dict} \ # path of dict
+ -o Global.use_gpu=False # whether to use gpu
+ ...
+
+ ```
+
+2. Manually modify the configuration file
+
+ You can also manually modify the following fields in the template:
+
+ ```
+ Global:
+ use_gpu: True
+ epoch_num: 500
+ ...
+ character_type: it # language
+ character_dict_path: {path/of/dict} # path of dict
+
+ Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: train_data/ # root directory of training data
+ label_file_list: ["./train_data/train_list.txt"] # train label path
+ ...
+
+ Eval:
+ dataset:
+ name: SimpleDataSet
+ data_dir: train_data/ # root directory of val data
+ label_file_list: ["./train_data/val_list.txt"] # val label path
+ ...
+
+ ```
+
+Currently, the multi-language algorithms supported by PaddleOCR are:
+
+| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type |
+| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
+| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht|
+| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN |
+| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french |
+| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german |
+| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan |
+| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean |
+| rec_it_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Italian | it |
+| rec_xi_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Spanish | xi |
+| rec_pu_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Portuguese | pu |
+| rec_ru_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Russia | ru |
+| rec_ar_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Arabic | ar |
+| rec_hi_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Hindi | hi |
+| rec_ug_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Uyghur | ug |
+| rec_fa_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Persian(Farsi) | fa |
+| rec_ur_ite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Urdu | ur |
+| rec_rs_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Serbian(latin) | rs |
+| rec_oc_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Occitan | oc |
+| rec_mr_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Marathi | mr |
+| rec_ne_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Nepali | ne |
+| rec_rsc_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Serbian(cyrillic) | rsc |
+| rec_bg_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Bulgarian | bg |
+| rec_uk_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Ukranian | uk |
+| rec_be_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Belarusian | be |
+| rec_te_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Telugu | te |
+| rec_ka_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Kannada | ka |
+| rec_ta_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Tamil | ta |
+
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
diff --git a/doc/fonts/chinese_cht.TTF b/doc/fonts/chinese_cht.TTF
deleted file mode 100644
index fae13bc26d07c124441232b611a4a2c72e2c7868..0000000000000000000000000000000000000000
Binary files a/doc/fonts/chinese_cht.TTF and /dev/null differ
diff --git a/doc/fonts/chinese_cht.ttf b/doc/fonts/chinese_cht.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..3416754fd35aecd6eb0d9acfc730ae10a408bffd
Binary files /dev/null and b/doc/fonts/chinese_cht.ttf differ
diff --git a/ppocr/data/imaug/label_ops.py b/ppocr/data/imaug/label_ops.py
index af3308a553b40b61fb206a8970038e91b718c56f..14c1cc9c60989300c86e9965e68c4c663d2425d9 100644
--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@@ -18,6 +18,7 @@ from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
+import string
class ClsLabelEncode(object):
@@ -92,7 +93,10 @@ class BaseRecLabelEncode(object):
character_type='ch',
use_space_char=False):
support_character_type = [
- 'ch', 'en', 'en_sensitive', 'french', 'german', 'japan', 'korean'
+ 'ch', 'en', 'EN_symbol', 'french', 'german', 'japan', 'korean',
+ 'EN', 'it', 'xi', 'pu', 'ru', 'ar', 'ta', 'ug', 'fa', 'ur', 'rs',
+ 'oc', 'rsc', 'bg', 'uk', 'be', 'te', 'ka', 'chinese_cht', 'hi',
+ 'mr', 'ne'
]
assert character_type in support_character_type, "Only {} are supported now but get {}".format(
support_character_type, character_type)
@@ -101,9 +105,14 @@ class BaseRecLabelEncode(object):
if character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
- elif character_type in ["ch", "french", "german", "japan", "korean"]:
+ elif character_type == "EN_symbol":
+ # same with ASTER setting (use 94 char).
+ self.character_str = string.printable[:-6]
+ dict_character = list(self.character_str)
+ elif character_type in support_character_type:
self.character_str = ""
- assert character_dict_path is not None, "character_dict_path should not be None when character_type is ch"
+ assert character_dict_path is not None, "character_dict_path should not be None when character_type is {}".format(
+ character_type)
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
@@ -112,11 +121,6 @@ class BaseRecLabelEncode(object):
if use_space_char:
self.character_str += " "
dict_character = list(self.character_str)
- elif character_type == "en_sensitive":
- # same with ASTER setting (use 94 char).
- import string
- self.character_str = string.printable[:-6]
- dict_character = list(self.character_str)
self.character_type = character_type
dict_character = self.add_special_char(dict_character)
self.dict = {}
diff --git a/ppocr/postprocess/rec_postprocess.py b/ppocr/postprocess/rec_postprocess.py
index 4d078994ad6b0020280b8a7ec5eec3626e7075cc..65ed467191caa7f2093859ff35a20a9ba6a9a08e 100644
--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
+import string
import paddle
from paddle.nn import functional as F
@@ -24,9 +25,10 @@ class BaseRecLabelDecode(object):
character_type='ch',
use_space_char=False):
support_character_type = [
- 'ch', 'en', 'en_sensitive', 'french', 'german', 'japan', 'korean', 'it',
- 'xi', 'pu', 'ru', 'ar', 'ta', 'ug', 'fa', 'ur', 'rs', 'oc', 'rsc', 'bg',
- 'uk', 'be', 'te', 'ka', 'chinese_cht', 'hi', 'mr', 'ne'
+ 'ch', 'en', 'EN_symbol', 'french', 'german', 'japan', 'korean',
+ 'it', 'xi', 'pu', 'ru', 'ar', 'ta', 'ug', 'fa', 'ur', 'rs', 'oc',
+ 'rsc', 'bg', 'uk', 'be', 'te', 'ka', 'chinese_cht', 'hi', 'mr',
+ 'ne', 'EN'
]
assert character_type in support_character_type, "Only {} are supported now but get {}".format(
support_character_type, character_type)
@@ -34,9 +36,14 @@ class BaseRecLabelDecode(object):
if character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
- elif character_type in ["ch", "french", "german", "japan", "korean"]:
+ elif character_type == "EN_symbol":
+ # same with ASTER setting (use 94 char).
+ self.character_str = string.printable[:-6]
+ dict_character = list(self.character_str)
+ elif character_type in support_character_type:
self.character_str = ""
- assert character_dict_path is not None, "character_dict_path should not be None when character_type is ch"
+ assert character_dict_path is not None, "character_dict_path should not be None when character_type is {}".format(
+ character_type)
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
@@ -45,11 +52,7 @@ class BaseRecLabelDecode(object):
if use_space_char:
self.character_str += " "
dict_character = list(self.character_str)
- elif character_type == "en_sensitive":
- # same with ASTER setting (use 94 char).
- import string
- self.character_str = string.printable[:-6]
- dict_character = list(self.character_str)
+
else:
raise NotImplementedError
self.character_type = character_type
diff --git a/tools/infer/utility.py b/tools/infer/utility.py
index 966fa3cc4c8c4e721fa83e440c9c6181937c7e96..4171a29bdd4194813638b72f0aae015da48fbcb1 100755
--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -70,7 +70,7 @@ def parse_args():
default="./ppocr/utils/ppocr_keys_v1.txt")
parser.add_argument("--use_space_char", type=str2bool, default=True)
parser.add_argument(
- "--vis_font_path", type=str, default="./doc/simfang.ttf")
+ "--vis_font_path", type=str, default="./doc/fonts/simfang.ttf")
parser.add_argument("--drop_score", type=float, default=0.5)
# params for text classifier