提交 9b6abc12 编写于 作者: T tink2123

add dict and corpus folder

上级 d75761b1
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: french character_type: french
character_dict_path: ./ppocr/utils/french_dict.txt character_dict_path: ./ppocr/utils/dict/french_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: german character_type: german
character_dict_path: ./ppocr/utils/german_dict.txt character_dict_path: ./ppocr/utils/dict/german_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: japan character_type: japan
character_dict_path: ./ppocr/utils/japan_dict.txt character_dict_path: ./ppocr/utils/dict/japan_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
image_shape: [3, 32, 320] image_shape: [3, 32, 320]
max_text_length: 25 max_text_length: 25
character_type: korean character_type: korean
character_dict_path: ./ppocr/utils/korean_dict.txt character_dict_path: ./ppocr/utils/dict/korean_dict.txt
loss_type: ctc loss_type: ctc
distort: true distort: true
use_space_char: false use_space_char: false
......
...@@ -325,7 +325,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ...@@ -325,7 +325,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: 需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
......
...@@ -120,19 +120,19 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起, ...@@ -120,19 +120,19 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典, `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典,
`ppocr/utils/french_dict.txt` 是一个包含118个字符的法文字典 `ppocr/utils/dict/french_dict.txt` 是一个包含118个字符的法文字典
`ppocr/utils/japan_dict.txt` 是一个包含4399个字符的法文字典 `ppocr/utils/dict/japan_dict.txt` 是一个包含4399个字符的法文字典
`ppocr/utils/korean_dict.txt` 是一个包含3636个字符的法文字典 `ppocr/utils/dict/korean_dict.txt` 是一个包含3636个字符的法文字典
`ppocr/utils/german_dict.txt` 是一个包含131个字符的法文字典 `ppocr/utils/dict/german_dict.txt` 是一个包含131个字符的法文字典
您可以按需使用。 您可以按需使用。
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体** 目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**
如您愿意可将字典文件提交至 [utils](../../ppocr/utils) ,我们会在Repo中感谢您。 如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict) 将语料文件提交至[corpus](../../ppocr/utils/corpus),我们会在Repo中感谢您。
- 自定义字典 - 自定义字典
...@@ -269,7 +269,7 @@ PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的 ...@@ -269,7 +269,7 @@ PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的
Global: Global:
... ...
# 添加自定义字典,如修改字典请将路径指向新字典 # 添加自定义字典,如修改字典请将路径指向新字典
character_dict_path: ./ppocr/utils/french_dict.txt character_dict_path: ./ppocr/utils/dict/french_dict.txt
# 训练时添加数据增强 # 训练时添加数据增强
distort: true distort: true
# 识别空格 # 识别空格
......
...@@ -330,7 +330,7 @@ If you need to predict other language models, when using inference model predict ...@@ -330,7 +330,7 @@ If you need to predict other language models, when using inference model predict
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition: You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/ utils/korean_dict.txt" --vis_font_path="doc/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
......
...@@ -112,18 +112,18 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a ...@@ -112,18 +112,18 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters `ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
`ppocr/utils/french_dict.txt` is a French dictionary with 118 characters `ppocr/utils/dict/french_dict.txt` is a French dictionary with 118 characters
`ppocr/utils/japan_dict.txt` is a French dictionary with 4399 characters `ppocr/utils/dict/japan_dict.txt` is a French dictionary with 4399 characters
`ppocr/utils/korean_dict.txt` is a French dictionary with 3636 characters `ppocr/utils/dict/korean_dict.txt` is a French dictionary with 3636 characters
`ppocr/utils/german_dict.txt` is a French dictionary with 131 characters `ppocr/utils/dict/german_dict.txt` is a French dictionary with 131 characters
You can use it on demand. You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**, The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [utils](../../ppocr/utils) and we will thank you in the Repo. If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) or corpus file to [corpus](../../ppocr/utils/corpus) and we will thank you in the Repo.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
...@@ -259,7 +259,7 @@ Global: ...@@ -259,7 +259,7 @@ Global:
... ...
# Add a custom dictionary, if you modify the dictionary # Add a custom dictionary, if you modify the dictionary
# please point the path to the new dictionary # please point the path to the new dictionary
character_dict_path: ./ppocr/utils/french_dict.txt character_dict_path: ./ppocr/utils/dict/french_dict.txt
# Add data augmentation during training # Add data augmentation during training
distort: true distort: true
# Identify spaces # Identify spaces
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册