Merge pull request #2449 from tink2123/add_multi_doc

polish multilingual doc

Merge pull request #2449 from tink2123/add_multi_doc
polish multilingual doc
606e7912 · xiaoting · GitHub · 7ed2628a · 03250463 · 606e7912
8 changed file
--- a/doc/doc_ch/multi_languages.md
+++ b/doc/doc_ch/multi_languages.md
@@ -5,6 +5,25 @@
 - 2021.4.9 支持**80种**语言的检测和识别
 - 2021.4.9 支持**轻量高精度**英文模型检测识别
+PaddleOCR 旨在打造一套丰富、领先、且实用的OCR工具库，不仅提供了通用场景下的中英文模型，也提供了专门在英文场景下训练的模型，
+和覆盖[80个语言](#语种缩写)的小语种模型。
+其中英文模型支持，大小写字母和常见标点的检测识别，并优化了空格字符的识别：
+<div align="center">
+    <img src="../imgs_results/multi_lang/en_1.jpg" width="400" height="600">
+</div>
+小语种模型覆盖了拉丁语系、阿拉伯语系、中文繁体、韩语、日语等等：
+<div align="center">
+    <img src="../imgs_results/multi_lang/japan_2.jpg" width="600" height="300">
+    <img src="../imgs_results/multi_lang/french_0.jpg" width="300" height="300">
+</div>
+本文档将简要介绍小语种模型的使用方法。
 - [1 安装](#安装)
    - [1.1 paddle 安装](#paddle安装)
    - [1.2 paddleocr package 安装](#paddleocr_package_安装)  
@@ -68,7 +87,11 @@ Paddleocr目前支持80个语种，可以通过修改--lang参数进行切换，
 paddleocr --image_dir doc/imgs/japan_2.jpg --lang=japan
 ```
-![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs/japan_2.jpg)
+<div align="center">
+    <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs/japan_2.jpg" width="800">
+</div>
 结果是一个list，每个item包含了文本框，文字和识别置信度
 ```text
@@ -138,8 +161,10 @@ im_show.save('result.jpg')
 ```
 结果可视化:
-![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_results/korean.jpg)
+<div align="center">
+    <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/korean.jpg" width="800">
+</div>
 * 识别预测
@@ -152,7 +177,8 @@ for line in result:
    print(line)
 ```
-![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_words/german/1.jpg)
+![](../imgs_words/german/1.jpg)
 结果是一个tuple，只包含识别结果和识别置信度
@@ -187,7 +213,10 @@ im_show.save('result.jpg')
 ```
 结果可视化 ：
-![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_results/whl/12_det.jpg)
+<div align="center">
+    <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/whl/12_det.jpg" width="800">
+</div>
 ppocr 还支持方向分类， 更多使用方式请参考：[whl包使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/whl.md)。
@@ -233,7 +262,7 @@ ppocr 支持使用自己的数据进行自定义训练或finetune, 其中识别
 |卡纳达文|Kannada |kn|
 |泰米尔文|Tamil |ta|
 |南非荷兰文 |Afrikaans |af|
-|阿塞拜疆文 |Azerbaijani	|az|
+|阿塞拜疆文 |Azerbaijani    |az|
 |波斯尼亚文|Bosnian|bs|
 |捷克文|Czech|cs|
 |威尔士文 |Welsh |cy|

--- a/doc/doc_en/multi_languages_en.md
+++ b/doc/doc_en/multi_languages_en.md
@@ -5,21 +5,41 @@
 -2021.4.9 supports the detection and recognition of 80 languages
 -2021.4.9 supports **lightweight high-precision** English model detection and recognition
- [1 Installation](#Install)
+PaddleOCR aims to create a rich, leading, and practical OCR tool library, which not only provides
-    - [1.1 paddle installation](#paddleinstallation)
+Chinese and English models in general scenarios, but also provides models specifically trained
-    - [1.2 paddleocr package installation](#paddleocr_package_install)
+in English scenarios. And multilingual models covering [80 languages](#language_abbreviations).
- [2 Quick Use](#Quick_Use)
+Among them, the English model supports the detection and recognition of uppercase and lowercase
-    - [2.1 Command line operation](#Command_line_operation)
+letters and common punctuation, and the recognition of space characters is optimized:
-        - [2.1.1 Prediction of the whole image](#bash_detection+recognition)
-        - [2.1.2 Recognition](#bash_Recognition)
+<div align="center">
-        - [2.1.3 Detection](#bash_detection)
+    <img src="../imgs_results/multi_lang/en_1.jpg" width="400" height="600">
-    - [2.2 python script running](#python_Script_running)
+</div>
-        - [2.2.1 Whole image prediction](#python_detection+recognition)
-        - [2.2.2 Recognition](#python_Recognition)
+The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japanese, etc.:
-        - [2.2.3 Detection](#python_detection)
- [3 Custom Training](#Custom_Training)
+<div align="center">
- [4 Supported languages and abbreviations](#language_abbreviations)
+    <img src="../imgs_results/multi_lang/japan_2.jpg" width="600" height="300">
+    <img src="../imgs_results/multi_lang/french_0.jpg" width="300" height="300">
+</div>
+This document will briefly introduce how to use the multilingual model.
+-[1 Installation](#Install)
+    -[1.1 paddle installation](#paddleinstallation)
+    -[1.2 paddleocr package installation](#paddleocr_package_install)
+-[2 Quick Use](#Quick_Use)
+    -[2.1 Command line operation](#Command_line_operation)
+     -[2.1.1 Prediction of the whole image](#bash_detection+recognition)
+     -[2.1.2 Recognition](#bash_Recognition)
+     -[2.1.3 Detection](#bash_detection)
+    -[2.2 python script running](#python_Script_running)
+     -[2.2.1 Whole image prediction](#python_detection+recognition)
+     -[2.2.2 Recognition](#python_Recognition)
+     -[2.2.3 Detection](#python_detection)
+-[3 Custom Training](#Custom_Training)
+-[4 Supported languages and abbreviations](#language_abbreviations)
 <a name="Install"></a>
 ## 1 Installation

--- a/doc/imgs_results/multi_lang/en_1.jpg
+++ b/doc/imgs_results/multi_lang/en_1.jpg
--- a/doc/imgs_results/multi_lang/en_2.jpg
+++ b/doc/imgs_results/multi_lang/en_2.jpg
--- a/doc/imgs_results/multi_lang/en_3.jpg
+++ b/doc/imgs_results/multi_lang/en_3.jpg
--- a/doc/imgs_results/multi_lang/french_0.jpg
+++ b/doc/imgs_results/multi_lang/french_0.jpg
--- a/doc/imgs_results/multi_lang/japan_2.jpg
+++ b/doc/imgs_results/multi_lang/japan_2.jpg
--- a/ppocr/utils/en_dict.txt
+++ b/ppocr/utils/en_dict.txt
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+:
+;
+<
+=
+>
+?
+@
+A
+B
+C
+D
+E
+F
+G
+H
+I
+J
+K
+L
+M
+N
+O
+P
+Q
+R
+S
+T
+U
+V
+W
+X
+Y
+Z
+[
+\
+]
+^
+_
+`
+a
+b
+c
+d
+e
+f
+g
+h
+i
+j
+k
+l
+m
+n
+o
+p
+q
+r
+s
+t
+u
+v
+w
+x
+y
+z
+{
+|
+}
+~
+!
+"
+#
+$
+%
+&
+'
+(
+)
+*
+
+,
+-
+.
+/