diff --git a/PPOCRLabel/README.md b/PPOCRLabel/README.md index 624e9324a8573074a169200894b10d161a820c7c..f72ae96679d9ca3ba1585c49ce8762cd73e97d24 100644 --- a/PPOCRLabel/README.md +++ b/PPOCRLabel/README.md @@ -1,21 +1,27 @@ +English | [简体中文](README_ch.md) + # PPOCRLabel -PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,使用python3和pyqt5编写,支持矩形框标注和四点标注模式,导出格式可直接用于PPOCR检测和识别模型的训练。 +PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models. + + + +## Installation - +### 1. Install PaddleOCR -## 安装 +Refer to [PaddleOCR installation document](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md) to prepare PaddleOCR -### 1. 安装PaddleOCR -参考[PaddleOCR安装文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md)准备好PaddleOCR +### 2. Install PPOCRLabel -### 2. 安装PPOCRLabel #### Windows + Anaconda +Download and install [Anaconda](https://www.anaconda.com/download/#download) (Python 3+) + ``` pip install pyqt5 -cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 -python PPOCRLabel.py --lang ch +cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder +python PPOCRLabel.py ``` #### Ubuntu Linux @@ -23,78 +29,97 @@ python PPOCRLabel.py --lang ch ``` pip3 install pyqt5 pip3 install trash-cli -cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 -python3 PPOCRLabel.py --lang ch +cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder +python3 PPOCRLabel.py ``` #### macOS ``` pip3 install pyqt5 -pip3 uninstall opencv-python # 由于mac版本的opencv与pyqt有冲突,需先手动卸载opencv -pip3 install opencv-contrib-python-headless # 安装headless版本的open-cv -cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 -python3 PPOCRLabel.py --lang ch +pip3 uninstall opencv-python # Uninstall opencv manually as it conflicts with pyqt +pip3 install opencv-contrib-python-headless # Install the headless version of opencv +cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder +python3 PPOCRLabel.py ``` -## 使用 +## Usage + +### Steps + +1. Build and launch using the instructions above. + +2. Click 'Open Dir' in Menu/File to select the folder of the picture.[1] + +3. Click 'Auto recognition', use PPOCR model to automatically annotate images which marked with 'X' [2]before the file name. + +4. Create Box: + + 4.1 Click 'Create RectBox' or press 'W' in English keyboard mode to draw a new rectangle detection box. Click and release left mouse to select a region to annotate the text area. -### 操作步骤 + 4.2 Press 'P' to enter four-point labeling mode which enables you to create any four-point shape by clicking four points with the left mouse button in succession and DOUBLE CLICK the left mouse as the signal of labeling completion. -1. 安装与运行:使用上述命令安装与运行程序。 -2. 打开文件夹:在菜单栏点击 “文件” - "打开目录" 选择待标记图片的文件夹[1]. -3. 自动标注:点击 ”自动标注“,使用PPOCR超轻量模型对图片文件名前图片状态[2]为 “X” 的图片进行自动标注。 -4. 手动标注:点击 “矩形标注”(推荐直接在英文模式下点击键盘中的 “W”),用户可对当前图片中模型未检出的部分进行手动绘制标记框。点击键盘P,则使用四点标注模式(或点击“编辑” - “四点标注”),用户依次点击4个点后,双击左键表示标注完成。 -5. 标记框绘制完成后,用户点击 “确认”,检测框会先被预分配一个 “待识别” 标签。 -6. 重新识别:将图片中的所有检测画绘制/调整完成后,点击 “重新识别”,PPOCR模型会对当前图片中的**所有检测框**重新识别[3]。 -7. 内容更改:双击识别结果,对不准确的识别结果进行手动更改。 -8. 确认标记:点击 “确认”,图片状态切换为 “√”,跳转至下一张(此时不会直接将结果写入文件)。 -9. 删除:点击 “删除图像”,图片将会被删除至回收站。 -10. 保存结果:用户可以通过菜单中“文件-保存标记结果”手动保存,同时程序也会在用户每确认10张图片后自动保存一次。手动确认过的标记将会被存放在所打开图片文件夹下的*Label.txt*中。在菜单栏点击 “文件” - "保存识别结果"后,会将此类图片的识别训练数据保存在*crop_img*文件夹下,识别标签保存在*rec_gt.txt*中[4]。 +5. After the marking frame is drawn, the user clicks "OK", and the detection frame will be pre-assigned a "TEMPORARY" label. -### 注意 +6. Click 're-Recognition', model will rewrite ALL recognition results in ALL detection box[3]. -[1] PPOCRLabel以文件夹为基本标记单位,打开待标记的图片文件夹后,不会在窗口栏中显示图片,而是在点击 "选择文件夹" 之后直接将文件夹下的图片导入到程序中。 +7. Double click the result in 'recognition result' list to manually change inaccurate recognition results. -[2] 图片状态表示本张图片用户是否手动保存过,未手动保存过即为 “X”,手动保存过为 “√”。点击 “自动标注”按钮后,PPOCRLabel不会对状态为 “√” 的图片重新标注。 +8. Click "Check", the image status will switch to "√",then the program automatically jump to the next(The results will not be written directly to the file at this time). -[3] 点击“重新识别”后,模型会对图片中的识别结果进行覆盖。因此如果在此之前手动更改过识别结果,有可能在重新识别后产生变动。 +9. Click "Delete Image" and the image will be deleted to the recycle bin. -[4] PPOCRLabel产生的文件放置于标记图片文件夹下,包括一下几种,请勿手动更改其中内容,否则会引起程序出现异常。 +10. Labeling result: the user can save manually through the menu "File - Save Label", while the program will also save automatically after every 10 images confirmed by the user.the manually checked label will be stored in *Label.txt* under the opened picture folder. + Click "PaddleOCR"-"Save Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*[4]. -| 文件名 | 说明 | +### Note + +[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir". + +[2] The image status indicates whether the user has saved the image manually. If it has not been saved manually it is "X", otherwise it is "√", PPOCRLabel will not relabel pictures with a status of "√". + +[3] After clicking "Re-recognize", the model will overwrite ALL recognition results in the picture. +Therefore, if the recognition result has been manually changed before, it may change after re-recognition. + +[4] The files produced by PPOCRLabel can be found under the opened picture folder including the following, please do not manually change the contents, otherwise it will cause the program to be abnormal. + +| File name | Description | | :-----------: | :----------------------------------------------------------: | -| Label.txt | 检测标签,可直接用于PPOCR检测模型训练。用户每保存10张检测结果后,程序会进行自动写入。当用户关闭应用程序或切换文件路径后同样会进行写入。 | -| fileState.txt | 图片状态标记文件,保存当前文件夹下已经被用户手动确认过的图片名称。 | -| Cache.cach | 缓存文件,保存模型自动识别的结果。 | -| rec_gt.txt | 识别标签。可直接用于PPOCR识别模型训练。需用户手动点击菜单栏“文件” - "保存识别结果"后产生。 | -| crop_img | 识别数据。按照检测框切割后的图片。与rec_gt.txt同时产生。 | +| Label.txt | The detection label file can be directly used for PPOCR detection model training. After the user saves 10 label results, the file will be automatically saved. It will also be written when the user closes the application or changes the file folder. | +| fileState.txt | The picture status file save the image in the current folder that has been manually confirmed by the user. | +| Cache.cach | Cache files to save the results of model recognition. | +| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Save recognition result". | +| crop_img | The recognition data, generated at the same time with *rec_gt.txt* | + +## Explanation + +### Built-in Model + +- Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection. + +- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languages​include French, German, Korean, and Japanese. + For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating) -## 说明 -### 内置模型 +- Custom model: The model trained by users can be replaced by modifying PPOCRLabel.py in [PaddleOCR class instantiation](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110) referring [Custom Model Code](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md#use-custom-model) - - 默认模型:PPOCRLabel默认使用PaddleOCR中的中英文超轻量OCR模型,支持中英文与数字识别,多种语言检测。 +### Export partial recognition results - - 模型语言切换:用户可通过菜单栏中 "PaddleOCR" - "选择模型" 切换内置模型语言,目前支持的语言包括法文、德文、韩文、日文。具体模型下载链接可参考[PaddleOCR模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md). +For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox. - - 自定义模型:用户可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B),通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110)替换成自己训练的模型。 +*Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.* -### 导出部分识别结果 +### Error message -针对部分难以识别的数据,通过在识别结果的复选框中**取消勾选**相应的标记,其识别结果不会被导出。 +- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated. -*注意:识别结果中的复选框状态仍需用户手动点击保存后才能保留* +- For Linux users, if you get an error starting with **objc[XXXXX]** when opening the software, it proves that your opencv version is too high. It is recommended to install version 4.2: -### 错误提示 -- 如果同时使用whl包安装了paddleocr,其优先级大于通过paddleocr.py调用PaddleOCR类,whl包未更新时会导致程序异常。 -- PPOCRLabel**不支持对中文文件名**的图片进行自动标注。 -- 针对Linux用户::如果您在打开软件过程中出现**objc[XXXXX]**开头的错误,证明您的opencv版本太高,建议安装4.2版本: - ``` - pip install opencv-python==4.2.0.32 - ``` -- 如果出现''Missing string id '开头的错误,需要重新编译资源: - ``` - pyrcc5 -o libs/resources.py resources.qrc - ``` -### 参考资料 + ``` + pip install opencv-python==4.2.0.32 + ``` +- If you get an error starting with **Missing string id **,you need to recompile resources: + ``` + pyrcc5 -o libs/resources.py resources.qrc + ``` +### Related 1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg) diff --git a/PPOCRLabel/README_ch.md b/PPOCRLabel/README_ch.md new file mode 100644 index 0000000000000000000000000000000000000000..334cb2860848dd7def22c537b0e10cd5a9435289 --- /dev/null +++ b/PPOCRLabel/README_ch.md @@ -0,0 +1,102 @@ +[English](README.md) | 简体中文 + +# PPOCRLabel + +PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,使用python3和pyqt5编写,支持矩形框标注和四点标注模式,导出格式可直接用于PPOCR检测和识别模型的训练。 + + + +## 安装 + +### 1. 安装PaddleOCR +参考[PaddleOCR安装文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md)准备好PaddleOCR + +### 2. 安装PPOCRLabel +#### Windows + Anaconda + +``` +pip install pyqt5 +cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 +python PPOCRLabel.py --lang ch +``` + +#### Ubuntu Linux + +``` +pip3 install pyqt5 +pip3 install trash-cli +cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 +python3 PPOCRLabel.py --lang ch +``` + +#### macOS +``` +pip3 install pyqt5 +pip3 uninstall opencv-python # 由于mac版本的opencv与pyqt有冲突,需先手动卸载opencv +pip3 install opencv-contrib-python-headless # 安装headless版本的open-cv +cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下 +python3 PPOCRLabel.py --lang ch +``` + +## 使用 + +### 操作步骤 + +1. 安装与运行:使用上述命令安装与运行程序。 +2. 打开文件夹:在菜单栏点击 “文件” - "打开目录" 选择待标记图片的文件夹[1]. +3. 自动标注:点击 ”自动标注“,使用PPOCR超轻量模型对图片文件名前图片状态[2]为 “X” 的图片进行自动标注。 +4. 手动标注:点击 “矩形标注”(推荐直接在英文模式下点击键盘中的 “W”),用户可对当前图片中模型未检出的部分进行手动绘制标记框。点击键盘P,则使用四点标注模式(或点击“编辑” - “四点标注”),用户依次点击4个点后,双击左键表示标注完成。 +5. 标记框绘制完成后,用户点击 “确认”,检测框会先被预分配一个 “待识别” 标签。 +6. 重新识别:将图片中的所有检测画绘制/调整完成后,点击 “重新识别”,PPOCR模型会对当前图片中的**所有检测框**重新识别[3]。 +7. 内容更改:双击识别结果,对不准确的识别结果进行手动更改。 +8. 确认标记:点击 “确认”,图片状态切换为 “√”,跳转至下一张(此时不会直接将结果写入文件)。 +9. 删除:点击 “删除图像”,图片将会被删除至回收站。 +10. 保存结果:用户可以通过菜单中“文件-保存标记结果”手动保存,同时程序也会在用户每确认10张图片后自动保存一次。手动确认过的标记将会被存放在所打开图片文件夹下的*Label.txt*中。在菜单栏点击 “文件” - "保存识别结果"后,会将此类图片的识别训练数据保存在*crop_img*文件夹下,识别标签保存在*rec_gt.txt*中[4]。 + +### 注意 + +[1] PPOCRLabel以文件夹为基本标记单位,打开待标记的图片文件夹后,不会在窗口栏中显示图片,而是在点击 "选择文件夹" 之后直接将文件夹下的图片导入到程序中。 + +[2] 图片状态表示本张图片用户是否手动保存过,未手动保存过即为 “X”,手动保存过为 “√”。点击 “自动标注”按钮后,PPOCRLabel不会对状态为 “√” 的图片重新标注。 + +[3] 点击“重新识别”后,模型会对图片中的识别结果进行覆盖。因此如果在此之前手动更改过识别结果,有可能在重新识别后产生变动。 + +[4] PPOCRLabel产生的文件放置于标记图片文件夹下,包括一下几种,请勿手动更改其中内容,否则会引起程序出现异常。 + +| 文件名 | 说明 | +| :-----------: | :----------------------------------------------------------: | +| Label.txt | 检测标签,可直接用于PPOCR检测模型训练。用户每保存10张检测结果后,程序会进行自动写入。当用户关闭应用程序或切换文件路径后同样会进行写入。 | +| fileState.txt | 图片状态标记文件,保存当前文件夹下已经被用户手动确认过的图片名称。 | +| Cache.cach | 缓存文件,保存模型自动识别的结果。 | +| rec_gt.txt | 识别标签。可直接用于PPOCR识别模型训练。需用户手动点击菜单栏“文件” - "保存识别结果"后产生。 | +| crop_img | 识别数据。按照检测框切割后的图片。与rec_gt.txt同时产生。 | + +## 说明 +### 内置模型 + + - 默认模型:PPOCRLabel默认使用PaddleOCR中的中英文超轻量OCR模型,支持中英文与数字识别,多种语言检测。 + + - 模型语言切换:用户可通过菜单栏中 "PaddleOCR" - "选择模型" 切换内置模型语言,目前支持的语言包括法文、德文、韩文、日文。具体模型下载链接可参考[PaddleOCR模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md). + + - 自定义模型:用户可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B),通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110)替换成自己训练的模型。 + +### 导出部分识别结果 + +针对部分难以识别的数据,通过在识别结果的复选框中**取消勾选**相应的标记,其识别结果不会被导出。 + +*注意:识别结果中的复选框状态仍需用户手动点击保存后才能保留* + +### 错误提示 +- 如果同时使用whl包安装了paddleocr,其优先级大于通过paddleocr.py调用PaddleOCR类,whl包未更新时会导致程序异常。 +- PPOCRLabel**不支持对中文文件名**的图片进行自动标注。 +- 针对Linux用户::如果您在打开软件过程中出现**objc[XXXXX]**开头的错误,证明您的opencv版本太高,建议安装4.2版本: + ``` + pip install opencv-python==4.2.0.32 + ``` +- 如果出现''Missing string id '开头的错误,需要重新编译资源: + ``` + pyrcc5 -o libs/resources.py resources.qrc + ``` +### 参考资料 + +1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg) diff --git a/PPOCRLabel/README_en.md b/PPOCRLabel/README_en.md deleted file mode 100644 index 42ded6b0eacb643469eb6869fa6ff5dddf85f9b7..0000000000000000000000000000000000000000 --- a/PPOCRLabel/README_en.md +++ /dev/null @@ -1,123 +0,0 @@ -# PPOCRLabel - -PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models. - - - -## Installation - -### 1. Install PaddleOCR - -Refer to [PaddleOCR installation document](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md) to prepare PaddleOCR - -### 2. Install PPOCRLabel - -#### Windows + Anaconda - -Download and install [Anaconda](https://www.anaconda.com/download/#download) (Python 3+) - -``` -pip install pyqt5 -cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder -python PPOCRLabel.py -``` - -#### Ubuntu Linux - -``` -pip3 install pyqt5 -pip3 install trash-cli -cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder -python3 PPOCRLabel.py -``` - -#### macOS -``` -pip3 install pyqt5 -pip3 uninstall opencv-python # Uninstall opencv manually as it conflicts with pyqt -pip3 install opencv-contrib-python-headless # Install the headless version of opencv -cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder -python3 PPOCRLabel.py -``` - -## Usage - -### Steps - -1. Build and launch using the instructions above. - -2. Click 'Open Dir' in Menu/File to select the folder of the picture.[1] - -3. Click 'Auto recognition', use PPOCR model to automatically annotate images which marked with 'X' [2]before the file name. - -4. Create Box: - - 4.1 Click 'Create RectBox' or press 'W' in English keyboard mode to draw a new rectangle detection box. Click and release left mouse to select a region to annotate the text area. - - 4.2 Press 'P' to enter four-point labeling mode which enables you to create any four-point shape by clicking four points with the left mouse button in succession and DOUBLE CLICK the left mouse as the signal of labeling completion. - -5. After the marking frame is drawn, the user clicks "OK", and the detection frame will be pre-assigned a "TEMPORARY" label. - -6. Click 're-Recognition', model will rewrite ALL recognition results in ALL detection box[3]. - -7. Double click the result in 'recognition result' list to manually change inaccurate recognition results. - -8. Click "Check", the image status will switch to "√",then the program automatically jump to the next(The results will not be written directly to the file at this time). - -9. Click "Delete Image" and the image will be deleted to the recycle bin. - -10. Labeling result: the user can save manually through the menu "File - Save Label", while the program will also save automatically after every 10 images confirmed by the user.the manually checked label will be stored in *Label.txt* under the opened picture folder. - Click "PaddleOCR"-"Save Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*[4]. - -### Note - -[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir". - -[2] The image status indicates whether the user has saved the image manually. If it has not been saved manually it is "X", otherwise it is "√", PPOCRLabel will not relabel pictures with a status of "√". - -[3] After clicking "Re-recognize", the model will overwrite ALL recognition results in the picture. -Therefore, if the recognition result has been manually changed before, it may change after re-recognition. - -[4] The files produced by PPOCRLabel can be found under the opened picture folder including the following, please do not manually change the contents, otherwise it will cause the program to be abnormal. - -| File name | Description | -| :-----------: | :----------------------------------------------------------: | -| Label.txt | The detection label file can be directly used for PPOCR detection model training. After the user saves 10 label results, the file will be automatically saved. It will also be written when the user closes the application or changes the file folder. | -| fileState.txt | The picture status file save the image in the current folder that has been manually confirmed by the user. | -| Cache.cach | Cache files to save the results of model recognition. | -| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Save recognition result". | -| crop_img | The recognition data, generated at the same time with *rec_gt.txt* | - -## Explanation - -### Built-in Model - -- Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection. - -- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languages​include French, German, Korean, and Japanese. - For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating) - -- Custom model: The model trained by users can be replaced by modifying PPOCRLabel.py in [PaddleOCR class instantiation](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110) referring [Custom Model Code](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md#use-custom-model) - -### Export partial recognition results - -For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox. - -*Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.* - -### Error message - -- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated. - -- For Linux users, if you get an error starting with **objc[XXXXX]** when opening the software, it proves that your opencv version is too high. It is recommended to install version 4.2: - - ``` - pip install opencv-python==4.2.0.32 - ``` -- If you get an error starting with **Missing string id **,you need to recompile resources: - ``` - pyrcc5 -o libs/resources.py resources.qrc - ``` -### Related - -1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg) diff --git a/README.md b/README.md index 03bc26dd47b6dd742ca99da6cd29f9baca4dcc83..3f6737f8343d4d03be98d76fe941482c5de8397f 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,28 @@ English | [简体中文](README_ch.md) ## Introduction -PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice. +PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice. + +## Notice +PaddleOCR supports both dynamic graph and static graph programming paradigm +- Dynamic graph: dygraph branch (default), **supported by paddle 2.0rc1+ ([installation](./doc/doc_en/installation_en.md))** +- Static graph: develop branch **Recent updates** +- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](./StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image. +- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 -- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list) -- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list) -- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated. -- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md) -- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519) - [more](./doc/doc_en/update_en.md) ## Features - PPOCR series of high-quality pre-trained models, comparable to commercial effects - - Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M - - General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M - - Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M -- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition -- Support multi-language recognition: Korean, Japanese, German, French + - Ultra lightweight ppocr_mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M + - General ppocr_server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M + - Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition + - Support multi-language recognition: Korean, Japanese, German, French +- Rich toolkits related to the OCR areas + - Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation + - Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image - Support user-defined training, provides rich predictive inference deployment solutions - Support PIP installation, easy to use - Support Linux, Windows, MacOS and other systems @@ -26,12 +30,21 @@ PaddleOCR aims to create rich, leading, and practical OCR tools that help users ## Visualization
- - + +
The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md). + +## Community +- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation. + +
+ +
+ + ## Quick Experience You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) @@ -48,55 +61,62 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr -## PP-OCR 1.1 series model list(Update on Sep 17) + +## PP-OCR 2.0 series model list(Update on Dec 15) +**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | | ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | -| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | -| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | +| Chinese and English ultra-lightweight OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | +| Chinese and English general OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | + -For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md) +For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md). +For a new language request, please refer to [Guideline for new language_requests](#language_requests). ## Tutorials - [Installation](./doc/doc_en/installation_en.md) - [Quick Start](./doc/doc_en/quickstart_en.md) - [Code Structure](./doc/doc_en/tree_en.md) -- Algorithm introduction +- Algorithm Introduction - [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md) - [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md) - - [PP-OCR Pipline](#PP-OCR-Pipline) -- Model training/evaluation + - [PP-OCR Pipeline](#PP-OCR-Pipeline) +- Model Training/Evaluation - [Text Detection](./doc/doc_en/detection_en.md) - [Text Recognition](./doc/doc_en/recognition_en.md) - [Direction Classification](./doc/doc_en/angle_class_en.md) - [Yml Configuration](./doc/doc_en/config_en.md) - Inference and Deployment - - [Quick inference based on pip](./doc/doc_en/whl_en.md) + - [Quick Inference Based on PIP](./doc/doc_en/whl_en.md) - [Python Inference](./doc/doc_en/inference_en.md) - [C++ Inference](./deploy/cpp_infer/readme_en.md) - [Serving](./deploy/hubserving/readme_en.md) - - [Mobile](./deploy/lite/readme_en.md) - - [Model Quantization](./deploy/slim/quantization/README_en.md) - - [Model Compression](./deploy/slim/prune/README_en.md) - - [Benchmark](./doc/doc_en/benchmark_en.md) + - [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md) + - [Benchmark](./doc/doc_en/benchmark_en.md) +- Data Annotation and Synthesis + - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md) + - [Data Synthesis Tool: Style-Text](./StyleText/README.md) + - [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md) + - [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) - Datasets - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) - - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md) - - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) - [Visualization](#Visualization) +- [New language requests](#language_requests) - [FAQ](./doc/doc_en/FAQ_en.md) - [Community](#Community) - [References](./doc/doc_en/reference_en.md) - [License](#LICENSE) - [Contribution](#CONTRIBUTION) - -## PP-OCR Pipline + + + +## PP-OCR Pipeline
@@ -109,30 +129,41 @@ PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of thr ## Visualization [more](./doc/doc_en/visualization_en.md) - Chinese OCR model
- - - - + + + +
- English OCR model
- +
- Multilingual OCR model
- - + +
- -## Community -Scan the QR code below with your Wechat and completing the questionnaire, you can access to official technical exchange group. -
- -
+ +## Guideline for new language requests + +If you want to request a new language support, a PR with 2 following files are needed: + +1. In folder [ppocr/utils/dict](./ppocr/utils/dict), +it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder. + +2. In folder [ppocr/utils/corpus](./ppocr/utils/corpus), +it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language. +Maybe, 50000 words per language is necessary at least. +Of course, the more, the better. + +If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on. + +More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048). + ## License @@ -149,3 +180,7 @@ We welcome all the contributions to PaddleOCR and appreciate for your feedback v - Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. - Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. - Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. +- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment. +- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set. +- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language. +- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。 diff --git a/README_ch.md b/README_ch.md index fa400d31307e83831628bc45559845e7adef6ffd..d4383eb4989d746ba4fbf124324f45abfb06302a 100644 --- a/README_ch.md +++ b/README_ch.md @@ -2,16 +2,16 @@ ## 简介 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 +## 注意 +PaddleOCR同时支持动态图与静态图两种编程范式 +- 动态图版本:dygraph分支(默认),需将paddle版本升级至2.0rc1+([快速安装](./doc/doc_ch/installation.md)) +- 静态图版本:develop分支 **近期更新** +- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。 - 2020.12.07 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。 -- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 +- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 - 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 -- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipeline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载) -- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载) -- 2020.9.17 更新[英文识别模型](./doc/doc_ch/models_list.md#英文识别模型)和[多语言识别模型](doc/doc_ch/models_list.md#多语言识别模型),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。 -- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md) -- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519) - [More](./doc/doc_ch/update.md) @@ -19,11 +19,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 特性 - PPOCR系列高质量预训练模型,准确的识别效果 - - 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M - - 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M - - 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M -- 支持中英文数字组合识别、竖排文本识别、长文本识别 -- 支持多语言识别:韩语、日语、德语、法语 + - 超轻量ppocr_mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M + - 通用ppocr_server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M + - 支持中英文数字组合识别、竖排文本识别、长文本识别 + - 支持多语言识别:韩语、日语、德语、法语 +- 丰富易用的OCR相关工具组件 + - 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注 + - 数据合成工具Style-Text:批量合成大量与目标场景类似的图像 - 支持用户自定义训练,提供丰富的预测推理部署方案 - 支持PIP快速安装使用 - 可运行于Linux、Windows、MacOS等多种系统 @@ -31,8 +33,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ## 效果展示
- - + +
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 @@ -47,15 +49,15 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
-- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始 +- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始 ## PP-OCR 2.0系列模型列表(更新中) - +**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。 | 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | | ------------ | --------------- | ----------------|---- | ---------- | -------- | -| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | -| 中英文通用OCR模型(143M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | +| 中英文超轻量OCR模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | +| 中英文通用OCR模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | 更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](./doc/doc_ch/models_list.md) @@ -78,27 +80,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 - [基于C++预测引擎推理](./deploy/cpp_infer/readme.md) - [服务化部署](./deploy/hubserving/readme.md) - [端侧部署](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme.md) - - [模型量化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README.md) - - [模型裁剪](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/README.md) - [Benchmark](./doc/doc_ch/benchmark.md) - 数据集 - [通用中英文OCR数据集](./doc/doc_ch/datasets.md) - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md) - [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md) - - [常用数据标注工具](./doc/doc_ch/data_annotation.md) - - [常用数据合成工具](./doc/doc_ch/data_synthesis.md) +- 数据标注与合成 + - [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md) + - [数据合成工具Style-Text](./StyleText/README_ch.md) + - [其它数据标注工具](./doc/doc_ch/data_annotation.md) + - [其它数据合成工具](./doc/doc_ch/data_synthesis.md) - [效果展示](#效果展示) - FAQ - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) - - [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md) - - [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md) + - [【理论篇】OCR通用30个问题](./doc/doc_ch/FAQ.md) + - [【实战篇】PaddleOCR实战84个问题](./doc/doc_ch/FAQ.md) - [技术交流群](#欢迎加入PaddleOCR技术交流群) - [参考文献](./doc/doc_ch/reference.md) - [许可证书](#许可证书) - [贡献代码](#贡献代码) -***注意:动态图端侧部署仍在开发中,目前仅支持动态图训练、python端预测,C++预测, -如果您有需要移动端部署案例或者量化裁剪,请切换到静态图分支;*** ## PP-OCR Pipline @@ -112,10 +113,10 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框 ## 效果展示 [more](./doc/doc_ch/visualization.md) - 中文模型
- - - - + + + +
- 英文模型 @@ -125,8 +126,8 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框 - 其他语言模型
- - + +
diff --git a/README_en.md b/README_en.md deleted file mode 100644 index 0b5eb1c60b5591c889275f31f4d5952727ae7645..0000000000000000000000000000000000000000 --- a/README_en.md +++ /dev/null @@ -1,186 +0,0 @@ -English | [简体中文](README_ch.md) - -## Introduction -PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice. - -**Recent updates** -- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README_en.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. -- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 -- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipeline](#PP-OCR-Pipeline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list) -- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list) -- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `English`, `Chinese`, `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated. -- 2020.8.24 Support the use of PaddleOCR through whl package installation,please refer [PaddleOCR Package](./doc/doc_en/whl_en.md) -- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519) -- [more](./doc/doc_en/update_en.md) - -## Features -- PPOCR series of high-quality pre-trained models, comparable to commercial effects - - Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M - - General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M - - Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M -- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition -- Support multi-language recognition: Korean, Japanese, German, French -- Support user-defined training, provides rich predictive inference deployment solutions -- Support PIP installation, easy to use -- Support Linux, Windows, MacOS and other systems - -## Visualization - -
- - -
- -The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md). - - -## Community -- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation. - -
- -
- - -## Quick Experience - -You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr) - -Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite) - - Also, you can scan the QR code below to install the App (**Android support only**) - -
- -
- -- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md) - - - -## PP-OCR 2.0 series model list(Update on Sep 17) - -| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | -| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | -| Chinese and English general OCR model (143M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | - - -For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md). - -For a new language request, please refer to [Guideline for new language_requests](#language_requests). - -## Tutorials -- [Installation](./doc/doc_en/installation_en.md) -- [Quick Start](./doc/doc_en/quickstart_en.md) -- [Code Structure](./doc/doc_en/tree_en.md) -- Algorithm Introduction - - [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md) - - [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md) - - [PP-OCR Pipeline](#PP-OCR-Pipeline) -- Model Training/Evaluation - - [Text Detection](./doc/doc_en/detection_en.md) - - [Text Recognition](./doc/doc_en/recognition_en.md) - - [Direction Classification](./doc/doc_en/angle_class_en.md) - - [Yml Configuration](./doc/doc_en/config_en.md) -- Inference and Deployment - - [Quick Inference Based on PIP](./doc/doc_en/whl_en.md) - - [Python Inference](./doc/doc_en/inference_en.md) - - [C++ Inference](./deploy/cpp_infer/readme_en.md) - - [Serving](./deploy/hubserving/readme_en.md) - - [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md) - - [Model Quantization](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md) - - [Model Compression](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/README_en.md) - - [Benchmark](./doc/doc_en/benchmark_en.md) -- Data Annotation and Synthesis - - [Semi-automatic Annotation Tool](./PPOCRLabel/README_en.md) - - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md) - - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md) -- Datasets - - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md) - - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md) - - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md) -- [Visualization](#Visualization) -- [New language requests](#language_requests) -- [FAQ](./doc/doc_en/FAQ_en.md) -- [Community](#Community) -- [References](./doc/doc_en/reference_en.md) -- [License](#LICENSE) -- [Contribution](#CONTRIBUTION) - -***Note: The dynamic graphs branch is still under development. -Currently, only dynamic graph training, python-end prediction, and C++ prediction are supported. -If you need mobile-end deployment cases or quantitative demo, -please use the static graph branch.*** - - - - -## PP-OCR Pipeline - -
- -
- -PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner and PACT quantization is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim). - - - -## Visualization [more](./doc/doc_en/visualization_en.md) -- Chinese OCR model -
- - - - -
- -- English OCR model -
- -
- -- Multilingual OCR model -
- - -
- - - -## Guideline for new language requests - -If you want to request a new language support, a PR with 2 following files are needed: - -1. In folder [ppocr/utils/dict](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/dict), -it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder. - -2. In folder [ppocr/utils/corpus](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/corpus), -it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language. -Maybe, 50000 words per language is necessary at least. -Of course, the more, the better. - -If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on. - -More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048). - - - -## License -This project is released under Apache 2.0 license - - -## Contribution -We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. - -- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation. -- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually. -- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure. -- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets. -- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively. -- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style. -- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services. -- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment. -- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set. -- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language. -- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。 diff --git a/StyleText/README.md b/StyleText/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bbce7c5eba280e95452933f713524b19ff5778e2 --- /dev/null +++ b/StyleText/README.md @@ -0,0 +1,195 @@ +English | [简体中文](README_ch.md) + +## Style Text + +### Contents +- [1. Introduction](#Introduction) +- [2. Preparation](#Preparation) +- [3. Quick Start](#Quick_Start) +- [4. Applications](#Applications) +- [5. Code Structure](#Code_structure) + + + +### Introduction + +
+ +
+ +
+ +
+ + +The Style-Text data synthesis tool is a tool based on Baidu's self-developed text editing algorithm "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047). + +Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes: +* (1) Text foreground style transfer module. +* (2) Background extraction module. +* (3) Fusion module. + +After these three steps, you can quickly realize the image text style transfer. The following figure is some results of the data synthesis tool. + +
+ +
+ + + +#### Preparation + +1. Please refer the [QUICK INSTALLATION](../doc/doc_en/installation_en.md) to install PaddlePaddle. Python3 environment is strongly recommended. +2. Download the pretrained models and unzip: + +```bash +cd StyleText +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip +unzip style_text_models.zip +``` + +If you save the model in another location, please modify the address of the model file in `configs/config.yml`, and you need to modify these three configurations at the same time: + +``` +bg_generator: + pretrain: style_text_rec/bg_generator +... +text_generator: + pretrain: style_text_models/text_generator +... +fusion_generator: + pretrain: style_text_models/fusion_generator +``` + + +### Quick Start + +#### Synthesis single image + +1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder. + +```python +python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en +``` + +* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean. + +For example, enter the following image and corpus `PaddleOCR`. + +
+ +
+ +The result `fake_fusion.jpg` will be generated. + +
+ +
+ +What's more, the medium result `fake_bg.jpg` will also be saved, which is the background output. + +
+ +
+ + +`fake_text.jpg` * `fake_text.jpg` is the generated image with the same font style as `Style Input`. + + +
+ +
+ + +#### Batch synthesis + +In actual application scenarios, it is often necessary to synthesize pictures in batches and add them to the training set. StyleText can use a batch of style pictures and corpus to synthesize data in batches. The synthesis process is as follows: + +1. The referenced dataset can be specifed in `configs/dataset_config.yml`: + + * `Global`: + * `output_dir:`:Output synthesis data path. + * `StyleSampler`: + * `image_home`:style images' folder. + * `label_file`:Style images' file list. If label is provided, then it is the label file path. + * `with_label`:Whether the `label_file` is label file list. + * `CorpusGenerator`: + * `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`. + * `language`:Language of the corpus. + * `corpus_file`: Filepath of the corpus. + + +We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below : + +
+ +
+ +2. You can run the following command to start synthesis task: + + ``` bash + python -m tools.synth_dataset.py -c configs/dataset_config.yml + ``` + + + +### Applications +We take two scenes as examples, which are metal surface English number recognition and general Korean recognition, to illustrate practical cases of using StyleText to synthesize data to improve text recognition. The following figure shows some examples of real scene images and composite images: + +
+ +
+ + +After adding the above synthetic data for training, the accuracy of the recognition model is improved, which is shown in the following table: + + +| Scenario | Characters | Raw Data | Test Data | Only Use Raw Data
Recognition Accuracy | New Synthetic Data | Simultaneous Use of Synthetic Data
Recognition Accuracy | Index Improvement | +| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- | +| Metal surface | English and numbers | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% | +| Random background | Korean | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% | + + + +### Code Structure + +``` +StyleText +|-- arch // Network module files. +| |-- base_module.py +| |-- decoder.py +| |-- encoder.py +| |-- spectral_norm.py +| `-- style_text_rec.py +|-- configs // Config files. +| |-- config.yml +| `-- dataset_config.yml +|-- engine // Synthesis engines. +| |-- corpus_generators.py // Sample corpus from file or generate random corpus. +| |-- predictors.py // Predict using network. +| |-- style_samplers.py // Sample style images. +| |-- synthesisers.py // Manage other engines to synthesis images. +| |-- text_drawers.py // Generate standard input text images. +| `-- writers.py // Write synthesis images and labels into files. +|-- examples // Example files. +| |-- corpus +| | `-- example.txt +| |-- image_list.txt +| `-- style_images +| |-- 1.jpg +| `-- 2.jpg +|-- fonts // Font files. +| |-- ch_standard.ttf +| |-- en_standard.ttf +| `-- ko_standard.ttf +|-- tools // Program entrance. +| |-- __init__.py +| |-- synth_dataset.py // Synthesis dataset. +| `-- synth_image.py // Synthesis image. +`-- utils // Module of basic functions. + |-- config.py + |-- load_params.py + |-- logging.py + |-- math_functions.py + `-- sys_funcs.py +``` diff --git a/StyleText/README_ch.md b/StyleText/README_ch.md new file mode 100644 index 0000000000000000000000000000000000000000..eb557ff24547f228610ffa2cbbaf993e2b4569c3 --- /dev/null +++ b/StyleText/README_ch.md @@ -0,0 +1,179 @@ +简体中文 | [English](README.md) + +## Style Text + + +### 目录 +- [一、工具简介](#工具简介) +- [二、环境配置](#环境配置) +- [三、快速上手](#快速上手) +- [四、应用案例](#应用案例) +- [五、代码结构](#代码结构) + + +### 一、工具简介 +
+ +
+ +
+ +
+ + +Style-Text数据合成工具是基于百度自研的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047 + +不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。 + +
+ +
+ + +### 二、环境配置 + +1. 参考[快速安装](../doc/doc_ch/installation.md),安装PaddleOCR。 +2. 进入`StyleText`目录,下载模型,并解压: + +```bash +cd StyleText +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip +unzip style_text_models.zip +``` + +如果您将模型保存再其他位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置: + +``` +bg_generator: + pretrain: style_text_models/bg_generator +... +text_generator: + pretrain: style_text_models/text_generator +... +fusion_generator: + pretrain: style_text_models/fusion_generator +``` + + +### 三、快速上手 + +#### 合成单张图 +输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下: + +```python +python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en +``` +* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。 + +例如,输入如下图片和语料"PaddleOCR": + +
+ +
+ +生成合成数据`fake_fusion.jpg`: +
+ +
+ +除此之外,程序还会生成并保存中间结果`fake_bg.jpg`:为风格参考图去掉文字后的背景; + +
+ +
+ +`fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。 + +
+ +
+ +#### 批量合成 +在实际应用场景中,经常需要批量合成图片,补充到训练集中。Style-Text可以使用一批风格图片和语料,批量合成数据。合成过程如下: + +1. 在`configs/dataset_config.yml`中配置目标场景风格图像和语料的路径,具体如下: + + * `Global`: + * `output_dir:`:保存合成数据的目录。 + * `StyleSampler`: + * `image_home`:风格图片目录; + * `label_file`:风格图片路径列表文件,如果所用数据集有label,则label_file为label文件路径; + * `with_label`:标志`label_file`是否为label文件。 + * `CorpusGenerator`: + * `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`; + * `language`:语料的语种; + * `corpus_file`: 语料文件路径。 + + Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。 + + 中英韩5万张通用场景数据: [下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar) + +
+ +
+ +2. 运行`tools/synth_dataset`合成数据: + + ``` bash + python -m tools.synth_dataset -c configs/dataset_config.yml + ``` + + +### 四、应用案例 +下面以金属表面英文数字识别和通用韩语识别两个场景为例,说明使用Style-Text合成数据,来提升文本识别效果的实际案例。下图给出了一些真实场景图像和合成图像的示例: + +
+ +
+ +在添加上述合成数据进行训练后,识别模型的效果提升,如下表所示: + +| 场景 | 字符 | 原始数据 | 测试数据 | 只使用原始数据
识别准确率 | 新增合成数据 | 同时使用合成数据
识别准确率 | 指标提升 | +| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- | +| 金属表面 | 英文和数字 | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% | +| 随机背景 | 韩语 | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% | + + + +### 五、代码结构 + +``` +StyleText +|-- arch // 网络结构定义文件 +| |-- base_module.py +| |-- decoder.py +| |-- encoder.py +| |-- spectral_norm.py +| `-- style_text_rec.py +|-- configs // 配置文件 +| |-- config.yml +| `-- dataset_config.yml +|-- engine // 数据合成引擎 +| |-- corpus_generators.py // 从文本采样或随机生成语料 +| |-- predictors.py // 调用网络生成数据 +| |-- style_samplers.py // 采样风格图片 +| |-- synthesisers.py // 调度各个模块,合成数据 +| |-- text_drawers.py // 生成标准文字图片,用作输入 +| `-- writers.py // 将合成的图片和标签写入本地目录 +|-- examples // 示例文件 +| |-- corpus +| | `-- example.txt +| |-- image_list.txt +| `-- style_images +| |-- 1.jpg +| `-- 2.jpg +|-- fonts // 字体文件 +| |-- ch_standard.ttf +| |-- en_standard.ttf +| `-- ko_standard.ttf +|-- tools // 程序入口 +| |-- __init__.py +| |-- synth_dataset.py // 批量合成数据 +| `-- synth_image.py // 合成单张图片 +`-- utils // 其他基础功能模块 + |-- config.py + |-- load_params.py + |-- logging.py + |-- math_functions.py + `-- sys_funcs.py +``` diff --git a/StyleText/__init__.py b/StyleText/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/StyleText/arch/__init__.py b/StyleText/arch/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/StyleText/arch/base_module.py b/StyleText/arch/base_module.py new file mode 100644 index 0000000000000000000000000000000000000000..da2b6b834c6a86b1c3efeb5cef4cb9d02e44e405 --- /dev/null +++ b/StyleText/arch/base_module.py @@ -0,0 +1,255 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle +import paddle.nn as nn + +from arch.spectral_norm import spectral_norm + + +class CBN(nn.Layer): + def __init__(self, + name, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + use_bias=False, + norm_layer=None, + act=None, + act_attr=None): + super(CBN, self).__init__() + if use_bias: + bias_attr = paddle.ParamAttr(name=name + "_bias") + else: + bias_attr = None + self._conv = paddle.nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + weight_attr=paddle.ParamAttr(name=name + "_weights"), + bias_attr=bias_attr) + if norm_layer: + self._norm_layer = getattr(paddle.nn, norm_layer)( + num_features=out_channels, name=name + "_bn") + else: + self._norm_layer = None + if act: + if act_attr: + self._act = getattr(paddle.nn, act)(**act_attr, + name=name + "_" + act) + else: + self._act = getattr(paddle.nn, act)(name=name + "_" + act) + else: + self._act = None + + def forward(self, x): + out = self._conv(x) + if self._norm_layer: + out = self._norm_layer(out) + if self._act: + out = self._act(out) + return out + + +class SNConv(nn.Layer): + def __init__(self, + name, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + use_bias=False, + norm_layer=None, + act=None, + act_attr=None): + super(SNConv, self).__init__() + if use_bias: + bias_attr = paddle.ParamAttr(name=name + "_bias") + else: + bias_attr = None + self._sn_conv = spectral_norm( + paddle.nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + weight_attr=paddle.ParamAttr(name=name + "_weights"), + bias_attr=bias_attr)) + if norm_layer: + self._norm_layer = getattr(paddle.nn, norm_layer)( + num_features=out_channels, name=name + "_bn") + else: + self._norm_layer = None + if act: + if act_attr: + self._act = getattr(paddle.nn, act)(**act_attr, + name=name + "_" + act) + else: + self._act = getattr(paddle.nn, act)(name=name + "_" + act) + else: + self._act = None + + def forward(self, x): + out = self._sn_conv(x) + if self._norm_layer: + out = self._norm_layer(out) + if self._act: + out = self._act(out) + return out + + +class SNConvTranspose(nn.Layer): + def __init__(self, + name, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + output_padding=0, + dilation=1, + groups=1, + use_bias=False, + norm_layer=None, + act=None, + act_attr=None): + super(SNConvTranspose, self).__init__() + if use_bias: + bias_attr = paddle.ParamAttr(name=name + "_bias") + else: + bias_attr = None + self._sn_conv_transpose = spectral_norm( + paddle.nn.Conv2DTranspose( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + output_padding=output_padding, + dilation=dilation, + groups=groups, + weight_attr=paddle.ParamAttr(name=name + "_weights"), + bias_attr=bias_attr)) + if norm_layer: + self._norm_layer = getattr(paddle.nn, norm_layer)( + num_features=out_channels, name=name + "_bn") + else: + self._norm_layer = None + if act: + if act_attr: + self._act = getattr(paddle.nn, act)(**act_attr, + name=name + "_" + act) + else: + self._act = getattr(paddle.nn, act)(name=name + "_" + act) + else: + self._act = None + + def forward(self, x): + out = self._sn_conv_transpose(x) + if self._norm_layer: + out = self._norm_layer(out) + if self._act: + out = self._act(out) + return out + + +class MiddleNet(nn.Layer): + def __init__(self, name, in_channels, mid_channels, out_channels, + use_bias): + super(MiddleNet, self).__init__() + self._sn_conv1 = SNConv( + name=name + "_sn_conv1", + in_channels=in_channels, + out_channels=mid_channels, + kernel_size=1, + use_bias=use_bias, + norm_layer=None, + act=None) + self._pad2d = nn.Pad2D(padding=[1, 1, 1, 1], mode="replicate") + self._sn_conv2 = SNConv( + name=name + "_sn_conv2", + in_channels=mid_channels, + out_channels=mid_channels, + kernel_size=3, + use_bias=use_bias) + self._sn_conv3 = SNConv( + name=name + "_sn_conv3", + in_channels=mid_channels, + out_channels=out_channels, + kernel_size=1, + use_bias=use_bias) + + def forward(self, x): + + sn_conv1 = self._sn_conv1.forward(x) + pad_2d = self._pad2d.forward(sn_conv1) + sn_conv2 = self._sn_conv2.forward(pad_2d) + sn_conv3 = self._sn_conv3.forward(sn_conv2) + return sn_conv3 + + +class ResBlock(nn.Layer): + def __init__(self, name, channels, norm_layer, use_dropout, use_dilation, + use_bias): + super(ResBlock, self).__init__() + if use_dilation: + padding_mat = [1, 1, 1, 1] + else: + padding_mat = [0, 0, 0, 0] + self._pad1 = nn.Pad2D(padding_mat, mode="replicate") + + self._sn_conv1 = SNConv( + name=name + "_sn_conv1", + in_channels=channels, + out_channels=channels, + kernel_size=3, + padding=0, + norm_layer=norm_layer, + use_bias=use_bias, + act="ReLU", + act_attr=None) + if use_dropout: + self._dropout = nn.Dropout(0.5) + else: + self._dropout = None + self._pad2 = nn.Pad2D([1, 1, 1, 1], mode="replicate") + self._sn_conv2 = SNConv( + name=name + "_sn_conv2", + in_channels=channels, + out_channels=channels, + kernel_size=3, + norm_layer=norm_layer, + use_bias=use_bias, + act="ReLU", + act_attr=None) + + def forward(self, x): + pad1 = self._pad1.forward(x) + sn_conv1 = self._sn_conv1.forward(pad1) + pad2 = self._pad2.forward(sn_conv1) + sn_conv2 = self._sn_conv2.forward(pad2) + return sn_conv2 + x diff --git a/StyleText/arch/decoder.py b/StyleText/arch/decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..36f07c5998a8f6b400997eacae0b44860312f432 --- /dev/null +++ b/StyleText/arch/decoder.py @@ -0,0 +1,251 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle +import paddle.nn as nn + +from arch.base_module import SNConv, SNConvTranspose, ResBlock + + +class Decoder(nn.Layer): + def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer, + act, act_attr, conv_block_dropout, conv_block_num, + conv_block_dilation, out_conv_act, out_conv_act_attr): + super(Decoder, self).__init__() + conv_blocks = [] + for i in range(conv_block_num): + conv_blocks.append( + ResBlock( + name="{}_conv_block_{}".format(name, i), + channels=encode_dim * 8, + norm_layer=norm_layer, + use_dropout=conv_block_dropout, + use_dilation=conv_block_dilation, + use_bias=use_bias)) + self.conv_blocks = nn.Sequential(*conv_blocks) + self._up1 = SNConvTranspose( + name=name + "_up1", + in_channels=encode_dim * 8, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up2 = SNConvTranspose( + name=name + "_up2", + in_channels=encode_dim * 4, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up3 = SNConvTranspose( + name=name + "_up3", + in_channels=encode_dim * 2, + out_channels=encode_dim, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate") + self._out_conv = SNConv( + name=name + "_out_conv", + in_channels=encode_dim, + out_channels=out_channels, + kernel_size=3, + use_bias=use_bias, + norm_layer=None, + act=out_conv_act, + act_attr=out_conv_act_attr) + + def forward(self, x): + if isinstance(x, (list, tuple)): + x = paddle.concat(x, axis=1) + output_dict = dict() + output_dict["conv_blocks"] = self.conv_blocks.forward(x) + output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"]) + output_dict["up2"] = self._up2.forward(output_dict["up1"]) + output_dict["up3"] = self._up3.forward(output_dict["up2"]) + output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"]) + output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"]) + return output_dict + + +class DecoderUnet(nn.Layer): + def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer, + act, act_attr, conv_block_dropout, conv_block_num, + conv_block_dilation, out_conv_act, out_conv_act_attr): + super(DecoderUnet, self).__init__() + conv_blocks = [] + for i in range(conv_block_num): + conv_blocks.append( + ResBlock( + name="{}_conv_block_{}".format(name, i), + channels=encode_dim * 8, + norm_layer=norm_layer, + use_dropout=conv_block_dropout, + use_dilation=conv_block_dilation, + use_bias=use_bias)) + self._conv_blocks = nn.Sequential(*conv_blocks) + self._up1 = SNConvTranspose( + name=name + "_up1", + in_channels=encode_dim * 8, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up2 = SNConvTranspose( + name=name + "_up2", + in_channels=encode_dim * 8, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up3 = SNConvTranspose( + name=name + "_up3", + in_channels=encode_dim * 4, + out_channels=encode_dim, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate") + self._out_conv = SNConv( + name=name + "_out_conv", + in_channels=encode_dim, + out_channels=out_channels, + kernel_size=3, + use_bias=use_bias, + norm_layer=None, + act=out_conv_act, + act_attr=out_conv_act_attr) + + def forward(self, x, y, feature2, feature1): + output_dict = dict() + output_dict["conv_blocks"] = self._conv_blocks( + paddle.concat( + (x, y), axis=1)) + output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"]) + output_dict["up2"] = self._up2.forward( + paddle.concat( + (output_dict["up1"], feature2), axis=1)) + output_dict["up3"] = self._up3.forward( + paddle.concat( + (output_dict["up2"], feature1), axis=1)) + output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"]) + output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"]) + return output_dict + + +class SingleDecoder(nn.Layer): + def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer, + act, act_attr, conv_block_dropout, conv_block_num, + conv_block_dilation, out_conv_act, out_conv_act_attr): + super(SingleDecoder, self).__init__() + conv_blocks = [] + for i in range(conv_block_num): + conv_blocks.append( + ResBlock( + name="{}_conv_block_{}".format(name, i), + channels=encode_dim * 4, + norm_layer=norm_layer, + use_dropout=conv_block_dropout, + use_dilation=conv_block_dilation, + use_bias=use_bias)) + self._conv_blocks = nn.Sequential(*conv_blocks) + self._up1 = SNConvTranspose( + name=name + "_up1", + in_channels=encode_dim * 4, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up2 = SNConvTranspose( + name=name + "_up2", + in_channels=encode_dim * 8, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up3 = SNConvTranspose( + name=name + "_up3", + in_channels=encode_dim * 4, + out_channels=encode_dim, + kernel_size=3, + stride=2, + padding=1, + output_padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate") + self._out_conv = SNConv( + name=name + "_out_conv", + in_channels=encode_dim, + out_channels=out_channels, + kernel_size=3, + use_bias=use_bias, + norm_layer=None, + act=out_conv_act, + act_attr=out_conv_act_attr) + + def forward(self, x, feature2, feature1): + output_dict = dict() + output_dict["conv_blocks"] = self._conv_blocks.forward(x) + output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"]) + output_dict["up2"] = self._up2.forward( + paddle.concat( + (output_dict["up1"], feature2), axis=1)) + output_dict["up3"] = self._up3.forward( + paddle.concat( + (output_dict["up2"], feature1), axis=1)) + output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"]) + output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"]) + return output_dict diff --git a/StyleText/arch/encoder.py b/StyleText/arch/encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..b884cda2934477082a1ed98c94e33b736d1f96b4 --- /dev/null +++ b/StyleText/arch/encoder.py @@ -0,0 +1,186 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle +import paddle.nn as nn + +from arch.base_module import SNConv, SNConvTranspose, ResBlock + + +class Encoder(nn.Layer): + def __init__(self, name, in_channels, encode_dim, use_bias, norm_layer, + act, act_attr, conv_block_dropout, conv_block_num, + conv_block_dilation): + super(Encoder, self).__init__() + self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate") + self._in_conv = SNConv( + name=name + "_in_conv", + in_channels=in_channels, + out_channels=encode_dim, + kernel_size=7, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down1 = SNConv( + name=name + "_down1", + in_channels=encode_dim, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down2 = SNConv( + name=name + "_down2", + in_channels=encode_dim * 2, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down3 = SNConv( + name=name + "_down3", + in_channels=encode_dim * 4, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + conv_blocks = [] + for i in range(conv_block_num): + conv_blocks.append( + ResBlock( + name="{}_conv_block_{}".format(name, i), + channels=encode_dim * 4, + norm_layer=norm_layer, + use_dropout=conv_block_dropout, + use_dilation=conv_block_dilation, + use_bias=use_bias)) + self._conv_blocks = nn.Sequential(*conv_blocks) + + def forward(self, x): + out_dict = dict() + x = self._pad2d(x) + out_dict["in_conv"] = self._in_conv.forward(x) + out_dict["down1"] = self._down1.forward(out_dict["in_conv"]) + out_dict["down2"] = self._down2.forward(out_dict["down1"]) + out_dict["down3"] = self._down3.forward(out_dict["down2"]) + out_dict["res_blocks"] = self._conv_blocks.forward(out_dict["down3"]) + return out_dict + + +class EncoderUnet(nn.Layer): + def __init__(self, name, in_channels, encode_dim, use_bias, norm_layer, + act, act_attr): + super(EncoderUnet, self).__init__() + self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate") + self._in_conv = SNConv( + name=name + "_in_conv", + in_channels=in_channels, + out_channels=encode_dim, + kernel_size=7, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down1 = SNConv( + name=name + "_down1", + in_channels=encode_dim, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down2 = SNConv( + name=name + "_down2", + in_channels=encode_dim * 2, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down3 = SNConv( + name=name + "_down3", + in_channels=encode_dim * 2, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._down4 = SNConv( + name=name + "_down4", + in_channels=encode_dim * 2, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up1 = SNConvTranspose( + name=name + "_up1", + in_channels=encode_dim * 2, + out_channels=encode_dim * 2, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + self._up2 = SNConvTranspose( + name=name + "_up2", + in_channels=encode_dim * 4, + out_channels=encode_dim * 4, + kernel_size=3, + stride=2, + padding=1, + use_bias=use_bias, + norm_layer=norm_layer, + act=act, + act_attr=act_attr) + + def forward(self, x): + output_dict = dict() + x = self._pad2d(x) + output_dict['in_conv'] = self._in_conv.forward(x) + output_dict['down1'] = self._down1.forward(output_dict['in_conv']) + output_dict['down2'] = self._down2.forward(output_dict['down1']) + output_dict['down3'] = self._down3.forward(output_dict['down2']) + output_dict['down4'] = self._down4.forward(output_dict['down3']) + output_dict['up1'] = self._up1.forward(output_dict['down4']) + output_dict['up2'] = self._up2.forward( + paddle.concat( + (output_dict['down3'], output_dict['up1']), axis=1)) + output_dict['concat'] = paddle.concat( + (output_dict['down2'], output_dict['up2']), axis=1) + return output_dict diff --git a/StyleText/arch/spectral_norm.py b/StyleText/arch/spectral_norm.py new file mode 100644 index 0000000000000000000000000000000000000000..21d0afc8d4a8fd4e2262db5c8461d6ffc3dadd45 --- /dev/null +++ b/StyleText/arch/spectral_norm.py @@ -0,0 +1,150 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +def normal_(x, mean=0., std=1.): + temp_value = paddle.normal(mean, std, shape=x.shape) + x.set_value(temp_value) + return x + + +class SpectralNorm(object): + def __init__(self, name='weight', n_power_iterations=1, dim=0, eps=1e-12): + self.name = name + self.dim = dim + if n_power_iterations <= 0: + raise ValueError('Expected n_power_iterations to be positive, but ' + 'got n_power_iterations={}'.format( + n_power_iterations)) + self.n_power_iterations = n_power_iterations + self.eps = eps + + def reshape_weight_to_matrix(self, weight): + weight_mat = weight + if self.dim != 0: + # transpose dim to front + weight_mat = weight_mat.transpose([ + self.dim, + * [d for d in range(weight_mat.dim()) if d != self.dim] + ]) + + height = weight_mat.shape[0] + + return weight_mat.reshape([height, -1]) + + def compute_weight(self, module, do_power_iteration): + weight = getattr(module, self.name + '_orig') + u = getattr(module, self.name + '_u') + v = getattr(module, self.name + '_v') + weight_mat = self.reshape_weight_to_matrix(weight) + + if do_power_iteration: + with paddle.no_grad(): + for _ in range(self.n_power_iterations): + v.set_value( + F.normalize( + paddle.matmul( + weight_mat, + u, + transpose_x=True, + transpose_y=False), + axis=0, + epsilon=self.eps, )) + + u.set_value( + F.normalize( + paddle.matmul(weight_mat, v), + axis=0, + epsilon=self.eps, )) + if self.n_power_iterations > 0: + u = u.clone() + v = v.clone() + + sigma = paddle.dot(u, paddle.mv(weight_mat, v)) + weight = weight / sigma + return weight + + def remove(self, module): + with paddle.no_grad(): + weight = self.compute_weight(module, do_power_iteration=False) + delattr(module, self.name) + delattr(module, self.name + '_u') + delattr(module, self.name + '_v') + delattr(module, self.name + '_orig') + + module.add_parameter(self.name, weight.detach()) + + def __call__(self, module, inputs): + setattr( + module, + self.name, + self.compute_weight( + module, do_power_iteration=module.training)) + + @staticmethod + def apply(module, name, n_power_iterations, dim, eps): + for k, hook in module._forward_pre_hooks.items(): + if isinstance(hook, SpectralNorm) and hook.name == name: + raise RuntimeError( + "Cannot register two spectral_norm hooks on " + "the same parameter {}".format(name)) + + fn = SpectralNorm(name, n_power_iterations, dim, eps) + weight = module._parameters[name] + + with paddle.no_grad(): + weight_mat = fn.reshape_weight_to_matrix(weight) + h, w = weight_mat.shape + + # randomly initialize u and v + u = module.create_parameter([h]) + u = normal_(u, 0., 1.) + v = module.create_parameter([w]) + v = normal_(v, 0., 1.) + u = F.normalize(u, axis=0, epsilon=fn.eps) + v = F.normalize(v, axis=0, epsilon=fn.eps) + + # delete fn.name form parameters, otherwise you can not set attribute + del module._parameters[fn.name] + module.add_parameter(fn.name + "_orig", weight) + # still need to assign weight back as fn.name because all sorts of + # things may assume that it exists, e.g., when initializing weights. + # However, we can't directly assign as it could be an Parameter and + # gets added as a parameter. Instead, we register weight * 1.0 as a plain + # attribute. + setattr(module, fn.name, weight * 1.0) + module.register_buffer(fn.name + "_u", u) + module.register_buffer(fn.name + "_v", v) + + module.register_forward_pre_hook(fn) + return fn + + +def spectral_norm(module, + name='weight', + n_power_iterations=1, + eps=1e-12, + dim=None): + + if dim is None: + if isinstance(module, (nn.Conv1DTranspose, nn.Conv2DTranspose, + nn.Conv3DTranspose, nn.Linear)): + dim = 1 + else: + dim = 0 + SpectralNorm.apply(module, name, n_power_iterations, dim, eps) + return module diff --git a/StyleText/arch/style_text_rec.py b/StyleText/arch/style_text_rec.py new file mode 100644 index 0000000000000000000000000000000000000000..599927ce3edefc90f14191ef3d29b1221355867e --- /dev/null +++ b/StyleText/arch/style_text_rec.py @@ -0,0 +1,285 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle +import paddle.nn as nn + +from arch.base_module import MiddleNet, ResBlock +from arch.encoder import Encoder +from arch.decoder import Decoder, DecoderUnet, SingleDecoder +from utils.load_params import load_dygraph_pretrain +from utils.logging import get_logger + + +class StyleTextRec(nn.Layer): + def __init__(self, config): + super(StyleTextRec, self).__init__() + self.logger = get_logger() + self.text_generator = TextGenerator(config["Predictor"][ + "text_generator"]) + self.bg_generator = BgGeneratorWithMask(config["Predictor"][ + "bg_generator"]) + self.fusion_generator = FusionGeneratorSimple(config["Predictor"][ + "fusion_generator"]) + bg_generator_pretrain = config["Predictor"]["bg_generator"]["pretrain"] + text_generator_pretrain = config["Predictor"]["text_generator"][ + "pretrain"] + fusion_generator_pretrain = config["Predictor"]["fusion_generator"][ + "pretrain"] + load_dygraph_pretrain( + self.bg_generator, + self.logger, + path=bg_generator_pretrain, + load_static_weights=False) + load_dygraph_pretrain( + self.text_generator, + self.logger, + path=text_generator_pretrain, + load_static_weights=False) + load_dygraph_pretrain( + self.fusion_generator, + self.logger, + path=fusion_generator_pretrain, + load_static_weights=False) + + def forward(self, style_input, text_input): + text_gen_output = self.text_generator.forward(style_input, text_input) + fake_text = text_gen_output["fake_text"] + fake_sk = text_gen_output["fake_sk"] + bg_gen_output = self.bg_generator.forward(style_input) + bg_encode_feature = bg_gen_output["bg_encode_feature"] + bg_decode_feature1 = bg_gen_output["bg_decode_feature1"] + bg_decode_feature2 = bg_gen_output["bg_decode_feature2"] + fake_bg = bg_gen_output["fake_bg"] + + fusion_gen_output = self.fusion_generator.forward(fake_text, fake_bg) + fake_fusion = fusion_gen_output["fake_fusion"] + return { + "fake_fusion": fake_fusion, + "fake_text": fake_text, + "fake_sk": fake_sk, + "fake_bg": fake_bg, + } + + +class TextGenerator(nn.Layer): + def __init__(self, config): + super(TextGenerator, self).__init__() + name = config["module_name"] + encode_dim = config["encode_dim"] + norm_layer = config["norm_layer"] + conv_block_dropout = config["conv_block_dropout"] + conv_block_num = config["conv_block_num"] + conv_block_dilation = config["conv_block_dilation"] + if norm_layer == "InstanceNorm2D": + use_bias = True + else: + use_bias = False + self.encoder_text = Encoder( + name=name + "_encoder_text", + in_channels=3, + encode_dim=encode_dim, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation) + self.encoder_style = Encoder( + name=name + "_encoder_style", + in_channels=3, + encode_dim=encode_dim, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation) + self.decoder_text = Decoder( + name=name + "_decoder_text", + encode_dim=encode_dim, + out_channels=int(encode_dim / 2), + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation, + out_conv_act="Tanh", + out_conv_act_attr=None) + self.decoder_sk = Decoder( + name=name + "_decoder_sk", + encode_dim=encode_dim, + out_channels=1, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation, + out_conv_act="Sigmoid", + out_conv_act_attr=None) + + self.middle = MiddleNet( + name=name + "_middle_net", + in_channels=int(encode_dim / 2) + 1, + mid_channels=encode_dim, + out_channels=3, + use_bias=use_bias) + + def forward(self, style_input, text_input): + style_feature = self.encoder_style.forward(style_input)["res_blocks"] + text_feature = self.encoder_text.forward(text_input)["res_blocks"] + fake_c_temp = self.decoder_text.forward([text_feature, + style_feature])["out_conv"] + fake_sk = self.decoder_sk.forward([text_feature, + style_feature])["out_conv"] + fake_text = self.middle(paddle.concat((fake_c_temp, fake_sk), axis=1)) + return {"fake_sk": fake_sk, "fake_text": fake_text} + + +class BgGeneratorWithMask(nn.Layer): + def __init__(self, config): + super(BgGeneratorWithMask, self).__init__() + name = config["module_name"] + encode_dim = config["encode_dim"] + norm_layer = config["norm_layer"] + conv_block_dropout = config["conv_block_dropout"] + conv_block_num = config["conv_block_num"] + conv_block_dilation = config["conv_block_dilation"] + self.output_factor = config.get("output_factor", 1.0) + + if norm_layer == "InstanceNorm2D": + use_bias = True + else: + use_bias = False + + self.encoder_bg = Encoder( + name=name + "_encoder_bg", + in_channels=3, + encode_dim=encode_dim, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation) + + self.decoder_bg = SingleDecoder( + name=name + "_decoder_bg", + encode_dim=encode_dim, + out_channels=3, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation, + out_conv_act="Tanh", + out_conv_act_attr=None) + + self.decoder_mask = Decoder( + name=name + "_decoder_mask", + encode_dim=encode_dim // 2, + out_channels=1, + use_bias=use_bias, + norm_layer=norm_layer, + act="ReLU", + act_attr=None, + conv_block_dropout=conv_block_dropout, + conv_block_num=conv_block_num, + conv_block_dilation=conv_block_dilation, + out_conv_act="Sigmoid", + out_conv_act_attr=None) + + self.middle = MiddleNet( + name=name + "_middle_net", + in_channels=3 + 1, + mid_channels=encode_dim, + out_channels=3, + use_bias=use_bias) + + def forward(self, style_input): + encode_bg_output = self.encoder_bg(style_input) + decode_bg_output = self.decoder_bg(encode_bg_output["res_blocks"], + encode_bg_output["down2"], + encode_bg_output["down1"]) + + fake_c_temp = decode_bg_output["out_conv"] + fake_bg_mask = self.decoder_mask.forward(encode_bg_output[ + "res_blocks"])["out_conv"] + fake_bg = self.middle( + paddle.concat( + (fake_c_temp, fake_bg_mask), axis=1)) + return { + "bg_encode_feature": encode_bg_output["res_blocks"], + "bg_decode_feature1": decode_bg_output["up1"], + "bg_decode_feature2": decode_bg_output["up2"], + "fake_bg": fake_bg, + "fake_bg_mask": fake_bg_mask, + } + + +class FusionGeneratorSimple(nn.Layer): + def __init__(self, config): + super(FusionGeneratorSimple, self).__init__() + name = config["module_name"] + encode_dim = config["encode_dim"] + norm_layer = config["norm_layer"] + conv_block_dropout = config["conv_block_dropout"] + conv_block_dilation = config["conv_block_dilation"] + if norm_layer == "InstanceNorm2D": + use_bias = True + else: + use_bias = False + + self._conv = nn.Conv2D( + in_channels=6, + out_channels=encode_dim, + kernel_size=3, + stride=1, + padding=1, + groups=1, + weight_attr=paddle.ParamAttr(name=name + "_conv_weights"), + bias_attr=False) + + self._res_block = ResBlock( + name="{}_conv_block".format(name), + channels=encode_dim, + norm_layer=norm_layer, + use_dropout=conv_block_dropout, + use_dilation=conv_block_dilation, + use_bias=use_bias) + + self._reduce_conv = nn.Conv2D( + in_channels=encode_dim, + out_channels=3, + kernel_size=3, + stride=1, + padding=1, + groups=1, + weight_attr=paddle.ParamAttr(name=name + "_reduce_conv_weights"), + bias_attr=False) + + def forward(self, fake_text, fake_bg): + fake_concat = paddle.concat((fake_text, fake_bg), axis=1) + fake_concat_tmp = self._conv(fake_concat) + output_res = self._res_block(fake_concat_tmp) + fake_fusion = self._reduce_conv(output_res) + return {"fake_fusion": fake_fusion} diff --git a/StyleText/configs/config.yml b/StyleText/configs/config.yml new file mode 100644 index 0000000000000000000000000000000000000000..3b10b3d2761a4aa40c28abe10134a2f276e1af9d --- /dev/null +++ b/StyleText/configs/config.yml @@ -0,0 +1,54 @@ +Global: + output_num: 10 + output_dir: output_data + use_gpu: false + image_height: 32 + image_width: 320 +TextDrawer: + fonts: + en: fonts/en_standard.ttf + ch: fonts/ch_standard.ttf + ko: fonts/ko_standard.ttf +Predictor: + method: StyleTextRecPredictor + algorithm: StyleTextRec + scale: 0.00392156862745098 + mean: + - 0.5 + - 0.5 + - 0.5 + std: + - 0.5 + - 0.5 + - 0.5 + expand_result: false + bg_generator: + pretrain: style_text_models/bg_generator + module_name: bg_generator + generator_type: BgGeneratorWithMask + encode_dim: 64 + norm_layer: null + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true + output_factor: 1.05 + text_generator: + pretrain: style_text_models/text_generator + module_name: text_generator + generator_type: TextGenerator + encode_dim: 64 + norm_layer: InstanceNorm2D + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true + fusion_generator: + pretrain: style_text_models/fusion_generator + module_name: fusion_generator + generator_type: FusionGeneratorSimple + encode_dim: 64 + norm_layer: null + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true +Writer: + method: SimpleWriter diff --git a/StyleText/configs/dataset_config.yml b/StyleText/configs/dataset_config.yml new file mode 100644 index 0000000000000000000000000000000000000000..e047489e5d82e4c561a835ccf4de1b385e4f5c08 --- /dev/null +++ b/StyleText/configs/dataset_config.yml @@ -0,0 +1,64 @@ +Global: + output_num: 10 + output_dir: output_data + use_gpu: false + image_height: 32 + image_width: 320 + standard_font: fonts/en_standard.ttf +TextDrawer: + fonts: + en: fonts/en_standard.ttf + ch: fonts/ch_standard.ttf + ko: fonts/ko_standard.ttf +StyleSampler: + method: DatasetSampler + image_home: examples + label_file: examples/image_list.txt + with_label: true +CorpusGenerator: + method: FileCorpus + language: ch + corpus_file: examples/corpus/example.txt +Predictor: + method: StyleTextRecPredictor + algorithm: StyleTextRec + scale: 0.00392156862745098 + mean: + - 0.5 + - 0.5 + - 0.5 + std: + - 0.5 + - 0.5 + - 0.5 + expand_result: false + bg_generator: + pretrain: models/style_text_rec/bg_generator + module_name: bg_generator + generator_type: BgGeneratorWithMask + encode_dim: 64 + norm_layer: null + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true + output_factor: 1.05 + text_generator: + pretrain: models/style_text_rec/text_generator + module_name: text_generator + generator_type: TextGenerator + encode_dim: 64 + norm_layer: InstanceNorm2D + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true + fusion_generator: + pretrain: models/style_text_rec/fusion_generator + module_name: fusion_generator + generator_type: FusionGeneratorSimple + encode_dim: 64 + norm_layer: null + conv_block_num: 4 + conv_block_dropout: false + conv_block_dilation: true +Writer: + method: SimpleWriter diff --git a/StyleText/doc/images/1.png b/StyleText/doc/images/1.png new file mode 100644 index 0000000000000000000000000000000000000000..8f7574ba2f723ac82241fec6dc52828713a5d293 Binary files /dev/null and b/StyleText/doc/images/1.png differ diff --git a/StyleText/doc/images/10.png b/StyleText/doc/images/10.png new file mode 100644 index 0000000000000000000000000000000000000000..6123cff27c6b7a89abc5cd318e4bf30a1aec767c Binary files /dev/null and b/StyleText/doc/images/10.png differ diff --git a/StyleText/doc/images/11.png b/StyleText/doc/images/11.png new file mode 100644 index 0000000000000000000000000000000000000000..ebfa09331984ac8bed285f631c7db2df4c0e62a6 Binary files /dev/null and b/StyleText/doc/images/11.png differ diff --git a/StyleText/doc/images/2.png b/StyleText/doc/images/2.png new file mode 100644 index 0000000000000000000000000000000000000000..ce9bf4712a551b9d9d27eae00f9c7b9b5845d8b3 Binary files /dev/null and b/StyleText/doc/images/2.png differ diff --git a/StyleText/doc/images/3.png b/StyleText/doc/images/3.png new file mode 100644 index 0000000000000000000000000000000000000000..0fb73a31f58c1c476cf84f3c507f0af6523385f4 Binary files /dev/null and b/StyleText/doc/images/3.png differ diff --git a/StyleText/doc/images/4.jpg b/StyleText/doc/images/4.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d881074a13a5320035e739b91ce4b98f78191301 Binary files /dev/null and b/StyleText/doc/images/4.jpg differ diff --git a/StyleText/doc/images/5.png b/StyleText/doc/images/5.png new file mode 100644 index 0000000000000000000000000000000000000000..b7d28b7a1ed9519a284487be620c4f180b612aa8 Binary files /dev/null and b/StyleText/doc/images/5.png differ diff --git a/StyleText/doc/images/6.png b/StyleText/doc/images/6.png new file mode 100644 index 0000000000000000000000000000000000000000..75af7275a009ec01c4bc0903a57d559daf93101b Binary files /dev/null and b/StyleText/doc/images/6.png differ diff --git a/StyleText/doc/images/7.jpg b/StyleText/doc/images/7.jpg new file mode 100644 index 0000000000000000000000000000000000000000..887094fb3a005e4649bf355fe9e61acf628fceca Binary files /dev/null and b/StyleText/doc/images/7.jpg differ diff --git a/StyleText/doc/images/8.jpg b/StyleText/doc/images/8.jpg new file mode 100644 index 0000000000000000000000000000000000000000..234d7f33e7a3a29201fda2f8b844128c8e730e06 Binary files /dev/null and b/StyleText/doc/images/8.jpg differ diff --git a/StyleText/doc/images/9.png b/StyleText/doc/images/9.png new file mode 100644 index 0000000000000000000000000000000000000000..179780250a563537188b336069b91c2472291a16 Binary files /dev/null and b/StyleText/doc/images/9.png differ diff --git a/StyleText/engine/__init__.py b/StyleText/engine/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/StyleText/engine/corpus_generators.py b/StyleText/engine/corpus_generators.py new file mode 100644 index 0000000000000000000000000000000000000000..186d15f36d16971d9e7700535b50b1f724a80fe7 --- /dev/null +++ b/StyleText/engine/corpus_generators.py @@ -0,0 +1,66 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import random + +from utils.logging import get_logger + + +class FileCorpus(object): + def __init__(self, config): + self.logger = get_logger() + self.logger.info("using FileCorpus") + + self.char_list = " 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + + corpus_file = config["CorpusGenerator"]["corpus_file"] + self.language = config["CorpusGenerator"]["language"] + with open(corpus_file, 'r') as f: + corpus_raw = f.read() + self.corpus_list = corpus_raw.split("\n")[:-1] + assert len(self.corpus_list) > 0 + random.shuffle(self.corpus_list) + self.index = 0 + + def generate(self, corpus_length=0): + if self.index >= len(self.corpus_list): + self.index = 0 + random.shuffle(self.corpus_list) + corpus = self.corpus_list[self.index] + if corpus_length != 0: + corpus = corpus[0:corpus_length] + if corpus_length > len(corpus): + self.logger.warning("generated corpus is shorter than expected.") + self.index += 1 + return self.language, corpus + + +class EnNumCorpus(object): + def __init__(self, config): + self.logger = get_logger() + self.logger.info("using NumberCorpus") + self.num_list = "0123456789" + self.en_char_list = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + self.height = config["Global"]["image_height"] + self.max_width = config["Global"]["image_width"] + + def generate(self, corpus_length=0): + corpus = "" + if corpus_length == 0: + corpus_length = random.randint(5, 15) + for i in range(corpus_length): + if random.random() < 0.2: + corpus += "{}".format(random.choice(self.en_char_list)) + else: + corpus += "{}".format(random.choice(self.num_list)) + return "en", corpus diff --git a/StyleText/engine/predictors.py b/StyleText/engine/predictors.py new file mode 100644 index 0000000000000000000000000000000000000000..d9f4afe4a18bd1e0a96ac37aa0359f26434ddb3d --- /dev/null +++ b/StyleText/engine/predictors.py @@ -0,0 +1,115 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import numpy as np +import cv2 +import math +import paddle + +from arch import style_text_rec +from utils.sys_funcs import check_gpu +from utils.logging import get_logger + + +class StyleTextRecPredictor(object): + def __init__(self, config): + algorithm = config['Predictor']['algorithm'] + assert algorithm in ["StyleTextRec" + ], "Generator {} not supported.".format(algorithm) + use_gpu = config["Global"]['use_gpu'] + check_gpu(use_gpu) + self.logger = get_logger() + self.generator = getattr(style_text_rec, algorithm)(config) + self.height = config["Global"]["image_height"] + self.width = config["Global"]["image_width"] + self.scale = config["Predictor"]["scale"] + self.mean = config["Predictor"]["mean"] + self.std = config["Predictor"]["std"] + self.expand_result = config["Predictor"]["expand_result"] + + def predict(self, style_input, text_input): + style_input = self.rep_style_input(style_input, text_input) + tensor_style_input = self.preprocess(style_input) + tensor_text_input = self.preprocess(text_input) + style_text_result = self.generator.forward(tensor_style_input, + tensor_text_input) + fake_fusion = self.postprocess(style_text_result["fake_fusion"]) + fake_text = self.postprocess(style_text_result["fake_text"]) + fake_sk = self.postprocess(style_text_result["fake_sk"]) + fake_bg = self.postprocess(style_text_result["fake_bg"]) + bbox = self.get_text_boundary(fake_text) + if bbox: + left, right, top, bottom = bbox + fake_fusion = fake_fusion[top:bottom, left:right, :] + fake_text = fake_text[top:bottom, left:right, :] + fake_sk = fake_sk[top:bottom, left:right, :] + fake_bg = fake_bg[top:bottom, left:right, :] + + # fake_fusion = self.crop_by_text(img_fake_fusion, img_fake_text) + return { + "fake_fusion": fake_fusion, + "fake_text": fake_text, + "fake_sk": fake_sk, + "fake_bg": fake_bg, + } + + def preprocess(self, img): + img = (img.astype('float32') * self.scale - self.mean) / self.std + img_height, img_width, channel = img.shape + assert channel == 3, "Please use an rgb image." + ratio = img_width / float(img_height) + if math.ceil(self.height * ratio) > self.width: + resized_w = self.width + else: + resized_w = int(math.ceil(self.height * ratio)) + img = cv2.resize(img, (resized_w, self.height)) + + new_img = np.zeros([self.height, self.width, 3]).astype('float32') + new_img[:, 0:resized_w, :] = img + img = new_img.transpose((2, 0, 1)) + img = img[np.newaxis, :, :, :] + return paddle.to_tensor(img) + + def postprocess(self, tensor): + img = tensor.numpy()[0] + img = img.transpose((1, 2, 0)) + img = (img * self.std + self.mean) / self.scale + img = np.maximum(img, 0.0) + img = np.minimum(img, 255.0) + img = img.astype('uint8') + return img + + def rep_style_input(self, style_input, text_input): + rep_num = int(1.2 * (text_input.shape[1] / text_input.shape[0]) / + (style_input.shape[1] / style_input.shape[0])) + 1 + style_input = np.tile(style_input, reps=[1, rep_num, 1]) + max_width = int(self.width / self.height * style_input.shape[0]) + style_input = style_input[:, :max_width, :] + return style_input + + def get_text_boundary(self, text_img): + img_height = text_img.shape[0] + img_width = text_img.shape[1] + bounder = 3 + text_canny_img = cv2.Canny(text_img, 10, 20) + edge_num_h = text_canny_img.sum(axis=0) + no_zero_list_h = np.where(edge_num_h > 0)[0] + edge_num_w = text_canny_img.sum(axis=1) + no_zero_list_w = np.where(edge_num_w > 0)[0] + if len(no_zero_list_h) == 0 or len(no_zero_list_w) == 0: + return None + left = max(no_zero_list_h[0] - bounder, 0) + right = min(no_zero_list_h[-1] + bounder, img_width) + top = max(no_zero_list_w[0] - bounder, 0) + bottom = min(no_zero_list_w[-1] + bounder, img_height) + return [left, right, top, bottom] diff --git a/StyleText/engine/style_samplers.py b/StyleText/engine/style_samplers.py new file mode 100644 index 0000000000000000000000000000000000000000..e171d58db7527ffb37972524991e58ac59c6bb0a --- /dev/null +++ b/StyleText/engine/style_samplers.py @@ -0,0 +1,62 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import numpy as np +import random +import cv2 + + +class DatasetSampler(object): + def __init__(self, config): + self.image_home = config["StyleSampler"]["image_home"] + label_file = config["StyleSampler"]["label_file"] + self.dataset_with_label = config["StyleSampler"]["with_label"] + self.height = config["Global"]["image_height"] + self.index = 0 + with open(label_file, "r") as f: + label_raw = f.read() + self.path_label_list = label_raw.split("\n")[:-1] + assert len(self.path_label_list) > 0 + random.shuffle(self.path_label_list) + + def sample(self): + if self.index >= len(self.path_label_list): + random.shuffle(self.path_label_list) + self.index = 0 + if self.dataset_with_label: + path_label = self.path_label_list[self.index] + rel_image_path, label = path_label.split('\t') + else: + rel_image_path = self.path_label_list[self.index] + label = None + img_path = "{}/{}".format(self.image_home, rel_image_path) + image = cv2.imread(img_path) + origin_height = image.shape[0] + ratio = self.height / origin_height + width = int(image.shape[1] * ratio) + height = int(image.shape[0] * ratio) + image = cv2.resize(image, (width, height)) + + self.index += 1 + if label: + return {"image": image, "label": label} + else: + return {"image": image} + + +def duplicate_image(image, width): + image_width = image.shape[1] + dup_num = width // image_width + 1 + image = np.tile(image, reps=[1, dup_num, 1]) + cropped_image = image[:, :width, :] + return cropped_image diff --git a/StyleText/engine/synthesisers.py b/StyleText/engine/synthesisers.py new file mode 100644 index 0000000000000000000000000000000000000000..177e3e049a695ecd06f5d2271f21336dd4eff997 --- /dev/null +++ b/StyleText/engine/synthesisers.py @@ -0,0 +1,71 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os + +from utils.config import ArgsParser, load_config, override_config +from utils.logging import get_logger +from engine import style_samplers, corpus_generators, text_drawers, predictors, writers + + +class ImageSynthesiser(object): + def __init__(self): + self.FLAGS = ArgsParser().parse_args() + self.config = load_config(self.FLAGS.config) + self.config = override_config(self.config, options=self.FLAGS.override) + self.output_dir = self.config["Global"]["output_dir"] + if not os.path.exists(self.output_dir): + os.mkdir(self.output_dir) + self.logger = get_logger( + log_file='{}/predict.log'.format(self.output_dir)) + + self.text_drawer = text_drawers.StdTextDrawer(self.config) + + predictor_method = self.config["Predictor"]["method"] + assert predictor_method is not None + self.predictor = getattr(predictors, predictor_method)(self.config) + + def synth_image(self, corpus, style_input, language="en"): + corpus, text_input = self.text_drawer.draw_text(corpus, language) + synth_result = self.predictor.predict(style_input, text_input) + return synth_result + + +class DatasetSynthesiser(ImageSynthesiser): + def __init__(self): + super(DatasetSynthesiser, self).__init__() + self.tag = self.FLAGS.tag + self.output_num = self.config["Global"]["output_num"] + corpus_generator_method = self.config["CorpusGenerator"]["method"] + self.corpus_generator = getattr(corpus_generators, + corpus_generator_method)(self.config) + + style_sampler_method = self.config["StyleSampler"]["method"] + assert style_sampler_method is not None + self.style_sampler = style_samplers.DatasetSampler(self.config) + self.writer = writers.SimpleWriter(self.config, self.tag) + + def synth_dataset(self): + for i in range(self.output_num): + style_data = self.style_sampler.sample() + style_input = style_data["image"] + corpus_language, text_input_label = self.corpus_generator.generate( + ) + text_input_label, text_input = self.text_drawer.draw_text( + text_input_label, corpus_language) + + synth_result = self.predictor.predict(style_input, text_input) + fake_fusion = synth_result["fake_fusion"] + self.writer.save_image(fake_fusion, text_input_label) + self.writer.save_label() + self.writer.merge_label() diff --git a/StyleText/engine/text_drawers.py b/StyleText/engine/text_drawers.py new file mode 100644 index 0000000000000000000000000000000000000000..8aaac06ec50816bb6e2774972644c0a7dfb908c6 --- /dev/null +++ b/StyleText/engine/text_drawers.py @@ -0,0 +1,57 @@ +from PIL import Image, ImageDraw, ImageFont +import numpy as np +from utils.logging import get_logger + + +class StdTextDrawer(object): + def __init__(self, config): + self.logger = get_logger() + self.max_width = config["Global"]["image_width"] + self.char_list = " 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" + self.height = config["Global"]["image_height"] + self.font_dict = {} + self.load_fonts(config["TextDrawer"]["fonts"]) + self.support_languages = list(self.font_dict) + + def load_fonts(self, fonts_config): + for language in fonts_config: + font_path = fonts_config[language] + font_height = self.get_valid_height(font_path) + font = ImageFont.truetype(font_path, font_height) + self.font_dict[language] = font + + def get_valid_height(self, font_path): + font = ImageFont.truetype(font_path, self.height - 4) + _, font_height = font.getsize(self.char_list) + if font_height <= self.height - 4: + return self.height - 4 + else: + return int((self.height - 4)**2 / font_height) + + def draw_text(self, corpus, language="en", crop=True): + if language not in self.support_languages: + self.logger.warning( + "language {} not supported, use en instead.".format(language)) + language = "en" + if crop: + width = min(self.max_width, len(corpus) * self.height) + 4 + else: + width = len(corpus) * self.height + 4 + bg = Image.new("RGB", (width, self.height), color=(127, 127, 127)) + draw = ImageDraw.Draw(bg) + + char_x = 2 + font = self.font_dict[language] + for i, char_i in enumerate(corpus): + char_size = font.getsize(char_i)[0] + draw.text((char_x, 2), char_i, fill=(0, 0, 0), font=font) + char_x += char_size + if char_x >= width: + corpus = corpus[0:i + 1] + self.logger.warning("corpus length exceed limit: {}".format( + corpus)) + break + + text_input = np.array(bg).astype(np.uint8) + text_input = text_input[:, 0:char_x, :] + return corpus, text_input diff --git a/StyleText/engine/writers.py b/StyleText/engine/writers.py new file mode 100644 index 0000000000000000000000000000000000000000..0df75e7234812c3fbab69ceed50040aa16cd83bc --- /dev/null +++ b/StyleText/engine/writers.py @@ -0,0 +1,71 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import cv2 +import glob + +from utils.logging import get_logger + + +class SimpleWriter(object): + def __init__(self, config, tag): + self.logger = get_logger() + self.output_dir = config["Global"]["output_dir"] + self.counter = 0 + self.label_dict = {} + self.tag = tag + self.label_file_index = 0 + + def save_image(self, image, text_input_label): + image_home = os.path.join(self.output_dir, "images", self.tag) + if not os.path.exists(image_home): + os.makedirs(image_home) + + image_path = os.path.join(image_home, "{}.png".format(self.counter)) + # todo support continue synth + cv2.imwrite(image_path, image) + self.logger.info("generate image: {}".format(image_path)) + + image_name = os.path.join(self.tag, "{}.png".format(self.counter)) + self.label_dict[image_name] = text_input_label + + self.counter += 1 + if not self.counter % 100: + self.save_label() + + def save_label(self): + label_raw = "" + label_home = os.path.join(self.output_dir, "label") + if not os.path.exists(label_home): + os.mkdir(label_home) + for image_path in self.label_dict: + label = self.label_dict[image_path] + label_raw += "{}\t{}\n".format(image_path, label) + label_file_path = os.path.join(label_home, + "{}_label.txt".format(self.tag)) + with open(label_file_path, "w") as f: + f.write(label_raw) + self.label_file_index += 1 + + def merge_label(self): + label_raw = "" + label_file_regex = os.path.join(self.output_dir, "label", + "*_label.txt") + label_file_list = glob.glob(label_file_regex) + for label_file_i in label_file_list: + with open(label_file_i, "r") as f: + label_raw += f.read() + label_file_path = os.path.join(self.output_dir, "label.txt") + with open(label_file_path, "w") as f: + f.write(label_raw) diff --git a/StyleText/examples/corpus/example.txt b/StyleText/examples/corpus/example.txt new file mode 100644 index 0000000000000000000000000000000000000000..78451cc3d92a3353f5de0c74c2cb0a06e6197653 --- /dev/null +++ b/StyleText/examples/corpus/example.txt @@ -0,0 +1,2 @@ +PaddleOCR +飞桨文字识别 diff --git a/StyleText/examples/image_list.txt b/StyleText/examples/image_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..b07be0353516f7822e4994d5dcddcd85766035dc --- /dev/null +++ b/StyleText/examples/image_list.txt @@ -0,0 +1,2 @@ +style_images/1.jpg NEATNESS +style_images/2.jpg 锁店君和宾馆 diff --git a/StyleText/examples/style_images/1.jpg b/StyleText/examples/style_images/1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4da7838e5d3c711cdeab60df63ae4c7af7b475ae Binary files /dev/null and b/StyleText/examples/style_images/1.jpg differ diff --git a/StyleText/examples/style_images/2.jpg b/StyleText/examples/style_images/2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0ab932b1d9348ab41ad8ea153740e86e6477fdeb Binary files /dev/null and b/StyleText/examples/style_images/2.jpg differ diff --git a/StyleText/fonts/ch_standard.ttf b/StyleText/fonts/ch_standard.ttf new file mode 100755 index 0000000000000000000000000000000000000000..cdb7fa5907587b8dbe0ad1da7442d3e4f8bd7488 Binary files /dev/null and b/StyleText/fonts/ch_standard.ttf differ diff --git a/StyleText/fonts/en_standard.ttf b/StyleText/fonts/en_standard.ttf new file mode 100755 index 0000000000000000000000000000000000000000..2e31d02424ed50b9e05c19b5d82500699a6edbb0 Binary files /dev/null and b/StyleText/fonts/en_standard.ttf differ diff --git a/StyleText/fonts/ko_standard.ttf b/StyleText/fonts/ko_standard.ttf new file mode 100755 index 0000000000000000000000000000000000000000..982bd879c27c731d2601ea8da988784e06f4b5b3 Binary files /dev/null and b/StyleText/fonts/ko_standard.ttf differ diff --git a/StyleText/tools/__init__.py b/StyleText/tools/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/StyleText/tools/synth_dataset.py b/StyleText/tools/synth_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..4a0e6d5e1f701c49558cfe1ea1df61e9b4180a89 --- /dev/null +++ b/StyleText/tools/synth_dataset.py @@ -0,0 +1,23 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from engine.synthesisers import DatasetSynthesiser + + +def synth_dataset(): + dataset_synthesiser = DatasetSynthesiser() + dataset_synthesiser.synth_dataset() + + +if __name__ == '__main__': + synth_dataset() diff --git a/StyleText/tools/synth_image.py b/StyleText/tools/synth_image.py new file mode 100644 index 0000000000000000000000000000000000000000..7b4827b825e4a28dd1fb2eba722d23e64e8ce0be --- /dev/null +++ b/StyleText/tools/synth_image.py @@ -0,0 +1,82 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import cv2 +import sys +import glob + +from utils.config import ArgsParser +from engine.synthesisers import ImageSynthesiser + +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) + + +def synth_image(): + args = ArgsParser().parse_args() + image_synthesiser = ImageSynthesiser() + style_image_path = args.style_image + img = cv2.imread(style_image_path) + text_corpus = args.text_corpus + language = args.language + + synth_result = image_synthesiser.synth_image(text_corpus, img, language) + fake_fusion = synth_result["fake_fusion"] + fake_text = synth_result["fake_text"] + fake_bg = synth_result["fake_bg"] + cv2.imwrite("fake_fusion.jpg", fake_fusion) + cv2.imwrite("fake_text.jpg", fake_text) + cv2.imwrite("fake_bg.jpg", fake_bg) + + +def batch_synth_images(): + image_synthesiser = ImageSynthesiser() + + corpus_file = "../StyleTextRec_data/test_20201208/test_text_list.txt" + style_data_dir = "../StyleTextRec_data/test_20201208/style_images/" + save_path = "./output_data/" + corpus_list = [] + with open(corpus_file, "rb") as fin: + lines = fin.readlines() + for line in lines: + substr = line.decode("utf-8").strip("\n").split("\t") + corpus_list.append(substr) + style_img_list = glob.glob("{}/*.jpg".format(style_data_dir)) + corpus_num = len(corpus_list) + style_img_num = len(style_img_list) + for cno in range(corpus_num): + for sno in range(style_img_num): + corpus, lang = corpus_list[cno] + style_img_path = style_img_list[sno] + img = cv2.imread(style_img_path) + synth_result = image_synthesiser.synth_image(corpus, img, lang) + fake_fusion = synth_result["fake_fusion"] + fake_text = synth_result["fake_text"] + fake_bg = synth_result["fake_bg"] + for tp in range(2): + if tp == 0: + prefix = "%s/c%d_s%d_" % (save_path, cno, sno) + else: + prefix = "%s/s%d_c%d_" % (save_path, sno, cno) + cv2.imwrite("%s_fake_fusion.jpg" % prefix, fake_fusion) + cv2.imwrite("%s_fake_text.jpg" % prefix, fake_text) + cv2.imwrite("%s_fake_bg.jpg" % prefix, fake_bg) + cv2.imwrite("%s_input_style.jpg" % prefix, img) + print(cno, corpus_num, sno, style_img_num) + + +if __name__ == '__main__': + # batch_synth_images() + synth_image() diff --git a/StyleText/utils/__init__.py b/StyleText/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/StyleText/utils/config.py b/StyleText/utils/config.py new file mode 100644 index 0000000000000000000000000000000000000000..b2f8a618a838db361da4867e00df8dcd619f9f3d --- /dev/null +++ b/StyleText/utils/config.py @@ -0,0 +1,224 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import yaml +import os +from argparse import ArgumentParser, RawDescriptionHelpFormatter + + +def override(dl, ks, v): + """ + Recursively replace dict of list + + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + #assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + logger.warning('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + assert ks[0] in dl, ( + '({}) doesn\'t exist in {}, a new dict field is invalid'. + format(ks[0], dl)) + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ( + "option({}) should be a str".format(opt)) + assert "=" in opt, ( + "option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + + return config + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument("-c", "--config", help="configuration file to use") + self.add_argument( + "-t", "--tag", default="0", help="tag for marking worker") + self.add_argument( + '-o', + '--override', + action='append', + default=[], + help='config options to be overridden') + self.add_argument( + "--style_image", default="examples/style_images/1.jpg", help="tag for marking worker") + self.add_argument( + "--text_corpus", default="PaddleOCR", help="tag for marking worker") + self.add_argument( + "--language", default="en", help="tag for marking worker") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + return args + + +def load_config(file_path): + """ + Load config from yml/yaml file. + Args: + file_path (str): Path of the config file to be loaded. + Returns: config + """ + ext = os.path.splitext(file_path)[1] + assert ext in ['.yml', '.yaml'], "only support yaml files for now" + with open(file_path, 'rb') as f: + config = yaml.load(f, Loader=yaml.Loader) + + return config + + +def gen_config(): + base_config = { + "Global": { + "algorithm": "SRNet", + "use_gpu": True, + "start_epoch": 1, + "stage1_epoch_num": 100, + "stage2_epoch_num": 100, + "log_smooth_window": 20, + "print_batch_step": 2, + "save_model_dir": "./output/SRNet", + "use_visualdl": False, + "save_epoch_step": 10, + "vgg_pretrain": "./pretrained/VGG19_pretrained", + "vgg_load_static_pretrain": True + }, + "Architecture": { + "model_type": "data_aug", + "algorithm": "SRNet", + "net_g": { + "name": "srnet_net_g", + "encode_dim": 64, + "norm": "batch", + "use_dropout": False, + "init_type": "xavier", + "init_gain": 0.02, + "use_dilation": 1 + }, + # input_nc, ndf, netD, + # n_layers_D=3, norm='instance', use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_id='cuda:0' + "bg_discriminator": { + "name": "srnet_bg_discriminator", + "input_nc": 6, + "ndf": 64, + "netD": "basic", + "norm": "none", + "init_type": "xavier", + }, + "fusion_discriminator": { + "name": "srnet_fusion_discriminator", + "input_nc": 6, + "ndf": 64, + "netD": "basic", + "norm": "none", + "init_type": "xavier", + } + }, + "Loss": { + "lamb": 10, + "perceptual_lamb": 1, + "muvar_lamb": 50, + "style_lamb": 500 + }, + "Optimizer": { + "name": "Adam", + "learning_rate": { + "name": "lambda", + "lr": 0.0002, + "lr_decay_iters": 50 + }, + "beta1": 0.5, + "beta2": 0.999, + }, + "Train": { + "batch_size_per_card": 8, + "num_workers_per_card": 4, + "dataset": { + "delimiter": "\t", + "data_dir": "/", + "label_file": "tmp/label.txt", + "transforms": [{ + "DecodeImage": { + "to_rgb": True, + "to_np": False, + "channel_first": False + } + }, { + "NormalizeImage": { + "scale": 1. / 255., + "mean": [0.485, 0.456, 0.406], + "std": [0.229, 0.224, 0.225], + "order": None + } + }, { + "ToCHWImage": None + }] + } + } + } + with open("config.yml", "w") as f: + yaml.dump(base_config, f) + + +if __name__ == '__main__': + gen_config() diff --git a/StyleText/utils/load_params.py b/StyleText/utils/load_params.py new file mode 100644 index 0000000000000000000000000000000000000000..be0561363eb21483d267ff6557c1d453d330c5f8 --- /dev/null +++ b/StyleText/utils/load_params.py @@ -0,0 +1,27 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import paddle + +__all__ = ['load_dygraph_pretrain'] + + +def load_dygraph_pretrain(model, logger, path=None, load_static_weights=False): + if not os.path.exists(path + '.pdparams'): + raise ValueError("Model pretrain path {} does not " + "exists.".format(path)) + param_state_dict = paddle.load(path + '.pdparams') + model.set_state_dict(param_state_dict) + logger.info("load pretrained model from {}".format(path)) + return diff --git a/StyleText/utils/logging.py b/StyleText/utils/logging.py new file mode 100644 index 0000000000000000000000000000000000000000..f700fe26bc9bfda21d39a0bddd89180f5de442ab --- /dev/null +++ b/StyleText/utils/logging.py @@ -0,0 +1,65 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import sys +import logging +import functools +import paddle.distributed as dist + +logger_initialized = {} + + +@functools.lru_cache() +def get_logger(name='srnet', log_file=None, log_level=logging.INFO): + """Initialize and get a logger by name. + If the logger has not been initialized, this method will initialize the + logger by adding one or two handlers, otherwise the initialized logger will + be directly returned. During initialization, a StreamHandler will always be + added. If `log_file` is specified a FileHandler will also be added. + Args: + name (str): Logger name. + log_file (str | None): The log filename. If specified, a FileHandler + will be added to the logger. + log_level (int): The logger level. Note that only the process of + rank 0 is affected, and other processes will set the level to + "Error" thus be silent most of the time. + Returns: + logging.Logger: The expected logger. + """ + logger = logging.getLogger(name) + if name in logger_initialized: + return logger + for logger_name in logger_initialized: + if name.startswith(logger_name): + return logger + + formatter = logging.Formatter( + '[%(asctime)s] %(name)s %(levelname)s: %(message)s', + datefmt="%Y/%m/%d %H:%M:%S") + + stream_handler = logging.StreamHandler(stream=sys.stdout) + stream_handler.setFormatter(formatter) + logger.addHandler(stream_handler) + if log_file is not None and dist.get_rank() == 0: + log_file_folder = os.path.split(log_file)[0] + os.makedirs(log_file_folder, exist_ok=True) + file_handler = logging.FileHandler(log_file, 'a') + file_handler.setFormatter(formatter) + logger.addHandler(file_handler) + if dist.get_rank() == 0: + logger.setLevel(log_level) + else: + logger.setLevel(logging.ERROR) + logger_initialized[name] = True + return logger diff --git a/StyleText/utils/math_functions.py b/StyleText/utils/math_functions.py new file mode 100644 index 0000000000000000000000000000000000000000..3dc8d9160f8941f825d7aade79afc99035577bca --- /dev/null +++ b/StyleText/utils/math_functions.py @@ -0,0 +1,45 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import paddle + + +def compute_mean_covariance(img): + batch_size = img.shape[0] + channel_num = img.shape[1] + height = img.shape[2] + width = img.shape[3] + num_pixels = height * width + + # batch_size * channel_num * 1 * 1 + mu = img.mean(2, keepdim=True).mean(3, keepdim=True) + + # batch_size * channel_num * num_pixels + img_hat = img - mu.expand_as(img) + img_hat = img_hat.reshape([batch_size, channel_num, num_pixels]) + # batch_size * num_pixels * channel_num + img_hat_transpose = img_hat.transpose([0, 2, 1]) + # batch_size * channel_num * channel_num + covariance = paddle.bmm(img_hat, img_hat_transpose) + covariance = covariance / num_pixels + + return mu, covariance + + +def dice_coefficient(y_true_cls, y_pred_cls, training_mask): + eps = 1e-5 + intersection = paddle.sum(y_true_cls * y_pred_cls * training_mask) + union = paddle.sum(y_true_cls * training_mask) + paddle.sum( + y_pred_cls * training_mask) + eps + loss = 1. - (2 * intersection / union) + return loss diff --git a/StyleText/utils/sys_funcs.py b/StyleText/utils/sys_funcs.py new file mode 100644 index 0000000000000000000000000000000000000000..203d91d83630e41fbe931a055e81e65cf0fb2e7d --- /dev/null +++ b/StyleText/utils/sys_funcs.py @@ -0,0 +1,67 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys +import os +import errno +import paddle + + +def get_check_global_params(mode): + check_params = [ + 'use_gpu', 'max_text_length', 'image_shape', 'image_shape', + 'character_type', 'loss_type' + ] + if mode == "train_eval": + check_params = check_params + [ + 'train_batch_size_per_card', 'test_batch_size_per_card' + ] + elif mode == "test": + check_params = check_params + ['test_batch_size_per_card'] + return check_params + + +def check_gpu(use_gpu): + """ + Log error and exit when set use_gpu=true in paddlepaddle + cpu version. + """ + err = "Config use_gpu cannot be set as true while you are " \ + "using paddlepaddle cpu version ! \nPlease try: \n" \ + "\t1. Install paddlepaddle-gpu to run model on GPU \n" \ + "\t2. Set use_gpu as false in config file to run " \ + "model on CPU" + if use_gpu: + try: + if not paddle.is_compiled_with_cuda(): + print(err) + sys.exit(1) + except: + print("Fail to check gpu state.") + sys.exit(1) + + +def _mkdir_if_not_exist(path, logger): + """ + mkdir if not exists, ignore the exception when multiprocess mkdir together + """ + if not os.path.exists(path): + try: + os.makedirs(path) + except OSError as e: + if e.errno == errno.EEXIST and os.path.isdir(path): + logger.warning( + 'be happy if some process has already created {}'.format( + path)) + else: + raise OSError('Failed to mkdir {}'.format(path)) diff --git a/configs/det/det_r50_vd_sast_icdar15.yml b/configs/det/det_r50_vd_sast_icdar15.yml old mode 100644 new mode 100755 index 7ca93cecec3dbf7152c1d509c4d8ca614dec388f..a989bc8fc754ca88e3bff2de2a6db1060301fdd5 --- a/configs/det/det_r50_vd_sast_icdar15.yml +++ b/configs/det/det_r50_vd_sast_icdar15.yml @@ -61,8 +61,8 @@ Train: dataset: name: SimpleDataSet data_dir: ./train_data/ - label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] - data_ratio_list: [0.5, 0.5] + label_file_list: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt] + ratio_list: [0.1, 0.45, 0.3, 0.15] transforms: - DecodeImage: # load image img_mode: BGR diff --git a/configs/det/det_r50_vd_sast_totaltext.yml b/configs/det/det_r50_vd_sast_totaltext.yml old mode 100644 new mode 100755 index a9a037c8bd940d04d08d0dae3a90139ead68cdba..257ecf2490bdde6280cf4b20bb66f2457b4b833b --- a/configs/det/det_r50_vd_sast_totaltext.yml +++ b/configs/det/det_r50_vd_sast_totaltext.yml @@ -60,8 +60,8 @@ Metric: Train: dataset: name: SimpleDataSet - label_file_list: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt] - ratio_list: [0.1, 0.45, 0.3, 0.15] + label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] + data_ratio_list: [0.5, 0.5] transforms: - DecodeImage: # load image img_mode: BGR diff --git a/configs/rec/rec_icdar15_train.yml b/configs/rec/rec_icdar15_train.yml index 7efbd5cf0d963229a94aa43558589b828d17cbd0..3de0ce7741cc8086b41cd1f5b98f6a8bbced90fa 100644 --- a/configs/rec/rec_icdar15_train.yml +++ b/configs/rec/rec_icdar15_train.yml @@ -36,12 +36,13 @@ Architecture: algorithm: CRNN Transform: Backbone: - name: ResNet - layers: 34 + name: MobileNetV3 + scale: 0.5 + model_name: large Neck: name: SequenceEncoder encoder_type: rnn - hidden_size: 256 + hidden_size: 96 Head: name: CTCHead fc_decay: 0 diff --git a/deploy/hubserving/ocr_cls/params.py b/deploy/hubserving/ocr_cls/params.py old mode 100644 new mode 100755 index bcdb2d6e3800c0ba7897b71f0b0999cafdc223af..72a7a10249176d86f75b5d3c3adae7f1021a75a8 --- a/deploy/hubserving/ocr_cls/params.py +++ b/deploy/hubserving/ocr_cls/params.py @@ -12,7 +12,7 @@ def read_params(): cfg = Config() #params for text classifier - cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/" + cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v2.0_cls_infer/" cfg.cls_image_shape = "3, 48, 192" cfg.label_list = ['0', '180'] cfg.cls_batch_num = 30 diff --git a/deploy/hubserving/ocr_det/params.py b/deploy/hubserving/ocr_det/params.py old mode 100644 new mode 100755 index 4d4a9fc27b727034d8185c82dad3e542659fd463..e50decbbc8ee604863c5965aa95bf1f79fa71d0a --- a/deploy/hubserving/ocr_det/params.py +++ b/deploy/hubserving/ocr_det/params.py @@ -13,7 +13,7 @@ def read_params(): #params for text detector cfg.det_algorithm = "DB" - cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/" + cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/" cfg.det_limit_side_len = 960 cfg.det_limit_type = 'max' @@ -27,16 +27,6 @@ def read_params(): # cfg.det_east_cover_thresh = 0.1 # cfg.det_east_nms_thresh = 0.2 - # #params for text recognizer - # cfg.rec_algorithm = "CRNN" - # cfg.rec_model_dir = "./inference/ch_det_mv3_crnn/" - - # cfg.rec_image_shape = "3, 32, 320" - # cfg.rec_char_type = 'ch' - # cfg.rec_batch_num = 30 - # cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" - # cfg.use_space_char = True - cfg.use_zero_copy_run = False cfg.use_pdserving = False diff --git a/deploy/hubserving/ocr_system/params.py b/deploy/hubserving/ocr_system/params.py old mode 100644 new mode 100755 index 1f6a07bcc0167e90564edab9c4719b9192233b4c..a0e1960b2857630780f6b34773d7760279f862a2 --- a/deploy/hubserving/ocr_system/params.py +++ b/deploy/hubserving/ocr_system/params.py @@ -13,7 +13,7 @@ def read_params(): #params for text detector cfg.det_algorithm = "DB" - cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/" + cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/" cfg.det_limit_side_len = 960 cfg.det_limit_type = 'max' @@ -29,7 +29,7 @@ def read_params(): #params for text recognizer cfg.rec_algorithm = "CRNN" - cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/" + cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v2.0_rec_infer/" cfg.rec_image_shape = "3, 32, 320" cfg.rec_char_type = 'ch' @@ -41,7 +41,7 @@ def read_params(): #params for text classifier cfg.use_angle_cls = True - cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/" + cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v2.0_cls_infer/" cfg.cls_image_shape = "3, 48, 192" cfg.label_list = ['0', '180'] cfg.cls_batch_num = 30 @@ -49,5 +49,6 @@ def read_params(): cfg.use_zero_copy_run = False cfg.use_pdserving = False + cfg.drop_score = 0.5 return cfg diff --git a/deploy/hubserving/readme.md b/deploy/hubserving/readme.md old mode 100644 new mode 100755 index f64bd372569f12ea52214e3e89927df0c859a17f..ce55b0f0da42b706cef30ab9b7a4c06f02e7c8eb --- a/deploy/hubserving/readme.md +++ b/deploy/hubserving/readme.md @@ -2,7 +2,7 @@ PaddleOCR提供2种服务部署方式: - 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用; -- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../deploy/pdserving/readme.md)。 +- (coming soon)基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../deploy/pdserving/readme.md)。 # 基于PaddleHub Serving的服务部署 @@ -33,11 +33,11 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple ``` ### 2. 下载推理模型 -安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认模型路径为: +安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v2.0版的超轻量模型,默认模型路径为: ``` -检测模型:./inference/ch_ppocr_mobile_v1.1_det_infer/ -识别模型:./inference/ch_ppocr_mobile_v1.1_rec_infer/ -方向分类器:./inference/ch_ppocr_mobile_v1.1_cls_infer/ +检测模型:./inference/ch_ppocr_mobile_v2.0_det_infer/ +识别模型:./inference/ch_ppocr_mobile_v2.0_rec_infer/ +方向分类器:./inference/ch_ppocr_mobile_v2.0_cls_infer/ ``` **模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。 diff --git a/deploy/hubserving/readme_en.md b/deploy/hubserving/readme_en.md old mode 100644 new mode 100755 index c6cf53413bc3eac45f933fead66356d1491cc60c..95223ffd82f8264d56158e8e7917c983b07f679d --- a/deploy/hubserving/readme_en.md +++ b/deploy/hubserving/readme_en.md @@ -2,7 +2,7 @@ English | [简体中文](readme.md) PaddleOCR provides 2 service deployment methods: - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial. -- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../deploy/pdserving/readme.md) for usage. +- (coming soon)Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../deploy/pdserving/readme.md) for usage. # Service deployment based on PaddleHub Serving @@ -34,11 +34,11 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple ``` ### 2. Download inference model -Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default model path is: +Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v2.0 is used, and the default model path is: ``` -detection model: ./inference/ch_ppocr_mobile_v1.1_det_infer/ -recognition model: ./inference/ch_ppocr_mobile_v1.1_rec_infer/ -text direction classifier: ./inference/ch_ppocr_mobile_v1.1_cls_infer/ +detection model: ./inference/ch_ppocr_mobile_v2.0_det_infer/ +recognition model: ./inference/ch_ppocr_mobile_v2.0_rec_infer/ +text direction classifier: ./inference/ch_ppocr_mobile_v2.0_cls_infer/ ``` **The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself. diff --git a/deploy/pdserving/det_local_server.py b/deploy/pdserving/det_local_server.py deleted file mode 100644 index eb7948daadd018810997bba78367e86aa3398e31..0000000000000000000000000000000000000000 --- a/deploy/pdserving/det_local_server.py +++ /dev/null @@ -1,79 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -import time -import re -import base64 - - -class OCRService(WebService): - def init_det(self): - self.det_preprocess = Sequential([ - ResizeByFactor(32, 960), Div(255), - Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( - (2, 0, 1)) - ]) - self.filter_func = FilterBoxes(10, 10) - self.post_func = DBPostProcess({ - "thresh": 0.3, - "box_thresh": 0.5, - "max_candidates": 1000, - "unclip_ratio": 1.5, - "min_size": 3 - }) - - def preprocess(self, feed=[], fetch=[]): - data = base64.b64decode(feed[0]["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - self.ori_h, self.ori_w, _ = im.shape - det_img = self.det_preprocess(im) - _, self.new_h, self.new_w = det_img.shape - return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"] - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - det_out = fetch_map["concat_1.tmp_0"] - ratio_list = [ - float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w - ] - dt_boxes_list = self.post_func(det_out, [ratio_list]) - dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w]) - return {"dt_boxes": dt_boxes.tolist()} - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_det_model") -ocr_service.init_det() -if sys.argv[1] == 'gpu': - ocr_service.set_gpus("0") - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) - ocr_service.run_debugger_service(gpu=True) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292) - ocr_service.run_debugger_service() -ocr_service.init_det() -ocr_service.run_web_service() diff --git a/deploy/pdserving/det_web_server.py b/deploy/pdserving/det_web_server.py deleted file mode 100644 index 14be74130dcb413c31a3e76c150d74f65575f451..0000000000000000000000000000000000000000 --- a/deploy/pdserving/det_web_server.py +++ /dev/null @@ -1,78 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -import time -import re -import base64 - - -class OCRService(WebService): - def init_det(self): - self.det_preprocess = Sequential([ - ResizeByFactor(32, 960), Div(255), - Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( - (2, 0, 1)) - ]) - self.filter_func = FilterBoxes(10, 10) - self.post_func = DBPostProcess({ - "thresh": 0.3, - "box_thresh": 0.5, - "max_candidates": 1000, - "unclip_ratio": 1.5, - "min_size": 3 - }) - - def preprocess(self, feed=[], fetch=[]): - data = base64.b64decode(feed[0]["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - self.ori_h, self.ori_w, _ = im.shape - det_img = self.det_preprocess(im) - _, self.new_h, self.new_w = det_img.shape - print(det_img) - return {"image": det_img}, ["concat_1.tmp_0"] - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - det_out = fetch_map["concat_1.tmp_0"] - ratio_list = [ - float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w - ] - dt_boxes_list = self.post_func(det_out, [ratio_list]) - dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w]) - return {"dt_boxes": dt_boxes.tolist()} - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_det_model") -if sys.argv[1] == 'gpu': - ocr_service.set_gpus("0") - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") -ocr_service.init_det() -ocr_service.run_rpc_service() -ocr_service.run_web_service() diff --git a/deploy/pdserving/ocr_local_server.py b/deploy/pdserving/ocr_local_server.py deleted file mode 100644 index de5b3d13f12afd4a84c5d46625682c42f418d6bb..0000000000000000000000000000000000000000 --- a/deploy/pdserving/ocr_local_server.py +++ /dev/null @@ -1,114 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -from paddle_serving_app.reader import OCRReader -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -from paddle_serving_app.local_predict import Debugger -import time -import re -import base64 - - -class OCRService(WebService): - def init_det_debugger(self, det_model_config): - self.det_preprocess = Sequential([ - ResizeByFactor(32, 960), Div(255), - Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( - (2, 0, 1)) - ]) - self.det_client = Debugger() - if sys.argv[1] == 'gpu': - self.det_client.load_model_config( - det_model_config, gpu=True, profile=False) - elif sys.argv[1] == 'cpu': - self.det_client.load_model_config( - det_model_config, gpu=False, profile=False) - self.ocr_reader = OCRReader() - - def preprocess(self, feed=[], fetch=[]): - data = base64.b64decode(feed[0]["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - ori_h, ori_w, _ = im.shape - det_img = self.det_preprocess(im) - _, new_h, new_w = det_img.shape - det_img = det_img[np.newaxis, :] - det_img = det_img.copy() - det_out = self.det_client.predict( - feed={"image": det_img}, fetch=["concat_1.tmp_0"]) - filter_func = FilterBoxes(10, 10) - post_func = DBPostProcess({ - "thresh": 0.3, - "box_thresh": 0.5, - "max_candidates": 1000, - "unclip_ratio": 1.5, - "min_size": 3 - }) - sorted_boxes = SortedBoxes() - ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w] - dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list]) - dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w]) - dt_boxes = sorted_boxes(dt_boxes) - get_rotate_crop_image = GetRotateCropImage() - img_list = [] - max_wh_ratio = 0 - for i, dtbox in enumerate(dt_boxes): - boximg = get_rotate_crop_image(im, dt_boxes[i]) - img_list.append(boximg) - h, w = boximg.shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - if len(img_list) == 0: - return [], [] - _, w, h = self.ocr_reader.resize_norm_img(img_list[0], - max_wh_ratio).shape - imgs = np.zeros((len(img_list), 3, w, h)).astype('float32') - for id, img in enumerate(img_list): - norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) - imgs[id] = norm_img - feed = {"image": imgs.copy()} - fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] - return feed, fetch - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) - res_lst = [] - for res in rec_res: - res_lst.append(res[0]) - res = {"res": res_lst} - return res - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_rec_model") -ocr_service.init_det_debugger(det_model_config="ocr_det_model") -if sys.argv[1] == 'gpu': - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) - ocr_service.run_debugger_service(gpu=True) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") - ocr_service.run_debugger_service() -ocr_service.run_web_service() diff --git a/deploy/pdserving/ocr_web_client.py b/deploy/pdserving/ocr_web_client.py deleted file mode 100644 index e2a92eb8ee4aa62059be184dd7e67237ed460f13..0000000000000000000000000000000000000000 --- a/deploy/pdserving/ocr_web_client.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -*- coding: utf-8 -*- - -import requests -import json -import cv2 -import base64 -import os, sys -import time - -def cv2_to_base64(image): - #data = cv2.imencode('.jpg', image)[1] - return base64.b64encode(image).decode( - 'utf8') #data.tostring()).decode('utf8') - -headers = {"Content-type": "application/json"} -url = "http://127.0.0.1:9292/ocr/prediction" -test_img_dir = "../../doc/imgs/" -for img_file in os.listdir(test_img_dir): - with open(os.path.join(test_img_dir, img_file), 'rb') as file: - image_data1 = file.read() - image = cv2_to_base64(image_data1) - data = {"feed": [{"image": image}], "fetch": ["res"]} - r = requests.post(url=url, headers=headers, data=json.dumps(data)) - print(r.json()) diff --git a/deploy/pdserving/ocr_web_server.py b/deploy/pdserving/ocr_web_server.py deleted file mode 100644 index 6c0de44661958a6425f57039261969551ff552c5..0000000000000000000000000000000000000000 --- a/deploy/pdserving/ocr_web_server.py +++ /dev/null @@ -1,105 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -from paddle_serving_app.reader import OCRReader -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -import time -import re -import base64 - - -class OCRService(WebService): - def init_det_client(self, det_port, det_client_config): - self.det_preprocess = Sequential([ - ResizeByFactor(32, 960), Div(255), - Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose( - (2, 0, 1)) - ]) - self.det_client = Client() - self.det_client.load_client_config(det_client_config) - self.det_client.connect(["127.0.0.1:{}".format(det_port)]) - self.ocr_reader = OCRReader() - - def preprocess(self, feed=[], fetch=[]): - data = base64.b64decode(feed[0]["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - ori_h, ori_w, _ = im.shape - det_img = self.det_preprocess(im) - det_out = self.det_client.predict( - feed={"image": det_img}, fetch=["concat_1.tmp_0"]) - _, new_h, new_w = det_img.shape - filter_func = FilterBoxes(10, 10) - post_func = DBPostProcess({ - "thresh": 0.3, - "box_thresh": 0.5, - "max_candidates": 1000, - "unclip_ratio": 1.5, - "min_size": 3 - }) - sorted_boxes = SortedBoxes() - ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w] - dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list]) - dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w]) - dt_boxes = sorted_boxes(dt_boxes) - get_rotate_crop_image = GetRotateCropImage() - feed_list = [] - img_list = [] - max_wh_ratio = 0 - for i, dtbox in enumerate(dt_boxes): - boximg = get_rotate_crop_image(im, dt_boxes[i]) - img_list.append(boximg) - h, w = boximg.shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for img in img_list: - norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) - feed = {"image": norm_img} - feed_list.append(feed) - fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] - return feed_list, fetch - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) - res_lst = [] - for res in rec_res: - res_lst.append(res[0]) - res = {"res": res_lst} - return res - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_rec_model") -if sys.argv[1] == 'gpu': - ocr_service.set_gpus("0") - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292) -ocr_service.init_det_client( - det_port=9293, - det_client_config="ocr_det_client/serving_client_conf.prototxt") -ocr_service.run_rpc_service() -ocr_service.run_web_service() diff --git a/deploy/pdserving/readme.md b/deploy/pdserving/readme.md deleted file mode 100644 index f9ad80b896be0be29e3a7bb17e4aa119af81d5c4..0000000000000000000000000000000000000000 --- a/deploy/pdserving/readme.md +++ /dev/null @@ -1,132 +0,0 @@ -# Paddle Serving 服务部署(Beta) - -本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。 - -## 快速启动服务 - -### 1. 准备环境 -我们先安装Paddle Serving相关组件 -我们推荐用户使用GPU来做Paddle Serving的OCR服务部署 - -**CUDA版本:9.0** - -**CUDNN版本:7.0** - -**操作系统版本:CentOS 6以上** - -**Python3操作指南:** -``` -#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线 -#GPU用户下载server包使用这个链接 -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py3-none-any.whl -python -m pip install paddle_serving_server_gpu-0.3.2-py3-none-any.whl -#CPU版本使用这个链接 -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py3-none-any.whl -python -m pip install paddle_serving_server-0.3.2-py3-none-any.whl -#客户端和App包使用以下链接(CPU,GPU通用) -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp36-none-any.whl -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py3-none-any.whl -python -m pip install paddle_serving_app-0.1.2-py3-none-any.whl paddle_serving_client-0.3.2-cp36-none-any.whl -``` - -**Python2操作指南:** -``` -#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线 -#GPU用户下载server包使用这个链接 -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py2-none-any.whl -python -m pip install paddle_serving_server_gpu-0.3.2-py2-none-any.whl -#CPU版本使用这个链接 -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py2-none-any.whl -python -m pip install paddle_serving_server-0.3.2-py2-none-any.whl - -#客户端和App包使用以下链接(CPU,GPU通用) -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py2-none-any.whl -wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp27-none-any.whl -python -m pip install paddle_serving_app-0.1.2-py2-none-any.whl paddle_serving_client-0.3.2-cp27-none-any.whl -``` - -### 2. 模型转换 -可以使用`paddle_serving_app`提供的模型,执行下列命令 -``` -python -m paddle_serving_app.package --get_model ocr_rec -tar -xzvf ocr_rec.tar.gz -python -m paddle_serving_app.package --get_model ocr_det -tar -xzvf ocr_det.tar.gz -``` -执行上述命令会下载`db_crnn_mobile`的模型,如果想要下载规模更大的`db_crnn_server`模型,可以在下载预测模型并解压之后。参考[如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md)。 - -我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在: - -``` -wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar -tar xf ch_rec_r34_vd_crnn_infer.tar -``` -因此我们按照Serving模型转换教程,运行下列python文件。 -``` -from paddle_serving_client.io import inference_model_to_serving -inference_model_dir = "ch_rec_r34_vd_crnn" -serving_client_dir = "serving_client_dir" -serving_server_dir = "serving_server_dir" -feed_var_names, fetch_var_names = inference_model_to_serving( - inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params") -``` -最终会在`serving_client_dir`和`serving_server_dir`生成客户端和服务端的模型配置。 - -### 3. 启动服务 -启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表: - -|版本|特点|适用场景| -|-|-|-| -|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况| -|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景| - -#### 方式1. 启动标准版服务 - -``` -# cpu,gpu启动二选一,以下是cpu启动 -python -m paddle_serving_server.serve --model ocr_det_model --port 9293 -python ocr_web_server.py cpu -# gpu启动 -python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0 -python ocr_web_server.py gpu -``` - -#### 方式2. 启动快速版服务 - -``` -# cpu,gpu启动二选一,以下是cpu启动 -python ocr_local_server.py cpu -# gpu启动 -python ocr_local_server.py gpu -``` - -## 发送预测请求 - -``` -python ocr_web_client.py -``` - -## 返回结果格式说明 - -返回结果是json格式 -``` -{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}} -``` -我们也可以打印结果json串中`res`字段的每一句话 -``` -土地整治与土壤修复研究中心 -华南农业大学1素图 -``` - -## 自定义修改服务逻辑 - -在`ocr_web_server.py`或是`ocr_local_server.py`当中的`preprocess`函数里面做了检测服务和识别服务的前处理,`postprocess`函数里面做了识别的后处理服务,可以在相应的函数中做修改。调用了`paddle_serving_app`库提供的常见CV模型的前处理/后处理库。 - -如果想要单独启动Paddle Serving的检测服务和识别服务,参见下列表格, 执行对应的脚本即可,并且在命令行参数注明用的CPU或是GPU来提供服务。 - -| 模型 | 标准版 | 快速版 | -| ---- | ----------------- | ------------------- | -| 检测 | det_web_server.py | det_local_server.py | -| 识别 | rec_web_server.py | rec_local_server.py | - -更多信息参见[Paddle Serving](https://github.com/PaddlePaddle/Serving) diff --git a/deploy/pdserving/rec_local_server.py b/deploy/pdserving/rec_local_server.py deleted file mode 100644 index ba021c1cd5054071eb115b3e6e9c64cb572ff871..0000000000000000000000000000000000000000 --- a/deploy/pdserving/rec_local_server.py +++ /dev/null @@ -1,79 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -from paddle_serving_app.reader import OCRReader -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -import time -import re -import base64 - - -class OCRService(WebService): - def init_rec(self): - self.ocr_reader = OCRReader() - - def preprocess(self, feed=[], fetch=[]): - img_list = [] - for feed_data in feed: - data = base64.b64decode(feed_data["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - img_list.append(im) - max_wh_ratio = 0 - for i, boximg in enumerate(img_list): - h, w = boximg.shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - _, w, h = self.ocr_reader.resize_norm_img(img_list[0], - max_wh_ratio).shape - imgs = np.zeros((len(img_list), 3, w, h)).astype('float32') - for i, img in enumerate(img_list): - norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) - imgs[i] = norm_img - feed = {"image": imgs.copy()} - fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] - return feed, fetch - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) - res_lst = [] - for res in rec_res: - res_lst.append(res[0]) - res = {"res": res_lst} - return res - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_rec_model") -ocr_service.init_rec() -if sys.argv[1] == 'gpu': - ocr_service.set_gpus("0") - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) - ocr_service.run_debugger_service(gpu=True) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") - ocr_service.run_debugger_service() -ocr_service.run_web_service() diff --git a/deploy/pdserving/rec_web_server.py b/deploy/pdserving/rec_web_server.py deleted file mode 100644 index 0f4e9f6d264ed602f387bfaf0303cd59af7823fa..0000000000000000000000000000000000000000 --- a/deploy/pdserving/rec_web_server.py +++ /dev/null @@ -1,77 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle_serving_client import Client -from paddle_serving_app.reader import OCRReader -import cv2 -import sys -import numpy as np -import os -from paddle_serving_client import Client -from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor -from paddle_serving_app.reader import Div, Normalize, Transpose -from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes -if sys.argv[1] == 'gpu': - from paddle_serving_server_gpu.web_service import WebService -elif sys.argv[1] == 'cpu': - from paddle_serving_server.web_service import WebService -import time -import re -import base64 - - -class OCRService(WebService): - def init_rec(self): - self.ocr_reader = OCRReader() - - def preprocess(self, feed=[], fetch=[]): - # TODO: to handle batch rec images - img_list = [] - for feed_data in feed: - data = base64.b64decode(feed_data["image"].encode('utf8')) - data = np.fromstring(data, np.uint8) - im = cv2.imdecode(data, cv2.IMREAD_COLOR) - img_list.append(im) - feed_list = [] - max_wh_ratio = 0 - for i, boximg in enumerate(img_list): - h, w = boximg.shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for img in img_list: - norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio) - feed = {"image": norm_img} - feed_list.append(feed) - fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] - return feed_list, fetch - - def postprocess(self, feed={}, fetch=[], fetch_map=None): - rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True) - res_lst = [] - for res in rec_res: - res_lst.append(res[0]) - res = {"res": res_lst} - return res - - -ocr_service = OCRService(name="ocr") -ocr_service.load_model_config("ocr_rec_model") -ocr_service.init_rec() -if sys.argv[1] == 'gpu': - ocr_service.set_gpus("0") - ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0) -elif sys.argv[1] == 'cpu': - ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu") -ocr_service.run_rpc_service() -ocr_service.run_web_service() diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md old mode 100644 new mode 100755 index 440b392227182f50396f0e66ca8250bc6bfc1c0d..a23bfcb112d54719298709d5e253f609ec9dea74 --- a/doc/doc_ch/algorithm_overview.md +++ b/doc/doc_ch/algorithm_overview.md @@ -1,6 +1,6 @@ ## 算法介绍 -本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。 +本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。 - [1.文本检测算法](#文本检测算法) - [2.文本识别算法](#文本识别算法) @@ -9,25 +9,25 @@ ### 1.文本检测算法 PaddleOCR开源的文本检测算法列表: -- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐) +- [x] DB([paper]( https://arxiv.org/abs/1911.08947) )(ppocr推荐) - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] SAST([paper](https://arxiv.org/abs/1908.05498)) 在ICDAR2015文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](link)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](link)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](link)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](link)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](link))| +| --- | --- | --- | --- | --- | --- | +|EAST|ResNet50_vd|88.76%|81.36%|84.90%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| +|EAST|MobileNetV3|78.24%|79.15%|78.69%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| +|DB|ResNet50_vd|86.41%|78.72%|82.38%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| +|DB|MobileNetV3|77.29%|73.08%|75.12%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| +|SAST|ResNet50_vd|91.83%|81.80%|86.52%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| 在Total-text文本检测公开数据集上,算法效果如下: |模型|骨干网络|precision|recall|Hmean|下载链接| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](link)| +| --- | --- | --- | --- | --- | --- | +|SAST|ResNet50_vd|89.05%|76.80%|82.47%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)| **说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi) @@ -38,9 +38,9 @@ PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训 ### 2.文本识别算法 PaddleOCR基于动态图开源的文本识别算法列表: -- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐) +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) coming soon - [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon - [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon @@ -48,12 +48,9 @@ PaddleOCR基于动态图开源的文本识别算法列表: |模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| |-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](link)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](link)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](link)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](link)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](link)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](link)| - +|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)| +|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)| +|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)| +|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)| PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。 diff --git a/doc/doc_ch/angle_class.md b/doc/doc_ch/angle_class.md index d6a36b86b476f15b7b34f67e888ceb781b2ed7a0..3f2027b9ddff331b3259ed62c7c7b43e686efcce 100644 --- a/doc/doc_ch/angle_class.md +++ b/doc/doc_ch/angle_class.md @@ -62,9 +62,9 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。 *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* ``` -# GPU训练 支持单卡,多卡训练,通过selected_gpus指定卡号 +# GPU训练 支持单卡,多卡训练,通过 '--gpus' 指定卡号,如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU # 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可 -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml ``` - 数据增强 @@ -74,7 +74,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入 默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。 训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考: -[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) +[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) [randaugment.py](../../ppocr/data/imaug/randaugment.py) *由于OpenCV的兼容性问题,扰动操作暂时只支持linux* diff --git a/doc/doc_ch/detection.md b/doc/doc_ch/detection.md index a31907015de3c6a119764917893ade29a0ff5493..08b94a9c838cb265a1e6145e29db676bf52c7de7 100644 --- a/doc/doc_ch/detection.md +++ b/doc/doc_ch/detection.md @@ -76,8 +76,8 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model # 单机单卡训练 mv3_db 模型 python3 tools/train.py -c configs/det/det_mv3_db.yml \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ -# 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID; -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ +# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID;如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ ``` @@ -107,17 +107,13 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall 运行如下代码,根据配置文件`det_db_mv3.yml`中`save_res_path`指定的测试集检测结果文件,计算评估指标。 -评估时设置后处理参数`box_thresh=0.6`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 -```shell -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` +评估时设置后处理参数`box_thresh=0.5`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。 - -比如: ```shell -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.5 PostProcess.unclip_ratio=1.5 ``` + * 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 ## 测试检测效果 diff --git a/doc/doc_ch/inference.md b/doc/doc_ch/inference.md old mode 100644 new mode 100755 index 663533c492ab5dc0bd22cc79bd95c9d1d194d854..aea7ff1de242dec75cae26a2bf3d6838d7559882 --- a/doc/doc_ch/inference.md +++ b/doc/doc_ch/inference.md @@ -22,9 +22,8 @@ inference 模型(`paddle.jit.save`保存的模型) - [三、文本识别模型推理](#文本识别模型推理) - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理) - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理) - - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理) - - [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理) - - [5. 多语言模型的推理](#多语言模型的推理) + - [3. 自定义文本识别字典的推理](#自定义文本识别字典的推理) + - [4. 多语言模型的推理](#多语言模型的推理) - [四、方向分类模型推理](#方向识别模型推理) - [1. 方向分类模型推理](#方向分类模型推理) @@ -129,24 +128,32 @@ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_mo 超轻量中文检测模型推理,可以执行如下命令: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" +# 下载超轻量中文检测模型: +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar +tar xf ch_ppocr_mobile_v2.0_det_infer.tar +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/" ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](../imgs_results/det_res_2.jpg) +![](../imgs_results/det_res_22.jpg) -通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制限,`limit_type=max`为限制长边长度<`det_limit_side_len`,`limit_type=min`为限制短边长度>`det_limit_side_len`, -图片不满足限制条件时(`limit_type=max`时长边长度>`det_limit_side_len`或`limit_type=min`时短边长度<`det_limit_side_len`),将对图片进行等比例缩放。 -该参数默认设置为`limit_type='max',det_max_side_len=960`。 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令: +通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制, +`litmit_type`可选参数为[`max`, `min`], +`det_limit_size_len` 为正整数,一般设置为32 的倍数,比如960。 +参数默认设置为`limit_type='max', det_limit_side_len=960`。表示网络输入图像的最长边不能超过960, +如果超过这个值,会对图像做等宽比的resize操作,确保最长边为`det_limit_side_len`。 +设置为`limit_type='min', det_limit_side_len=960` 则表示限制图像的最短边为960。 + +如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以设置det_limit_side_len 为想要的值,比如1216: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1200 +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 ``` 如果想使用CPU进行预测,执行命令如下 ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` @@ -173,7 +180,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ ### 3. EAST文本检测模型推理 -首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址 (coming soon)](link) ),可以使用如下命令进行转换: +首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar) ),可以使用如下命令进行转换: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_east @@ -186,7 +193,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/img ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -(coming soon) +![](../imgs_results/det_res_img_10_east.jpg) **注意**:本代码库中,EAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。 @@ -194,7 +201,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/img ### 4. SAST文本检测模型推理 #### (1). 四边形文本检测模型(ICDAR2015) -首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址(coming soon)](link)),可以使用如下命令进行转换: +首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)),可以使用如下命令进行转换: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_ic15 @@ -205,10 +212,10 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -(coming soon) +![](../imgs_results/det_res_img_10_sast.jpg) #### (2). 弯曲文本检测模型(Total-Text) -首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址(coming soon)](link)),可以使用如下命令进行转换: +首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)),可以使用如下命令进行转换: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_tt @@ -221,7 +228,7 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img ``` 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -(coming soon) +![](../imgs_results/det_res_img623_sast.jpg) **注意**:本代码库中,SAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。 @@ -268,16 +275,6 @@ CRNN 文本识别模型推理,可以执行如下命令: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` - -### 3. 基于Attention损失的识别模型推理 - -基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE" -RARE 文本识别模型推理,可以执行如下命令: -``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" - -``` - ![](../imgs_words_en/word_336.png) 执行命令后,上面图像的识别结果如下: @@ -297,7 +294,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` -### 4. 自定义文本识别字典的推理 +### 3. 自定义文本识别字典的推理 如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch` ``` @@ -305,7 +302,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ``` -### 5. 多语言模型的推理 +### 4. 多语言模型的推理 如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果, 需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别: diff --git a/doc/doc_ch/installation.md b/doc/doc_ch/installation.md index eb9a5c9f1323feadc3bea3017586d1004ee75cc2..0dddfec0a6e17e26a73d284ac98c9c95e449c378 100644 --- a/doc/doc_ch/installation.md +++ b/doc/doc_ch/installation.md @@ -2,7 +2,7 @@ 经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 PaddleOCR 工作环境 -- PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0 +- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0rc1 - python3.7 - glibc 2.23 - cuDNN 7.6+ (GPU) @@ -35,11 +35,11 @@ sudo docker container exec -it ppocr /bin/bash pip3 install --upgrade pip 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 -python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle-gpu==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple 如果您的机器是CPU,请运行以下命令安装 -python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple 更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 ``` diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index b281e1e736f6c3747c2ae07188dc6f87abfc67a8..4995cf8522c1741bca6d26aa582eb2484442f6d3 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -1,4 +1,5 @@ ## OCR模型列表(V2.0,2020年12月12日更新) +**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。 - [一、文本检测模型](#文本检测模型) - [二、文本识别模型](#文本识别模型) @@ -21,7 +22,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |[推理模型 (coming soon)](link) / [slim模型 (coming soon)](link)| +|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |推理模型 (coming soon) / slim模型 (coming soon)| |ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -34,7 +35,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |[推理模型 (coming soon)](link) / [slim模型 (coming soon)](link) | +|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |推理模型 (coming soon) / slim模型 (coming soon) | |ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|3.71M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -45,7 +46,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |[推理模型 (coming soon )](link) / [slim模型 (coming soon)](link) | +|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| | 推理模型 (coming soon) / slim模型 (coming soon) | |en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.56M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | @@ -64,11 +65,5 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |模型名称|模型简介|配置文件|推理模型大小|下载地址| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |[推理模型 (coming soon)](link) / [训练模型](link) / [slim模型](link) | +|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |推理模型 (coming soon) / 训练模型 / slim模型 | |ch_ppocr_mobile_v2.0_cls|原始模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | - - -## OCR模型列表(V1.1,2020年9月22日更新) - -[1.1系列模型地址](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/models_list.md) - diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md index 87d60c5504d28c3cae660ebfd3765bb6893f163e..dc06365c6ef66fe5539887a19042dfdbfb45efa3 100644 --- a/doc/doc_ch/recognition.md +++ b/doc/doc_ch/recognition.md @@ -7,7 +7,7 @@ - [字典](#字典) - [支持空格](#支持空格) -- [二、启动训练](#文本检测模型推理) +- [二、启动训练](#启动训练) - [1. 数据增强](#数据增强) - [2. 训练](#训练) - [3. 小语种](#小语种) @@ -167,7 +167,7 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc ``` # GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号 -# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log +# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml ``` @@ -200,11 +200,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | -| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | -| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | | rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | | rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | -| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | 训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件: @@ -356,8 +353,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkp ``` infer_img: doc/imgs_words/en/word_1.png - index: [19 24 18 23 29] - word : joint + result: ('joint', 0.9998967) ``` 预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml` 完成了中文模型的训练, @@ -376,6 +372,5 @@ python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v ``` infer_img: doc/imgs_words/ch/word_1.jpg - index: [2092 177 312 2503] - word : 韩国小馆 + result: ('韩国小馆', 0.997218) ``` diff --git a/doc/doc_ch/update.md b/doc/doc_ch/update.md index 81a5e68b99f40809d5de4e13c349c974c1dfb28c..3fe8a0c9ace4be31882b22fe75b88f18848e1ad9 100644 --- a/doc/doc_ch/update.md +++ b/doc/doc_ch/update.md @@ -1,7 +1,10 @@ # 更新 +- 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。 +- 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。 +- 2020.11.25 更新半自动标注工具[PPOCRLabel](../../PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 - 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 -- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](../../README_ch.md#PP-OCR)),适合在移动端部署使用。[模型下载](../../README_ch.md#模型下载) -- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](../../README_ch.md#模型下载) +- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见PP-OCR Pipline),适合在移动端部署使用。 +- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。 - 2020.9.17 更新[英文识别模型](./models_list.md#english-recognition-model)和[多语种识别模型](./models_list.md#english-recognition-model),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。 - 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./FAQ.md) - 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md) diff --git a/doc/doc_ch/visualization.md b/doc/doc_ch/visualization.md index dc7b0b9cdcf8ac1abea8163b40ef99bfbf9d7d94..f2ea2b09d9431ebd710f2d7ccac0bd73c50b558e 100644 --- a/doc/doc_ch/visualization.md +++ b/doc/doc_ch/visualization.md @@ -1,49 +1,32 @@ # 效果展示 - -## 通用ppocr_server_1.1效果展示 + +## 通用ppocr_server_2.0 效果展示
- - - - - - + + + + + + + + + +
## 英文识别模型效果展示
- +
## 多语言识别模型效果展示
- - -
- - - -## 超轻量ppocr_mobile_1.0效果展示 - -
- - - - -
- - - -## 通用ppocr_server_1.0效果展示 - -
- - - + +
diff --git a/doc/doc_ch/whl.md b/doc/doc_ch/whl.md index 587b443baf2ed92c1913b29f2dad45b812b44928..6b218e31ecdc3d07132a2a88ad22528ed6ef23b4 100644 --- a/doc/doc_ch/whl.md +++ b/doc/doc_ch/whl.md @@ -6,7 +6,7 @@ pip安装 ```bash -pip install paddleocr +pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本 ``` 本地构建并安装 @@ -166,7 +166,7 @@ paddleocr -h * 检测+分类+识别全流程 ```bash -paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true ``` 结果是一个list,每个item包含了文本框,文字和识别置信度 ```bash @@ -190,7 +190,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg * 分类+识别 ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false ``` 结果是一个list,每个item只包含识别结果和识别置信度 @@ -222,7 +222,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false * 单独执行分类 ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false +paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false --rec false ``` 结果是一个list,每个item只包含分类结果和分类置信度 @@ -258,7 +258,7 @@ im_show.save('result.jpg') ### 通过命令行使用 ```bash -paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true ``` ### 使用网络图片或者numpy数组作为输入 diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md old mode 100644 new mode 100755 index 532ebd90cf149813acc9ad929840e1611766f652..7f1afd027b9b56ad9a1f7a10f3f6b1fc34587252 --- a/doc/doc_en/algorithm_overview_en.md +++ b/doc/doc_en/algorithm_overview_en.md @@ -1,7 +1,7 @@ ## Algorithm introduction -This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v1.1 models list](./models_list_en.md). +This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). - [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM) @@ -13,27 +13,27 @@ This tutorial lists the text detection algorithms and text recognition algorithm PaddleOCR open source text detection algorithms list: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] DB([paper](https://arxiv.org/abs/1911.08947)) -- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research) +- [x] SAST([paper](https://arxiv.org/abs/1908.05498) )(Baidu Self-Research) On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| -|-|-|-|-|-|-| -|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](link)| -|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](link)| -|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](link)| -|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](link)| -|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](link)| +| --- | --- | --- | --- | --- | --- | +|EAST|ResNet50_vd|88.76%|81.36%|84.90%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| +|EAST|MobileNetV3|78.24%|79.15%|78.69%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| +|DB|ResNet50_vd|86.41%|78.72%|82.38%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| +|DB|MobileNetV3|77.29%|73.08%|75.12%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| +|SAST|ResNet50_vd|91.83%|81.80%|86.52%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| On Total-Text dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| -|-|-|-|-|-|-| -|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](link)| +| --- | --- | --- | --- | --- | --- | +|SAST|ResNet50_vd|89.05%|76.80%|82.47%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)| **Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). -For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md) ### 2. Text Recognition Algorithm @@ -41,20 +41,17 @@ For the training guide and use of PaddleOCR text detection algorithms, please re PaddleOCR open-source text recognition algorithms list: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) -- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) coming soon - [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon -- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) coming soon +- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: |Model|Backbone|Avg Accuracy|Module combination|Download link| |-|-|-|-|-| -|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| -|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| -|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| -|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| -|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| -|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)| +|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)| +|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)| +|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)| - -Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) +Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md) diff --git a/doc/doc_en/angle_class_en.md b/doc/doc_en/angle_class_en.md index defdff3ccbbad9d0201305529073bdc80abd5d29..4c479e7b22e7caea6bf5f864d32b57197b925dd9 100644 --- a/doc/doc_en/angle_class_en.md +++ b/doc/doc_en/angle_class_en.md @@ -65,9 +65,9 @@ Start training: ``` # Set PYTHONPATH path export PYTHONPATH=$PYTHONPATH:. -# GPU training Support single card and multi-card training, specify the card number through selected_gpus +# GPU training Support single card and multi-card training, specify the card number through --gpus. If your paddle version is less than 2.0rc1, please use '--selected_gpus' # Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml ``` - Data Augmentation @@ -77,7 +77,7 @@ PaddleOCR provides a variety of data augmentation methods. If you want to add di The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment. Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: -[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) +[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) [randaugment.py](../../ppocr/data/imaug/randaugment.py) diff --git a/doc/doc_en/benchmark_en.md b/doc/doc_en/benchmark_en.md old mode 100644 new mode 100755 diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 83e949344f1821aae2dcb57911aff7173246076f..7638315ae9991c909d7079c904d646a656173dca 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -76,8 +76,10 @@ You can also use `-o` to change the training parameters without modifying the ym python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 # multi-GPU training -# Set the GPU ID used by the '--select_gpus' parameter; -python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +# Set the GPU ID used by the '--gpus' parameter; If your paddle version is less than 2.0rc1, please use '--selected_gpus' +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 + + ``` #### load trained model and continue training @@ -99,15 +101,11 @@ Run the following code to calculate the evaluation indicators. The result will b When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result. +The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. ```shell python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 ``` -The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. -Such as: -```shell -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` * Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md old mode 100644 new mode 100755 index 411a733dd062cf347d7a2e5d5d067739bda36819..db86b109d1a13d00aab833aa31d0279622e7c7f8 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -25,9 +25,8 @@ Next, we first introduce how to convert a trained model into an inference model, - [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE) - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION) - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION) - - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION) - - [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) - - [5. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) + - [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS) + - [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE) - [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) - [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE) @@ -135,24 +134,33 @@ Because EAST and DB algorithms are very different, when inference, it is necessa For lightweight Chinese detection model inference, you can execute the following commands: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" +# download DB text detection inference model +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar +tar xf ch_ppocr_mobile_v2.0_det_infer.tar +# predict +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" ``` The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: -![](../imgs_results/det_res_2.jpg) +![](../imgs_results/det_res_22.jpg) -The size of the image is limited by the parameters `limit_type` and `det_limit_side_len`, `limit_type=max` is to limit the length of the long side <`det_limit_side_len`, and `limit_type=min` is to limit the length of the short side>`det_limit_side_len`, -When the picture does not meet the restriction conditions (for `limit_type=max`and long side >`det_limit_side_len` or for `min` and short side <`det_limit_side_len`), the image will be scaled proportionally. -This parameter is set to `limit_type='max', det_max_side_len=960` by default. If the resolution of the input picture is relatively large, and you want to use a larger resolution prediction, you can execute the following command: +You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image, +The optional parameters of `litmit_type` are [`max`, `min`], and +`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960. +The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960, +If this value is exceeded, the image will be resized with the same width ratio to ensure that the longest side is `det_limit_side_len`. +Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest side of the image is limited to 960. + +If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216: ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1200 +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 ``` If you want to use the CPU for prediction, execute the command as follows ``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` @@ -179,7 +187,7 @@ The visualized text detection results are saved to the `./inference_results` fol ### 3. EAST TEXT DETECTION MODEL INFERENCE -First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link (coming soon)](link)), you can use the following command to convert: +First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_east @@ -192,7 +200,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: -(coming soon) +![](../imgs_results/det_res_img_10_east.jpg) **Note**: EAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases. @@ -200,7 +208,7 @@ The visualized text detection results are saved to the `./inference_results` fol ### 4. SAST TEXT DETECTION MODEL INFERENCE #### (1). Quadrangle text detection model (ICDAR2015) -First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link (coming soon)](link)), you can use the following command to convert: +First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)), you can use the following command to convert: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_ic15 @@ -214,10 +222,10 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: -(coming soon) +![](../imgs_results/det_res_img_10_sast.jpg) #### (2). Curved text detection model (Total-Text) -First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link (coming soon)](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)), you can use the following command to convert: +First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)), you can use the following command to convert: ``` python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_tt @@ -231,7 +239,7 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: -(coming soon) +![](../imgs_results/det_res_img623_sast.jpg) **Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases. @@ -275,15 +283,6 @@ For CRNN text recognition model inference, execute the following commands: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` - -### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE - -The recognition model based on Attention loss is different from ctc, and additional recognition algorithm parameters need to be set --rec_algorithm="RARE" -After executing the command, the recognition result of the above image is as follows: -```bash -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" -``` - ![](../imgs_words_en/word_336.png) After executing the command, the recognition result of the above image is as follows: @@ -303,7 +302,7 @@ dict_character = list(self.character_str) ``` -### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY +### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch` ``` @@ -311,7 +310,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ``` -### 5. MULTILINGAUL MODEL INFERENCE +### 4. MULTILINGAUL MODEL INFERENCE If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition: diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index b49c3f02851eb540c3b8e78beb0dfa1a4b08ce09..073b67b04d10cc2ae4b20f0ca38b604ab95bc09f 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -3,7 +3,7 @@ After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. PaddleOCR working environment: -- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0 +- PaddlePaddle 1.8+, Recommend PaddlePaddle 2.0rc1 - python3.7 - glibc 2.23 @@ -38,10 +38,10 @@ sudo docker container exec -it ppocr /bin/bash pip3 install --upgrade pip # If you have cuda9 or cuda10 installed on your machine, please run the following command to install -python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple # If you only have cpu on your machine, please run the following command to install -python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple +python3 -m pip install paddlepaddle==2.0rc1 -i https://mirror.baidu.com/pypi/simple ``` For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 63d8c598bbe4e3b37ae47804e595438ee79905c8..4b4b393d3477133a2d493b7271ae257004e62c83 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -1,4 +1,5 @@ -## OCR model list(V1.1, updated on 2020.12.12) +## OCR model list(V2.0, updated on 2020.12.12) +**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. - [1. Text Detection Model](#Detection) - [2. Text Recognition Model](#Recognition) @@ -20,7 +21,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |[inference model (coming soon)](link) / [slim model (coming soon)](link)| +|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |inference model (coming soon) / slim model (coming soon)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| @@ -32,7 +33,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |[inference model (coming soon)](link) / [slim model (coming soon)](link) | +|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |inference model (coming soon) / slim model (coming soon) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|3.71M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | @@ -44,7 +45,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |[inference model (coming soon )](link) / [slim model (coming soon)](link) | +|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |inference model (coming soon ) / slim model (coming soon) | |en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.56M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | @@ -62,10 +63,6 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |[inference model (coming soon)](link) / [trained model](link) / [slim model](link) | +|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |inference model (coming soon) / trained model / slim model| |ch_ppocr_mobile_v2.0_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | - -## OCR model list (V1.1,updated on 2020.9.22) - -[1.1 series model address](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/models_list.md) diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 1539b288da2518bf5441adea7983135f3c46619f..bc8faa0fc3df936855ead965f1e22107b576bc7a 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -162,7 +162,7 @@ Start training: ``` # GPU training Support single card and multi-card training, specify the card number through --gpus -# Training icdar15 English data and saving the log as train_rec.log +# Training icdar15 English data and The training log will be automatically saved as train.log under "{save_model_dir}" python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml ``` @@ -193,11 +193,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend | rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | | rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | -| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | -| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | | rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | | rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | -| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | For training Chinese data, it is recommended to use [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: @@ -350,8 +347,7 @@ Get the prediction result of the input image: ``` infer_img: doc/imgs_words/en/word_1.png - index: [19 24 18 23 29] - word : joint + result: ('joint', 0.9998967) ``` The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml`, you can use the following command to predict the Chinese model: @@ -369,6 +365,5 @@ Get the prediction result of the input image: ``` infer_img: doc/imgs_words/ch/word_1.jpg - index: [2092 177 312 2503] - word : 韩国小馆 + result: ('韩国小馆', 0.997218) ``` diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md index 71f784812bcac9ff55aa0523831ba9b1a5849403..1e80012e0608f0e28291d0f57b5a5d0beffe2e8c 100644 --- a/doc/doc_en/update_en.md +++ b/doc/doc_en/update_en.md @@ -1,8 +1,9 @@ # RECENT UPDATES +- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image. +- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 -- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](../../README.md#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](../../README.md#Supported-Chinese-model-list) -- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](../../README.md#Supported-Chinese-model-list) -- 2020.9.17 update [English recognition model](./models_list_en.md#english-recognition-model) and [Multilingual recognition model](./models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated. +- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M, suitable for mobile deployment. +- 2020.9.17 update English recognition model and Multilingual recognition model, `English`, `Chinese`, `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated. - 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md) - 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) diff --git a/doc/doc_en/visualization_en.md b/doc/doc_en/visualization_en.md index 2b88b1de6042b6f63cefb85134f58d83d4c625f8..f9c455e5b3510a9f262c6bf59b8adfbaef3fa01d 100644 --- a/doc/doc_en/visualization_en.md +++ b/doc/doc_en/visualization_en.md @@ -1,49 +1,34 @@ # Visualization - -## ch_ppocr_server_1.1 - -
- - - - - - -
- - - -## en_ppocr_mobile_1.1 -
- -
+ +## ch_ppocr_server_2.0 - -## (multilingual)_ppocr_mobile_1.1
- - + + + + + + + + + +
- -## ppocr_mobile_1.0 + +## en_ppocr_mobile_2.0
- - - - +
- -## ppocr_server_1.0 - + +## (multilingual)_ppocr_mobile_2.0
- - - + +
diff --git a/doc/doc_en/whl_en.md b/doc/doc_en/whl_en.md index c25999d4d5c57ee272de20dd4b5e47ebf41abd52..1ef14f1427eb4c1a2a504f4a420cd43c8444aeac 100644 --- a/doc/doc_en/whl_en.md +++ b/doc/doc_en/whl_en.md @@ -4,7 +4,7 @@ ### install package install by pypi ```bash -pip install paddleocr +pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ ``` build own whl package and install @@ -172,7 +172,7 @@ paddleocr -h * detection classification and recognition ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true -cls true --lang en +paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en ``` Output will be a list, each item contains bounding box, text and recognition confidence @@ -198,7 +198,7 @@ Output will be a list, each item contains bounding box, text and recognition con * classification and recognition ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --lang en +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en ``` Output will be a list, each item contains text and recognition confidence @@ -221,7 +221,7 @@ Output will be a list, each item only contains bounding box * only recognition ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --cls false --lang en +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en ``` Output will be a list, each item contains text and recognition confidence @@ -231,7 +231,7 @@ Output will be a list, each item contains text and recognition confidence * only classification ```bash -paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --rec false +paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false ``` Output will be a list, each item contains classification result and confidence @@ -268,7 +268,7 @@ im_show.save('result.jpg') ### Use by command line ```bash -paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true +paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true ``` ### Use web images or numpy array as input diff --git a/doc/imgs/korean_1.jpg b/doc/imgs/korean_1.jpg index f1614e9f286a82261312731e13e8256410b160e3..4259c046c29257f15107f734e142fa511db9769b 100644 Binary files a/doc/imgs/korean_1.jpg and b/doc/imgs/korean_1.jpg differ diff --git a/doc/imgs_results/1.jpg b/doc/imgs_results/1.jpg deleted file mode 100644 index 0d180854acd0142cad7c4d9beb635368f39c1321..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1.jpg and /dev/null differ diff --git a/doc/imgs_results/10.jpg b/doc/imgs_results/10.jpg deleted file mode 100644 index f5dd8b802ca1b86ea99cbe8eeaa8551ddfda7f04..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/10.jpg and /dev/null differ diff --git a/doc/imgs_results/11.jpg b/doc/imgs_results/11.jpg deleted file mode 100644 index dfa0a9f852449ed6ed663b700c6a462715515299..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/11.jpg and /dev/null differ diff --git a/doc/imgs_results/1101.jpg b/doc/imgs_results/1101.jpg deleted file mode 100644 index fa8d809a9b133ca09e4265355493e5c60e311e44..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1101.jpg and /dev/null differ diff --git a/doc/imgs_results/1102.jpg b/doc/imgs_results/1102.jpg deleted file mode 100644 index 6988b12c4b836e88b67897a7b7141e12e236e7c0..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1102.jpg and /dev/null differ diff --git a/doc/imgs_results/1103.jpg b/doc/imgs_results/1103.jpg deleted file mode 100644 index 3437f60b8e587b0fda9c88aa37c001a68ace59b4..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1103.jpg and /dev/null differ diff --git a/doc/imgs_results/1104.jpg b/doc/imgs_results/1104.jpg deleted file mode 100644 index 9297be0787ad6cc89c43acfcd1abd010c512c45b..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1104.jpg and /dev/null differ diff --git a/doc/imgs_results/1105.jpg b/doc/imgs_results/1105.jpg deleted file mode 100644 index 6280e5eec8c05125bcde2a171d767a3fc3f3ea4d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1105.jpg and /dev/null differ diff --git a/doc/imgs_results/1106.jpg b/doc/imgs_results/1106.jpg deleted file mode 100644 index 61f3915d5a36b02537681687dafb0e2e9303eea2..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1106.jpg and /dev/null differ diff --git a/doc/imgs_results/1110.jpg b/doc/imgs_results/1110.jpg deleted file mode 100644 index b0c63e7c47c9ddbd555df34f8a9c17bf7d93043d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1110.jpg and /dev/null differ diff --git a/doc/imgs_results/1112.jpg b/doc/imgs_results/1112.jpg deleted file mode 100644 index 35bec155034ba5860620f8c9d387dbc71607d6fe..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/1112.jpg and /dev/null differ diff --git a/doc/imgs_results/12.jpg b/doc/imgs_results/12.jpg deleted file mode 100644 index 61e312dbd3b87d759863b8ad2bdee7d710614919..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/12.jpg and /dev/null differ diff --git a/doc/imgs_results/13.png b/doc/imgs_results/13.png deleted file mode 100644 index a460d127012dafe664db466f47b223c7aa369ff8..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/13.png and /dev/null differ diff --git a/doc/imgs_results/15.jpg b/doc/imgs_results/15.jpg deleted file mode 100644 index 04ddeaa7156278db62c3c25f1d0bb4e43e18d186..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/15.jpg and /dev/null differ diff --git a/doc/imgs_results/16.png b/doc/imgs_results/16.png deleted file mode 100644 index fbd3e184473ebc9f0be016ae542d7e5066d037cb..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/16.png and /dev/null differ diff --git a/doc/imgs_results/17.png b/doc/imgs_results/17.png deleted file mode 100644 index 1b606b6391b1d4ab8e1f6bd5aaa889715e6776d4..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/17.png and /dev/null differ diff --git a/doc/imgs_results/22.jpg b/doc/imgs_results/22.jpg deleted file mode 100644 index beaf2acc9d9beee582cbad6a744e9561ba918b1b..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/22.jpg and /dev/null differ diff --git a/doc/imgs_results/3.jpg b/doc/imgs_results/3.jpg deleted file mode 100644 index a27e28d93ddd6415c87bf9a18c14154681d34f7c..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/3.jpg and /dev/null differ diff --git a/doc/imgs_results/4.jpg b/doc/imgs_results/4.jpg deleted file mode 100644 index f055e8b31bcd9e437d580a996494460919762774..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/4.jpg and /dev/null differ diff --git a/doc/imgs_results/5.jpg b/doc/imgs_results/5.jpg deleted file mode 100644 index 9df29cc9af7dd8ea80b0c9dbb4f028a0d2f54a8e..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/5.jpg and /dev/null differ diff --git a/doc/imgs_results/6.jpg b/doc/imgs_results/6.jpg deleted file mode 100644 index 8e3655b57476834510faeb165ee0dfaa270b454f..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/6.jpg and /dev/null differ diff --git a/doc/imgs_results/7.jpg b/doc/imgs_results/7.jpg deleted file mode 100644 index 015509e8ea554cde87a62573bf835f0c97d88bc8..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/7.jpg and /dev/null differ diff --git a/doc/imgs_results/8.jpg b/doc/imgs_results/8.jpg deleted file mode 100644 index 728070203099c4ea079bea9ab2ead48af0f35750..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/8.jpg and /dev/null differ diff --git a/doc/imgs_results/9.jpg b/doc/imgs_results/9.jpg deleted file mode 100644 index 362bb7f7524cb57e877b88eadcdfb583091264f5..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/9.jpg and /dev/null differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d7762d2e2c93338d61cf72f0fb9bf19fe5745324 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0383d445bd0e58ced010b3a85cf245964bb468f7 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9162cf1479f83b7db2d1a2cd2d9a5e92b3b95b39 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b3d645779428bcce8c120976ef66bef10deee0c5 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7dba7708be61d912d574b610fc0b04cfa4e5feea Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2168ecd1f0acb75d7ecc9c15202f342d18111495 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg new file mode 100644 index 0000000000000000000000000000000000000000..03fd19784af94296a7cf5f417a7477f555accbb7 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f1acbf0f94a1febbbf0d780ed019723b3dd78fa9 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7dae24a92da90bf8a863ae7b04661a14ce97562c Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg new file mode 100644 index 0000000000000000000000000000000000000000..59d9a5632d3054dbf8cc6bdb021ebef224c890a8 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg differ diff --git a/doc/imgs_results/img_12.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg similarity index 100% rename from doc/imgs_results/img_12.jpg rename to doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg new file mode 100644 index 0000000000000000000000000000000000000000..643b850da8042186a415c23aa33a1594e3bcc6b7 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg differ diff --git a/doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg b/doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b5ded6e1de4cfa30013ebd09f98ef7f991804942 Binary files /dev/null and b/doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg differ diff --git a/doc/imgs_results/chinese_db_crnn_server/1.jpg b/doc/imgs_results/chinese_db_crnn_server/1.jpg deleted file mode 100644 index 04838988a13cd09e280fc8c1c316d65471bfb7d8..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/1.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/10.jpg b/doc/imgs_results/chinese_db_crnn_server/10.jpg deleted file mode 100644 index 0ca280b859a46b8af272d76f53db42402417689d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/10.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/11.jpg b/doc/imgs_results/chinese_db_crnn_server/11.jpg deleted file mode 100644 index d34ab7560b827152dbd13a40868a3c4a4c918c6c..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/11.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/12.jpg b/doc/imgs_results/chinese_db_crnn_server/12.jpg deleted file mode 100644 index ee33dd666d8838a058185cef84e0901e3306e714..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/12.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/13.png b/doc/imgs_results/chinese_db_crnn_server/13.png deleted file mode 100644 index fb5b81c28301f8c1817b4be4a5665a1ea33841d6..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/13.png and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/15.jpg b/doc/imgs_results/chinese_db_crnn_server/15.jpg deleted file mode 100644 index fd51309c6bf06adfe41ebeb2156d11c8784f544d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/15.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/16.png b/doc/imgs_results/chinese_db_crnn_server/16.png deleted file mode 100644 index 05679e18ce4019bec6c595d4250e1dd96e8d5f6e..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/16.png and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/17.png b/doc/imgs_results/chinese_db_crnn_server/17.png deleted file mode 100644 index d57172bca5452d15cec000cd2dd29499b9cd4db4..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/17.png and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/2.jpg b/doc/imgs_results/chinese_db_crnn_server/2.jpg deleted file mode 100644 index 08b6040973672e3b7223a5f60eb1dfadb2893c4f..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/2.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/22.jpg b/doc/imgs_results/chinese_db_crnn_server/22.jpg deleted file mode 100644 index 19cca2ad7b179d35380cba3605a39e9a7350edc0..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/22.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/3.jpg b/doc/imgs_results/chinese_db_crnn_server/3.jpg deleted file mode 100644 index 35246cad823ae04374291aa6bab34a465d47b677..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/3.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/4.jpg b/doc/imgs_results/chinese_db_crnn_server/4.jpg deleted file mode 100644 index f3e44f2b820b06afe596ba3ccd008fa11f6bd6f0..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/4.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/5.jpg b/doc/imgs_results/chinese_db_crnn_server/5.jpg deleted file mode 100644 index e794068d394889c8b24c5c4893abc3192a16416d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/5.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/6.jpg b/doc/imgs_results/chinese_db_crnn_server/6.jpg deleted file mode 100644 index 6d1ea84dad584e4e91642f56dbf73b7d7e22dc20..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/6.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/7.jpg b/doc/imgs_results/chinese_db_crnn_server/7.jpg deleted file mode 100644 index 943a5a7f359c32e920ae329f3dce985e49621761..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/7.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/8.jpg b/doc/imgs_results/chinese_db_crnn_server/8.jpg deleted file mode 100644 index b6b0387f162cbe0e3f19df897d816655519e967e..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/8.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/9.jpg b/doc/imgs_results/chinese_db_crnn_server/9.jpg deleted file mode 100644 index cb9d7ff03e64aa437db1bd737a22cd8d8c2e3041..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/9.jpg and /dev/null differ diff --git a/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg b/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg deleted file mode 100644 index c051d3fdb54204e87873093359c486ee0aab8184..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg and /dev/null differ diff --git a/doc/imgs_results/det_res_22.jpg b/doc/imgs_results/det_res_22.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d1255f49d9d371d4b91d98c6750a10a01f56b629 Binary files /dev/null and b/doc/imgs_results/det_res_22.jpg differ diff --git a/doc/imgs_results/det_res_img623_sast.jpg b/doc/imgs_results/det_res_img623_sast.jpg index b2dd538f7729724a33516091d11c081c7c2c1bd7..af5e2d6e2c5643ee71a29bc015e8ae88a8d20058 100644 Binary files a/doc/imgs_results/det_res_img623_sast.jpg and b/doc/imgs_results/det_res_img623_sast.jpg differ diff --git a/doc/imgs_results/det_res_img_10_east.jpg b/doc/imgs_results/det_res_img_10_east.jpg index 400b9e6fcd75f886fc99964cd0793e2ff07693d2..908d077c3eabcb95eabf4afc54ce0bed1b54f355 100644 Binary files a/doc/imgs_results/det_res_img_10_east.jpg and b/doc/imgs_results/det_res_img_10_east.jpg differ diff --git a/doc/imgs_results/det_res_img_10_sast.jpg b/doc/imgs_results/det_res_img_10_sast.jpg index c63faf1354601f25cedb57a3b87f4467999f5457..702f773e68fe339e9acbc4d21c98cd0aa4536ef5 100644 Binary files a/doc/imgs_results/det_res_img_10_sast.jpg and b/doc/imgs_results/det_res_img_10_sast.jpg differ diff --git a/doc/imgs_results/french_0.jpg b/doc/imgs_results/french_0.jpg new file mode 100644 index 0000000000000000000000000000000000000000..3c2abe6304b93e3025dd19b75980c548f70bd3c7 Binary files /dev/null and b/doc/imgs_results/french_0.jpg differ diff --git a/doc/imgs_results/img_10.jpg b/doc/imgs_results/img_10.jpg deleted file mode 100644 index e9635d172dd9c310e7807a6a6cc204b24a2027c6..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/img_10.jpg and /dev/null differ diff --git a/doc/imgs_results/img_11.jpg b/doc/imgs_results/img_11.jpg deleted file mode 100644 index cf942f9a59c35041e5a1885d14d7cf8aa582f54d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results/img_11.jpg and /dev/null differ diff --git a/doc/imgs_results/korean.jpg b/doc/imgs_results/korean.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e5d863cd8695505c7d3915d5da07fb5d9de92b73 Binary files /dev/null and b/doc/imgs_results/korean.jpg differ diff --git a/doc/imgs_results_vis2/1.jpg b/doc/imgs_results_vis2/1.jpg deleted file mode 100644 index f6bb48b6d221041238b01b6f2cf331ed2d45770a..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/1.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/10.jpg b/doc/imgs_results_vis2/10.jpg deleted file mode 100644 index 03585245bf93d592530f344d906847080891de50..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/10.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/11.jpg b/doc/imgs_results_vis2/11.jpg deleted file mode 100644 index 2b5392a5a8836b56e84dd2695eac239cbeb81b46..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/11.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/12.jpg b/doc/imgs_results_vis2/12.jpg deleted file mode 100644 index a7b6518cf5241f1ef6f30d02225e8c883b552d76..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/12.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/13.png b/doc/imgs_results_vis2/13.png deleted file mode 100644 index fca7ac3953e2443da0bb0a470e72a564ebec6e4d..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/13.png and /dev/null differ diff --git a/doc/imgs_results_vis2/15.jpg b/doc/imgs_results_vis2/15.jpg deleted file mode 100644 index 47a32eeff2ecd9667cd3fb890472eec7612f14d0..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/15.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/16.png b/doc/imgs_results_vis2/16.png deleted file mode 100644 index 191c4759fcd23dc306b03205e0295e10eed13ab1..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/16.png and /dev/null differ diff --git a/doc/imgs_results_vis2/17.png b/doc/imgs_results_vis2/17.png deleted file mode 100644 index 0ba1b073e7edcee78ca7e6195b8487381f6c5663..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/17.png and /dev/null differ diff --git a/doc/imgs_results_vis2/2.jpg b/doc/imgs_results_vis2/2.jpg deleted file mode 100644 index 8e46314c88c931ff98b3d6ac75eae184698e639a..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/2.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/22.jpg b/doc/imgs_results_vis2/22.jpg deleted file mode 100644 index aea7791cf7953bf2708a8fbe44ecab6c3b8b9396..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/22.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/3.jpg b/doc/imgs_results_vis2/3.jpg deleted file mode 100644 index f3ac72acfa552a326ac81b45186f07c4e8a1e46e..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/3.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/4.jpg b/doc/imgs_results_vis2/4.jpg deleted file mode 100644 index 6986959d935ec02fd345dd47bf3ca2f5b0606d71..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/4.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/5.jpg b/doc/imgs_results_vis2/5.jpg deleted file mode 100644 index c6fadd0ec1401c25f2902482f971a4cd98c69e8c..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/5.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/6.jpg b/doc/imgs_results_vis2/6.jpg deleted file mode 100644 index 232e53a3b0d3299d7ac93298901a692da8345e95..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/6.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/7.jpg b/doc/imgs_results_vis2/7.jpg deleted file mode 100644 index 31ee1078409c49418cbad8e15143da98c799bdb3..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/7.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/8.jpg b/doc/imgs_results_vis2/8.jpg deleted file mode 100644 index db6e913ce6f44f5262afe5432f5a62745be322ee..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/8.jpg and /dev/null differ diff --git a/doc/imgs_results_vis2/9.jpg b/doc/imgs_results_vis2/9.jpg deleted file mode 100644 index 5b28f97b3f3687a61d34e297878b0e6aeb82db12..0000000000000000000000000000000000000000 Binary files a/doc/imgs_results_vis2/9.jpg and /dev/null differ diff --git a/doc/imgs_words_en/.DS_Store b/doc/imgs_words_en/.DS_Store deleted file mode 100644 index 5008ddfcf53c02e82d7eee2e57c38e5672ef89f6..0000000000000000000000000000000000000000 Binary files a/doc/imgs_words_en/.DS_Store and /dev/null differ diff --git a/ppocr/modeling/transforms/tps.py b/ppocr/modeling/transforms/tps.py index 50c1740ee4a3c687405c4d28818543c043e53227..86665bedfff726c174e676cb544000a37ada0dad 100644 --- a/ppocr/modeling/transforms/tps.py +++ b/ppocr/modeling/transforms/tps.py @@ -128,7 +128,7 @@ class LocalizationNetwork(nn.Layer): i = 0 for block in self.block_list: x = block(x) - x = x.reshape([B, -1]) + x = x.squeeze(axis=2).squeeze(axis=2) x = self.fc1(x) x = F.relu(x) @@ -176,14 +176,14 @@ class GridGenerator(nn.Layer): Return: batch_P_prime: the grid for the grid_sampler """ - C = self.build_C() - P = self.build_P(I_r_size) - inv_delta_C = self.build_inv_delta_C(C).astype('float32') - P_hat = self.build_P_hat(C, P).astype('float32') + C = self.build_C_paddle() + P = self.build_P_paddle(I_r_size) + + inv_delta_C_tensor = self.build_inv_delta_C_paddle(C).astype('float32') + P_hat_tensor = self.build_P_hat_paddle( + C, paddle.to_tensor(P)).astype('float32') - inv_delta_C_tensor = paddle.to_tensor(inv_delta_C) inv_delta_C_tensor.stop_gradient = True - P_hat_tensor = paddle.to_tensor(P_hat) P_hat_tensor.stop_gradient = True batch_C_ex_part_tensor = self.get_expand_tensor(batch_C_prime) @@ -196,71 +196,80 @@ class GridGenerator(nn.Layer): batch_P_prime = paddle.matmul(P_hat_tensor, batch_T) return batch_P_prime - def build_C(self): + def build_C_paddle(self): """ Return coordinates of fiducial points in I_r; C """ F = self.F - ctrl_pts_x = np.linspace(-1.0, 1.0, int(F / 2)) - ctrl_pts_y_top = -1 * np.ones(int(F / 2)) - ctrl_pts_y_bottom = np.ones(int(F / 2)) - ctrl_pts_top = np.stack([ctrl_pts_x, ctrl_pts_y_top], axis=1) - ctrl_pts_bottom = np.stack([ctrl_pts_x, ctrl_pts_y_bottom], axis=1) - C = np.concatenate([ctrl_pts_top, ctrl_pts_bottom], axis=0) + ctrl_pts_x = paddle.linspace(-1.0, 1.0, int(F / 2)) + ctrl_pts_y_top = -1 * paddle.ones([int(F / 2)]) + ctrl_pts_y_bottom = paddle.ones([int(F / 2)]) + ctrl_pts_top = paddle.stack([ctrl_pts_x, ctrl_pts_y_top], axis=1) + ctrl_pts_bottom = paddle.stack([ctrl_pts_x, ctrl_pts_y_bottom], axis=1) + C = paddle.concat([ctrl_pts_top, ctrl_pts_bottom], axis=0) return C # F x 2 - def build_P(self, I_r_size): - I_r_width, I_r_height = I_r_size - I_r_grid_x = (np.arange(-I_r_width, I_r_width, 2) + 1.0) \ - / I_r_width # self.I_r_width - I_r_grid_y = (np.arange(-I_r_height, I_r_height, 2) + 1.0) \ - / I_r_height # self.I_r_height + def build_P_paddle(self, I_r_size): + I_r_height, I_r_width = I_r_size + I_r_grid_x = ( + paddle.arange(-I_r_width, I_r_width, 2).astype('float32') + 1.0 + ) / I_r_width # self.I_r_width + I_r_grid_y = ( + paddle.arange(-I_r_height, I_r_height, 2).astype('float32') + 1.0 + ) / I_r_height # self.I_r_height # P: self.I_r_width x self.I_r_height x 2 - P = np.stack(np.meshgrid(I_r_grid_x, I_r_grid_y), axis=2) + P = paddle.stack(paddle.meshgrid(I_r_grid_x, I_r_grid_y), axis=2) + P = paddle.transpose(P, perm=[1, 0, 2]) # n (= self.I_r_width x self.I_r_height) x 2 return P.reshape([-1, 2]) - def build_inv_delta_C(self, C): + def build_inv_delta_C_paddle(self, C): """ Return inv_delta_C which is needed to calculate T """ F = self.F - hat_C = np.zeros((F, F), dtype=float) # F x F + hat_C = paddle.zeros((F, F), dtype='float32') # F x F for i in range(0, F): for j in range(i, F): - r = np.linalg.norm(C[i] - C[j]) - hat_C[i, j] = r - hat_C[j, i] = r - np.fill_diagonal(hat_C, 1) - hat_C = (hat_C**2) * np.log(hat_C) - # print(C.shape, hat_C.shape) - delta_C = np.concatenate( # F+3 x F+3 + if i == j: + hat_C[i, j] = 1 + else: + r = paddle.norm(C[i] - C[j]) + hat_C[i, j] = r + hat_C[j, i] = r + hat_C = (hat_C**2) * paddle.log(hat_C) + delta_C = paddle.concat( # F+3 x F+3 [ - np.concatenate( - [np.ones((F, 1)), C, hat_C], axis=1), # F x F+3 - np.concatenate( - [np.zeros((2, 3)), np.transpose(C)], axis=1), # 2 x F+3 - np.concatenate( - [np.zeros((1, 3)), np.ones((1, F))], axis=1) # 1 x F+3 + paddle.concat( + [paddle.ones((F, 1)), C, hat_C], axis=1), # F x F+3 + paddle.concat( + [paddle.zeros((2, 3)), paddle.transpose( + C, perm=[1, 0])], + axis=1), # 2 x F+3 + paddle.concat( + [paddle.zeros((1, 3)), paddle.ones((1, F))], + axis=1) # 1 x F+3 ], axis=0) - inv_delta_C = np.linalg.inv(delta_C) + inv_delta_C = paddle.inverse(delta_C) return inv_delta_C # F+3 x F+3 - def build_P_hat(self, C, P): + def build_P_hat_paddle(self, C, P): F = self.F eps = self.eps n = P.shape[0] # n (= self.I_r_width x self.I_r_height) # P_tile: n x 2 -> n x 1 x 2 -> n x F x 2 - P_tile = np.tile(np.expand_dims(P, axis=1), (1, F, 1)) - C_tile = np.expand_dims(C, axis=0) # 1 x F x 2 + P_tile = paddle.tile(paddle.unsqueeze(P, axis=1), (1, F, 1)) + C_tile = paddle.unsqueeze(C, axis=0) # 1 x F x 2 P_diff = P_tile - C_tile # n x F x 2 # rbf_norm: n x F - rbf_norm = np.linalg.norm(P_diff, ord=2, axis=2, keepdims=False) + rbf_norm = paddle.norm(P_diff, p=2, axis=2, keepdim=False) + # rbf: n x F - rbf = np.multiply(np.square(rbf_norm), np.log(rbf_norm + eps)) - P_hat = np.concatenate([np.ones((n, 1)), P, rbf], axis=1) + rbf = paddle.multiply( + paddle.square(rbf_norm), paddle.log(rbf_norm + eps)) + P_hat = paddle.concat([paddle.ones((n, 1)), P, rbf], axis=1) return P_hat # n x F+3 def get_expand_tensor(self, batch_C_prime): - B = batch_C_prime.shape[0] - batch_C_prime = batch_C_prime.reshape([B, -1]) + B, H, C = batch_C_prime.shape + batch_C_prime = batch_C_prime.reshape([B, H * C]) batch_C_ex_part_tensor = self.fc(batch_C_prime) batch_C_ex_part_tensor = batch_C_ex_part_tensor.reshape([-1, 3, 2]) return batch_C_ex_part_tensor @@ -277,10 +286,8 @@ class TPS(nn.Layer): def forward(self, image): image.stop_gradient = False - I_r_size = [image.shape[3], image.shape[2]] - batch_C_prime = self.loc_net(image) - batch_P_prime = self.grid_generator(batch_C_prime, I_r_size) + batch_P_prime = self.grid_generator(batch_C_prime, image.shape[2:]) batch_P_prime = batch_P_prime.reshape( [-1, image.shape[2], image.shape[3], 2]) batch_I_r = F.grid_sample(x=image, grid=batch_P_prime) diff --git a/ppocr/postprocess/east_postprocess.py b/ppocr/postprocess/east_postprocess.py old mode 100644 new mode 100755 index 0b669405562aef9812b9771977bf82f362beb75e..ceee727aa3df052041aee925c6c856773c8a288e --- a/ppocr/postprocess/east_postprocess.py +++ b/ppocr/postprocess/east_postprocess.py @@ -19,12 +19,10 @@ from __future__ import print_function import numpy as np from .locality_aware_nms import nms_locality import cv2 +import paddle import os import sys -# __dir__ = os.path.dirname(os.path.abspath(__file__)) -# sys.path.append(__dir__) -# sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) class EASTPostProcess(object): @@ -113,11 +111,14 @@ class EASTPostProcess(object): def __call__(self, outs_dict, shape_list): score_list = outs_dict['f_score'] geo_list = outs_dict['f_geo'] + if isinstance(score_list, paddle.Tensor): + score_list = score_list.numpy() + geo_list = geo_list.numpy() img_num = len(shape_list) dt_boxes_list = [] for ino in range(img_num): - score = score_list[ino].numpy() - geo = geo_list[ino].numpy() + score = score_list[ino] + geo = geo_list[ino] boxes = self.detect( score_map=score, geo_map=geo, diff --git a/ppocr/postprocess/sast_postprocess.py b/ppocr/postprocess/sast_postprocess.py old mode 100644 new mode 100755 index 03b0e8f17d60a8863a2d1d5900e01dcd4874d5a9..f011e7e571cf4c2297a81a7f7772aa0c09f0aaf1 --- a/ppocr/postprocess/sast_postprocess.py +++ b/ppocr/postprocess/sast_postprocess.py @@ -24,7 +24,7 @@ sys.path.append(os.path.join(__dir__, '..')) import numpy as np from .locality_aware_nms import nms_locality -# import lanms +import paddle import cv2 import time @@ -276,14 +276,19 @@ class SASTPostProcess(object): border_list = outs_dict['f_border'] tvo_list = outs_dict['f_tvo'] tco_list = outs_dict['f_tco'] + if isinstance(score_list, paddle.Tensor): + score_list = score_list.numpy() + border_list = border_list.numpy() + tvo_list = tvo_list.numpy() + tco_list = tco_list.numpy() img_num = len(shape_list) poly_lists = [] for ino in range(img_num): - p_score = score_list[ino].transpose((1,2,0)).numpy() - p_border = border_list[ino].transpose((1,2,0)).numpy() - p_tvo = tvo_list[ino].transpose((1,2,0)).numpy() - p_tco = tco_list[ino].transpose((1,2,0)).numpy() + p_score = score_list[ino].transpose((1,2,0)) + p_border = border_list[ino].transpose((1,2,0)) + p_tvo = tvo_list[ino].transpose((1,2,0)) + p_tco = tco_list[ino].transpose((1,2,0)) src_h, src_w, ratio_h, ratio_w = shape_list[ino] poly_list = self.detect_sast(p_score, p_tvo, p_border, p_tco, ratio_w, ratio_h, src_w, src_h, diff --git a/setup.py b/setup.py index bef6dbbfc41f6f0ced3a6dc0cf8c6a8c7270dedf..f92074be1274bb44b3f2b8fdc621554df88d054f 100644 --- a/setup.py +++ b/setup.py @@ -32,7 +32,7 @@ setup( package_dir={'paddleocr': ''}, include_package_data=True, entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]}, - version='2.0', + version='2.0.1', install_requires=requirements, license='Apache License 2.0', description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices', diff --git a/tools/infer/predict_det.py b/tools/infer/predict_det.py index 6f98ded8295dabbd5edf05913245e5d94d856689..d389ca393dc94a7ece69e6f59f999073ae4b1773 100755 --- a/tools/infer/predict_det.py +++ b/tools/infer/predict_det.py @@ -37,33 +37,51 @@ class TextDetector(object): def __init__(self, args): self.det_algorithm = args.det_algorithm self.use_zero_copy_run = args.use_zero_copy_run + pre_process_list = [{ + 'DetResizeForTest': { + 'limit_side_len': args.det_limit_side_len, + 'limit_type': args.det_limit_type + } + }, { + 'NormalizeImage': { + 'std': [0.229, 0.224, 0.225], + 'mean': [0.485, 0.456, 0.406], + 'scale': '1./255.', + 'order': 'hwc' + } + }, { + 'ToCHWImage': None + }, { + 'KeepKeys': { + 'keep_keys': ['image', 'shape'] + } + }] postprocess_params = {} if self.det_algorithm == "DB": - pre_process_list = [{ - 'DetResizeForTest': { - 'limit_side_len': args.det_limit_side_len, - 'limit_type': args.det_limit_type - } - }, { - 'NormalizeImage': { - 'std': [0.229, 0.224, 0.225], - 'mean': [0.485, 0.456, 0.406], - 'scale': '1./255.', - 'order': 'hwc' - } - }, { - 'ToCHWImage': None - }, { - 'KeepKeys': { - 'keep_keys': ['image', 'shape'] - } - }] postprocess_params['name'] = 'DBPostProcess' postprocess_params["thresh"] = args.det_db_thresh postprocess_params["box_thresh"] = args.det_db_box_thresh postprocess_params["max_candidates"] = 1000 postprocess_params["unclip_ratio"] = args.det_db_unclip_ratio postprocess_params["use_dilation"] = True + elif self.det_algorithm == "EAST": + postprocess_params['name'] = 'EASTPostProcess' + postprocess_params["score_thresh"] = args.det_east_score_thresh + postprocess_params["cover_thresh"] = args.det_east_cover_thresh + postprocess_params["nms_thresh"] = args.det_east_nms_thresh + elif self.det_algorithm == "SAST": + postprocess_params['name'] = 'SASTPostProcess' + postprocess_params["score_thresh"] = args.det_sast_score_thresh + postprocess_params["nms_thresh"] = args.det_sast_nms_thresh + self.det_sast_polygon = args.det_sast_polygon + if self.det_sast_polygon: + postprocess_params["sample_pts_num"] = 6 + postprocess_params["expand_scale"] = 1.2 + postprocess_params["shrink_ratio_of_width"] = 0.2 + else: + postprocess_params["sample_pts_num"] = 2 + postprocess_params["expand_scale"] = 1.0 + postprocess_params["shrink_ratio_of_width"] = 0.3 else: logger.info("unknown det_algorithm:{}".format(self.det_algorithm)) sys.exit(0) @@ -149,12 +167,25 @@ class TextDetector(object): for output_tensor in self.output_tensors: output = output_tensor.copy_to_cpu() outputs.append(output) - preds = outputs[0] - # preds = self.predictor(img) + preds = {} + if self.det_algorithm == "EAST": + preds['f_geo'] = outputs[0] + preds['f_score'] = outputs[1] + elif self.det_algorithm == 'SAST': + preds['f_border'] = outputs[0] + preds['f_score'] = outputs[1] + preds['f_tco'] = outputs[2] + preds['f_tvo'] = outputs[3] + else: + preds = outputs[0] + post_result = self.postprocess_op(preds, shape_list) dt_boxes = post_result[0]['points'] - dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape) + if self.det_algorithm == "SAST" and self.det_sast_polygon: + dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes, ori_im.shape) + else: + dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape) elapse = time.time() - starttime return dt_boxes, elapse diff --git a/tools/test_hubserving.py b/tools/test_hubserving.py old mode 100644 new mode 100755 index f28ff39e441e9f0d8a4c6e1081827daf8aff9792..0548726417699855a3905fa1a3fb679d69c85fc8 --- a/tools/test_hubserving.py +++ b/tools/test_hubserving.py @@ -17,8 +17,9 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) -from ppocr.utils.utility import initial_logger -logger = initial_logger() +from ppocr.utils.logging import get_logger +logger = get_logger() + import cv2 import numpy as np import time diff --git a/train.sh b/train.sh index a0483e4dc81a4530822850f5a0a20ec9b4c94764..c511c51600cc2d939f0bc8c7f52a3f3c6ce52d58 100644 --- a/train.sh +++ b/train.sh @@ -1 +1,5 @@ - python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml \ No newline at end of file +# for paddle.__version__ >= 2.0rc1 +python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml + +# for paddle.__version__ < 2.0rc1 +# python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml